Loops with unknown nesting depth

We often encounter nested loops, like

for (i = 0; i < n; i++) {
    for (j = 0; j < m; j++) {
        doSomething(i, j);
    }
}

This can be nested to a few more levels without too much pain, as long as we observe that the number of iterations for each level need to be multiplied to get the number of iterations for the whole thing and that total numbers of iterations beyond a few billions (10^9, German: Milliarden, Russian Миллиарди) become unreasonable no matter how fast the doSomethings(...) is. Just looking at this example program

public class Modular {
    public static void main(String[] args) {
        long n = Long.parseLong(args[0]);
        long t = System.currentTimeMillis();
        long m = Long.parseLong(args[1]);
        long prod = 1;
        long sum  = 0;
        for (long i = 0; i < n; i++) {
            sum += i;
            sum %= m;
            prod *= (i*i+1);
            prod %= m;
        }
        System.out.println("--> sum=" + sum + " prod=" + prod + " dt=" + (System.currentTimeMillis() - t));
    }
}

which measures it net run time and runs 0 msec for 1000 iterations and almost three minutes for 10 billions (10^{10}):

> java Modular 10000 1001 # 10'000
--> sum=55 prod=520 dt=1
> java Modular 100000 1001 # 100'000
--> sum=45 prod=299 dt=4
> java Modular 1000000 1001 # 1'000'000
--> sum=0 prod=806 dt=20
> java Modular 10000000 1001 # 10'000'000
--> sum=45 prod=299 dt=161
> java Modular 100000000 1001 # 100'000'000
--> sum=946 prod=-364 dt=1570
> java Modular 1000000000 1001 # 1'000'000'000
--> sum=1 prod=0 dt=15635
> java Modular 10000000000 1001 # 10'000'000'000
--> sum=55 prod=0 dt=160365

As soon as we do I/O, network access, database access or simply a bit more serious calculation, this becomes of course easily unbearably slow. But today it is cool to deal with big data and to at least call what we are doing big data, even though conventional processing on a laptop can do it in a few seconds or minutes... And there are of course ways to process way more iterations than this, but it becomes worth thinking about the system architecture, the hardware, parallel processing and of course algorithms and software stacks. But here we are in the "normal world", which can be a "normal subuniverse" of something really big, so running on one CPU and using a normal language like Perl, Java, Ruby, Scala, Clojure, F# or C.

Now sometimes we encounter situations where we want to nest loops, but the depth is unknown, something like

for (i_0 = 0; i_0 < n_0; i_0++) {
  for (i_1 = 0; i_1 < n_1; i_1++) {
    \cdots
      for (i_m = 0; i_m < n_m; i_m++) \{\cr
        dosomething(i_0, i_1,\ldots, i_m);
      }
    \cdots
  }
}

Now our friends from the functional world help us to understand what a loop is, because in some of these more functional languages the classical C-Style loop is either missing or at least not recommended as the everyday tool. Instead we view the set of values we iterate about as a collection and iterate through every element of the collection. This can be a bad thing, because instantiating such big collections can be a show stopper, but we don't. Out of the many features of collections we just pick the iterability, which can very well be accomplished by lazy collections. In Java we have the Iterable, Iterator, Spliterator and the Stream interfaces to express such potentially lazy collections that are just used for iterating.

So we could think of a library that provides us with support for ordinary loops, so we could write something like this:

Iterable range = new LoopRangeExcludeUpper<>(0, n);
for (Integer i : range) {
    doSomething(i);
}

or even better, if we assume 0 as a lower limit is the default anyway:

Iterable range = new LoopRangeExcludeUpper<>(n);
for (Integer i : range) {
    doSomething(i);
}

with the ugliness of boxing and unboxing in terms of runtime overhead, memory overhead, and additional complexity for development. In Scala, Ruby or Clojure the equivalent solution would be elegant and useful and the way to go...
I would assume, that a library who does something like LoopRangeExcludeUpper in the code example should easily be available for Java, maybe even in the standard library, or in some common public maven repository...

Now the issue of loops with unknown nesting depth can easily be addressed by writing or downloading a class like NestedLoopRange, which might have a constructor of the form NestedLoopRange(int ... ni) or NestedLoopRange(List li) or something with collections that are more efficient with primitives, for example from Apache Commons. Consider using long instead of int, which will break some compatibility with Java-collections. This should not hurt too much here and it is a good thing to reconsider the 31-bit size field of Java collections as an obstacle for future development and to address how collections can grow larger than 2^{31}-1 elements, but that is just a side issue here. We broke this limit with the example iterating over 10'000'000'000 values for i already and it took only a few minutes. Of course it was just an abstract way of dealing with a lazy collection without the Java interfaces involved.

So, the code could just look like this:

Iterable range = new NestedLoopRange(n_0, n_1, \ldots, n_m);
for (Tuple t : range) {
    doSomething(t);
}

Btw, it is not too hard to write it in the classical way either:

        long[] n = new long[] { n_0, n_1, \ldots, n_m };
        int m1 = n.length;
        int m  = m1-1; // just to have the math-m matched...
        long[] t = new long[m1];
        for (int j = 0; j < m1; j++) {
            t[j] = 0L;
        }
        boolean done = false;
        for (int j = 0; j < m1; j++) {
            if (n[j] <= 0) {
                done = true;
                break;
            }
        }
        while (! done) {
            doSomething(t);
            done = true;
            for (int j = 0; j < m1; j++) {
                t[j]++;
                if (t[j] < n[j]) {
                    done = false;
                    break;
                }
                t[j] = 0;
            }
        }

I have written this kind of loop several times in my life in different languages. The first time was on C64-basic when I was still in school and the last one was written in Java and shaped into a library, where appropriate collection interfaces were implemented, which remained in the project or the organization, where it had been done, but it could easily be written again, maybe in Scala, Clojure or Ruby, if it is not already there. It might even be interesting to explore, how to write it in C in a way that can be used as easily as such a library in Java or Scala. If there is interest, please let me know in the comments section, I might come back to this issue in the future...

Links

Share Button

Not all projects are on ideal paths

It is nice to write about positive things, how things have been done well, how they should be done well and how good we are and how good we could be if we were just applying the right technology and methodology and process and management…

Tom Rocket

Let’s leave that for today and write about some projects that did not go too well. I picked an example that is more than ten years ago and I am not going to mention any names. Maybe someone who knew this projects as well will recognize something, maybe not. There would be more than one article to write about something like this, and I guess almost anybody who has been around in IT for a couple of years would agree. If not, enjoy being so lucky. 🙂 I change all the names, of course. Otherwise, I hope it is enjoyable to read about others who did funny stuff and to find some anti-patterns between the lines.

So project dinosaur was taking place in the same rooms where I was working. I had some low volume consulting task for this team, but was mostly absorbed by totally different projects.

Tom Rocket is the project manager. There are some more project managers who represent the customers side, but Tom Rocket is running the show and is present in the room. It is easy to recognize that he is the boss, because he calls his team members like the way somebody treats his doggy, who happens for some unintelligible reasons to have a dog, even though he does not particularly like dogs. The team is very engaged, they work hard. Not every day very motivated, but who can understand that. Of course some parts of the development is done by another team of the same company, at another location. The managers of both locations are not really best friends. Donald Peak, the manager of this location, does not care too much, as long as he can sit in his office, smoke cigars, enjoy his newspaper, do some strategy work and manage his car collection. Having been a communist in his younger days, he still knows very well how to be a capitalist. He has his people to deal with the operational issues. They are the bosses of Tom Rocket. And they do not like the other office, where they seem to do interesting and good work. Anyway, to put the pieces together, a complicated server and firewall infrastructure is needed. The other team can dump its code in some neutral zone (DMZ) from where it is fetched and integrated by Tom’s team. And the really cool stuff is done by Tom’s team.

It was a bit before web applications where the thing, at least in a time when people like Tom Rocket did not know and did not care about web applications. The client was developed in MS-Access-Basic. This resulted in interesting opportunities concerning the data. It was possible to store some part of the data in the client database and some part of the data in the server database.

The project was really important. A big company in the country where this happened did an advertising campaign for their end users concerning the services provided by the product developed by Tom Rocket. It was seen a lot in cinemas, in news papers, on the streets. Everybody knew about it. It was coming, it was there already.

We know, there are some organizations that use IT and that actually perform backups. Tom Rocket and the company where he was working did not belong to this group. Backups were not yet introduced, but talking about the possibility of having backups had already begun.

Some IT development projects use source code management. We are talking about the times of source safe and RCS. CVS was already on the horizon, but not yet widely adopted. But Tom’s project did not employ source code management. Whenever a bug in production occurred, they had to get their current development version ready for production, fix the bug there and try to get it installed. Certain end of month operations were causing big pressure on the team each month, because they did not really work fully automatically. The data quality was kind of poor and it was hard for developers to find anybody responsible for this and Tom Rocket was not at all helpful. So they tried to find workarounds that allowed the software to kind of work with such poor data quality, but this was very hard and not really a success story. The introduction of source code management was on the radar, but since the whole team always worked overtime to get the most urgent things done, this never became a serious priority.

A new developer was added to the team. They tried to copy the sources to his home directory, but that did not work. So the solution was, that the senior developer had to give his login and password to the new colleague in order to let him participate in the development.

There was a happy end to all of this. Most if not all people in Tom’s team left the company. The project was stopped, even though it was a bit embarrassing after all the advertisements. And the mother company bought another company in the same country and somehow merged these too operations.

Lessons

We can look at this story from different angles. As a distant spectator it is kind of funny, maybe even as a spectator working on something else in the same room, if you have that kind of humor.

As a team member: It might be hard to enjoy working in such an environment and it would not get better by itself. So options are to see if something can be done about Tom becoming a better team leader or being replaced by a better team leader. Or as most of the guys did, just find a better job.

As a project leader in a project that is so deep in the mud? Treat the people working for it even worse, so they will be encouraged to work harder and solve the problems, that can’t be solved at their level? I guess, it is important to admit the situation and either get enough air to put the project on a good track or even risk that it is abandoned. In the end that happened anyway.

As a manager of the company or the office where this took place? I think beyond all cigars, car collections, stock options and strategy it is important to know what is going on in the company, how people are working, how people are feeling, how people are treating each other and how chances are to get to a success that will encourage customers to do another project in the same way. Just ignoring this or demanding „just a bit more“, because the project is in danger and it is no time for addressing any long term issues did not work out. The teams are the real value of an IT company, not the hardware, the furniture and the rooms. Burning the real values like coal, as the word „resources“ might suggest, might not be the recipe for long term success. I would mention in favor of Donald Peak, that I have good reasons to assume that he was a decent and probably good manager in earlier times, but the time of Tom’s project was probably not one the best years of his carreer.

As a customer? I think it is important to have an idea about the team, how they are working and what kind of people they are. In this case it was most likely the right decision to stop the whole thing.

As far as I know there was a happy end, at least for the team members and for the mother company.

Share Button

Microsoft is buying Github

It seems that Microsoft is buying Github for about 7.5 billion USD worth in their own stock.

Is this a good thing or a bad thing? Probably no reason to celebrate, but Microsoft under Nadella seems to run a totally different strategy than Microsoft under Steve Ballmer. Selling licenses of MS-Windows and MS-Office is probably still a good business, if done efficiently, but it cannot be the future of a company of the size of Microsoft. The dominating operating system is now Linux, mostly due to cell phones and tablets running Android, network devices and of course servers. The religious rejection of open source, as it was propagated by Steve Ballmer, is no longer visible. Some MS-Projects like C# and F# have in large parts or completely become open source and on the MS-Azure-platform virtual servers with Linux and PostgreSQL can be chosen, with the limitation that there seems to be a need to have an MS-Windows-box with dedicated software to manage the Azure cloud services. If we ask Richard Stallman this is probably all tactics to achieve bad goals by harming the free software movement. This can be true or not.

But we no longer have to automatically assume that this acquisition is bad. Most likely Github will continue to operate as it did before. Nadella will not shut down Github, while Ballmer might have done so, maybe. He would not have bought Github anyway for so much money. The real asset of Github as well as of LinkedIn and Skype for Microsoft might be the large collection of high quality identity data. Today Google and Facebook have identity data for about two thirds of the adult world population by my estimation. We see it in action when we find a „login with Google“ or „login with Facebook“ option for a web site. Yes, „login with Skype“, „login with Github“ and „login with LinkedIn“ would also be possible or even exist. This is the spot were these acquisitions might or might not be seriously negative. But increasing the pool of identity data makes a lot of sense for the new strategy of the company and I would assume that it was one of the major reasons for this acquisition of Github as well as Skype and LinkedIn before. Is it worth so much money? If the answer is quite obvious, we will probably know in the future.

Links

Links are in English or in German…

Share Button