# Loops with unknown nesting depth

We often encounter nested loops, like

for (i = 0; i < n; i++) {
for (j = 0; j < m; j++) {
doSomething(i, j);
}
}


This can be nested to a few more levels without too much pain, as long as we observe that the number of iterations for each level need to be multiplied to get the number of iterations for the whole thing and that total numbers of iterations beyond a few billions (, German: Milliarden, Russian Миллиарди) become unreasonable no matter how fast the doSomethings(...) is. Just looking at this example program

public class Modular {
public static void main(String[] args) {
long n = Long.parseLong(args[0]);
long t = System.currentTimeMillis();
long m = Long.parseLong(args[1]);
long prod = 1;
long sum  = 0;
for (long i = 0; i < n; i++) {
sum += i;
sum %= m;
prod *= (i*i+1);
prod %= m;
}
System.out.println("--> sum=" + sum + " prod=" + prod + " dt=" + (System.currentTimeMillis() - t));
}
}


which measures it net run time and runs 0 msec for 1000 iterations and almost three minutes for 10 billions ():

> java Modular 10000 1001 # 10'000
--> sum=55 prod=520 dt=1
> java Modular 100000 1001 # 100'000
--> sum=45 prod=299 dt=4
> java Modular 1000000 1001 # 1'000'000
--> sum=0 prod=806 dt=20
> java Modular 10000000 1001 # 10'000'000
--> sum=45 prod=299 dt=161
> java Modular 100000000 1001 # 100'000'000
--> sum=946 prod=-364 dt=1570
> java Modular 1000000000 1001 # 1'000'000'000
--> sum=1 prod=0 dt=15635
> java Modular 10000000000 1001 # 10'000'000'000
--> sum=55 prod=0 dt=160365


As soon as we do I/O, network access, database access or simply a bit more serious calculation, this becomes of course easily unbearably slow. But today it is cool to deal with big data and to at least call what we are doing big data, even though conventional processing on a laptop can do it in a few seconds or minutes... And there are of course ways to process way more iterations than this, but it becomes worth thinking about the system architecture, the hardware, parallel processing and of course algorithms and software stacks. But here we are in the "normal world", which can be a "normal subuniverse" of something really big, so running on one CPU and using a normal language like Perl, Java, Ruby, Scala, Clojure, F# or C.

Now sometimes we encounter situations where we want to nest loops, but the depth is unknown, something like

for ( = 0;  < ; ++) {
for ( = 0;  < ; ++) {

for ( = 0;  < ; ++) \{\cr
dosomething();
}

}
}


Now our friends from the functional world help us to understand what a loop is, because in some of these more functional languages the classical C-Style loop is either missing or at least not recommended as the everyday tool. Instead we view the set of values we iterate about as a collection and iterate through every element of the collection. This can be a bad thing, because instantiating such big collections can be a show stopper, but we don't. Out of the many features of collections we just pick the iterability, which can very well be accomplished by lazy collections. In Java we have the Iterable, Iterator, Spliterator and the Stream interfaces to express such potentially lazy collections that are just used for iterating.

So we could think of a library that provides us with support for ordinary loops, so we could write something like this:

Iterable range = new LoopRangeExcludeUpper<>(0, n);
for (Integer i : range) {
doSomething(i);
}


or even better, if we assume 0 as a lower limit is the default anyway:

Iterable range = new LoopRangeExcludeUpper<>(n);
for (Integer i : range) {
doSomething(i);
}


with the ugliness of boxing and unboxing in terms of runtime overhead, memory overhead, and additional complexity for development. In Scala, Ruby or Clojure the equivalent solution would be elegant and useful and the way to go...
I would assume, that a library who does something like LoopRangeExcludeUpper in the code example should easily be available for Java, maybe even in the standard library, or in some common public maven repository...

Now the issue of loops with unknown nesting depth can easily be addressed by writing or downloading a class like NestedLoopRange, which might have a constructor of the form NestedLoopRange(int ... ni) or NestedLoopRange(List li) or something with collections that are more efficient with primitives, for example from Apache Commons. Consider using long instead of int, which will break some compatibility with Java-collections. This should not hurt too much here and it is a good thing to reconsider the 31-bit size field of Java collections as an obstacle for future development and to address how collections can grow larger than elements, but that is just a side issue here. We broke this limit with the example iterating over 10'000'000'000 values for i already and it took only a few minutes. Of course it was just an abstract way of dealing with a lazy collection without the Java interfaces involved.

So, the code could just look like this:

Iterable range = new NestedLoopRange();
for (Tuple t : range) {
doSomething(t);
}


Btw, it is not too hard to write it in the classical way either:

        long[] n = new long[] {  };
int m1 = n.length;
int m  = m1-1; // just to have the math- matched...
long[] t = new long[m1];
for (int j = 0; j < m1; j++) {
t[j] = 0L;
}
boolean done = false;
for (int j = 0; j < m1; j++) {
if (n[j] <= 0) {
done = true;
break;
}
}
while (! done) {
doSomething(t);
done = true;
for (int j = 0; j < m1; j++) {
t[j]++;
if (t[j] < n[j]) {
done = false;
break;
}
t[j] = 0;
}
}


I have written this kind of loop several times in my life in different languages. The first time was on C64-basic when I was still in school and the last one was written in Java and shaped into a library, where appropriate collection interfaces were implemented, which remained in the project or the organization, where it had been done, but it could easily be written again, maybe in Scala, Clojure or Ruby, if it is not already there. It might even be interesting to explore, how to write it in C in a way that can be used as easily as such a library in Java or Scala. If there is interest, please let me know in the comments section, I might come back to this issue in the future...

# Clojure-Art

It is an interesting idea to generate colorful images using or music. In both areas Clojure seems to be quite attractive. Not having explored the music side, I did find the idea of creating images fun and inspiring. It also shows us something about the functions we are working with, if we learn to read the images right, but that will come or not, depending on the circumstances. It is useful not to be too scared of some mathematics when reading this.

Now the challenge is to create an image on a two dimensional array of points, for example 1000×1000 pixel, with x- and y-coordinates ranging from 0 to 999. Each pixel needs to be colored. While it is very interesting to explore different color models, we can for simplicity assume that we need 3 numbers each ranging from 0 to 255 for the red, green and blue channels. This is how most displays work, more or less. Now the goal is to create something that looks good. And of course is reasonable to program, otherwise we could just color one million points individually using for example GIMP, but a million is a lot.

Now we can apply any function on x and y and play around with functions like exp, log, sqrt, sin, cos, tan, sec, csc, sinh, … and of course the basic operations +, -, * and /. It turns out that in most cases we do not get interesting images, but experience will show what is promising to explore. I tried to create pictures by keeping the three channels fairly independent, but this did not work so well. It seems that it is better to keep some connection. One approach that actually works quite well is to consider the pair as a complex number and to apply just one complex function on it, again exp, log, sqrt, sin, …. are good building blocks. Now these complex functions have a tendency to grow to infinity somewhere. While real functions can avoid this issue by constraining themselves just to one strait line on the plane, complex functions almost have to go to infinity somewhere. By making the square small enough or by changing the scale we can avoid this, but it imposes quite severe constraints. The Riemann Sphere allows us to map any complex number to a point on the surface of a sphere. With some scaling we can already get to RGB-space and get coordinates that are using, but not exceeding the desired range. There are more ways to visualize complex numbers, but this is a possibility worth exploring.

Another way is to just use functions that calculate a real number and to apply a to it. With some shifting and scaling the values will be between 0 and 255 only and there are nor abrupt changes in color, unless the function we calculated is very steep or very chaotic. Using phase shifts by and the three color channels can be served and we get nice rainbow-waves like the following:

Clojure Art: angle + log(r)

Another experiment was to just assume the HSV-model and to calculate the colors from assuming the function is the H-part. But this ended up looking like plastic and I did not like it too much.

An important issue to observe is that functions may end up in exceptions. I wrapped the functions, so that they do not stop the calculation of the image half way through, but instead provide default values in cases where an exceptions occured.

It can also be fun to explore bitwise-functions like bitxor or even functions like the p-adic exponential function, which yields totally different kind of images.

I have put some of the code from my experiments into Github and licensed it with the GPL, so you can use it as a starting point. Others have worked with this as well, for example Clojure Art on Tumblr, Clojure Art Collective on github, another „clojure art“ on github or creative computing with clojure on O’Reilly’s blog.

Enjoy it and learn some Clojure. I sometimes use this when teaching Clojure.

# DB Persistence without UPDATE and DELETE

When exploring the usage of databases for persistence, the easiest case is a database that does only SELECT. We can cache as much as we like and it is more or less the functional immutable world brought to the database. For working on fixed data and analyzing data this can sometimes be useful.

Usually our data actually changes in some way. It has been discussed in this Blog already, that it would be possible to extend the idea of immutability to the database, which would be achieved by allowing only INSERT and SELECT. Since data can correlate, an INSERT in a table that is understood as a sub-entity via a one-to-many-relationship by the application actually is mutating the containing entity. So it is necessary to look at this in terms of the actual OR-mapping of all applications that are running on that DB schema.

Life can be simple, if we actually have self contained data as with MongoDB or by having a JSON-column in PostgreSQL, for example. Then inter-table-relations are eliminated, but of course it is not even following the first normal form. This can be OK or not, but at least there are good reasons why best practices have been introduced in the relational DB world and we should be careful about that. Another approach is to avoid the concept of sub entities and only work with IDs that are foreign keys. We can query them explicitly when needed.

An interesting approach is to have two ID-columns. One is an id, that is unique in the DB-table and increasing for newly created data. One is the entity-ID. This is shared between several records referring to different generations of the same object. New of them are generated each time we change something and persist the changes and in a simple approach we just consider the newest record with that entity-ID valid. It can of course be enhanced with validFrom and validTo. Then each access to the database also includes a timestamp, usually close to current time, but kept constant across a transaction. Only records for which validFrom <= timestamp < validTo are considered, and within these the newest. The validFrom and validTo can form disjoint intervals, but it is up to the application logic if that is needed or not. It is also possible to select the entry with the highest ID among the records with a given entityID and timestamp-validTo/From-condition. Deleting records can be simulated by this as well, by allowing a way to express a "deleted" record, which means that in case we find this deleted record by our rules, we pretend not having found anything at all. But still referential integrity is possible, because the pre-deletion-data are still there. This concept of having two IDs has been inspired by a talk on that I saw during Clojure Exchange 2017: Immutable back to front.

# Lazy Collections, Strings or Numbers

The idea is, that we have data that is obtained or calculated to give us on demand as much of it as we request. But it is not necessarily initially present. This concept is quite common in the functional world, where we in a way hide the deprecated concept of state in such structures, by the way in a way that lets use retain the benefits that led to the desire for statelessness.

Actually the concept is quite old. We have it for I/O in Unix and hence in Linux since the 1970ies. „Everything is a file“, at least as long as we constrain ourselves to a universal subset of possible file operations. It can be keyboard input, a named or anonymous pipe, an actual file, a TCP-connection, to name the most important cases. These are „lazy“ files, behave more or less like files as far as sequential reading is concerned, but not for random access reading. The I/O-concept has been done in such a way that it takes the case into account that we want to read bytes, but get only bytes. This can happen with files when we reach their end, but then we can obtain an indication that we reached the end of the file, while it is perfectly possible that we read less then we want in one access, but eventually get bytes including subsequent reads. Since the API has been done right, but by no means ideal, it generalizes well to the different cases that exist in current OS environments.

We could consider a File as an array of bytes. There is actually a way to access it in this way by memory-mapping it, but this assumes a physically present file. Now we could assume that we think of the array as a list that is optimized for sequential access and iterating, but not for random access. Both list types actually exist in languages like Java. Actually the random access structure can be made lazy as well, within certain constraints. If the source is actually sequential, we can just assume that the data is obtained up to the point where we actually read. The information about the total length of the stream may or may not be available, it is always available somehow in the case of structures that are completely available in memory. This random access on lazy collections works fine if the reason of laziness is to actually save us from doing expensive operations to obtain data that we do not actually need or to obtain them in parallel to the computation that processes the data. But we loose another potential drawback in this case. If the data is truly sequential, we can actually process data that is way beyond our memory capacity.

So the concept transfers easily from I/O-streams to lists and even arrays, most naturally to iterables that can be iterated only once. But we can easily imagine that this also applies to Strings, which can be seen a sequence of characters. If we do not constrain us to what a String is in C or Java or Ruby, but consider String to be a more abstract concept, again possibly dropping the idea of knowing the length or having a finite length. Just think of the output of the Unix command „yes“ or „cat /dev/zero“, which is infinite, in a theoretical way, but the computer won’t last forever in real life, of course. And we always interrupt the output at some time, usually be having the consumer shut down the connection.

Even numbers can be infinite. For real numbers this can happen only after the decimal point, for p-adic numbers it happens only before the decimal point, if you like to look into that. Since we rarely program with p-adic numbers this is more or less an edge case that is not part of our daily work, unless we actually do math research. But we could have integers with so many digits that we actually obtain and process them sequentially.

Reactive programming, which is promoted by lightbend in the Reactive Manifesto relies heavily on lazy structures, in this case data streams. An important concept is the so called „backpressure“, that allows the consumer to slow down the producer, if it cannot read the data fast enough.

Back to the collections, we can observe different approaches. Java 8 has introduced streams as lazy collections and we need to transform collections into streams and after the operation a stream back into a collection, at least in many real life situations. But putting all into one structure has some drawbacks as well. But looking at it from an abstract point of view this does not matter. The java8-streams to not implement a collection interface, but they are lazy collections from a more abstract point of view.

It is interesting that this allows us to relatively easily write nested loops where the depth of the nesting is a parameter that is not known at compile time. We just need a lazy collections of -tuples, where is the actual depth of the nesting and the contents are according to what the loops should iterate through. In this case we might or might not know the size of the collection, possibly not fitting into a 32-bit-integer. We might be able to produce a random member of the collection. And for sure we can iterate through it and stop the iteration wherever it is, once the desired calculation has been completed.

# Clojure

Functional programming languages have become a bit of a hype.

But the ideas are not really so new.
The first languages beyond Assembly language that have maintained some relevance up to today were FORTRAN, COBOL and Lisp. Indirectly also Algol, because it inspired pretty much any modern mainstream programming language in some way through some intermediate ancestors. The early Algol Variants itself have disappeared .

It can be argued if the early Lisp Dialects were really functional languages, but they did support some functional patterns and made them popular at least in certain communities. I would say that popular scripting languages like the Ruby programming language, the Perl programming language, the Lua programming language and especially JavaScript brought them to the main stream.

Lisp has always remained in its niche. But the question arose on creating a new Lisp that follows more strictly the functional paradigm and is somewhat more modern, cleaner and simpler than the traditional Lisps. It was done and it is called Clojure.

So anybody who has never used any Lisp will at first be lost, because it is a jungle of parentheses „((((())))()()()(…)“ with some minor stuff in between…
Actually that is an issue, when we move from today’s common languages to Clojure. But it is not that bad. The infix-notation is familiar to us, but it has its benefits to use one simple syntax for almost everything.

An expression that consists of a function call is written like this (function-name param1 param2 parm3...). +, -, *,…. are just functions like anything else, so if we want to write we just write .

In the early days of calculators it was easier to build something that works with a notion called „RPN“, so there we would write 3 ENTER 4 * 5 ENTER 6 * +, which is similar to the Lisp way, but just the other way round.

It is easy to add a different number of values:
* (+) -> 0
* (+ 7) -> 7
* (+ 1 2 3 4 5 6 7) -> 28

In Clojure functions are just normal values like numbers, arrays, lists,… that can be passed around.. It is good programming practice to rely on this where it helps. And with more experience it will be helpful more often.

Immutability is king. Most of the default structures of Clojure are immutable. They can be passed around without the fear that they might change once they have been constructed. This helps for multithreading.

Clojure provides lists, arrays, sets, hashmaps, and the sorted variants of the latter. These can be written easily:
* List: (list 1 2 3) -> (1 2 3) (entries are evaluated in this case)
* List: '(1 2 3) -> (1 2 3) (entries are not evaluated in this case)
* Array: [1 2 3] (entries are evaluated in this case)
* Set: #{1 2 3} (entries are evaluated in this case)
* Map: {1 2, 3 4} (entries are evaluated in this case. key-value-Pairs are usually grouped with a comma, which is whitespace for Clojure)

All of these are immutable. So methods that change collections, always create a copy that contains the changes. The copy can be done lazily or share data with the original.

Actually I can teach Clojure in course of two to five days duration depending on the experience of the participants and the goals they want to achieve.

There is much more to write about Clojure…