Observer Effect

Scientists have to deal with the observer effect, which means that observing something actually changes it. Typically we think of quantum physics, where this effect is very strong and surprising and closely related to the Heisenberg Uncertainty Principle, but it is actually something that in a more abstract sense is present in a multitude of situtations. Just think of human interaction. If we want to find out about people, we can ask them. But this conversation actually changes the people, sometimes in a way that we can neglect or tolerate.

But we also have this in the case of IT. If we think of a software and we want to observe if the software behaves well, we need ways to observe the software. Very often we use logging, sometimes monitoring tools, and sometimes debugging or even profiling. We think that they do not hurt us, apart from using resources, but we have to be quite careful. The example of logging is quite good, because it is quite common and usually something that we do a lot, without wasting too much thoughts about it.

Now logging slows our application down that is known already. Now we tend to use a slightly less noisy log level, because terabytes of log are still a pain, even today. But usually the messages are calculated and then discarded by the logging framework. With functional language features there are quite elegant ways to deal with this, by just passing a function that calculates the message on demand instead of passing the message. It has always been possible, but too clumsy to actually do it, unless the logging framework can rely on macro facilities, even such simple ones as the C-preprocessor. The deferred evaluation has its dangers as well. If an object that is passed as an ingrediant for a potential message already changes, while the message is created, we might get funny effects. Maybe only in the log, but maybe it could crash the application or stop the main program flow from doing its work. We need to be careful, unless the object is immutable.

In case of Hibernate or JPA or similar frameworks this can be specially interesting, even with eager message calculation. Accessing attributes of the object can actually lead to database operations. They can fail. They can create load, maybe deadlocks. They can have lost their transaction. A lot of things can happen in places far away from where we assumed to do the DB-work. This actually changes the objects. Do we want such operations to occur during logging? Maybe differently depending on the log level? Immutability is our friend, especially in conjunction with JPA, but that is a long story. We may at least be lucky that we actually have some tables that we only read. We can make the objects „pseudo-immutable“, but still JPA-magic must mutate them at least during the read operation. It is tempting to let tools generate the toString-methods of objects, but it is very dangerous here. We should avoid including any potentially lazily loaded attributes in the toString-output, because otherwise they will be loaded during the logging or even worse differently depending on how we log.

The next thing is the NullPointerException during logging. It is quite common in Java, for example. And we do not want to burden our program logic with NullPointerExceptions from the logging, especially not with those that occur only sporadically. So it is a good idea to be careful and to test well. Only the combination is possibly good enough.

Modern times create more demand for some kind of real multi threading, not in the JEE-sense with a couple of EJBs that can run in parallel, but with massively parallel operations. Even though we have a multitude of logging frameworks and unifying logging frameworks and even more of them, there is a common weakness that they tend to share. Writing into one single target is achieved by some kind of synchronizing, which can slow our application down and change the timing behavior in ways that we did not desire. Asynchronous logging could be good, but in a way this only shifts the problem a bit.

Share Button

Functional Programming: Article in another Blog

I wrote a Guest Blog Article about Functional Programming on Adesso’s Blog in German.

Share Button

Testing and Micro Components

It is an interesting trend to create micro components, micro services and use building blocks that are very easy to understand. As always, a new approach will not eliminate all problems. The problem that we are solving has an inherent complexity that we cannot beat. But usually we create a lot of complexity that would not be necessary to solve the problem and this can be avoided. Any approach that really helps us here is welcome. If and how micro components and micro services can help us, is another issue, I will just assume that they are in use.

Think of constructing a building. Now this is a very complex task, but we simplified it by building Lego pieces instead. They are simple, easy to understand, well tested and we just have to compose them to get the building. The building does not become trivial buy that, but maybe it becomes easier to create it.

In the end we want to sell a building. The fact that it is constructed of ISO and TÜV and CE and whatever certified lego blocks is not really relevant for the users of the building. They actually want a building, not lego blocks. Maybe they do not even care how it has been made, as long as it is good. They should not care, but usually they do at least a bit. We do not live in an ideal world and I would care too.

Now the building has to fit into the city. It has a function to play. It is itself a lego block for building the city. But for the moment we are actually only doing one building, not a whole city. So we need to provide tests that it works with its environment and provides the functionality that we expect. There is an API and an API-contract. We need to test this API in the way it will be used in production. That is on Linux, if the productive servers run Linux, not on MS-Windows. But Linux is anyway better for developement than MS-Windows, unless we are working on MS-specific technologies, where the servers will run MS-Windows anyway. We have to test with the database product that is used in production, for example PostgreSQL or Oracle. Tests with In-Memory-DBs or products like sqlite are interesting and maybe helpful for finding bugs by having tests that run presumably faster or with less overhead, but there are ways to provide the real DB to each developer and the overhead is not much, once this has been prepared. And a local Oracle instance is not much slower than an in-memory-DB, but allows us to inspect and change data with SQL, which can be very useful. And if we use application servers or middleware, it is mandatory to test with the same middleware as in production, no matter how much compatibility between the implementations of different vendors has been promised. All of these issues are in theory not existent, but I have seen them in practice and that is what counts. Yes, I love theory. 🙂

The other thing is that the fact that the lego pieces have been tested does not guarantee that the whole building will work. The composition creates something special, that is more than the sum of its pieces. That is what we want to achieve. So we need to test it.

So it is good to have automated tests that run against the service using the API that is used in production, possibly REST or SOAP, for example. And to run the test against an instance of the application that runs on the same kind of server against the same kind of database and on the same kind of middleware as in production. At least continuous integration should work like that. The smaller our components are, the more important it becomes to test them in conjunction. Problems will occur there, even though the components by themselves seem to work perfectly.

Now there is demand for running more local tests, unit tests, as they are commonly called, to actually test lego blocks or small groups of lego blocks. It is good to locate problems, because the end-to-end-tests just provide the information that there is a bug, but it is hard to locate. And without thorough testing of the sub components it is possible that we overlook edge cases on the overall tests that will anyway hit us in production, because only the more local tests can explore the boundaries that are visible to them, but obfuscated when accessed indirectly. Also we do not want to be happy with two bugs that cancel each other, because they will not always work together in our favor. So in the end of the day it is important to have automated tests at all levels. And to make them an important issue.

I have seen many projects, where Unit-Tests eroded. They were once done, but did not work any more and were impossible or hard to fix, off course lacking the time to do so. Or the software was hiding access points that were needed for testing. Just two examples for this, for our Java friends: methods can be package private and test classes in the same package to allow testing. And application servers need to actually allow certain remote access, usually via REST or SOAP these days. And we need remote debugging access. This is all easy to achieve, but can be made hard by company policies, even on development workstations and continuous integration servers.

Share Button


I like comments that really address the content that I have written, no matter if they agree or disagree with me.

But I am getting a lot of spam these days and comments that advertise pharmaceutical products, credits, dating sites or other non-related issues are spam, when applied to this site, unless we discuss the IT of their web shops, and I delete them. Generally I will delete comments that do not contain content referring to the article, but only consist of links. I like languages and I speak German, English, Swedish, Norwegian, Spanish, Russian and French, but I will usually delete comments in languages other than the article, if their text is not obviously an interesting reference to the article. Even comments like „great blog“ with no content, however nice they are, do not create any value for the readers and will not be released.

Share Button


It is an interesting idea to generate colorful images using or music. In both areas Clojure seems to be quite attractive. Not having explored the music side, I did find the idea of creating images fun and inspiring. It also shows us something about the functions we are working with, if we learn to read the images right, but that will come or not, depending on the circumstances. It is useful not to be too scared of some mathematics when reading this.

Now the challenge is to create an image on a two dimensional array of points, for example 1000×1000 pixel, with x- and y-coordinates ranging from 0 to 999. Each pixel needs to be colored. While it is very interesting to explore different color models, we can for simplicity assume that we need 3 numbers each ranging from 0 to 255 for the red, green and blue channels. This is how most displays work, more or less. Now the goal is to create something that looks good. And off course is reasonable to program, otherwise we could just color one million points individually using for example GIMP, but a million is a lot.

Now we can apply any function on x and y and play around with functions like exp, log, sqrt, sin, cos, tan, sec, csc, sinh, … and of course the basic operations +, -, * and /. It turns out that in most cases we do not get interesting images, but experience will show what is promising to explore. I tried to create pictures by keeping the three channels fairly independent, but this did not work so well. It seems that it is better to keep some connection. One approach that actually works quite well is to consider the pair (x,y) as a complex number z = x+iy and to apply just one complex function on it, again exp, log, sqrt, sin, …. are good building blocks. Now these complex functions have a tendency to grow to infinity somewhere. While real functions can avoid this issue by constraining themselves just to one strait line on the plane, complex functions almost have to go to infinity somewhere. By making the square small enough or by changing the scale we can avoid this, but it imposes quite severe constraints. The Riemann Sphere allows us to map any complex number to a point on the surface of a sphere. With some scaling we can already get to RGB-space and get coordinates that are using, but not exceeding the desired range. There are more ways to visualize complex numbers, but this is a possibility worth exploring.

Another way is to just use functions that calculate a real number and to apply a \sin to it. With some shifting and scaling the values will be between 0 and 255 only and there are nor abrupt changes in color, unless the function we calculated is very steep or very chaotic. Using phase shifts by \frac{2\pi}{3} and \frac{4\pi}{3} the three color channels can be served and we get nice rainbow-waves like the following:

Clojure Art: angle + log(r)

Clojure Art: angle + log(r)

Another experiment was to just assume the HSV-model and to calculate the colors from assuming the function is the H-part. But this ended up looking like plastic and I did not like it too much.

An important issue to observe is that functions may end up in exceptions. I wrapped the functions, so that they do not stop the calculation of the image half way through, but instead provide default values in cases where an exceptions occured.

It can also be fun to explore bitwise-functions like bitxor or even functions like the p-adic exponential function, which yields totally different kind of images.

I have put some of the code from my experiments into Github and licensed it with the GPL, so you can use it as a starting point. Others have worked with this as well, for example Clojure Art on Tumblr, Clojure Art Collective on github, another „clojure art“ on github or creative computing with clojure on O’Reilly’s blog.

Enjoy it and learn some Clojure. I sometimes use this when teaching Clojure.

Share Button

DB Persistence without UPDATE and DELETE

When exploring the usage of databases for persistence, the easiest case is a database that does only SELECT. We can cache as much as we like and it is more or less the functional immutable world brought to the database. For working on fixed data and analyzing data this can sometimes be useful.

Usually our data actually changes in some way. It has been discussed in this Blog already, that it would be possible to extend the idea of immutability to the database, which would be achieved by allowing only INSERT and SELECT. Since data can correlate, an INSERT in a table that is understood as a sub-entity via a one-to-many-relationship by the application actually is mutating the containing entity. So it is necessary to look at this in terms of the actual OR-mapping of all applications that are running on that DB schema.

Life can be simple, if we actually have self contained data as with MongoDB or by having a JSON-column in PostgreSQL, for example. Then inter-table-relations are eliminated, but off course it is not even following the first normal form. This can be OK or not, but at least there are good reasons why best practices have been introduced in the relational DB world and we should be careful about that. Another approach is to avoid the concept of sub entities and only work with IDs that are foreign keys. We can query them explicitly when needed.

An interesting approach is to have two ID-columns. One is an id, that is unique in the DB-table and increasing for newly created data. One is the entity-ID. This is shared between several records referring to different generations of the same object. New of them are generated each time we change something and persist the changes and in a simple approach we just consider the newest record with that entity-ID valid. It can off course be enhanced with validFrom and validTo. Then each access to the database also includes a timestamp, usually close to current time, but kept constant across a transaction. Only records for which validFrom <= timestamp < validTo are considered, and within these the newest. The validFrom and validTo can form disjoint intervals, but it is up to the application logic if that is needed or not. It is also possible to select the entry with the highest ID among the records with a given entityID and timestamp-validTo/From-condition. Deleting records can be simulated by this as well, by allowing a way to express a "deleted" record, which means that in case we find this deleted record by our rules, we pretend not having found anything at all. But still referential integrity is possible, because the pre-deletion-data are still there. This concept of having two IDs has been inspired by a talk on that I saw during Clojure Exchange 2017: Immutable back to front.

Share Button

Lazy Collections, Strings or Numbers

The idea is, that we have data that is obtained or calculated to give us on demand as much of it as we request. But it is not necessarily initially present. This concept is quite common in the functional world, where we in a way hide the deprecated concept of state in such structures, by the way in a way that lets use retain the benefits that led to the desire for statelessness.

Actually the concept is quite old. We have it for I/O in Unix and hence in Linux since the 1970ies. „Everything is a file“, at least as long as we constrain ourselves to a universal subset of possible file operations. It can be keyboard input, a named or anonymous pipe, an actual file, a TCP-connection, to name the most important cases. These are „lazy“ files, behave more or less like files as far as sequential reading is concerned, but not for random access reading. The I/O-concept has been done in such a way that it takes the case into account that we want to read n bytes, but get only m < n bytes. This can happen with files when we reach their end, but then we can obtain an indication that we reached the end of the file, while it is perfectly possible that we read less then we want in one access, but eventually get \ge n bytes including subsequent reads. Since the API has been done right, but by no means ideal, it generalizes well to the different cases that exist in current OS environments.

We could consider a File as an array of bytes. There is actually a way to access it in this way by memory-mapping it, but this assumes a physically present file. Now we could assume that we think of the array as a list that is optimized for sequential access and iterating, but not for random access. Both list types actually exist in languages like Java. Actually the random access structure can be made lazy as well, within certain constraints. If the source is actually sequential, we can just assume that the data is obtained up to the point where we actually read. The information about the total length of the stream may or may not be available, it is always available somehow in the case of structures that are completely available in memory. This random access on lazy collections works fine if the reason of laziness is to actually save us from doing expensive operations to obtain data that we do not actually need or to obtain them in parallel to the computation that processes the data. But we loose another potential drawback in this case. If the data is truly sequential, we can actually process data that is way beyond our memory capacity.

So the concept transfers easily from I/O-streams to lists and even arrays, most naturally to iterables that can be iterated only once. But we can easily imagine that this also applies to Strings, which can be seen a sequence of characters. If we do not constrain us to what a String is in C or Java or Ruby, but consider String to be a more abstract concept, again possibly dropping the idea of knowing the length or having a finite length. Just think of the output of the Unix command „yes“ or „cat /dev/zero“, which is infinite, in a theoretical way, but the computer won’t last forever in real life, off course. And we always interrupt the output at some time, usually be having the consumer shut down the connection.

Even numbers can be infinite. For real numbers this can happen only after the decimal point, for p-adic numbers it happens only before the decimal point, if you like to look into that. Since we rarely program with p-adic numbers this is more or less an edge case that is not part of our daily work, unless we actually do math research. But we could have integers with so many digits that we actually obtain and process them sequentially.

Reactive programming, which is promoted by lightbend in the Reactive Manifesto relies heavily on lazy structures, in this case data streams. An important concept is the so called „backpressure“, that allows the consumer to slow down the producer, if it cannot read the data fast enough.

Back to the collections, we can observe different approaches. Java 8 has introduced streams as lazy collections and we need to transform collections into streams and after the operation a stream back into a collection, at least in many real life situations. But putting all into one structure has some drawbacks as well. But looking at it from an abstract point of view this does not matter. The java8-streams to not implement a collection interface, but they are lazy collections from a more abstract point of view.

It is interesting that this allows us to relatively easily write nested loops where the depth of the nesting is a parameter that is not known at compile time. We just need a lazy collections of n-tuples, where n is the actual depth of the nesting and the contents are according to what the loops should iterate through. In this case we might or might not know the size of the collection, possibly not fitting into a 32-bit-integer. We might be able to produce a random member of the collection. And for sure we can iterate through it and stop the iteration wherever it is, once the desired calculation has been completed.

Share Button


This Blog is now using https. So the new URL is, while the old URL will continue to work for some time.

If you like to read more about changing the links within the Blog you can find information on Vladimir’s Blog including a recipe, both in German.

Share Button

Is Java becoming non-free?

We are kind of used to the fact that Java is „free“.
It has been free in the sense of „free beer“ pretty much forever.
And more recently also „free“ in the sense of „free speech“.

In spite of the fact that we read that „Oracle is going to monetize on Java“, as can be read in articles like this, it is remaining like that, at least for now. This is also written in the article.
But it seems that they are looking for loopholes. For example we download and install Java SE including X, Y and Z, because it comes like that. Agree to hundred pages of license text and confirm having read and understood everything, as always… Now we really need X, which is the JDK, which is actually free. But we just accidentally also install Y and Z, which we do not need, but which has a price tag on which they are trying to get us.

Even if nothing will really happen, issues like that help undermining the trust in the platform in general, not only for Java, but also for other JVM-languages. Eventually there could be forks like we have seen with LibreOffice vs. OpenOffice or with mariaDB vs. mySQL, which kind of took over by avoiding the ties to Oracle. Solaris seems to have a similar fork, but in this case people are just moving to Linux anyway, so the issue is less relevant.

These prospects are not desirable, but I think we do not have to panic, because there are ways to solve this that are going to be pursued if necessary. Maybe it is a good idea to be more careful when installing software. And to think twice when starting a new project if Oracle or PostgreSQL is the right DB product in the long term, taking into consideration Oracle’s attitude towards loyal long term customers.

It is regrettable. Oracle has great technology from their own history and from SUN in databases, Java including the surrounding universe, Solaris and hardware. Let us hope that they will stay reasonable at least with Java.

Share Button


Java has always not just been a language, but it brought us libraries and frameworks. Some of them proved to be bad ideas, some become hyped without having any obvious advantages, but some were really good.

In the JEE-stack, messaging (JMS) was included pretty much from the beginning. In those days, when Java belonged to Sun Microsystems and Sun did not belong to Oracle, an aim was to support databases, which was in those days mostly Oracle, via JDBC and so called Message oriented middleware, which was available in the IBM-world via JMS. JMS is a common interface for messaging, that is like sending micro-email-message not between human, but between software components. It can be used within one JVM, but even between geographically distant servers, provided a safe network connection exists. Since we all know EMail this is in principle not too hard to understand, but the question is, what it really means and if it brings us something that we do not already have otherwise.

We do have web services as an established way to communicate between different servers across the network and of course they can also be used locally, if desired. Web services are neither the first nor the only way to communicate between servers nor are they the most efficient way. But I would say that they are the way how we do it in typical distributed applications that are not tied to any legacy. In principal web services are network capable and synchronous. This is well understood and works fine for many applications. But it also forces us to block processes or threads while waiting for responses, thus occupying valuable resources. And we tend to loose responsiveness, because of the waiting for the response. It needs to be observed that DB-access is typically only available synchronously. In a way understandable because of the transactions, but it also blocks resources to a huge extent, because we know that the performance of many applications is DB driven.

Now message based software architectures think mostly asynchronously. Sending a message is a „fire and forget“. There is such a thing as making message transactional, but this has to be understood correctly. There is one transaction for sending the message. It is guaranteed that the message is sent. Delivery guarantees can only be given to a limited extent, because we do not know anything about the other side and if it is at all working. This is not checked as part of the transaction. We can imagine though that the messaging system has its own transactional database and stores the message there within the transaction. It then retries delivering it forever, until it succeeds. Then it is deleted from this store as part of the receiving transaction. Both these transactions can be part of a distributed transaction and thus be combined with other transactions, usually against databases, for a combined transaction. This is what we usually have in mind when talking about this. I have to mention that the distributed transaction, usually based on the so called two phase commit, is not quite as water proof as we might hope, but it can be broken by construction of a worst case scenario regarding the timing of failures of network and systems. But it is for practical purposes reasonable good to use.

While it is extremely interesting to investigate purely message based architectures, especially in conjunction with functional paradigm, this may not be the only choice. Often it is a good option to use a combination of messaging with synchronous services.

We should observe that messaging is a more abstract concept. It can be implemented by some middle ware and even be accessible by a standardized kind of interface like JMS. But it can also be more abstract as a queuing system or as something like Akka uses for its internal communication. And messaging is not limited to Java or JVM languages. Interoperability does impose some constraints on how to use it, because it bans usage of Object-messages which store serialized Java objects, but there are ways to address this by using JSON or BSON or XML or Protocol Buffers as message contents.

What is interesting about JMS and messaging in general are two major communication modes. We can have queues, which are point to point connections. Or we can have „topics“, which are channels into which messages are sent. They are then received by all current subscribers of the topic. This is interesting to notify different components about an event happening in the system, while possibly details about the event must be queried via synchronous services or requested by further messaging via queues.

Generally JMS in Java has different implementations, usually there are those coming with the application servers and there are also some standalone implementations. They can be operated via the same interface, at least as long as we constrain us to the common set of functionality. So we can exchange the JMS implementation for the whole platform (which is a nightmare in real life), but we cannot mix them, because the wire protocol is usually incompatible. There is now something like a standard network protocol for messaging, which is followed by some, but not all implementations.

As skeptical as I am against Java Enterprise edition, I do find the JMS part of enterprise Java very interesting and worthwhile exploring for projects that have a size and characteristics justifying this.

Share Button