Forrest roads

In forests we usually find unpaved roads like this:

Forrest Road

Forest Road

Now we should observe how they are constructed. Usually the need to somehow allow access to areas in the forest. It does not have to be the shortest connection, it does not have to be as flat as possible, it is not required to build for high speed and it is not at all required to build for high capacity.

Often it is necessary that trucks and heavy machinery can use the road. So it needs to be constructed in a way that it withstands occasional usage by heavy vehicles, for example to extract wood from the forest or in bad cases to fight a forest fire. And the network usually provides a reasonable way for a truck with a trailer to get out again by driving forward only.

Apart from this there is one very important requirement. The construction must be cheap. It should usually be so cheap that it can easily be paid by a relatively small fraction of the money that is obtained by selling the wood from the forest. There can be a significant secondary use for recreational purposes or even as route through the forest, typically for MTBs and pedestrians, which might justify using money from other sources than the revenue from the forest. Interestingly forests in Switzerland have much denser road networks than forests in Norway or Sweden.

What we never see is forest roads that are somehow prepared for the case that it might be decided in the future to expand them to eight lanes or to transform them into high speed highways. It would be good to use a route that allows this, to already pass hills with cuts or tunnels and to build bridges already much wider than currently necessary. Real highways usually run on a dam or at least they include a thick sequence of layers that often add up to a few meters under the upper asphalt layer. There are „best practices“ about building highways, even relatively narrow highways like this one:

Highway E45 in the Taiga in northern Sweden 2014

Highway E45 in the Taiga in northern Sweden 2014

They are mostly ignored, apart from very universal principals that apply to any kind of construction.

One kilometer of forest road costs a very tiny fraction of one kilometer of such a highway with two lanes.

We should learn from this for our IT solutions. We should think how big our IT solution might actually become. How many customers do we need to serve? What kind of sophisticated functionality needs to be added? Very often we see the mistake that IT solution are built too small. They do not scale, cannot easily be expanded or simply not serve the load or the availability requirements.

Companies like Google. Facebook, Twitter, Netflix or VK serve millions of users 7×24. There is even an implicit promise that there will be no down times and people start relying on this.

We expect banking software to be accurate to the cent. Not sometimes, but always.

Running a device in a chemical plant or steering a rocket requires absolutely reliable software. Errors can be very expensive and cost human lives.

On the other hand there is plenty of software that is useful. Off course it needs to work properly. But the requirements are much lower. A typical app for a mobile phone does not need to be able to run on millions of servers simultaneously. Downtimes during software updates are no problem. Usually a small development team is sufficient to build them. And when it comes to money, the real business logic for this is usually on the server. And we just should not control a chemical plant by a mobile phone app. But mobile phone apps usually have to come at neglectable prices to the users. They either have to pay themselves by the business that they indirectly promote or by a very small amount of money per user, usually combined with a very small number of users.

It is important to really understand the requirements well and build the right size of application. Building too small is very bad, but building too big can be as bad for the project and the organization. And even if the company is lucky and discovers that the software is so attractive that a bigger solution is necessary, then maybe this is a good moment to rewrite it and consider the first version as proof of concept.

Whenever a new highway is built on a route that is previously only covered by a forest road, the highway will usually be built from scratch, ignoring the routing of the old forest road. If the forest road becomes obsolete by that, then the money for the original construction of the forest road is neglectable. Trying to transform a forest road into a highway did happen in many small steps over centuries to create part of our current road network, but it is usually not a recommended approach.

Share Button

ä ö … in HTML

In the old days of the web, more than 20 years ago, we found a possibility to write German Umlaut letters and a lot of other letters and symbols using pure ASCII. These are called „entities“, btw.

Many people, including myself, started writing web pages using these transcriptions, in the assumption that they were required. Actually in the early days of the web there were some rumors that some browsers actually did not understand the character encodings that contained these letters properly, which was kind of plausible, because the late 80s and the 90s were the transition period where people discovered that computers are useful outside of the United States and at least for non-IT-guys it was or it should have been a natural requirement that computers understand their language or at least can process, store and transmit texts using the proper characters of the language. In case of German language this was not so terrible, because there were transcriptions for the special characters (ä->ae, ö->oe, ü->ue, ß->ss) that were somewhat ugly, but widely understandable to native German speakers. Other languages like Russian, Greek, Arabic or East-Asian languages were in more trouble, because they consist of „special characters“ only.

Anyway, this „ä“-transcription for web pages, which is actually superior to the „ae“, because the reader of the web page will read the correct „ä“, was part of the HTML-standard to support writing characters not on the keyboard. This was a useful feature in those days, but today we can find web pages that help use with the transliteration or just look up the word with the special characters in the web in order to write it correctly. Then we can as well copy it into our HTML-code, including all the special characters.

There could be some argument about UTF-8 vs. UTF-16 vs. ISO-8859-x as possible encodings of the web page. But in the area of the web this was never really an issue, because the web pages have headers that should be present and inform the browser about the encoding. Now I recommend to use UTF-8 as the default, because that includes all the potential characters that we might want to use sporadically. And then the Umlaut kann just directly be part of the HTML content. I convereted all my web pages to use Umlaut-letters properly, where needed, without using entities in the mid 90s.

Some entities are still useful:

  • „&lt;“ for „<“, because „<“ is used as part of the HTML-syntax
  • „&amp;“ for „&“ for the same reason
  • „&gt;“ for „>“ for consistency with „<„
  • „&nbsp;“ for no-break-space, to emphasize it, since most editors make it hard to tell the difference between regular space and no-break-space.
Share Button

Observer Effect

Scientists have to deal with the observer effect, which means that observing something actually changes it. Typically we think of quantum physics, where this effect is very strong and surprising and closely related to the Heisenberg Uncertainty Principle, but it is actually something that in a more abstract sense is present in a multitude of situtations. Just think of human interaction. If we want to find out about people, we can ask them. But this conversation actually changes the people, sometimes in a way that we can neglect or tolerate.

But we also have this in the case of IT. If we think of a software and we want to observe if the software behaves well, we need ways to observe the software. Very often we use logging, sometimes monitoring tools, and sometimes debugging or even profiling. We think that they do not hurt us, apart from using resources, but we have to be quite careful. The example of logging is quite good, because it is quite common and usually something that we do a lot, without wasting too much thoughts about it.

Now logging slows our application down that is known already. Now we tend to use a slightly less noisy log level, because terabytes of log are still a pain, even today. But usually the messages are calculated and then discarded by the logging framework. With functional language features there are quite elegant ways to deal with this, by just passing a function that calculates the message on demand instead of passing the message. It has always been possible, but too clumsy to actually do it, unless the logging framework can rely on macro facilities, even such simple ones as the C-preprocessor. The deferred evaluation has its dangers as well. If an object that is passed as an ingrediant for a potential message already changes, while the message is created, we might get funny effects. Maybe only in the log, but maybe it could crash the application or stop the main program flow from doing its work. We need to be careful, unless the object is immutable.

In case of Hibernate or JPA or similar frameworks this can be specially interesting, even with eager message calculation. Accessing attributes of the object can actually lead to database operations. They can fail. They can create load, maybe deadlocks. They can have lost their transaction. A lot of things can happen in places far away from where we assumed to do the DB-work. This actually changes the objects. Do we want such operations to occur during logging? Maybe differently depending on the log level? Immutability is our friend, especially in conjunction with JPA, but that is a long story. We may at least be lucky that we actually have some tables that we only read. We can make the objects „pseudo-immutable“, but still JPA-magic must mutate them at least during the read operation. It is tempting to let tools generate the toString-methods of objects, but it is very dangerous here. We should avoid including any potentially lazily loaded attributes in the toString-output, because otherwise they will be loaded during the logging or even worse differently depending on how we log.

The next thing is the NullPointerException during logging. It is quite common in Java, for example. And we do not want to burden our program logic with NullPointerExceptions from the logging, especially not with those that occur only sporadically. So it is a good idea to be careful and to test well. Only the combination is possibly good enough.

Modern times create more demand for some kind of real multi threading, not in the JEE-sense with a couple of EJBs that can run in parallel, but with massively parallel operations. Even though we have a multitude of logging frameworks and unifying logging frameworks and even more of them, there is a common weakness that they tend to share. Writing into one single target is achieved by some kind of synchronizing, which can slow our application down and change the timing behavior in ways that we did not desire. Asynchronous logging could be good, but in a way this only shifts the problem a bit.

Share Button

Functional Programming: Article in another Blog

I wrote a Guest Blog Article about Functional Programming on Adesso’s Blog in German.

Share Button

Testing and Micro Components

It is an interesting trend to create micro components, micro services and use building blocks that are very easy to understand. As always, a new approach will not eliminate all problems. The problem that we are solving has an inherent complexity that we cannot beat. But usually we create a lot of complexity that would not be necessary to solve the problem and this can be avoided. Any approach that really helps us here is welcome. If and how micro components and micro services can help us, is another issue, I will just assume that they are in use.

Think of constructing a building. Now this is a very complex task, but we simplified it by building Lego pieces instead. They are simple, easy to understand, well tested and we just have to compose them to get the building. The building does not become trivial buy that, but maybe it becomes easier to create it.

In the end we want to sell a building. The fact that it is constructed of ISO and TÜV and CE and whatever certified lego blocks is not really relevant for the users of the building. They actually want a building, not lego blocks. Maybe they do not even care how it has been made, as long as it is good. They should not care, but usually they do at least a bit. We do not live in an ideal world and I would care too.

Now the building has to fit into the city. It has a function to play. It is itself a lego block for building the city. But for the moment we are actually only doing one building, not a whole city. So we need to provide tests that it works with its environment and provides the functionality that we expect. There is an API and an API-contract. We need to test this API in the way it will be used in production. That is on Linux, if the productive servers run Linux, not on MS-Windows. But Linux is anyway better for developement than MS-Windows, unless we are working on MS-specific technologies, where the servers will run MS-Windows anyway. We have to test with the database product that is used in production, for example PostgreSQL or Oracle. Tests with In-Memory-DBs or products like sqlite are interesting and maybe helpful for finding bugs by having tests that run presumably faster or with less overhead, but there are ways to provide the real DB to each developer and the overhead is not much, once this has been prepared. And a local Oracle instance is not much slower than an in-memory-DB, but allows us to inspect and change data with SQL, which can be very useful. And if we use application servers or middleware, it is mandatory to test with the same middleware as in production, no matter how much compatibility between the implementations of different vendors has been promised. All of these issues are in theory not existent, but I have seen them in practice and that is what counts. Yes, I love theory. 🙂

The other thing is that the fact that the lego pieces have been tested does not guarantee that the whole building will work. The composition creates something special, that is more than the sum of its pieces. That is what we want to achieve. So we need to test it.

So it is good to have automated tests that run against the service using the API that is used in production, possibly REST or SOAP, for example. And to run the test against an instance of the application that runs on the same kind of server against the same kind of database and on the same kind of middleware as in production. At least continuous integration should work like that. The smaller our components are, the more important it becomes to test them in conjunction. Problems will occur there, even though the components by themselves seem to work perfectly.

Now there is demand for running more local tests, unit tests, as they are commonly called, to actually test lego blocks or small groups of lego blocks. It is good to locate problems, because the end-to-end-tests just provide the information that there is a bug, but it is hard to locate. And without thorough testing of the sub components it is possible that we overlook edge cases on the overall tests that will anyway hit us in production, because only the more local tests can explore the boundaries that are visible to them, but obfuscated when accessed indirectly. Also we do not want to be happy with two bugs that cancel each other, because they will not always work together in our favor. So in the end of the day it is important to have automated tests at all levels. And to make them an important issue.

I have seen many projects, where Unit-Tests eroded. They were once done, but did not work any more and were impossible or hard to fix, off course lacking the time to do so. Or the software was hiding access points that were needed for testing. Just two examples for this, for our Java friends: methods can be package private and test classes in the same package to allow testing. And application servers need to actually allow certain remote access, usually via REST or SOAP these days. And we need remote debugging access. This is all easy to achieve, but can be made hard by company policies, even on development workstations and continuous integration servers.

Share Button

Comments

I like comments that really address the content that I have written, no matter if they agree or disagree with me.

But I am getting a lot of spam these days and comments that advertise pharmaceutical products, credits, dating sites or other non-related issues are spam, when applied to this site, unless we discuss the IT of their web shops, and I delete them. Generally I will delete comments that do not contain content referring to the article, but only consist of links. I like languages and I speak German, English, Swedish, Norwegian, Spanish, Russian and French, but I will usually delete comments in languages other than the article, if their text is not obviously an interesting reference to the article. Even comments like „great blog“ with no content, however nice they are, do not create any value for the readers and will not be released.

Share Button

Clojure-Art

It is an interesting idea to generate colorful images using or music. In both areas Clojure seems to be quite attractive. Not having explored the music side, I did find the idea of creating images fun and inspiring. It also shows us something about the functions we are working with, if we learn to read the images right, but that will come or not, depending on the circumstances. It is useful not to be too scared of some mathematics when reading this.

Now the challenge is to create an image on a two dimensional array of points, for example 1000×1000 pixel, with x- and y-coordinates ranging from 0 to 999. Each pixel needs to be colored. While it is very interesting to explore different color models, we can for simplicity assume that we need 3 numbers each ranging from 0 to 255 for the red, green and blue channels. This is how most displays work, more or less. Now the goal is to create something that looks good. And off course is reasonable to program, otherwise we could just color one million points individually using for example GIMP, but a million is a lot.

Now we can apply any function on x and y and play around with functions like exp, log, sqrt, sin, cos, tan, sec, csc, sinh, … and of course the basic operations +, -, * and /. It turns out that in most cases we do not get interesting images, but experience will show what is promising to explore. I tried to create pictures by keeping the three channels fairly independent, but this did not work so well. It seems that it is better to keep some connection. One approach that actually works quite well is to consider the pair (x,y) as a complex number z = x+iy and to apply just one complex function on it, again exp, log, sqrt, sin, …. are good building blocks. Now these complex functions have a tendency to grow to infinity somewhere. While real functions can avoid this issue by constraining themselves just to one strait line on the plane, complex functions almost have to go to infinity somewhere. By making the square small enough or by changing the scale we can avoid this, but it imposes quite severe constraints. The Riemann Sphere allows us to map any complex number to a point on the surface of a sphere. With some scaling we can already get to RGB-space and get coordinates that are using, but not exceeding the desired range. There are more ways to visualize complex numbers, but this is a possibility worth exploring.

Another way is to just use functions that calculate a real number and to apply a \sin to it. With some shifting and scaling the values will be between 0 and 255 only and there are nor abrupt changes in color, unless the function we calculated is very steep or very chaotic. Using phase shifts by \frac{2\pi}{3} and \frac{4\pi}{3} the three color channels can be served and we get nice rainbow-waves like the following:

Clojure Art: angle + log(r)

Clojure Art: angle + log(r)

Another experiment was to just assume the HSV-model and to calculate the colors from assuming the function is the H-part. But this ended up looking like plastic and I did not like it too much.

An important issue to observe is that functions may end up in exceptions. I wrapped the functions, so that they do not stop the calculation of the image half way through, but instead provide default values in cases where an exceptions occured.

It can also be fun to explore bitwise-functions like bitxor or even functions like the p-adic exponential function, which yields totally different kind of images.

I have put some of the code from my experiments into Github and licensed it with the GPL, so you can use it as a starting point. Others have worked with this as well, for example Clojure Art on Tumblr, Clojure Art Collective on github, another „clojure art“ on github or creative computing with clojure on O’Reilly’s blog.

Enjoy it and learn some Clojure. I sometimes use this when teaching Clojure.

Share Button

DB Persistence without UPDATE and DELETE

When exploring the usage of databases for persistence, the easiest case is a database that does only SELECT. We can cache as much as we like and it is more or less the functional immutable world brought to the database. For working on fixed data and analyzing data this can sometimes be useful.

Usually our data actually changes in some way. It has been discussed in this Blog already, that it would be possible to extend the idea of immutability to the database, which would be achieved by allowing only INSERT and SELECT. Since data can correlate, an INSERT in a table that is understood as a sub-entity via a one-to-many-relationship by the application actually is mutating the containing entity. So it is necessary to look at this in terms of the actual OR-mapping of all applications that are running on that DB schema.

Life can be simple, if we actually have self contained data as with MongoDB or by having a JSON-column in PostgreSQL, for example. Then inter-table-relations are eliminated, but off course it is not even following the first normal form. This can be OK or not, but at least there are good reasons why best practices have been introduced in the relational DB world and we should be careful about that. Another approach is to avoid the concept of sub entities and only work with IDs that are foreign keys. We can query them explicitly when needed.

An interesting approach is to have two ID-columns. One is an id, that is unique in the DB-table and increasing for newly created data. One is the entity-ID. This is shared between several records referring to different generations of the same object. New of them are generated each time we change something and persist the changes and in a simple approach we just consider the newest record with that entity-ID valid. It can off course be enhanced with validFrom and validTo. Then each access to the database also includes a timestamp, usually close to current time, but kept constant across a transaction. Only records for which validFrom <= timestamp < validTo are considered, and within these the newest. The validFrom and validTo can form disjoint intervals, but it is up to the application logic if that is needed or not. It is also possible to select the entry with the highest ID among the records with a given entityID and timestamp-validTo/From-condition. Deleting records can be simulated by this as well, by allowing a way to express a "deleted" record, which means that in case we find this deleted record by our rules, we pretend not having found anything at all. But still referential integrity is possible, because the pre-deletion-data are still there. This concept of having two IDs has been inspired by a talk on that I saw during Clojure Exchange 2017: Immutable back to front.

Share Button

2017 — Happy New Year

Sylvester in Zürich New Years Eve in Zurich (C) Karl Brodowsky 2016

Sylvester in Zürich
New Years Eve in Zurich


Let’s do it with Clojure this time:

(let [ texts [ "Akemashite omedetô!" "Ath bhliain faoi mhaise!" "Boldog új évet!" "Bonne année!" "Bun di bun an!" "Cung chúc tân xuân!" "Een gelukkig nieuwjaar!" "FELIX SIT ANNUS NOVUS!" "Felice Anno Nuovo!" "Feliz ano novo!" "Feliz año nuevo!" "Feliĉan novan jaron!" "Frohes neues Jahr!" "Gelukkig nieuwjaar!" "Gleðilegt nýtt ár!" "Godt Nyttår!" "Godt nytår!" "Gott nytt år!" "Gott nýggjár!" "Happy New Year!" "Hääd uut aastat!" "Laimingų naujųjų metų!" "Laimīgu Jauno gadu!" "Laimīgu jauno gadu!" "Lokkich nijjier!" "Nav varsh ki subhkamna!" "Naya barsa ko hardik shuvakamana!" "Onnellista uutta vuotta!" "Próspero ano novo!" "Sala we ya nû pîroz be!" "Selamat tahun baru!" "Shnorhavor nor tari!" "Srechno novo leto!" "Sretna nova godina!" "Subho nababarsho!" "Sugeng warsa enggal!" "Szczęśliwego nowego roku!" "Un an nou fericit!" "Yeni yılınız kutlu olsun!" "¡Próspero año nuevo!" "Šťastný nový rok!" "Καλή Χρονια!" "Весёлого нового года!" "С новым годом!" "Срећна нова година!" "Среќна нова година!" "Честита нова година!" "Щасливого нового року!" "السنة الجديدة المبتهجة" "سال نو مبارک" "عام سعيد" "สวัสดีปีใหม่" "うれしい新しい年" "新年好" "새해 복 많이 받으세요" ] ]
(reduce (fn [ x y ] (str x " — " y)) (shuffle texts)))


Próspero ano novo! — Godt Nyttår! — السنة الجديدة المبتهجة — Sretna nova godina! — Καλή Χρονια! — 新年好 — Felice Anno Nuovo! — Cung chúc tân xuân! — Een gelukkig nieuwjaar! — Gelukkig nieuwjaar! — Gott nýggjár! — Un an nou fericit! — سال نو مبارک — Subho nababarsho! — Akemashite omedetô! — Ath bhliain faoi mhaise! — Lokkich nijjier! — Sugeng warsa enggal! — Весёлого нового года! — Naya barsa ko hardik shuvakamana! — 새해 복 많이 받으세요 — Laimīgu Jauno gadu! — Šťastný nový rok! — Godt nytår! — Shnorhavor nor tari! — FELIX SIT ANNUS NOVUS! — うれしい新しい年 — Gleðilegt nýtt ár! — Onnellista uutta vuotta! — Hääd uut aastat! — Feliz año nuevo! — Szczęśliwego nowego roku! — Frohes neues Jahr! — Bun di bun an! — Feliĉan novan jaron! — Честита нова година! — Laimingų naujųjų metų! — Sala we ya nû pîroz be! — Gott nytt år! — Happy New Year! — Bonne année! — С новым годом! — عام سعيد — Nav varsh ki subhkamna! — Selamat tahun baru! — ¡Próspero año nuevo! — Laimīgu jauno gadu! — Срећна нова година! — Srechno novo leto! — สวัสดีปีใหม่ — Boldog új évet! — Yeni yılınız kutlu olsun! — Среќна нова година! — Feliz ano novo! — Щасливого нового року!

Share Button

Lazy Collections, Strings or Numbers

The idea is, that we have data that is obtained or calculated to give us on demand as much of it as we request. But it is not necessarily initially present. This concept is quite common in the functional world, where we in a way hide the deprecated concept of state in such structures, by the way in a way that lets use retain the benefits that led to the desire for statelessness.

Actually the concept is quite old. We have it for I/O in Unix and hence in Linux since the 1970ies. „Everything is a file“, at least as long as we constrain ourselves to a universal subset of possible file operations. It can be keyboard input, a named or anonymous pipe, an actual file, a TCP-connection, to name the most important cases. These are „lazy“ files, behave more or less like files as far as sequential reading is concerned, but not for random access reading. The I/O-concept has been done in such a way that it takes the case into account that we want to read n bytes, but get only m < n bytes. This can happen with files when we reach their end, but then we can obtain an indication that we reached the end of the file, while it is perfectly possible that we read less then we want in one access, but eventually get \ge n bytes including subsequent reads. Since the API has been done right, but by no means ideal, it generalizes well to the different cases that exist in current OS environments.

We could consider a File as an array of bytes. There is actually a way to access it in this way by memory-mapping it, but this assumes a physically present file. Now we could assume that we think of the array as a list that is optimized for sequential access and iterating, but not for random access. Both list types actually exist in languages like Java. Actually the random access structure can be made lazy as well, within certain constraints. If the source is actually sequential, we can just assume that the data is obtained up to the point where we actually read. The information about the total length of the stream may or may not be available, it is always available somehow in the case of structures that are completely available in memory. This random access on lazy collections works fine if the reason of laziness is to actually save us from doing expensive operations to obtain data that we do not actually need or to obtain them in parallel to the computation that processes the data. But we loose another potential drawback in this case. If the data is truly sequential, we can actually process data that is way beyond our memory capacity.

So the concept transfers easily from I/O-streams to lists and even arrays, most naturally to iterables that can be iterated only once. But we can easily imagine that this also applies to Strings, which can be seen a sequence of characters. If we do not constrain us to what a String is in C or Java or Ruby, but consider String to be a more abstract concept, again possibly dropping the idea of knowing the length or having a finite length. Just think of the output of the Unix command „yes“ or „cat /dev/zero“, which is infinite, in a theoretical way, but the computer won’t last forever in real life, off course. And we always interrupt the output at some time, usually be having the consumer shut down the connection.

Even numbers can be infinite. For real numbers this can happen only after the decimal point, for p-adic numbers it happens only before the decimal point, if you like to look into that. Since we rarely program with p-adic numbers this is more or less an edge case that is not part of our daily work, unless we actually do math research. But we could have integers with so many digits that we actually obtain and process them sequentially.

Reactive programming, which is promoted by lightbend in the Reactive Manifesto relies heavily on lazy structures, in this case data streams. An important concept is the so called „backpressure“, that allows the consumer to slow down the producer, if it cannot read the data fast enough.

Back to the collections, we can observe different approaches. Java 8 has introduced streams as lazy collections and we need to transform collections into streams and after the operation a stream back into a collection, at least in many real life situations. But putting all into one structure has some drawbacks as well. But looking at it from an abstract point of view this does not matter. The java8-streams to not implement a collection interface, but they are lazy collections from a more abstract point of view.

It is interesting that this allows us to relatively easily write nested loops where the depth of the nesting is a parameter that is not known at compile time. We just need a lazy collections of n-tuples, where n is the actual depth of the nesting and the contents are according to what the loops should iterate through. In this case we might or might not know the size of the collection, possibly not fitting into a 32-bit-integer. We might be able to produce a random member of the collection. And for sure we can iterate through it and stop the iteration wherever it is, once the desired calculation has been completed.

Share Button