How procurement can create value for IT projects

We know this, in many IT projects we need to make use of services and software and hardware that needs to be bought.

Actually it often makes a huge difference, what kind of deals are made and how efficient the projects can work on this basis.

I will just briefly tell a few stories and tell a bigger story by that.

A common pattern is a „preferred supplier“. It is nice to be a preferred supplier and in the phase when the partner is chosen and the contracts are made, companies often offer their best people and services to show how good it will be later to have them as preferred supplier. And then, when the deal has been fixed, they send the juniors for the same hourly rate and make a lot of profit. Or the price has been made so low, that only the juniors can be sent. This can be a problem in the long run, because it might get really difficult to get enough really good people in order to progress with strategic long term projects and not just maintenance of the daily business. Another interesting pattern can occur, when the preferred supplier is very strong with their employees in a project. Now they are getting some money from the hourly rates and in order to make profit the salaries should be significantly lower than this. In order for this to make sense for the employees they can provide non-monetary incentives, like some kind of career steps. Being in a powerful position in the project, the preferred supplier has some possibilities to choose who is in the project and who gets more responsible positions. So there is a temptation to kick out people who are not from their company and to provide these attractive positions not to the person, who would be the best choice for the customer, but to those whom they want to give an incentive. This is on the expense of their customer. So in the end of the day it is usually good to rely on multiple providers for „external“ people. There are serious companies who behave professionally and correctly even when they have become a „preferred supplier“, but this is not always the case.

When the preferred supplier is providing software, for example a database, it may be possible to get a really good deal for five years. Then in five years the deal needs to be extended and becomes magically more expensive. Especially if the company knows itself in the position of a „preferred supplier“. And when this issue is discovered, maybe even a year before, it is already too late to migrate. And then the expensive software needs to be used for another four years until it is again too late… And being from a big, impressive company does not necessarily make the software good. Counterexamples exist. In the case of databases I have seen companies that follow a strategy of multiple databases and that require a good reason for using the more expensive solutions. And magically the position of the buyer becomes much stronger when the deal needs to be extended for another five years. Maybe the overly expensive database will even be kicked out at some time. And yes, this expensive database has some really cool and pretty unique features. Unfortunately they only come in some enterprise edition that would be even much more expensive than the regular one, while open source databases provide decent, but less sophisticated variants of these enterprise features for a price that is less than the base version of the expensive database product. But, since the DB product cannot easily be changed, it is important to make a wise choice and to consider different options, including the more expensive ones, when starting a project. And to pick what makes sense for the specific needs.

Some interesting observations where made, when some preferred supplier made a really tempting offer for operating all the servers of a larger company. The annual price was really low. Much cheaper than doing it with their own team. Now suddenly the need arose, to store some really large amount of log files. I mean, I am talking about what could be stored on a few USB-discs, that could have been bought in the supermarket for a few hundred Euros. But of course this was forbidden, because the servers had to be run by the preferred supplier and even putting Linux on a few PCs that were no longer needed and attaching a few cheap disks would have been ruled out by the cheap overall contract. But this cheap solution would have been absolutely sufficient for the purpose. Now the diskspace could be bought from this supplier. Or more precisely rented. It was not a few hundred Euros, but a few hundred thousand Euros a year. Yeah, trey needed to make some money somehow… And a few hundred Euros once every few years, or maybe even a few thousand Euros every year would have been totally acceptable to pay by the project. But there are limits. You cannot do certain things under such conditions. The deal kills important possibilities of the IT people. I am not going to write, how this was resolved…

Another story, with a really cheap preferred supplier: They actually ran an important database for a stunningly low fixed base price. And on top of that it was paid per query. So what did they do? They designed the software in such a way that it used an Oracle database as a cache for the pay-per-query DB2 database. So the same query had to be made only once to DB2 as long as the data did not change. And when the data changed, the Oracle database just had to be cleaned up. Since this happened only a few times a year, this technically stupid architecture saved really a lot of money every year. Big money.

Yet another example: The management had already bought clearcase licenses. They were really expensive and the money was already gone. Now the setup that was used for clearcase and that was allowed by the licensing was not really optimized for part of the team working remotely. To do that efficiently would have required a much more expensive license that no one wanted to pay. So every day synchronizing the software took like 30 to 45 minutes. And one team member had to work full time to maintain clearcase. There were some other pains, like it crashed when files contained only linefeeds instead of carriage return-linefeed and some other annoying details that I do not really remember. Just for the record, some of these issues have been fixed in later releases… And clearcase had a lot of really interesting features that were not at all used. The seriously useful features can all be found in git now, in a contemporary way, of course. But not in those days, when there was still neither git nor subversion. So some tests were performed and it looked like the free software CVS (which really sucks big time when compared with contemporary systems like git) would have worked much much better for the concrete project. But clearcase had to be used because it was so expensive and the money had already been paid.

So in the end of the day, when the procurement does make good deals, this can create a lot of value for the project and allow for efficient and innovative work and for solutions that make sense technically instead of finding tricks how ot bypass the worst parts of the contract.

So a good procurement team and a good communication with the technical staff that knows what is needed for their work is a big plus for everybody, for the project and for the company.

Share Button

Just run it twice

Often we use some kind of „clustered“ environment to run our software.

This promises higher performance and better availability.

And the frameworks seem to suggest that it is just a matter of starting it twice and it will magically work correctly.

There is nothing wrong with investing some thoughts on this issue. It can actually quite wrong otherwise…

So some questions to think about:

Where are the data? Does each service have its own set of data? Or do they share the data? Or is there some kind of synchronization mechanism between the copies of data? Or is some data shared and some data as copy for each instance of the service?

Do we gain anything in terms of performance or is the additional power of the second instance eaten up by the overhead of synchronizing data? Or if data is only stored once, does this become the bottleneck?

Then there is an issue with sending the requests to the right service. Usually it is a good idea to use something like „sticky sessions“ to keep a whole session or collections of related requests on one instance. Even if the protocol is „stateless“ and „restful“.

Is there some magic caching that happens automatically, for example in persistence frameworks like Hibernate? What does this mean when running two instances? Do we really understand what is happening? Or do we trust that hibernate does it correctly anyway? I would not trust hibernate (or any other JPA implementation) on this issue.

What about transactions? If storage is not centralized, we might need to do distributed transactions. What does that mean?

Now messaging can become fun. Modern microservice architectures favor asynchronous communication over synchronous communication, where it can be applied. That means some kind of messaging or transmission of „events“ or whatever it is called. Now events can be subscribed. Do we want them to be processed exactly once, at least once or by every instance? How do we make sure it is happening as we need it? Especially the „exactly once“-case is tricky, but of course it can be done.

How do we handle tasks that run like once in a certain period of time, like cronjobs in Linux? Do we need to enforce that they run exactly once or is it ok to run them on each instance? If so, is it ok to run them at the same time?

Do we run the service multiple times on the productive system, but only a single instance on the test and development systems?

Running the service twice or more times is of course something we need to do quite often and it will become more common. But it is not necessarily easy. Some thinking needs to be done. Some questions need to be asked. And we need to find answwers to them. Consider this as a starting point for your own thinking processes for your specific application landscape. Get the knowledge, if it is not yet in the team by learning or by involving specialists who have experience…

Share Button

Project-local Libraries

Many projects have these „project-local“ or „company-local“ libraries, that can be used optionally or are even strongly imposed on developers. They may be called something like

  • core
  • toolkit
  • toolbox
  • base
  • platform
  • utils
  • lib
  • common
  • framework
  • baselib
  • sdk
  • tools

or even with a meaningful good name.

Generally there is nothing wrong with such libraries, if they are used correctly.

Some things should be observed, though: Such libraries are there to be useful. If that is not the case, it is better not to use them and to deprecate and discard them. Quite a lot of projects suffer from the fact that inferior local libraries are imposed on them and have to be used. The strive for „consistency“ is not bad either, but it should be kept to a level at which it is useful and not become a primary goal by itself.

So when need for a functionality arises, an existing library might cover this well enough. If it is not too hard to integrate, there is little need to write a „local library functionality“ for this. If no good library can be found or reasonably be integrated, it is a good idea to write what is needed oneself. It should be observed, that such a library should have slightly higher quality standards and very good automated testing, because it is more universal and the actual and future usage cannot be anticipated as easily as with „business code“. There are places, where it is good to impose a local library, to make sure that certain fields are validated in the same way across the software or even across the organization, for example. For data like phone numbers, dates, email addresses etc. that are commonly used in our world, it is probably possible to find good libraries that do this much better than any local library with reasonable team effort could do.

Now the world moves and we might discover better replacements for parts of the local library. It is a good idea to move on to these better replacements.

Things that can go into a local library are small pieces of business logic that need to be used everywhere. Think of an insurance company. Customers have customer numbers. They have to be entered into web application forms and they have to be checked for correctness, they have to be formatted etc. This could go into a library and we could be sure that everyone in the larger project uses the same rules for what is a valid customer number and how to format and parse them. It is a bit too hard to implement it again and again, because tiny errors sneak in, for example into the code to check for validity with some check sum, but it is too easy to put it into a service that can be called to check formal validity. Such a service will be there to check if some formally correct customer number is actually a real customer’s number, maybe with constraints on who can see this for which customer. And of course to generate new customer numbers, when needed.

It is always good to ask twice if certain functionality should rather sit in a service that can be called or in a library, of course depending also on the local architecture preferences. In cases where the functionality is useful for others in the organization, but where the exact same behavior is less important, it can also be an option to just copy the code to different teams. But this should be used with care, because it means that improvements do not easily flow back to benefit other users of the functionality.

Generally it is an observation that organizations rarely have good „local libraries“. The good libraries are found on the internet. Or only exist within the team. And the bad company libraries are often forced on those who cannot find ways to opt out. Good team libraries that are not known outside of the team can sometimes be a loss for the organization, because things are done twice or even worse are done differently where it would be good to have the same behavior.

Share Button

Phone Numbers and E-Mail Addresses

Most data that we deal with are strings or numbers or booleans and combinations of these into classes and collections. Dates can be expressed as string or number, but have enough specific logic to be seen as a fourth group of data. All these have interesting aspects, some of which have been discussed in this blog already.

Now phone numbers are by an naïve approach numbers or strings, but very soon we see that they have their own specific aspects. The same applies for email addresses which can be represented as strings.

Often projects go by their own „simplified“ specification of what an email address or a phone number is, how to parse, compare and render them. In the end of the day the simplification is harder to tame than the real solution, because it needs to be maintained and specified by the project team rather than being based on a proven library. And once in a while „edge cases“ occur, that cannot be ignored and that make the „home grown“ library even more complex.

Behind phone numbers and email addresses there are well defined and established standards and they are hard to understand thoroughly within the constrained time budget of a typical „business project“, because the time should be allocated to enhancing the business logic and not to reinventing the basics. Unless there is a real need to do so, of course.

Just to give an idea: When phone numbers are parsed or provided by user input, they can start with a „+“ sign or use some country specific logic to express, to which country they belong. And then the „+1“, for example, does not stand for the United States alone, but also for Canada and some smaller countries that are in some way associated with the United States or Canada. Further analysis of the number is required to know about that. The prefix for international number is often „00“, but in the United States it is „011“ and there were and are some other variants, that are still frequently used. Some people like to write something like „+49(0)431 77 88 99 11 1“ instead of „+49 431 77 88 99 11 1“. We can constrain the input to the variants we happen to think of and force the supplier of data to comply, but why bother? Why not accept legitimate formats, as long as they are correct and unambiguous?

Now for E-Mail-addresses there is the famous one page regular expression to recognize correct email addresses which is even by itself not totally complete. Find it at the bottom of the article…

Of course it includes some rarely used variants of email addresses that were once used and have not been completely abolished officially, but it is hard to draw and exact border for this.

So the general recommendation is to find a good library for working with email addresses and phone numbers. Maybe the library can even to some extent eliminate input strings that are formally complying the format, but know to be incorrect by knowing about numbering schemes world wide or about email domains or even by performing lookups.

Another strong recommendation is to store data like email addresses and phone numbers in a technical format, that is in the example of phone numbers always starting with a „+“ followed by digits only. For input any positioning of spaces is accepted, for output the library knows how to format it correctly. This allows selecting by the numbers without dealing with complex formatting, by just using the technical format in the query as well.

For Java (and thus for many JVM-languages), C++ and JavaScript there is an excellent library from Google for dealing with phone numbers. For E-Mails something like apache commons email validator is a way to go.

Keep in mind that for E-Mail addresses and phone numbers, the ultimate way of verification is to send them a link or a code that they need to enter. In the end of the day it is insufficient to rely only on formal verification without this final step.

But still issues remain for transforming data into a canonical technical format for storing them, formatting data for display etc. And there is a huge added value, if we can reliably recognize formally false entries early, when the user can still easily react to it, rather than waiting for an email/SMS/phone call being processed, which may fail when the user is no longer on our „registration site“. And we can process data which has already been verified by a third party, but still we want to parse it to recognize obvious errors.

The concrete libraries may be outdated by the time you are reading this, or they may not be applicable for the language environment that you are using, but please make an effort to find something similar.

So, please use good libraries, that are like to be found for the environment that you are using and write yourself what creates value for your project or organization. Unless your goal is really to write a better library. Better invest the time into areas where there are still no good libraries around.

And as always, you may understand email addresses and phone numbers as an example for a more general idea.


E-Mail Regex


(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*)

Share Button

Devoxx UA and Devoxx BE 2019

In 2019 I visited Devoxx UA in Kiev and Devoxx BE in Antwerp.
Traveling was actually a little story by itself, so for now we can just assume that I magically was at the locations of DevoxxUA and DevoxxBE.

In Kiew I attended the following talks:

On Wednesday I attended the following talks in Antwerp:

On Thursday I attended the following talks in Antwerp:

On Friday I attended the following talks in Antwerp:

That’s it…
As always, a lot of these topics deserve an article in this blog. And a lot of video recordings from the conference are worth viewing.


Share Button


Microservices are a Hype, but they have their pros and cons. Sometimes people say, that this is the magic tool to solve all problems. They hear it on conference talks, read it in the internet or even in books. It is not the first time and it won’t be the last time that we hear a promise like that. It was promised with a lot of new or newly sold technologies, like object oriented languages, scripting languages, functional languages, spiral instead of waterfall, agile instead of rup, client server instead of dumb terminals or web applications instead of fat clients or fat javascript clients instead of serverside rendered pages, cloud technologies, serverless, zero-something and now microservices. And of course a lot more. Usually we get promised an efficiency gain by a factor of two or three. So we should by now be million times faster in developing a functionality than in the good old days with assembly language on punch cards. Are we?

Such promises should not stop us from being critical and analyzing pros and cons.

But in order to achieve any benefit from micro services it is crucial to understand them well enough and to apply the concept well enough. Badly implemented micro service architectures just add the disadvantages of microservices to what we already have.

It is often heard that microservices should be made so small that it is „trivial“ to rewrite them. That may be hard to achieve and there are some good reasons why.

If we make microservices so small that they are easy to rewrite and we need only a few of them, probably our application is so trivial that we should question if it is at all necessary to split it into microservices or if we should rather build a monolith and structure it internally. On the other hand, if we have a big application, in the best case it can be combined of a huge number of such „trivial“ microservices. And we get a lot of complexity in the combination. Just imagine you are getting a house built. And the construction company just dumps a few truckloads of lego blocks in the construction site. They are all well designed, well tested, high quality and you just need to plug them together. I cannot imagine that this will ever be trivial for a huge number of services and a non-trivial application.

Another problem is that there are typically one or more spots of the application where the real complexity resides. This is not a complexity made by us, but it is there because of the business logic, no matter how well we look at the requirements, understand them, work on them to get something better that is easier to build. Either these microservices tend to become bigger or they are more connected with other parts of the system than would be desirable for a microservice. So we end up with one or a few relatively fat „micro“services and some smaller ones, the smallest ones actually relatively trivial to rewrite. But we can keep everything non-essential out of these central services and make them at least as simple as reasonably possible.

Now we do have issues. How do the services communicate? How do they share data? By the book each service should have its own database schema. And they should not access each other’s DB schemes, because services should be independent from each other and not be using the DB as integration layer. In practice we see this rule applied, but in the end there is one database server that runs all the schemes for all the microservices. But we could split that up and move the database closer to the service.

Now there is some data that needs to be shared between services. There are several ideas how to accomplish this. The most basic principle is that services should be cut in such a way that little data needs to be shared. But there is also the pattern of having a microservice that operates like a daemon and a companion service that can be used for configuring the daemon service. There are some advantages in splitting this up, because the optimizations are totally different, the deployment patterns are different, the availability requirements are different. And it can be a good idea to build a configuration on the config service, test it on a test system and them publish it to the productive service, when it is complete, consistent and tested. Still these two services closely belong together, so sharing between them is well understood and works well. More difficult is it when it comes to data between really different services. We assume that it has been tried to eliminate such access to data from other services and it only remains for a few cases. There can be data that really needs to be in one place and the same for everybody. We can for example think of the accounts to log in. They should be the same throughout the system and without any delays when changes occur. This can be accomplished by having one service with really good availability. In other cases we can consider one service as the owner of the data and then publish the data to other services via mechanisms like Kafka. This is powerful, but setting up a good and reliable Kafka infrastructure for high performance and throughput is not trivial. When it comes to multimaster data and bidirectional synchronization, it gets really hard. We should avoid that. But it can be done when absolutely needed.

When services talk to each other, we should prefer mechanisms like JMS or Kafka over REST or SOAP calls, because it avoids tight coupling and works better, if one service is temporarily not available. Also we should avoid letting one service wait for the response from another service. Sometimes REST calls are of course the only reasonable choice and then they should be used. We need to address issues as to what happens if the service is not available and how to find the service.

When moving to a microservice architecture it is very easy to „prove“ that this approach „does not work“ or has „only disadvantages“ over a monolithic architecture. But microservices are a legitimate way to create large applications and we should give this approach a chance to succeed if we have decided to follow this road. It is important to embrace the new style, not to try to copy what we would have done in a monolith, but to do things in the microservice way. And it is important to find the right balance, the right distribution and the right places for some allowed exceptions like REST instead of JMS or data sharing.


Share Button



Software often contains a logging functionality. Usually entries one or sometimes multiple lines are appended to a file, written to syslog or to stdout, from where they are redirected into a file. They are telling us something about what the software is doing. Usually we can ignore all of it, but as soon as something with „ERROR“ or worse and more visible stack traces can be found, we should investigate this. Unfortunately software is often not so good, which can be due to libraries, frameworks or our own code. Then stack traces and errors are so common that it is hard to look into or to find the ones that are really worth looking into. Or there is simply no complete process in place to watch the log files. Sometimes the error shows up much later than it actually occurred and stack traces do not really lead us to the right spot. More often than we think logging actually introduces runtime errors, that were otherwise not present. This is related to a more general concept, which is called observer effect, where logging actually changes the business logic.

It is nice that log files keep to some format. Usually they start with a time stamp in ISO-format, often to the millisecond. Please add trailing zeros to always have 3 digits after the decimal point in this case. It is preferable to use UTC, but people tend to stick to local date and time zones, including the issues that come with switching to and from daylight saving time. Usually we have several processes or threads that run simultaneously. This can result in a wild mix of logging entries. As long as even multiline entries stay together and as long as beginning and end of one multiline entry can easily be recognized, this can be dealt with. Tools like splunk or simple Perl, Ruby or Python scripts can help us to follow threads separately. We could actually have separate logs for each thread in the first place, but this is not a common practice and it might hit OS-limitations on the number of open files, if we have many threads or even thousands of actors as in Erlang or Akka. Keeping log entries together can be achieved by using an atomic write, like the write system call in Linux and other Posix systems. Another way is to queue the log entries and to have a logger thread that processes the queue.

Overall this area has become very complex and hard to tame. In the Java world there used to be log4j with a configuration file that was a simple properties file, at least in the earlier version. This was so good that other languages copied it and created some log4X. Later the config file was replaced by XML and more logging frame works were added. Of course quite a lot of them just for the purpose of abstracting from the large zoo of logging frameworks and providing a unique interface for all of them. So the result was, that there was one more to deal with.

It is a good question, how much logic for handling of log files do we really want to see in our software. Does the software have to know, into which file it should log or how to do log rotation? If a configuration determines this, but the configuration is compiled into the jar file, it does have to know… We can keep our code a bit cleaner by relying on program functionality without code, but this still keeps it as part of the software.

Log files have to please the system administrator or whoever replaced them in a pure devops shop. And in the end developers will have to be able to work with the information provided by the logs to find issues in the code or to explain what is happening, if the system administrator cannot resolve an issue by himself. Should this system administrator have to deal with a different special complex setup for the logging for each software he is running? Or should it be necessary to call for developer support to get a new version of the software with just another log setting, because the configurations are hard coded in the deployment artifacts? Interesting is also, what happens when we use PAAS, where we have application server, database etc., but the software can easily move to another server, which might result in losing the logs. Moving logs to another server or logging across the network is expensive, maybe more expensive than the rest of this infrastructure.

Is it maybe a good idea to just log to stdout, maintaining a decent format and to run the software in such a way that stdout is piped into a log manager? This can be the same for all software and there is one way to configure it. The same means not only the same for all the java programs, but actually the same for all programs in all languages that comply to a minimal standard. This could be achieved using named pipes in conjunction with any hard coded log file that the software wants to use. But this is a dangerous path unless we really know what the software is doing with its log files. Just think of what weird errors might happen if the software tries to apply log rotation to the named pipe by renaming, deleting, creating new files and so on. A common trick to stop software from logging into a place where we do not want this is to create a directory with the name of the file that the software usually uses and to write protect this directory and its parent directory for the software. Please find out how to do it in detail, depending on your environment.

What about software, that is a filter by itself, so its main functionality is to actually write useful data to stdout? Usually smaller programs and scripts work like this. Often they do not need to log and often they are well tested relyable parts of our software installation. Where are the log files of cp, ls, rm, mv, grep, sort, cat, less,…? Yes, they do tend to write to stderr, if real errors occur. Where needed, programs can turn on logging with a log file provided on the command line, which is also a quite operations friendly approach. Named pipes can help here.

And we had a good logging framework in place for many years. It was called syslog and it is still around, at least on Linux.

A last thought: We spend really a lot of effort to get well performing software, using multiple processes, threads or even clusters. And then we forget about the fact that logging might become the bottle neck.

Share Button

Forrest roads

In forests we usually find unpaved roads like this:

Forrest Road

Forest Road

Now we should observe how they are constructed. Usually the need to somehow allow access to areas in the forest. It does not have to be the shortest connection, it does not have to be as flat as possible, it is not required to build for high speed and it is not at all required to build for high capacity.

Often it is necessary that trucks and heavy machinery can use the road. So it needs to be constructed in a way that it withstands occasional usage by heavy vehicles, for example to extract wood from the forest or in bad cases to fight a forest fire. And the network usually provides a reasonable way for a truck with a trailer to get out again by driving forward only.

Apart from this there is one very important requirement. The construction must be cheap. It should usually be so cheap that it can easily be paid by a relatively small fraction of the money that is obtained by selling the wood from the forest. There can be a significant secondary use for recreational purposes or even as route through the forest, typically for MTBs and pedestrians, which might justify using money from other sources than the revenue from the forest. Interestingly forests in Switzerland have much denser road networks than forests in Norway or Sweden.

What we never see is forest roads that are somehow prepared for the case that it might be decided in the future to expand them to eight lanes or to transform them into high speed highways. It would be good to use a route that allows this, to already pass hills with cuts or tunnels and to build bridges already much wider than currently necessary. Real highways usually run on a dam or at least they include a thick sequence of layers that often add up to a few meters under the upper asphalt layer. There are „best practices“ about building highways, even relatively narrow highways like this one:

Highway E45 in the Taiga in northern Sweden 2014

Highway E45 in the Taiga in northern Sweden 2014

They are mostly ignored, apart from very universal principals that apply to any kind of construction.

One kilometer of forest road costs a very tiny fraction of one kilometer of such a highway with two lanes.

We should learn from this for our IT solutions. We should think how big our IT solution might actually become. How many customers do we need to serve? What kind of sophisticated functionality needs to be added? Very often we see the mistake that IT solution are built too small. They do not scale, cannot easily be expanded or simply not serve the load or the availability requirements.

Companies like Google. Facebook, Twitter, Netflix or VK serve millions of users 7×24. There is even an implicit promise that there will be no down times and people start relying on this.

We expect banking software to be accurate to the cent. Not sometimes, but always.

Running a device in a chemical plant or steering a rocket requires absolutely reliable software. Errors can be very expensive and cost human lives.

On the other hand there is plenty of software that is useful. Of course it needs to work properly. But the requirements are much lower. A typical app for a mobile phone does not need to be able to run on millions of servers simultaneously. Downtimes during software updates are no problem. Usually a small development team is sufficient to build them. And when it comes to money, the real business logic for this is usually on the server. And we just should not control a chemical plant by a mobile phone app. But mobile phone apps usually have to come at neglectable prices to the users. They either have to pay themselves by the business that they indirectly promote or by a very small amount of money per user, usually combined with a very small number of users.

It is important to really understand the requirements well and build the right size of application. Building too small is very bad, but building too big can be as bad for the project and the organization. And even if the company is lucky and discovers that the software is so attractive that a bigger solution is necessary, then maybe this is a good moment to rewrite it and consider the first version as proof of concept.

Whenever a new highway is built on a route that is previously only covered by a forest road, the highway will usually be built from scratch, ignoring the routing of the old forest road. If the forest road becomes obsolete by that, then the money for the original construction of the forest road is neglectable. Trying to transform a forest road into a highway did happen in many small steps over centuries to create part of our current road network, but it is usually not a recommended approach.

Share Button

DB Persistence without UPDATE and DELETE

When exploring the usage of databases for persistence, the easiest case is a database that does only SELECT. We can cache as much as we like and it is more or less the functional immutable world brought to the database. For working on fixed data and analyzing data this can sometimes be useful.

Usually our data actually changes in some way. It has been discussed in this Blog already, that it would be possible to extend the idea of immutability to the database, which would be achieved by allowing only INSERT and SELECT. Since data can correlate, an INSERT in a table that is understood as a sub-entity via a one-to-many-relationship by the application actually is mutating the containing entity. So it is necessary to look at this in terms of the actual OR-mapping of all applications that are running on that DB schema.

Life can be simple, if we actually have self contained data as with MongoDB or by having a JSON-column in PostgreSQL, for example. Then inter-table-relations are eliminated, but of course it is not even following the first normal form. This can be OK or not, but at least there are good reasons why best practices have been introduced in the relational DB world and we should be careful about that. Another approach is to avoid the concept of sub entities and only work with IDs that are foreign keys. We can query them explicitly when needed.

An interesting approach is to have two ID-columns. One is an id, that is unique in the DB-table and increasing for newly created data. One is the entity-ID. This is shared between several records referring to different generations of the same object. New of them are generated each time we change something and persist the changes and in a simple approach we just consider the newest record with that entity-ID valid. It can of course be enhanced with validFrom and validTo. Then each access to the database also includes a timestamp, usually close to current time, but kept constant across a transaction. Only records for which validFrom <= timestamp < validTo are considered, and within these the newest. The validFrom and validTo can form disjoint intervals, but it is up to the application logic if that is needed or not. It is also possible to select the entry with the highest ID among the records with a given entityID and timestamp-validTo/From-condition. Deleting records can be simulated by this as well, by allowing a way to express a "deleted" record, which means that in case we find this deleted record by our rules, we pretend not having found anything at all. But still referential integrity is possible, because the pre-deletion-data are still there. This concept of having two IDs has been inspired by a talk on that I saw during Clojure Exchange 2017: Immutable back to front.

Share Button

Brass Music and what we can learn for IT

The English term „music“ refers to what we actually listen to, but also to how we write it down on paper, like this:

Music handwritten by Johann Sebastian Bach

Music handwritten by Johann Sebastian Bach

This musical notation is actually like a programming language, because it allows to write down complex musical pieces on paper.

But there is of course more to it than just mechanically playing what is on the paper and dealing with the inconveniences of the musical instruments. What makes it pleasant is the interpretation and that requires skill and intuition and experience and feelings. Since this is not a music blog, I will leave this as it is and stick with the relatively irrelevant side issue of the musical notation language, how we write music.

Generally there might be issues that it is hard to read, because things look too similar, but on the other hand musicians just see it immediately and at least fast enough to work efficiently with it, so I guess that this way of writing music is generally OK.

Now we have the possibility to cover a certain range, slightly more than two octaves, efficiently. Beyond that it will get hard to count the auxiliary lines. To cover different instruments, at least three kinds of Clefs are in use and the same note usually means the same. I think that there are ways to shift the whole system by one ocatve, at least for beginners, but usually with the three clefs that is not necessary for the whole piece of music.

Now for some brass instruments we have different sizes, as for other instruments as well. So the same way to play it yields a different tone on different sizes of the instrument. Just take the recorder, which has five common sizes. They are based on different f and c notes and when you play an f-based recorder you have to adopt to this by playing an f when reading an f-note, for example by closing all wholes. On a c-based recorder you read a c (the deepest you can regularly play) and close all holes to play this. Normally people know this and can deal with this. For brass instruments a different approach was chosen. For situations where you actually want to hear an F and might actually write an F in the notes for one size of the instrument, the larger or smaller instruments just call something an „F“ which is actually not an F at all for the rest of the musical world. So for these instruments „F“ does not mean the tone that you hear, but the grip combination that you do to achieve the tone, simplified. It was supposedly meant to make it easier for relatively unskilled musicians to adapt to different sizes of their instruments, but now even professionals have to live with this.

So they invented a new mechanism to simplify things, which in the overall view makes things a lot more complicated and simplifies just something trivial that even average skilled musicians can easily learn.

I respect the musicians for what they do and I guess since they can deal with this irregularity, it is kind of OK. Or at least up to the musicians to decide if they want to fix this or not.

But we can learn a lot for IT solutions from it.

We often have the situation that we need to adapt a software for another related, but slightly different use case. And we often get the request to simplify things.

It is important to think carefully at which level we do the adaption, so that it will make sense in the long run.

And we should simplify things, but there is no point in trying to make things simpler than they actually are, this simply cannot work an will backfire.

Share Button