Project-local Libraries

Many projects have these „project-local“ or „company-local“ libraries, that can be used optionally or are even strongly imposed on developers. They may be called something like

  • core
  • toolkit
  • toolbox
  • base
  • platform
  • utils
  • lib
  • common
  • framework
  • baselib
  • sdk
  • tools

or even with a meaningful good name.

Generally there is nothing wrong with such libraries, if they are used correctly.

Some things should be observed, though: Such libraries are there to be useful. If that is not the case, it is better not to use them and to deprecate and discard them. Quite a lot of projects suffer from the fact that inferior local libraries are imposed on them and have to be used. The strive for „consistency“ is not bad either, but it should be kept to a level at which it is useful and not become a primary goal by itself.

So when need for a functionality arises, an existing library might cover this well enough. If it is not too hard to integrate, there is little need to write a „local library functionality“ for this. If no good library can be found or reasonably be integrated, it is a good idea to write what is needed oneself. It should be observed, that such a library should have slightly higher quality standards and very good automated testing, because it is more universal and the actual and future usage cannot be anticipated as easily as with „business code“. There are places, where it is good to impose a local library, to make sure that certain fields are validated in the same way across the software or even across the organization, for example. For data like phone numbers, dates, email addresses etc. that are commonly used in our world, it is probably possible to find good libraries that do this much better than any local library with reasonable team effort could do.

Now the world moves and we might discover better replacements for parts of the local library. It is a good idea to move on to these better replacements.

Things that can go into a local library are small pieces of business logic that need to be used everywhere. Think of an insurance company. Customers have customer numbers. They have to be entered into web application forms and they have to be checked for correctness, they have to be formatted etc. This could go into a library and we could be sure that everyone in the larger project uses the same rules for what is a valid customer number and how to format and parse them. It is a bit too hard to implement it again and again, because tiny errors sneak in, for example into the code to check for validity with some check sum, but it is too easy to put it into a service that can be called to check formal validity. Such a service will be there to check if some formally correct customer number is actually a real customer’s number, maybe with constraints on who can see this for which customer. And of course to generate new customer numbers, when needed.

It is always good to ask twice if certain functionality should rather sit in a service that can be called or in a library, of course depending also on the local architecture preferences. In cases where the functionality is useful for others in the organization, but where the exact same behavior is less important, it can also be an option to just copy the code to different teams. But this should be used with care, because it means that improvements do not easily flow back to benefit other users of the functionality.

Generally it is an observation that organizations rarely have good „local libraries“. The good libraries are found on the internet. Or only exist within the team. And the bad company libraries are often forced on those who cannot find ways to opt out. Good team libraries that are not known outside of the team can sometimes be a loss for the organization, because things are done twice or even worse are done differently where it would be good to have the same behavior.

Share Button

Testing and Micro Components

It is an interesting trend to create micro components, microservices and use building blocks that are very easy to understand. As always, a new approach will not eliminate all problems. The problem that we are solving has an inherent complexity that we cannot beat. But usually we create a lot of complexity that would not be necessary to solve the problem and this can be avoided. Any approach that really helps us here is welcome. If and how micro components and micro services can help us, is another issue, I will just assume that they are in use.

Think of constructing a building. Now this is a very complex task, but we simplified it by building Lego pieces instead. They are simple, easy to understand, well tested and we just have to compose them to get the building. The building does not become trivial buy that, but maybe it becomes easier to create it.

In the end we want to sell a building. The fact that it is constructed of ISO and TÜV and CE and whatever certified lego blocks is not really relevant for the users of the building. They actually want a building, not lego blocks. Maybe they do not even care how it has been made, as long as it is good. They should not care, but usually they do at least a bit. We do not live in an ideal world and I would care too.

Now the building has to fit into the city. It has a function to play. It is itself a lego block for building the city. But for the moment we are actually only doing one building, not a whole city. So we need to provide tests that it works with its environment and provides the functionality that we expect. There is an API and an API-contract. We need to test this API in the way it will be used in production. That is on Linux, if the productive servers run Linux, not on MS-Windows. But Linux is anyway better for developement than MS-Windows, unless we are working on MS-specific technologies, where the servers will run MS-Windows anyway. We have to test with the database product that is used in production, for example PostgreSQL or Oracle. Tests with In-Memory-DBs or products like sqlite are interesting and maybe helpful for finding bugs by having tests that run presumably faster or with less overhead, but there are ways to provide the real DB to each developer and the overhead is not much, once this has been prepared. And a local Oracle instance is not much slower than an in-memory-DB, but allows us to inspect and change data with SQL, which can be very useful. And if we use application servers or middleware, it is mandatory to test with the same middleware as in production, no matter how much compatibility between the implementations of different vendors has been promised. All of these issues are in theory not existent, but I have seen them in practice and that is what counts. Yes, I love theory. 🙂

The other thing is that the fact that the lego pieces have been tested does not guarantee that the whole building will work. The composition creates something special, that is more than the sum of its pieces. That is what we want to achieve. So we need to test it.

So it is good to have automated tests that run against the service using the API that is used in production, possibly REST or SOAP, for example. And to run the test against an instance of the application that runs on the same kind of server against the same kind of database and on the same kind of middleware as in production. At least continuous integration should work like that. The smaller our components are, the more important it becomes to test them in conjunction. Problems will occur there, even though the components by themselves seem to work perfectly.

Now there is demand for running more local tests, unit tests, as they are commonly called, to actually test lego blocks or small groups of lego blocks. It is good to locate problems, because the end-to-end-tests just provide the information that there is a bug, but it is hard to locate. And without thorough testing of the sub components it is possible that we overlook edge cases on the overall tests that will anyway hit us in production, because only the more local tests can explore the boundaries that are visible to them, but obfuscated when accessed indirectly. Also we do not want to be happy with two bugs that cancel each other, because they will not always work together in our favor. So in the end of the day it is important to have automated tests at all levels. And to make them an important issue.

I have seen many projects, where Unit-Tests eroded. They were once done, but did not work any more and were impossible or hard to fix, of course lacking the time to do so. Or the software was hiding access points that were needed for testing. Just two examples for this, for our Java friends: methods can be package private and test classes in the same package to allow testing. And application servers need to actually allow certain remote access, usually via REST or SOAP these days. And we need remote debugging access. This is all easy to achieve, but can be made hard by company policies, even on development workstations and continuous integration servers.

Share Button

Micro-Komponenten-Antipattern

Eine größere Software muss man sicher strukturieren, sonst baut man sich ein Monster.

Nun kann man versuchen, Komponenten zu definieren, die so klein wie nur möglich sind. Die Komplexität innerhalb der Komponenten wird dadurch reduziert und überschaubar. Aber man bekommt ein Problem, weil die Menge der Komponenten dabei zu groß wird und man eine sehr hohe Komplexität bekommt, diese vielen Komponenten miteinander zu verknüpfen.

Man sieht es bei Büchern, bei denen die Gliederung gut gelungen ist, dass es eine Hierarchie von Gliederungen gibt und in jeder Ebene findet man etwa 2-10 Untereinträge zu den Einträgen aus der nächst höheren Ebene, nicht aber 50 Kapitel, die nicht in irgendeiner Form gruppiert sind.

Solche Ebenen wie „Teil“, „Kapitel“, „Abschnitt“… hat man in der Software-Architektur auch zur Verfügung, wobei es von der Technologie abhängt, welche Hierarchieebenen für dieses Gliederung und Strukturierung zur Verfügung stehen, z.B. Methode – Klasse – Package – Library – Applikation und man sollte sie mit Bedacht einsetzen. Was gehört zusammen und was nicht?

Es gibt aber noch einen anderen Aspekt. Oft verbaut man sich durch zu feingranulare Aufteilung Wege.

Ein Beispiel: Es soll eine Multiplikation von Matrizen mit kmoplexen Zahlen als Elementen implementiert werden. Nun ist es sehr elegant, die Matrizenmulitplikation einmal zu implementieren und dabei einen abstrakten numerischen Typ zu verwenden. Für jeden dieser numerischen Typen implementiert man die Grundoperationen.

Dies kann aber in Bezug auf die Perforamnce und auch in Bezug auf die Rechengenauigkeit beim Arbeiten mit Fließkommazahlen zu Problemen führen. Es lassen sich viel performantere Algorithmen für diese Matrizenmultiplikation finden, wenn man sie speziell für komplexe Zahlen schreibt und auch auf die Realteile und Imagniärteile der Matrizenelemente zugrifen kann. Besonders tückisch sind aber auch die Rundungsfehler. Um Rundungsfehler zu verringern muss man auf die Kalkulationen Zugriff haben und deren Reihenfolge und Assoziierung steuern können.

Hier ein Beispiel:

a=3.0
b=4.0
c=5e30
d=-5e30

Berechnet man nun (a+b)+(c+d), erhält man 7.0, aber mit (a+c)+(b+d) erhält man 0.0.
In irb (ruby) sieht es etwa so aus:


$ irb
irb(main):001:0> a=3.0
=> 3.0
irb(main):002:0> b=4.0
=> 4.0
irb(main):003:0> c=5e30
=> 5.0e+30
irb(main):004:0> d=-5e30
=> -5.0e+30
irb(main):005:0> (a+b)+(c+d)
=> 7.0
irb(main):006:0> (a+c)+(b+d)
=> 0.0
irb(main):007:0>

Der Fehler kann sich nun natürlich noch beliebig fortpflanzen.

Das ändert aber nichts daran, dass ein auf Polymorphie basierender Ansatz in den allermeisten Fällen der richtige Weg ist, solange man nicht auf die entsprechende Optimierung angewiesn ist.

Es bleibt aber dabei, dass bei einer Aufteilung in Komponenten die richtige Granularität gewählt werden sollte, also keine Microkomponenten, aber auch nicht wahlloses Zusammenfügen von Dingen, die nicht zusammengehören, nur um die richtige Größe der Komponenten zu erreichen.

Share Button