Karl Brodowsky's IT-Blog – Seite 3 – IT Sky Consulting GmbH

Christmas 2021

С Рождеством − З Рiздвом Христовим − Prettige Kerstdagen − Feliz Natal − क्रिसमस मंगलमय हो − Nollaig Shona Dhuit! − Buon Natale − Sretan božić − Natale hilare − Честита Коледа − Veselé Vánoce − Bella Festas daz Nadal − Kellemes Karácsonyi Ünnepeket − Crăciun fericit − Zalig Kerstfeest − Wesołych Świąt Bożego Narodzenia − 圣诞快乐 − Vesele bozicne praznike − Glædelig Jul − کريسمس مبارک − クリスマスおめでとう ; メリークリスマス − Gledhilig jól − God Jul − Fröhliche Weihnachten − Feliz Navidad − Su Šventom Kalėdom − Hyvää Joulua − Gleðileg jól − 즐거운 성탄, 성탄 축하 − Срећан Божић − Priecîgus Ziemassvçtkus − Bon nadal − God Jul! − Merry Christmas − καλά Χριστούγεννα − Mutlu Noeller − Selamat Hari Natal − ميلاد مجيد − Vesele Vianoce − Gëzuar Krishtlindjet − Häid jõule − Feliĉan Kristnaskon − Joyeux Noël

This was generated with a Perl Script:

#!/usr/bin/perl


use utf8;

binmode STDOUT, ':utf8';
my $SEP = ' − ';
my @texts = ( 'Bella Festas daz Nadal', 'Bon nadal', 'Buon Natale', 'Crăciun fericit', 'Feliz Natal', 'Feliz Navidad', 'Feliĉan Kristnaskon', 'Fröhliche Weihnachten', 'Gledhilig jól', 'Gleðileg jól', 'Glædelig Jul', 'God Jul!', 'God Jul', 'Gëzuar Krishtlindjet', 'Hyvää Joulua', 'Häid jõule', 'Joyeux Noël', 'Kellemes Karácsonyi Ünnepeket', 'Merry Christmas', 'Mutlu Noeller', 'Natale hilare', 'Nollaig Shona Dhuit!', 'Prettige Kerstdagen', 'Priecîgus Ziemassvçtkus', 'Selamat Hari Natal', 'Sretan božić', 'Su Šventom Kalėdom', 'Vesele Vianoce', 'Vesele bozicne praznike', 'Veselé Vánoce', 'Wesołych Świąt Bożego Narodzenia', 'Zalig Kerstfeest', 'καλά Χριστούγεννα', 'З Рiздвом Христовим', 'С Рождеством', 'Срећан Божић', 'Честита Коледа', 'ميلاد مجيد', 'کريسمس مبارک', 'क्रिसमस मंगलमय हो', 'クリスマスおめでとう ; メリークリスマス', '圣诞快乐', '즐거운 성탄, 성탄 축하');

my $n = @texts; my $m = $n; for ($i = 0; $i < $n; $i++) { my $idx = rand($m); my $text = splice(@texts, $idx, 1); $m = @texts; if ($i > 0) { print $SEP; } print $text; } print "\n";
Please replace NBSP by normal spaces…

Pseudo Data Structure for Strava and Komoot

Strava and Komoot can be used to plan bicycle trips and to record them. Both tools can be used for both purposes, but it seems that Strava is better for the recording and Komoot better for the planning. Btw. it works also for walking or running, for example. But I will stick with the bicycle as the main example…

Now the normal process is to plan a trip and then cycle it and it gets recorded. If you use wandrer.earth also and connect it to Strava, you will see a map of the aggregated routes that you have cycled and coverage of areas in % and stuff like that. Recording can also be done by a smart speedometer, which is again connected to Strava…

Now there are trips that have been performed in the past, where recording was not yet in place, but you want to have them in the collection as well.

In this case the process is slightly different. You have already cycled it long ago and then plan it in komoot to exactly match what you cycled. This can be difficult, because roads may no longer exist, have bicycle forbidden or Komoot just thinks so, even though it is not true. Or they have become forbidden after you cycled them. In all these cases Komoot will refuse the actual route. There are two approaches to solve this. Either you put „off grid“ intermediate points and mark the actual route. Or if it is still present, you can plan it with mymaps of google, export it as kmz-file, convert kmz to gpx (using some of the sites that offer such a conversion or a program) and then you can import the gpx to Komoot and still change the parts that Komoot thinks are allowed.

Then it will be possible to export the route as gpx, use https://gotoes.org/strava to import it to Strava and provide some additional information, mostly the actual date and speed. Or the rough date and speed, if you do not know it exactly and do not find that important. Please put lower speed in doubt, so you do not become a cheater for speed records of certain sections. And make sure you use the right unit for speed and to use „bike“ as elevation speed calculation parameter…

Now it turns out that there will be hundreds or thousands of trips. There will be multiday trips, which you might want to handle as separate trips that kind of belong together. Or because of ferries or reuse of sections already planned or complexity you might even want to split the day into several parts.

Now both Komoot and Strava allow to give their trips names.

For multiday trips, I have numbered them and 3 digits are still sufficient. So their name in both Komoot and Strava is something like
R () {/}
where
is the 3 digit zero padded number of the trip,
is a capital letter indicating the part of the year (think of something like seasons) and the 4-digit year.
is clear
can be the actual date in ISO-format or if it is now know something like D1, D2, D3…
or for longer trips D01, D02, D03 or even D001, D002,… is a number for the section within the day, if there is more than one

Now if a route segment is used multiple times, in Komoot it will have a different name than in Strave, something like this:
S
where is often the highway number and the start and end point, if that works as a complete description.
is random, unique and consists of 6 characters [0-9A-Z].

If the Strava segment of a multiday trip is based on such a multiply used segment from komoot, it is named like this:
R () {/} S

For one day trips, that consist of only one segment that is specific to it, the description looks like this (in Strava and Komoot):
T {} {} {}
where
is random, unique and consists of 6 characters [0-9A-Z].
optionally describes how often the trip has been done (with no multiplicity meaning „once“), it can be a number or a „many“ or a number preceded by „~“.
is the iso date of the trip, if known (only applicable if multiplicity is not used)

For one day trips, that consist of only one segment that is multiply used, the description looks like this (in Strava and Komoot):
T {} {} {} S
This case is quite unlikely to occur…

For one day trips, that consist of several segments, the description looks like this:
T {} {} / {S}
can be the actual date if known or just „D“ for „any date“ if unknown. is used to number the segments from 1 to n
If it is based on a multiply used segment in Komoot, the S is included, otherwise the route has the same name in Komoot.

To support hiking or running or others, I suggest to just add this „means of transport“ as a keyword to the descriptions, making bicycle the default, if that is most commonly used.

So using some pseudo data structure in the description fields will help being able to have useful searchability even when the number of trips gets to large to page through…

What we can learn from this: Whenever we have some amount of data in some software, it is good to think of how to organize it and how to be able to find and connect the data. Even if the tool is quite dumb in this aspect.

The primary example of such a (less dumb) software is of course the database, where we actually do exactly this in very sophisticated ways.

Some Thoughts about Geo-Positioning

Today every phone and tablet and a lot of other devices have capabilities to detect the current location and this can be used for many things… Even some speedometers for bicycles rely on geo-positioning only or use a rotation count of the wheel just as additional, optional source of information.

In the 1990s I worked in a company that provided management systems for public transport, which included radio transmission, schedule tracking, information display for passengers, information for drivers etc. An important ingredient was knowing where the vehicles were, or even better, when they passed a certain point. This seems to be mathematically equivalent, but when the vehicle moves at different speeds, stops at traffic lights and bus stops (or tram stops or subway stops) then the time of passing at a location cannot easily be found from the location at a certain time without loss of accuracy.

To understand things well: At that time GPS was already there, but it was deliberately made inaccurate for everyone who was not the US-military. This could be improved by putting an additional GPS-device at a fixed, well-known position and by applying the deviation found by this device to the measured positions of the mobile GPS devices. But some more sources where used. The vehicles usually moved along predefined routes. These were measured with a special measuring vehicle, which required someone with excellent knowledge of the bus routes to participate in measuring the distances between the busstops and for the whole line. In this case there were formal bus stops, which were stored in the software systems with scheduled times for each run and of course exact distances. And between there were informal stops which are marked with some sign and recognized by the drivers, with times being somehow between the scheduled times at the surrounding formal stops. Distances could vary, because depending on the daytime different routes for the same line could be planned. But for the specific vehicle on a specific run, in principal its exact route, the distances between formal stops and the arrival and departure times at the formal stops should be well defined internally, up to the second. Even though published schedules only show minutes. So, this adds another useful source of information, because the vehicle itself also measures the distance, maybe a bit less accurate than the measuring vehicle, but hopefully accurate enough to be useful. That is kind of the opposite of what current bicycle speedometers do, using the real measured distance and speed to learn about the position rather than the position to learn about speed and distance. Of course, bus drivers are humans, they make mistakes and when a new routing has been implemented they might miss a turn or so. Not very often, but it can happen. Also a driver might make a little detour and stop at a coffee shop to get a cup of coffee (of course take-away). But 99.5% of the time the distance measuring should help.

Yet another source of information are the stops and the door opening events, especially in conjunction with scheduled travel times and distances. Of course, the door can in rare cases open even without a stop. And more often, informal stops can be skipped, if nobody wants to get of or on. And stops can occur because of traffic jam and because of traffic lights. But even about these there can be some knowledge. This is yet another source of information.

The most reliable source of information are beacons. These are devices that are positioned at certain fixed points where a lot of vehicles pass by. Ideally in such a way that on each route at least once a beacon is passed. And they provide the more interesting information, when the vehicle passes them, not where the vehicle is at a certain time. Unfortunately these beacons were expensive, so they could only be used in small numbers as an additional source of information.

Today the concept of finding one’s position is quite mature. But we should not forget that companies like Samsung or Google can spend millions on developing hardware and software to do this really well. For a small company which is fighting with software delivery schedules, different client requirements, legacy software, and of course very limited budgets, doing a good geo-positioning by combining many insufficient sources of information to get a decent accuracy was quite a challenge. A lot of ideas to improve this, that where „obvious“, would be moved to some far future releases that never actually happened. For example, the system could learn distances and travel times between stops by observing the system for a few weeks and even include informal stops. And different sources of information could be combined in sophisticated ways, not by just using the best information or some kind of average or so… I am sure the developers did a good job.

So what happened: Today we have several sets of satellites for geo-positioning. Galileo was launched to get more accuracy and to be independent of the deliberately inaccurate American GPS. And the Americans stopped making GPS inaccurate and even removed the feature from newer satellites. At least they say that they did. There is a Russian, a Chinese and a Japanese system and some more. So current phones and speedometers can and do use all of these simultaneously to get fantastic accuracy. It is said that Google observes WiFi names and uses that as an additional source of information. What could be used, are the antennas of the mobile network. At a certain location a certain number of antennas can be seen and the strength of the signal can be measured and be used in conjunction with knowledge of obstacles, orientation of antennas and their transmission power to improve location finding. I do not know if this is done and if this would be of any help for improving accuracy. What could also be used are sensors in the devices that observe acceleration. And of course, beacons can now be any kind of object that is sufficiently unique and that is at a fixed position and can be recognized by a camera with some images recognition software. In case of bicycle speedometers or in case of public transport vehicles, using the wheel rotation as measurement for the distance and using that as additional source of information could still be useful, especially in tunnels, where the satellites are not helpful. When it comes to the third dimension, elevations, mountains or just human built structures, the satellite positioning can be used even for that. But the air pressure is an additional source of information, especially in conjunction with weather data and with data about the location of elevations and artificial structures. Then it can even help knowing about the position, unless it is located in a flying object.

It can be seen that this is an interesting area and even though the accuracy is already very good today, some further improvements would still be useful.

Automatic editing

For changing file contents, we often use editors.

But sometimes it is a good idea to do this from a script and do a change to many files at once or do the same kind of changes often.

The classic approach to this is using sed, which is made exactly for this purpose.

But most scripting languages, like for example Perl can do similar work to sed. Since Perl was developed with the idea of replacing awk and sed and some shell scripting and being a complete programming language, it claims to do the same better than sed. Ruby, Python, Raku and some others are also possible to use.

Basically this can be done in typical bash style work, so the bash script uses specific tools to perform small tasks and together an end result is achieved.

I am quite familiar with Perl, so for this kind of replace-in-file I am using Perl, just in a way that would with some minor changes also work with the other four languages mentioned or probably some others. Of course for more advanced stuff there might be areas where it for example Perl or Raku work better.

One thing needs to be thought about:

These „scripted search and replace“ operations primarily work like pipes, so with something like

perl -p -e 's/ä/ae/g;s/ö/oe/g;' < infile > outfile

or the equivalent in sed, Python, Raku or Ruby. In the end it is probably desired that the outfile just replaces the infile, but

perl -p -e 's/ä/ae/g;s/ö/oe/g;' < infile > infile

won’t work.

So an universally applicable approach is to just mv the modified file to the original afterwards, like

perl -p -e 's/ä/ae/g;s/ö/oe/g;' < infile > outfile \ && mv outfile infile

Sometimes it is desirable to retain the original file to be able to check if the transformation did what it was meant to do and to be able to go back…

mv infile infile~ perl -p -e 's/ä/ae/g;s/ö/oe/g;' < infile~ > infile

And then the files ending with ~ need to be cleaned up when everything is done.

Since this is the most common way to work, there is a shortcut (at least for Perl):

perl -p -i~ -e 's/ä/ae/g;s/ö/oe/g;' infile

If no backup is needed, this can be done as

perl -p -i -e 's/ä/ae/g;s/ö/oe/g;' infile

For these simple examples, using sed, ruby, perl, python or raku is more a question of what syntax we know by heart. And of course, Perl, Python and SED are most likely installed on almost any Linux server…

Sometimes we want to do more complex things.. So depend on a context or on a state and do different substitutions depending on that. Or aggregate information in variables and insert them at some point.. For more complex things I really prefer real programming languages like Perl or Ruby, but a lot can be done by using just bash, sed and awk, if you know them really well.

A funny thing is that there is a simple editor ed, which can actually be used from the command line to apply editing functions to a file.

I remember that in some project there were a lot of template files that where used to create emails for example. It was quite a good idea to change them with perl scripts (and retain the original). Then I could show the outcome to the stakeholders and change the scripts until it was ok. This works for a couple of hundred files without problems, but of course it works better if they are kept clean with no unnecessary differences sneaking in that cause automatic editing to fail or create different results on some of the files. Testing remains important, of course.

Better learn to do at least some basic automated editing than doing tedious manual editing of a couple of hundred files, possibly having to do it more than once.

Meetings with some members remote

Almost all meeting rooms have these technically excellent Jabra devices to do a meeting and let someone participate who is not in the meeting room.

What happens during these meetings more often than not:

The remote participant or the remote participants are forgotten…

While it may be a useful approach in some circumstances, think twice if it is not better just to do the meeting with headphones for everyone, even though a lot of people are actually sitting in the same office location.

vim and vi

We all have our preferences for editors and IDEs.

I like Emacs and IntelliJ.

But I also like vi.

When you work on Linux-Servers that are not your own, there is a certain, quite limited set of software installed, that makes this thing work.

Usually you will find Perl and Python. And vi as editor.

Of course it is possible to install emacs or some people like „tiny“ editors like nano, pico, joe, jed, kedit, gedit,… And then again, you need to install them. Not on one server, but on a lot of servers. Where they serve no purpose, other than allowing people who are not comfortable with vi to edit.

So I guess, it is ok, if there is a number of servers belonging to a team if the team agrees to use an additional editor.

But generally, it can be assumed that this is not the case or not there on every server.

So we need to learn vi or vim.

Vim is an improved version of vi, which is what Linux system today have as vi. Usually. And it is less and less common that Aix, Solaris or other Unices are still in place, it is more and more Linux only, with vim as the default editor that is always there. And it works without graphical user interface, which is a big plus. Servers can and often do have X11 libraries installed, so it is possible to ssh with -X option to allow X11 forwarding. But this is a little bit to set up on both sides, which can be some work. And which can fail because of a missing part. And changing the server setup is not always desirable. And not always efficient, if it is a huge number of servers. So not needing X11 can be a plus and makes it universal.

So what is annoying in the beginning: The editor has different modes. Text can be entered in insert mode. Then this ends and keys work as commands. So some effort is needed to understand the basics, while some editors are self explanatory and work like „normal editors“ from the beginning. But just learn it.

On the other hand, vi and even much more so vim is a very powerful editor. It has a lot of functionality and for people who really know it well, it can be very fast. As with Emacs it IS possible to configure vim to a great extent. Damian Conway has written macros for vim that allow it to work as a replacement for powerpoint for presentation that are very code centric. With the limitations of what text mode provides, but it is still amazing.

This is not the way to go on thousands of Linux servers, because it is not efficient to distribute the configuration and maybe not desirable.

So I work with Emacs with my configuration. I am totally lost with Emacs with the default configuration.

And I work with vi with the default configuration and deliberately avoid adding my personal configuration.

Since Emacs, Vi and IntelliJ are different enough and look different enough, it is not such a danger to mix them up.

There is more than one way to do it

Quite often we need to do something (system engineering, software development, database,…) and when we let ten people do it, it will be done in eleven different ways.

Some organization like to „standardize“ things…

And to some extent that is needed actually.

But the challenge is to find what needs to be standardized and what not.

While it is absolutely clear that it will lead to chaos, if nothing is standardized, because stuff will not work together well and people will have a hard time taking over somebody else’s work or helping each other.

But the other extreme can be bad also… It kills creativity, it forces people to use inferior solutions that need a long time to implement. Just think of standardizing on hammers and then using screws as nails. Also to much standardization freezes an approach too early, when there should be different approaches competing, with the more successful ones gaining terrain.

What can be done and what works: for a big software it is possible, that different teams work on different components. They can use different programming languages. In some cases they should actually do that, if they need specific strengths of one language in one area and of another one in another area. Of course a lot of things like interfaces, security, character encodings, networking and network architecture need to be standardized. While it may or may not be perfectly legitimate ot use for example both PostgreSQL and MongoDB and CassandraDB by the same application, there needs to be a good reason for using two different relational databases.

I have seen this happen, but at some point of time one team using programming language A was quite successful and team using programming language B not. And there was no strong specific benefit of language B for what they were doing, other that they knew programming language B better. Companies can replace teams or train people on language A. Or try to train them and replace some of them depending on the success of the transition. Possibly it is not the language, but the team that was the reason… Not necessarily the people, but for some reason they got on the wrong track and are unsuccessful this time… In the end, the project decided to standardize on language A and have this part rewritten by people who knew language A quite well. Even though this meant writing stuff twice, in the end it brought a much better result. This „rewriting something with better technology“ can be a dangerous trap. Often the second solution takes ten times longer to write until it does what the „inferior old solution“ did. But in this case that problem did not occur at all.

Now we should look at the IT work from a more abstract point of view.

Some work is done and in the end there is a result. This can be a source code, a running software, a server setup, a document, a database setup, a network setup,…

As a general rule, we want to standardize some aspects of the result. How the result is achieved should be left to the engineer.

So for many ad-hoc-tasks people can use different tools, like different editors and IDEs, different configurations for these, scripts in different scripting languages etc. Of course efficiency is a goal. Scripts can even be documented or checked into git for optional usage by those who like to use them. People can use Perl, Python, Ruby, Bash or something else for scripting. But the scripts can actually become results and then become candidates for standardization. For example, for managing a large number of servers, a lot can be done using ansible, which can be seen as a kind of fourth generation (declarative) language for doing exactly this. It makes sense to agree on using this all over the place and then the ansible playbook and not only the result of its execution is the result of the work and probably checked into git.

When working on Linux-servers, vi or most likely vim will be installed on every server. With the default configuration, at least for root and other system users. It is possible for an organization to agree to install a second editor on every server without too much effort. But it is usually not done and it does not really solve the issue, so most of the time editing on servers is standardized to usage of vi. Get used to it, it will be like that anyway in the next organization you are working for.

When someone writes a super complex script that nobody else understands, then this can be a combination of unnecessary complexity and inherent complexity. If the inherent complexity is high, then it is quite likely that this script will and should be reused and then it is no longer an ad-hoc solution but it becomes an result. Then it becomes of interest to make it a bit maintainable and readable, so others can work with it and improve it.

But in all levels there are often different ways to do something that are all legitimate and good and efficient for the purpose.

Configuration Files

Just a short story as a starter:

I went to a Covid19 test and a colleague asked the very legitimate question, why I am doing a test in spite of being vaccinated twice.

This has a lot to do with configuration of software.

There are configuration files, that are the rules that local, regional, national or EU government impose. They typically prepare rules, publish them and activate them on a given date, but sometimes they become active immediately. The first case is a like using a more or less clean process for managing configurations and the second case is like adhoc editing with vi. Both have their place, but the second case should be understood as a rare exception that is needed to keep the system running and there should be steps to follow afterwards to document this change and to feed it into the process afterwards.

So the rules by governents confingure for example border controls and controls at airports. At each border and at each airport or port there are natural checkpoints where a person can be stopped. When travelling by train, by bicycle, by car or by motorcycle, there are usually only the border controls.

Now these configurations change freqently and there are a lot of complex rules. Like configuration files of software in real life.

So it can happen, that somewhere they ask for a test even if the person has benn vaccinated. Or in other situations the vaccination is enough. Or the proof of vaccination has to be given in a certain format, that is maybe still impossible to get. Or the vaccination only counts if it has been done using a certain vaccine. Now it is advisable to equip oneself in such a way that it is possible to pass absolutely all checkpoints of the whole trip.

Now there can be errors in the system. For example the persons who run a checkpoint might not know the rules valid at that particular day correctly and stop someone who actually complies with the rules.

It is said that it has been known 2000 years ago that the configuration of customs control have been falsified in an ad hoc manner. Not only by mistake, but actually voluntarily.

I think that this describes quite well the challenges and complexities of managing software configuration.

So almost every software has some configuration that comes with it.

This can be flat files, that are more or less specific to configuration.

It can be stored in a database. This can be a classical SQL database or a NoSQL-database or some kind of lightweight database that is accessed only via a library.

Configuration can be prepared in advance.

Or it is edited on the server.

Or it is edited using settings of the software via its GUI.

Now for a seriously managed software installation, it is a good idea to have well defined installation artifacts, which includes the configuration. But less technical configuration can actually be changed by sophisticated end users and this can be actually the right way to do it.

It is a good idea to keep some understanding of the configuration of the software. A way to do this is to have it in git, possibly using tags and or branches. I do not recommend to actually have git on all the machines directly, but rather to have some configuration master server on which the files are edited, prepared and pushed into git. Then they can be pushed to the servers by some mechanism, via tools like ansible or via simple scripts…

Remote Work (15 months later)

After having worked mostly remotely for more than 15 months, some more experience has been gained on this issue than in April 2020.

It looks like conservative organizations still have a hard time getting used to this concept and instead insist that they are this rare special exception that absolutely needs to be in the office.

Usually privately owned companies are more innovative and small companies are more innovative. But I have observed the contrary… Some small companies are very conservative and really reject adopting technology and processes that allow to work remotely efficiently. And some relatively big, state owned companies manage to do this really well.

In the end it depends on the people. If the company does not bully away good people and gives them some freedom, they can organize good stuff, just for the love of technology (and of course they get paid). If a small company has a really good business idea and implemented this idea really well, it can afford to be a bit less efficient in some usually minor areas.

Now for Covid19 countries had to make different choices. One strategy was to shutdown as much as possible, to get Covid19 down to zero, which is what Australia, New Zealand and China tried. And which implies, if this strategy is retained, keeping the borders closed for several years in almost iron curtain style. Most countries have followed a strategy of just keeping the numbers at a level that can be dealt with, for example Sweden and Switzerland. And some countries have tried to get vaccinations very early and very fast… Now it was the question, if the people or the companies have stronger influence on the government. In Sweden and Switzerland companies where very strongly urged to let their people work remotely, whenever possible, but they never imposed restrictions the constrained basic human rights, like France in the first time of Covid or Germany recently. Anyway, responsible companies made remote work possible, where it is reasonable to at least reduce the density in the office and not to forget to reduce the density in the trains an public transport for the commute. For our American friends: In modern countries, a lot of people come to work by train, by bus and by bicycle. Which they do in the US as well in the Bay Area and in New York.

So now, what are the pros and cons of remote work? What can we learn from it?

After a year or so, people got used to working remotely. The technology works. Meetings work. So it has become a real option for many tasks.

Pros:

You do not have to spend time on the commute.
You can eat lunch at home
You do not get distracted by working in a loud office
You can have a nap after work
If remote work from abroad is allowed and you can afford it: you can work from really cool places

Cons:

When living alone most of us really miss meeting people after some time…
Some (actually not so many) meetings are more efficient when in person
You get a clean separation between work and free time when working in the office

So what are the right answers:

I think it is good to let people choose if and how much they work from home. The point is: if there is a task that can be done really better in the office, it is good to come to the office… If there is a task that can be done better from home, it is good to be able to just tell this to the team and do it from home. In the end of the day, some people will like to work 100% in the office, some will like to work almost 100% remotely and many will work some „mix“. Of course it is good to make sure the commute is not so long or at least not so unpleasant. Going by train allows to make use of the time much better than driving. Or commuting by bicycle has the side effect of being free exercise and for that reason being the only mode of transport that increases life expectancy. I am writing this blog post on my laptop in a train. My advice: If you like to work at home and are allowed to choose, I recommend going to the office a few times a month, like once a week or like a few adjacent days, depending on the distance.

A challenge will be: When nine people sit in the office and have a meeting and the tenth is participating from remote via a Jabra device, it requires a lot of discipline not to lose the remote guy. Those who have experienced it, know what I am talking about.

So in the end of the day: We can all profit from what we have learned from the pandemia and use our experience to apply remote work even when there is no virus and no government that forces this on us.

Unit and Integration Test with Databases

From an ideological point of view, tests that involve the database may be unit tests are may be integration tests. For many applications the functional logic is quite trivial, so it is quite pointless to write „real unit tests“ that work without the database. Or it may be useful.

But there is a category of tests, that we should usually write, that involve the database and the database access layer and possibly the service layer on top of that. If layers are separated in that way. Or if not, they still test the service that is heavily relying on a database. I do not care too much if these tests are called unit tests or integration tests, as long as these terms are used consistently within the project or within the organization. If they are called integration tests, than achieving a good coverage of these integration tests is important.

Now some people promote using an in-memory-database for this. So in the beginning of the tests, this database is initialized and filled with empty tables and on top of that with some test data. This costs almost no time, because the DB is in memory.

While this is a valid way to go, I do not recommend it. First of all, different database products are always behaving different. And usually it is a bad idea to support more than one, because it significantly adds to the complexity of the software or degrades the performance. So there is this database product, that is used for production. For example PostgreSQL or Oracle or MS-SQL-Server or DB2. That is, what needs to work. That means the tests need to run against this database product, at least sometimes. This may be slow, so there is temptation to use H2 instead, but that means, writing tests in such a way that they support both db products. And enhancing the code to support at least to a minimal extent the H2 database as well. This is additional work and additional complexity to maintain.

What I recommend is: Use the same database product as for production, but tune it specifically for testing. For example: You can run it in a RAM disk. Or you can configure the database in such a way that transactions do not insist on the „D“ of „ACID“. This does not break functionality, but causes data loss in case of a crash, which is something we can live with when doing automated tests. PostgreSQL allows this optimization for sure, possibly others as well.
Now for setting up the database installation, possibly with all tables, we can use several techniques that allow to have a database instance with all table structures and some test data on demand. Just a few ideas, that might work:

Have a directory with everything in place and copy it into the ram disk and start it from there
Just have a few instances ready to use, dispose them after tests and already prepare them immediately after that for the next test
Have a docker image that is copied
Have a docker image that is immutable in terms of its disk image and use a RAM disk for all variable data. Much like Knoppix
Do it with VMWare or VirtualBox instead…

In the end of the day, there are many valid ways. And in some way or other, a few master images have to be maintained and all tests start from this master image. How it is done, if it is done on classical servers, on virtual servers, on a private cloud, on a public cloud or on the developers machine, does not really matter, if it just works efficiently.

If there is no existing efficient infrastructure, why not start with docker images that contain the database?

We need to leave the old traditional thinking that the database instance is expensive and that there are few of them on the server…

We can use the database product that is used for production environments even for development and testing and have separated and for testing fresh images. We should demand the right thing and work hard to get it instead of spending too much time on non-solutions that work only 80%. It can be done right. And laziness is a virtue and doing it right in this case is exactly serving this virtue.