Unit and Integration Test with Databases

From an ideological point of view, tests that involve the database may be unit tests are may be integration tests. For many applications the functional logic is quite trivial, so it is quite pointless to write „real unit tests“ that work without the database. Or it may be useful.

But there is a category of tests, that we should usually write, that involve the database and the database access layer and possibly the service layer on top of that. If layers are separated in that way. Or if not, they still test the service that is heavily relying on a database. I do not care too much if these tests are called unit tests or integration tests, as long as these terms are used consistently within the project or within the organization. If they are called integration tests, than achieving a good coverage of these integration tests is important.

Now some people promote using an in-memory-database for this. So in the beginning of the tests, this database is initialized and filled with empty tables and on top of that with some test data. This costs almost no time, because the DB is in memory.

While this is a valid way to go, I do not recommend it. First of all, different database products are always behaving different. And usually it is a bad idea to support more than one, because it significantly adds to the complexity of the software or degrades the performance. So there is this database product, that is used for production. For example PostgreSQL or Oracle or MS-SQL-Server or DB2. That is, what needs to work. That means the tests need to run against this database product, at least sometimes. This may be slow, so there is temptation to use H2 instead, but that means, writing tests in such a way that they support both db products. And enhancing the code to support at least to a minimal extent the H2 database as well. This is additional work and additional complexity to maintain.

What I recommend is: Use the same database product as for production, but tune it specifically for testing. For example: You can run it in a RAM disk. Or you can configure the database in such a way that transactions do not insist on the „D“ of „ACID“. This does not break functionality, but causes data loss in case of a crash, which is something we can live with when doing automated tests. PostgreSQL allows this optimization for sure, possibly others as well.
Now for setting up the database installation, possibly with all tables, we can use several techniques that allow to have a database instance with all table structures and some test data on demand. Just a few ideas, that might work:

  • Have a directory with everything in place and copy it into the ram disk and start it from there
  • Just have a few instances ready to use, dispose them after tests and already prepare them immediately after that for the next test
  • Have a docker image that is copied
  • Have a docker image that is immutable in terms of its disk image and use a RAM disk for all variable data. Much like Knoppix
  • Do it with VMWare or VirtualBox instead…

In the end of the day, there are many valid ways. And in some way or other, a few master images have to be maintained and all tests start from this master image. How it is done, if it is done on classical servers, on virtual servers, on a private cloud, on a public cloud or on the developers machine, does not really matter, if it just works efficiently.

If there is no existing efficient infrastructure, why not start with docker images that contain the database?

We need to leave the old traditional thinking that the database instance is expensive and that there are few of them on the server…

We can use the database product that is used for production environments even for development and testing and have separated and for testing fresh images. We should demand the right thing and work hard to get it instead of spending too much time on non-solutions that work only 80%. It can be done right. And laziness is a virtue and doing it right in this case is exactly serving this virtue.

Share Button

Electronic Vaccination Certificates

It is becoming increasingly important to have a way to easily prove being vaccinated against Covid-19 in a way that works at least throughout Europe.

This can be a piece of paper and it can be an app.

People are very concerned about faking the certificate.

This can happen and it is of course a serious crime.

The problem is that this is needed really fast and thinking about a typical software project it is very ambitious to be live in two months if it has not really started yet.

But leaving that timing issue aside, is it a reasonable idea?

Actually there are apps for railway tickets, boarding passes and of course for helping to prove ones identity when logging into some ebanking systems.

In Switzerland railroad tickets are either bought for a specific ride. Or people buy a „flat rate“ and can use any train within Switzerland as often as they want and they have already paid it with the annual payment. And there is a combination of both that makes the ticket cheaper for people who pay a much lower annual rate.

Now there is a mobile app and a chip card. Both provide a QR-code. This allows the trainmen to scan it with the app on their phone and to see the photo of the person and the person’s subscriptions. The actual ticket for guys without the flatrate is another QR-code. This has been in use for a few years now and seems to work quite well.

Of course it is based on the fact that railroad trainmen are a relatively small, trusted group that has access to this kind of information about travelers. That is another issue than having a terminal to scan the app at every cinema, restaurant etc. throughout Europe. The photo is convenience, but it could be provided by showing an ID. The QR-code could be signed with the private key of a health agency, whose public key is available to the public and allows to verify that it is correct. This way it could just provide all the information without using a server. The other variant would be that the QR-code is a key for a lookup on the server. But not depending on storing the important data on a server but having it all in the QR-code is actually a good idea. If it is desired to provide test results as well, then it might be a bit easier to provide access to a server and store all the information there. But since vaccinations and tests can be done in any country and it probably is not a reasonable or desirable idea to implement some mutual data exchange pattern between the servers and it is probably also not possible to tap the NSA-servers for this information, it will probably anyway end up having one QR-code for each event, where events are tests and vaccinations and maybe recoveries of the actual sickness from Covid19. In this case they could work without accessing the server.

The whole functionality is probably not too difficult to build, but these security related software projects tend to require working even more thorough than usual. Also it is probably important to look into the issue of data privacy, which might complicate issues a lot, especially due to the fact that anybody in the world can read the certificate if it is shown and by that access sensitive data of the user. And users have to show it every day on many occasions…

Share Button

Spline Approximation (Mathematics II)

This is the third of a series of article about spline approximation. If you have not done so, you should start reading

So a function which is supposed to approximate a given set of points \{(\xi_j, \ety_j)\}_{j=1\ldots N} as a linear combination

    \[(1)\thickspace g(x) = \sum_{i} a_i f_i(x)\]

of functions f_i, as described in the previous article with

    \[(2)\thickspace f(x)= \begin{cases} 0 &\text{for } x \le -2\\ (x+2)^3&\text{for } -2 < x \le -1\\ 1+3(x+1)+3(x+1)^2-3(x+1)^3&\text{for } -1 < x \le 0\\ 1+3(1-x)+3(1-x)^2-3(1-x)^3&\text{for } 0 < x \le 1\\ (2-x)^3&\text{for } 1 < x \le 2\\ 0&\text{for } x > 2 \end{cases}\]

and with h chosen such that x_i=x_0+i*h for all i

    \[(3)\thickspace f_i(x)=f(\frac{x-x_i}{h}) \text{ for } i=-1,\ldots,n+1\]

Now up to four f_is can overlap in one sub-interval. Assuming the subinterval is

    \[S=[x_i,x_{i+1}]\]

and the arithmetic mean of the borders is

    \[X = \frac{1}{2}(x_i+x_{i+1}) = x_i + \frac{h}{2} = x_{i+1} - \frac{h}{2}\]

This interval is touched by the base functions f_{i-1}, f_i, f_{i+1}, f_{i+2}, so the function g can be written for x\in S as

    \[g(x)=af_{i-1}(x)+bf_i(x)+cf_{i+1}(x)+df_{i+2}(x)\]

    \[= af\left(\frac{x-x_{i-1}}{h}\right)    +bf\left(\frac{x-x_{i}}{h}\right)    +cf\left(\frac{x-x_{i+1}}{h}\right)    +df\left(\frac{x-x_{i+2}}{h}\right)\]

    \[= a\left(2-\frac{x-x_{i-1}}{h}\right)^3\]

    \[+b\left(1+3\left(1-\frac{x-x_{i}}{h}\right)+3\left(1-\frac{x-x_{i}}{h}\right)^2-3\left(1-\frac{x-x_{i}}{h}\right)^3\right)\]

    \[+c\left(1+3\left(\frac{x-x_{i+1}}{h}+1\right)+3\left(\frac{x-x_{i+1}}{h}+1\right)^2-3\left(\frac{x-x_{i+1}}{h}+1\right)^3\right)\]

    \[+d\left(\frac{x-x_{i+2}}{h}+2\right)^3 \text{ using (5)}\]

Now we have

    \[x_{i-1}=X-\frac{h}{2}\]

    \[x_{i}=X+\frac{h}{2}\]

    \[x_{i+1}=X+\frac{3h}{2}\]

    \[x_{i+1}=X+\frac{5h}{2}\]

and thus

    \[g(x) =a\,\left(2-{{x+{{h}\over{2}}-X}\over{h}}\right)^3\]

    \[+b\,\left(-3\, \left(1-{{x-{{h}\over{2}}-X}\over{h}}\right)^3+3\,\left(1-{{x-{{h  }\over{2}}-X}\over{h}}\right)^2+3\,\left(1-{{x-{{h}\over{2}}-X  }\over{h}}\right)+1\right)\]

    \[+c\,\left(-3\,\left({{x-{{3}\over{2}} h-X }\over{h}}+1\right)^3  +3\,\left({{x-{{3}\over{2}} h-X}\over{h}}+1 \right)^2  +3\,\left({{x-{{3}\over{2}} h-X}\over{h}}+1\right)+1 \right)\]

    \[+d\,\left({{x-{{{5}\over{2}} h-X}\over{h}}+2\right)^3\]

Now let

    \[y=x-X\]

and hence

    \[x=y+X\]

and substitute that:

    \[g(x)=a\,\left(2-{{y+{{h}\over{2}}}\over{h}}\right)^3\]

    \[+b\,\left(-3\,  \left(1-{{y-{{h}\over{2}}}\over{h}}\right)^3+3\,\left(1-{{y-{{h  }\over{2}}}\over{h}}\right)^2+3\,\left(1-{{y-{{h}\over{2}}}\over{h}}  \right)+1\right)\]

    \[+c\,\left(-3\,\left({{y-{{3\,h}\over{2}}}\over{h}}+1  \right)^3+3\,\left({{y-{{3\,h}\over{2}}}\over{h}}+1\right)^2+3\,  \left({{y-{{3\,h}\over{2}}}\over{h}}+1\right)+1\right)\]

    \[+d\,\left({{y-  {{5\,h}\over{2}}}\over{h}}+2\right)^3\]

    \[={{d\,y^3}\over{h^3}} -{{3\,c\,y^3}\over{h^3}} +{{3\,b\,y^3}\over{h^ 3}} -{{a\,y^3}\over{h^3}}\]

    \[-{{3\,d\,y^2}\over{2\,h^2}} +{{15\,c\,y^2 }\over{2\,h^2}} -{{21\,b\,y^2}\over{2\,h^2}} +{{9\,a\,y^2}\over{2\,h^2 }}\]

    \[+{{3\,d\,y}\over{4\,h}} -{{9\,c\,y}\over{4\,h}} +{{33\,b\,y}\over{4 \,h}} -{{27\,a\,y}\over{4\,h}}\]

    \[-{{d}\over{8}} +{{5\,c}\over{8}} +{{17\,b }\over{8}} +{{27\,a}\over{8}}\]

    \[=\frac{d-3\,c+3\,b-a}{h^3}\,y^3 +\frac{-3\,d+15\,c-21\,b+9\,a}{2\,h^2}\,y^2\]

    \[+\frac{3\,d-9\,c+33\,b-27\,a}{4\,h}\,y +\frac{-d +5\,c + 17\,b + 27\,a}{8}\]

This will be used to actually calculate g in programs.

Disclaimer: I am not an expert in numerical analysis. While I believe that the approach that this article comes up with is sound and useful, I do believe that an expert of numerical analysis could still improve the accuracy of the calculations.

How to actually program it will be covered in the upcoming article „Spline Approximation (Cookbook)“. The link will be added, when it is available.

Share Button

Git for Linux System Engineering

Now that git has become the standard version control software, which is used by software developers.

Now for system engineering and system administration purposes it used to be an approach to just login and do something and remember it or even note it somewhere.

Some people tried to just use RCS or SCCS on the server to create a local repository for the version of a configuration file. Some thought might be needed, if this is OK, because in the processes of checking in a file, it might be temporarily unavailable or inconsistent, because RCS can modify files in the processes. Usually this is not a problem, but if these kinds of problems occur, they are kind of weird ot find and fix.

Today the approach has changed a lot. Virtualization and it has become possible to build up thousands of servers. Or even millions, if you go to companies like Google. Typically servers perform a specific task and another server is used for another specific task and since almost all are virtual, that is possible. Thus it become necessary to run and maintain thousands of servers and that means that things have to be done really efficiently.

An important approach is to automate a lot of things. This can be done for example using Ansible or Perl, Ruby or Python. Or combinations of these. Or there can be a cloud that supports a lot of features out of the box. And tools that automate certain tasks…

So in the end of the day, it is no longer the typical approach to login to a server and do certain things, but to work on some kind of master server, prepare things and then run them on multiple servers from there. So the scripts and configuration files are created on this master server and now it becomes perfectly natural to use git to check them in.

Another engineer can clone the repository and perform the same task on a new server, for example.

Also it is a good idea to create branches and work with the branch on a test server, until it looks good and them merge that to the main branch. Even tags and release numbers make sense in some environments.

Only one tiny difference: While software developers tend to use IDEs like IntelliJ IDEA, Eclipse, Visual Studio or editors like Atom or Emacs, system engineers tend to use vi. This has several advantages: First of all, it is installed on every Linux server (unless you create one and uninstall it to disprove this, of course), it runs well across an ssh session even with relatively slow internet connections and it is powerful once you learn how it works. So I recommend to learn just the basics to get simple stuff done. And I recommend to avoid heavy customizing of vi, because that might cause problems when working on a server without the customization.

And even though today work is done mostly on these master servers, of course it does still happen, that it becomes necessary to login to a specific server to check out if the stuff worked as desired or to do tiny tasks that have not been automated. But remember: Lazyness is good. You should hate doing repetitive work on multiple servers and figure out how to automate it, if it happens for the third time or if it can be anticipated that it will happen more often. Automation not only creates efficiency, it also allows for better collaboration (via git), better consistency and better stability. It is the way to go.

Share Button

Starting processes while booting (Linux)

When a Linux system is booted, we want certain processes to run immediately.

In the old days, that is 25 years ago or so, this was done in „BSD-style“ by having certain magical shell scripts that start everything in the right order. When adding another service, this just had to be added to the shell script and that was it.

Then this was replaced by System-V-style init. So the system had certain runlevels. The desired runlevel was configured somewhere. And for each runlevel there was a directory which contained a lot of scripts or mostly softlinks to scripts. The scripts starting with S were used for starting and the scripts with K were used for stopping. The ordering was achieved by adding a two digit number immediately after the letter S or K.

Important runlevels were:

  • single user mode, which is a very special way to boot the system for maintenance tasks that cannot otherwise be achieved. I have used this only a few times in my life when the system was really seriously messed up…
  • multi user mode with network and no graphical UI. This is what most servers are running
  • multi user mode with graphical UI. This is what most Laptops are running

It was possible to boot into a certain runlevel by configuration. And to switch to another runlevel…

Now this has given way to another more versatile and approach called systemd.

Processes that need to be started are configured by a so called service file. It contains information about how to start and stop the process and about dependencies on other services or abstract groups called „target“. Runlevels are expressed as targets and have names instead of numbers.

A service can be enabled by

systemctl enable something.service

which means it will be started automatically when booting.

It can be disabled with

systemctl disable something.service

And it can be started and stopped with

systemctl start something.service
systemctl stop something.service

The status is queried with

systemctl status something.service

There is a lot more to discover… For example, there are timers to run a certain task repeatedly (instead of putting it into crontab).

If you need to shutdown or start a service at certain times, it is a good idea to perform this always via systemctl and call systemctl from the script instead of going directly to the startup script of the process, because systemctl start/stop stores information that allow systemctl status to work.

It should be observed, that systemctl not only calls scripts to start and stop services, but also keeps track of the processes that are created by this. So it is very important to always start and stop them via systemctl and not to bypass this.

It is nice how beautiful and consistent this solution is compared to the previous solutions…

Share Button

Perl & Java

How do we use Perl and Java together? Unlike many other languages, that for example run in the JVM, it is not particularly easy to combine them. But that is not the idea.

A good starting point is thinking about houses and furniture. While it is perfectly possible to build houses of wood or furniture of concrete, the common practice is to build the house of concrete, stone (plus some other „hard“ materials) and to build furniture mostly of wood or materials that are somewhat like wood. There is a point behind this. The house should be durable and it remains the same for a long time. Changing the house is an expensive operation.

Furniture is individual and different people who use different parts of the house usually bring their own furniture. It is easy to change and it dues not need to be as hard, because being inside the house it is protected from wind, snow, rain and to some extent even temperature changes.

And now think about the tools that we use to build the house. They are not made of concrete or stone at all.

Materials have their strength and weaknesses. We can use the common materials for the task. We can substitute them to some extent. Good wooden houses exist, but wooden skyscrapers are rare.

We also have different programming languages with certain strengths and weaknesses. We can substitute them to some extent as well. Good programs exist that are written in many different languages, even combinations of languages. Who likes PHP? And who likes Wikipedia?

There are many good JVM languages, like Java, Scala, Kotlin, Clojure, JRuby, Ceylon, Groovy… They are widely accepted by the people who manage the budgets. And the operations teams know how to work with software in JVM languages. It is easy to find developers, maybe a lot easier for Java than for Ceylon. Ceylon and Kotlin came out at the same time and with very similar goals. Simply said: Kotlin won and Ceylon lost.  And the performance of JVM-lanauges is close enough to that of native C/C++, which is an amazing success of the developers who wrote and improved the JVM. We kindly ignore that this is at the expense of startup times and memory consumption, but even these issues are being worked on and we will see further progress with GraalVM. It runs on Linux servers, which is what we usually want to use, but it would also run on other common servers. This is excellent building material for houses.

The cons of JVM software: high memory footprint and startup times. The memory footprint can be solved by throwing some money and usually that is a reasonable option, when looking at the big picture. Startup times do not really matter, because we just let our server run for a long time.

So for scripts that do small to moderate size jobs on the command line, Perl seems to be superior to Java. When it comes to parsing text and using regular expressions, Perl profits from having the best regular expression engine, second maybe to that of Raku, which is former Perl6.

Perl can be used to edit Java programs. So you write a perl script that performs simple or more complex transformations on a possibly large set of Java classes or even creates Java classes. This allows to do things that are not available in the „refactoring tab“ of the IDE.

You can call Perl programs from Java to perform tasks that are impossible or too difficult to do in Java. Because Java chose to be „OS-independent“, certain lowlevel OS-functionality cannot be accessed from Java, but this is possible from Perl.

And Perl can be used to start Java programs. Today we see really horrible shell scripts to start something like Tomcat. For the windows variant they add a bat-file that is even much worse. A Perl script can much more easily setup the environment for this Java program and start it and there is no need to write it twice. Or use Perl scripts to perform installation tasks for the software.

Another interesting application is testing. Perl can be used to create automated test, to prepare test data or to parse result data. Or to anonymize data.

Most of these things can as well be done with Python and Ruby and probably some other languages. For certain tasks Ruby or Python are probably even better than Perl. And you can very well think of other JVM-languages instead of „Java“. Many of these things can be done with specific tools that serve the task well. Which can be a good idea, but it can also constrain you to what the tool supports, as in the refactoring case with the IDE as a tool.

It is sometimes a good idea, to combine a language like Clojure, Java, C, C++, Scala with a scripting language like Ruby, Python or Perl.

Share Button

Happy Easter

Happy Easter!

Share Button

Getters and Setters

Deutsch

When programming in Java, it is kind of part of the language to write classes with attributes and equip these attributes with „getters“ and „setters“. You could do otherwise, but you just don’t. But some criticism is of course allowed. Even if it only applies to the design of future languages or to minor improvements in the Java language.

The reason for using this pattern is of course that we do not want to impose inner implementation details of classes, but just interfaces. The implementation can change, for example the getter can caclulate the attribute instead of just returning it. Or it might possible happen in the future, that it will be calculated. And the setter can perform sanity checks or even to some adjustments of dependent attributes. Or, most interestingly, the setter can be omitted in an approach towards immutability. Now there are whole big categories of classes in many projects, that never ever contain any business logic. It is an architectural decision, to put all business logic into certain places. This does not sound like the idea of the OO-paradigm, actually Martin Fowler considers even it an antipattern, but it is done and it makes sense for classes that contain data in interface definitions. So, a typical java application has tons of layers, each with the almost same data classes, mostly without any business logic, because the business logic resides in classes that are reserved to contain business logic. Basically procedural programming, but with cool frameworks and OO because written in an OO-language. Data is copied multiple times between the different layers. One of the layers is the DB-layer and the classes are managed by hibernate. Now interesting questions arise, if hibernate goes through the getters and setters or directly to the attributes. The latter seems to be more common and it allows for a work-around for an important Oracle Bug.

Now „stupid“ getters and setters make the code larger, harder to maintain and harder to read. They can be generated automatically by IntelliJ, Eclipse, Perl Scripts or Emacs-Lisp code and I have always done it that way. But when changing code it becomes more difficult at some point.

There is also a subtle issue in terms of the name space. It is highly unusual to start attribute names with „get“ or „set“, but it would be possible and create a lot of confusion. Since getters for boolean attributes often start with „is“ or even „has“ instead of „get“, this problem does actually exist there, because people like to naturally name some boolean attributes with names like „isNew“ or „hasEngine“ and then the getters become „isIsNew“, „isHasEngine“ or something like that. Also some funny effects can occur when all capital abreviations like HTML or XML are part of the attribute name. This causes some pain, but of course, we live with it…

Interestingly Java creates „internal“ getters and setters for each attribute and they are called when accessing attributes and they are used as hooks for Hibernate in some setups. So there is seemingly an indirection too much, because the getter that we write calls the internal getter, so why not just make get… an alias for the internal getter? This does not seem to be a problem, because the optimization of Java is so fantastic and it is fair to assume that such a common case is really well optimized. So, do not mess around with this for the sake of optimazation, unless you really know what you are doing and run serious benchmarks for this.

Now having a look at other languages shows that things can be done in a better way without losing the original benefit of getters and setters.

Ruby, Scala and C# show that this is possible. The getters are just named like we would name an attribute, so we can use something like

point.x

to access the x-coordinate of the point. In Scala it is quite a thing that the classes should be immutable, so the setters go away anyway. C# and Ruby allow setters to be defined in such a way that they look like an assignment:

point.x = 9

Just two final remarks:

Do not write Unit-Tests for trivial getters and setters, but do write Unit-Tests for getters and setters that are not trivial, but have at least a bit of logic. And configure your sonarqube to be happy with that.

And in terms of documentation, if the project encourages documentation, agree on where to write the documentation and write it in this one place. For example always write javadoc for the getter and never for the attribute or the other way round. And for tons of class hierarchies that are more or less isomorphic, agree in which layer the documentation goes and write it only there, unless another layer happens to actually differ in a non-trivial way on this attribute. Having documentation for basically the same thing more than once is usually a bad idea. The burden of maintenance will increase and it will be outdated even faster than usual javadoc.

Links

Share Button

New Java Framework: Functionativity

A new Java framework Functionativity has been announced today. I will totally change the way we work and attract all people who are now using other langauges like Scala, Kotlin, Clojure, Ruby, Perl, C#, PHP, Python, JavaScript … to move to Java, just to be able to use this new framework and gain more efficiency than ever before within a week after the transition.

Funcionativity is installed only on the developers machine. The program is written there in any style and automatically transformed into highly functional modern java code that allows for high scalability and runs on any device without any installation of software and without any testing. The new super sandbox prevents any bugs from actually being executed. With artificial intelligence the framework finds out what was meant to be programmed, even if there are some inaccuracies and bugs in the program and it will correct itself. The web interface will design itself automatically to the current taste and cooperate design and create the maximal user friendly UEX. It is using only HTML, and with a trick it can put functionality that used to require JavaScript into pure executable HTML. So it works even with very old browsers and JavaScript turned off and provides a rich and highly responsive user interface that minimized network traffic between the browser and the server.

There are integrations to combine Functionativity code with existing legacy frameworks and of course migration scripts to transform code written in other langauges into Java code that works with Functionativity…

Share Button

Ansible

We are talking about system administration and system engineering for Linux. Most of this also works more or less the same on Unix, but this is today much less relevant. It is even possible to do some of these things on MS-Windows, but that is another story… So just assume for the time being OS=Linux. Or wait for the next article in two weeks, if that is not relevant for you….

In the good old days system administration meant logging into the server and doing something there. Of course good system engineers wrote scripts in Bash, Ruby, Perl or Python and got things done more efficiently. When we were maintaining 1000 vending machines of a public transport company, we wrote scripts that ran on one Linux-machine in our office and iterated through a whole list of machines to perform a task on them. Usually one, then another one, then five or so and at some point the rest.

It is not totally gone, it is still necessary to log into the machine to do things that are not yet automated, that are not worth while automating or that simply did not work as desired. Or to check if things did work as desired.

But with tools like ansible it is now possible to do what was done with these scripts in the old days. Ideally the idea is to describe the end state, for example a certain:
– We want a certain user to be present. If it is present, nothing is done, if not, the user is created.
– We want a certain file system to be mounted on a certain mount point. Same idea….
– We want a certain package to be installed. If it is missing, it is installed.

The files that describe the desired outcome are in a directory tree and this is called a playbook. We can think of it as a high level scripting langauge, just a bit like SQL, where we also describe the outcome and not how to get there. And as always, there are ways to fall back to python and bash where the built in features are not sufficient.

Ideally, ansible playbooks, as they are called, are idempotent. That means, they can be exucuted as many times as we want without creating harm. That makes it much easier to update 1000 hosts. We can just try the update on 1, 2, 5, 10, 50 and 100 and then on all and it does not matter, if we actually just give the whole list in the last step or if we by mistake have the same host multiple times. It does matter a bit, because the whole thing becomes slower, the more hosts we have in the list. But still it is no big risk of messing things terribly up.

But this depends on how the playbooks are written. So the goal of writing idempotent playbooks needs to be taken care of the achieve this. In cases where standard ansible features are used, this is usually the case. But when the playbooks are using our own extensions or calling python or bash scripts, we are on our own and need to keep an eye on this or deal with the fact that the playbook is not idempotent.

Now usually there is a file called inventory, that contains „all“ hosts. This needs to be provided. Often the hosts in the file are put into groups and certain steps apply only to one group. Now the actual installation can and usually should be limited to a subset of this „all“, for example when we are installing on a test system and not on production servers. It is possible to limit this to a group, a single host, a short list of hosts or to the hosts found in another file (not the inventory). With some simple scripts it is for example possible, to run the playbook on the next n hosts that have not been processed so far or to split up work with a colleague.

And the playbooks should of course be managed by a source code management system, for example git.

Share Button