Logging

Deutsch

Software often contains a logging functionality. Usually entries one or sometimes multiple lines are appended to a file, written to syslog or to stdout, from where they are redirected into a file. They are telling us something about what the software is doing. Usually we can ignore all of it, but as soon as something with „ERROR“ or worse and more visible stack traces can be found, we should investigate this. Unfortunately software is often not so good, which can be due to libraries, frameworks or our own code. Then stack traces and errors are so common that it is hard to look into or to find the ones that are really worth looking into. Or there is simply no complete process in place to watch the log files. Sometimes the error shows up much later than it actually occurred and stack traces do not really lead us to the right spot. More often than we think logging actually introduces runtime errors, that were otherwise not present. This is related to a more general concept, which is called observer effect, where logging actually changes the business logic.

It is nice that log files keep to some format. Usually they start with a time stamp in ISO-format, often to the millisecond. Please add trailing zeros to always have 3 digits after the decimal point in this case. It is preferable to use UTC, but people tend to stick to local date and time zones, including the issues that come with switching to and from daylight saving time. Usually we have several processes or threads that run simultaneously. This can result in a wild mix of logging entries. As long as even multiline entries stay together and as long as beginning and end of one multiline entry can easily be recognized, this can be dealt with. Tools like splunk or simple Perl, Ruby or Python scripts can help us to follow threads separately. We could actually have separate logs for each thread in the first place, but this is not a common practice and it might hit OS-limitations on the number of open files, if we have many threads or even thousands of actors as in Erlang or Akka. Keeping log entries together can be achieved by using an atomic write, like the write system call in Linux and other Posix systems. Another way is to queue the log entries and to have a logger thread that processes the queue.

Overall this area has become very complex and hard to tame. In the Java world there used to be log4j with a configuration file that was a simple properties file, at least in the earlier version. This was so good that other languages copied it and created some log4X. Later the config file was replaced by XML and more logging frame works were added. Of course quite a lot of them just for the purpose of abstracting from the large zoo of logging frameworks and providing a unique interface for all of them. So the result was, that there was one more to deal with.

It is a good question, how much logic for handling of log files do we really want to see in our software. Does the software have to know, into which file it should log or how to do log rotation? If a configuration determines this, but the configuration is compiled into the jar file, it does have to know… We can keep our code a bit cleaner by relying on program functionality without code, but this still keeps it as part of the software.

Log files have to please the system administrator or whoever replaced them in a pure devops shop. And in the end developers will have to be able to work with the information provided by the logs to find issues in the code or to explain what is happening, if the system administrator cannot resolve an issue by himself. Should this system administrator have to deal with a different special complex setup for the logging for each software he is running? Or should it be necessary to call for developer support to get a new version of the software with just another log setting, because the configurations are hard coded in the deployment artifacts? Interesting is also, what happens when we use PAAS, where we have application server, database etc., but the software can easily move to another server, which might result in losing the logs. Moving logs to another server or logging across the network is expensive, maybe more expensive than the rest of this infrastructure.

Is it maybe a good idea to just log to stdout, maintaining a decent format and to run the software in such a way that stdout is piped into a log manager? This can be the same for all software and there is one way to configure it. The same means not only the same for all the java programs, but actually the same for all programs in all languages that comply to a minimal standard. This could be achieved using named pipes in conjunction with any hard coded log file that the software wants to use. But this is a dangerous path unless we really know what the software is doing with its log files. Just think of what weird errors might happen if the software tries to apply log rotation to the named pipe by renaming, deleting, creating new files and so on. A common trick to stop software from logging into a place where we do not want this is to create a directory with the name of the file that the software usually uses and to write protect this directory and its parent directory for the software. Please find out how to do it in detail, depending on your environment.

What about software, that is a filter by itself, so its main functionality is to actually write useful data to stdout? Usually smaller programs and scripts work like this. Often they do not need to log and often they are well tested relyable parts of our software installation. Where are the log files of cp, ls, rm, mv, grep, sort, cat, less,…? Yes, they do tend to write to stderr, if real errors occur. Where needed, programs can turn on logging with a log file provided on the command line, which is also a quite operations friendly approach. Named pipes can help here.

And we had a good logging framework in place for many years. It was called syslog and it is still around, at least on Linux.

A last thought: We spend really a lot of effort to get well performing software, using multiple processes, threads or even clusters. And then we forget about the fact that logging might become the bottle neck.

Share Button

Perl 5 and Perl 6

We have now two Perls. Perl 5, which has been around for more than 20 years just as the „Perl programming language“ and Perl 6, which has been developed for more than a decade and of which now stable versions exist.

The fact, that they are both called „Perl“ is a bit misleading. They are two different and incompatible programming languages. But they share the same community. And Perl conferences are usually covering both languages.

So this rises the question about the differences or about which of the two Perls to use.

Here are some differences:

  • Perl 5 is well established and many people know it. Perl 6 has to be learned, even if it is relatively easy to learn for someone with a Perl 5 background.
  • Perl 5 runs about three times faster than Perl 6
  • Perl 6 programs are a bit shorter than Perl 5 programs
  • Perl 6 regular expressions are even better than Perl 5’s regular expressions
  • Perl 6 is more logical than Perl 5
  • Perl 6 uses by default better numerical types
  • Perl 6 makes it easier and more natural to do object oriented programming and functional programming
  • Perl 6 has come up with a useful approach for doing multithreading.
  • Perl 5 has so many cool libraries on CPAN, Perl 6 just a few.

Links:

Share Button

Swiss Perl Workshop 2017

I have attended the Swiss Perl Workshop.
We were a group of about 40 people, one track and some very interesting talks, including by Damian Conway.
I gave a regular talk and a lightning talk myself.
The content of my talk might go into another Blog post in the future.
The Perl programming language is still interesting, and of course it was covered in both variants: Perl 5 and Perl 6.
But many of the talks were about general issues like security and architecture and just exemplified by Perl.

The Video recording of talks was optional. Here are those that have been recorded and already uploaded: Youtube: Swiss Perl Workshop

Share Button

Lazy Collections, Strings or Numbers

The idea is, that we have data that is obtained or calculated to give us on demand as much of it as we request. But it is not necessarily initially present. This concept is quite common in the functional world, where we in a way hide the deprecated concept of state in such structures, by the way in a way that lets use retain the benefits that led to the desire for statelessness.

Actually the concept is quite old. We have it for I/O in Unix and hence in Linux since the 1970ies. „Everything is a file“, at least as long as we constrain ourselves to a universal subset of possible file operations. It can be keyboard input, a named or anonymous pipe, an actual file, a TCP-connection, to name the most important cases. These are „lazy“ files, behave more or less like files as far as sequential reading is concerned, but not for random access reading. The I/O-concept has been done in such a way that it takes the case into account that we want to read n bytes, but get only m < n bytes. This can happen with files when we reach their end, but then we can obtain an indication that we reached the end of the file, while it is perfectly possible that we read less then we want in one access, but eventually get \ge n bytes including subsequent reads. Since the API has been done right, but by no means ideal, it generalizes well to the different cases that exist in current OS environments.

We could consider a File as an array of bytes. There is actually a way to access it in this way by memory-mapping it, but this assumes a physically present file. Now we could assume that we think of the array as a list that is optimized for sequential access and iterating, but not for random access. Both list types actually exist in languages like Java. Actually the random access structure can be made lazy as well, within certain constraints. If the source is actually sequential, we can just assume that the data is obtained up to the point where we actually read. The information about the total length of the stream may or may not be available, it is always available somehow in the case of structures that are completely available in memory. This random access on lazy collections works fine if the reason of laziness is to actually save us from doing expensive operations to obtain data that we do not actually need or to obtain them in parallel to the computation that processes the data. But we loose another potential drawback in this case. If the data is truly sequential, we can actually process data that is way beyond our memory capacity.

So the concept transfers easily from I/O-streams to lists and even arrays, most naturally to iterables that can be iterated only once. But we can easily imagine that this also applies to Strings, which can be seen a sequence of characters. If we do not constrain us to what a String is in C or Java or Ruby, but consider String to be a more abstract concept, again possibly dropping the idea of knowing the length or having a finite length. Just think of the output of the Unix command „yes“ or „cat /dev/zero“, which is infinite, in a theoretical way, but the computer won’t last forever in real life, of course. And we always interrupt the output at some time, usually be having the consumer shut down the connection.

Even numbers can be infinite. For real numbers this can happen only after the decimal point, for p-adic numbers it happens only before the decimal point, if you like to look into that. Since we rarely program with p-adic numbers this is more or less an edge case that is not part of our daily work, unless we actually do math research. But we could have integers with so many digits that we actually obtain and process them sequentially.

Reactive programming, which is promoted by lightbend in the Reactive Manifesto relies heavily on lazy structures, in this case data streams. An important concept is the so called „backpressure“, that allows the consumer to slow down the producer, if it cannot read the data fast enough.

Back to the collections, we can observe different approaches. Java 8 has introduced streams as lazy collections and we need to transform collections into streams and after the operation a stream back into a collection, at least in many real life situations. But putting all into one structure has some drawbacks as well. But looking at it from an abstract point of view this does not matter. The java8-streams to not implement a collection interface, but they are lazy collections from a more abstract point of view.

It is interesting that this allows us to relatively easily write nested loops where the depth of the nesting is a parameter that is not known at compile time. We just need a lazy collections of n-tuples, where n is the actual depth of the nesting and the contents are according to what the loops should iterate through. In this case we might or might not know the size of the collection, possibly not fitting into a 32-bit-integer. We might be able to produce a random member of the collection. And for sure we can iterate through it and stop the iteration wherever it is, once the desired calculation has been completed.

Share Button

Alpine Perl Workshop

On 2016-09-02 and 2016-09-03 I was able to visit the Alpine Perl Workshop. This was a Perl conference with around 50 participants, among them core members of the Perl community. We had mostly one track, so the documented information about the talks that were given is actually quite closely correlated to the list of talks that I have actually visited.

We had quite a diverse set of talks about technical issues but also about the role of Perl programming language in projects and in general. The speeches were in English and German…

Perl 6 is now a reality. It can be used together with Perl 5, there are ways to embed them within each other and they seem to work reasonably well. This fills some of the gaps of Perl 5, since the set of modules is by far not as complete as for Perl 5.

Perl 5 has since quite a few years established a time boxed release schedule. Each year they ship a new major release. The previous two releases are supported for bugfixes. The danger that major Linux distributions remain on older releases has been banned. Python 3 has been released in 2008 and still in 2016 Python 2.7 is what is usually used and shipped with major Linux distributions. It looks like Perl 5 is there to stay, not be replaced by Perl 6, which is a quite different language that just shares the name and the community. But the recent versions are actually adopted and the incompatible changes are so little that they do not hurt too much, usually. An advantage of Perl is the CPAN repository for libraries. It is possible to test new versions against a ton of such libraries and to find out, where it might break or even providing fixes for the library.

An interesting issue is testing of software. For continuous integration we can now find servers and they will run against a configurable set of Perl versions. But using different Linux distributions or even non-Linux-systems becomes a more elaborate issue. People willing to test new versions of Perl or of libraries on exotic hardware and OS are still welcome and often they discover a weakness that might be of interest even for the mainstream platforms in the long run.

I will leave it with this. You can find more information in the web site of the conference.

And some of the talks are on youtube already.

It was fun to go there, I learned a lot and met nice people. It would be great to be able to visit a similar event again…

Share Button

Integers in Perl 6

The language Perl 6 has been announced to be production ready by the beginning of this year. Its implementation is Rakudo, while the Perl 6 programming language itself is an abstract language definition that allows any language implementation that passes the test suite to call itself an Perl 6 implementation. The idea is not totally new, we see the Ruby language being implemented more than once (Ruby, JRuby, Rubinius, IronRuby), but we can also learn from the Ruby guys that it is a challenge to keep this up to date and eventually it is likely that one implementation will fall back or go its own way at some point of time.

Perl 6 is also called „Perl“ as part of its name, but quite different from its sister language Perl, which is sometimes called „Perl 5“ to emphasize the distinction, so it is absolutely necessary to call it „Perl 6“ or maybe „Rakudo“, but not just „Perl“.

Even though many things can be written in a similar way, a major change to Perl 5 is the way of dealing with numeric types. You can find an article describing Numeric Types in Perl [5]. So now we will see how to do the same things in Perl 6.

Dealing with numeric types in Perl 6 is neither like in Perl 5 nor like what we are used to in many other languages.

So when we just use numbers in a naïve way, we get long integers automatically:

my $f = 2_000_000_000;
my $p = 1;
loop (my Int $i = 0; $i < 10; $i++) {
    say($i, " ", $p);
    $p *= $f;
}

creates this output:

0 1
1 2000000000
2 4000000000000000000
3 8000000000000000000000000000
4 16000000000000000000000000000000000000
5 32000000000000000000000000000000000000000000000
6 64000000000000000000000000000000000000000000000000000000
7 128000000000000000000000000000000000000000000000000000000000000000
8 256000000000000000000000000000000000000000000000000000000000000000000000000
9 512000000000000000000000000000000000000000000000000000000000000000000000000000000000

This is an nice default, similar to what Ruby, Clojure and many other Lisps use, but most languages have a made a choice that is weird for application development.

Now we can also statically type this:

my Int $f = 2_000_000_000;
my Int $p = 1;
loop (my Int $i = 0; $i < 10; $i++) {
    say($i, " ", $p);
    $p *= $f;
}

and we get the exact same result:

0 1
1 2000000000
2 4000000000000000000
3 8000000000000000000000000000
4 16000000000000000000000000000000000000
5 32000000000000000000000000000000000000000000000
6 64000000000000000000000000000000000000000000000000000000
7 128000000000000000000000000000000000000000000000000000000000000000
8 256000000000000000000000000000000000000000000000000000000000000000000000000
9 512000000000000000000000000000000000000000000000000000000000000000000000000000000000

Now we can actually use low-level machine integers which do an arithmetic modulo powers of 2, usually 2^{32} or 2^{64}:

my int $f = 2_000_000_000;
my int $p = 1;
loop (my Int $i = 0; $i < 10; $i++) {
    say($i, " ", $p);
    $p *= $f;
}

and we get the same kind of results that we would get in java or C with (signed) long, if we are on a typical 64-bit environment:

0 1
1 2000000000
2 4000000000000000000
3 -106958398427234304
4 3799332742966018048
5 7229403301836488704
6 -8070450532247928832
7 0
8 0
9 0

We can try it in Java. I was lazy and changed as little as possible and the "$" is allowed as part of the variable name by the language, but of course not by the coding standards:

public class JavaInt {
    public static void main(String[] args) {
        long $f = 2_000_000_000;
        long $p = 1;
        for (int $i = 0; $i < 10; $i++) {
            System.out.println($i + " " +  $p);
            $p *= $f;
        }
    }
}

We get this output:

0 1
1 2000000000
2 4000000000000000000
3 -106958398427234304
4 3799332742966018048
5 7229403301836488704
6 -8070450532247928832
7 0
8 0
9 0

And we see, with C# we get the same result:

using System;

public class CsInt {

    public static void Main(string[] args) {
        long f = 2000000000;
        long p = 1;
        for (int i = 0; i < 10; i++) {
            Console.WriteLine(i + " " +  p);
            p *= f;
        }
    }
}

gives us:

0 1
1 2000000000
2 4000000000000000000
3 -106958398427234304
4 3799332742966018048
5 7229403301836488704
6 -8070450532247928832
7 0
8 0
9 0

If you like, you can try the same in C using signed long long (or whatever is 64 bits), and you will get the exact same result.

Now we can simulate this in Perl 6 also using Int, to understand what int is really doing to us. The idea has already been shown with Ruby before:

my Int $MODULUS = 0x10000000000000000;
my Int $LIMIT   =  0x8000000000000000;
sub mul($x, $y) {
    my Int $result = ($x * $y) % $MODULUS;
    if ($result >= $LIMIT) {
        $result -= $MODULUS;
    } elsif ($result < - $LIMIT) {
        $result += $MODULUS;
    }
    $result;
}

my Int $f = 2_000_000_000;
my Int $p = 1;
loop (my Int $i = 0; $i < 10; $i++) {
    say($i, " ", $p);
    $p = mul($p, $f);
}

and we get the same again:

0 1
1 2000000000
2 4000000000000000000
3 -106958398427234304
4 3799332742966018048
5 7229403301836488704
6 -8070450532247928832
7 0
8 0
9 0

The good thing is that the default has been chosen correctly as Int and that Int allows easily to do integer arithmetic with arbitrary precision.

Now the question is, how we actually get floating point numbers. This will be covered in another blog posting, because it is a longer story of its own interest.

Share Button

Perl 6

Perl 6 has silently reached its first production ready release on Christmas 2015, called v6c. It will be interesting to explore what this language can do, which features it offers and how it compares to existing relevant and interesting languages like Java, C, Ruby, Perl (5), Clojure, Scala, F#, C++, Python, PHP and others in different aspects. It looks like Perl 5 is there to stay and will be continued. Perl 6 should actually be considered to be a different programming language than Perl 5, so the name is somewhat misleading, because it suggests slightly more similarity than there really is. On the other hand, it was done by the same people, good ideas concepts from Perl 5 were retained and so it does look somewhat similar.

Today I will just provide some links

* Main web page
* Documentation
* Documentation II
* Rakudo: Currently the major implementation
* Wikipedia
* Wikipedia (Russian)
* Wikipedia (Spanish)
* Wikipedia (French)
* Wikipeida (Norwegian)

Maybe I will write more about this in the future…

Share Button