The little obstacles of interoperability

Deutsch

A lot of things in today’s IT landscape have been unified and interoperability is much better than 20 years ago.

Some examples:

  • Networking: Today networking is TCP/IP. Even the physical cables with RJ45/Ethernet and the wireless networks have been standardized and all kinds of devices can use the same networks. In the old days there were tons of mutually incompatible proprietary network technologies like BitNet (IBM), NetBios (MS), DecNet (DEC), IPX (Novell),….
  • Character Sets: Today we have Unicode and a few standard encodings. At least for Web and Emails we have ways to provide meta information about the actual encoding. This area is not totally free of problems, but on a very good way. In earlier days we had to deal with different EBCDIC-encodings or with character sets that only fully supported English langauge and a few languages using the same alphabet without additional letters like „ä“, „ö“, „ü“, „ø“, „ñ“,… So there we have encountered a great progress.
  • Numbers: For floating point numbers and integers a relatively small set of standardized numerical types has been established, that are used more or less everywhere and behave almost in the same way. The issue of integer overflow remains problematic, though.
  • Software: In the old days software was written for a specific machine, one CPU architecture and one OS. Today we have common platforms on different kind of hardware: Linux runs on almost any physical and virtual hardware from mobile phones and routers to super computers with practically the same kernel. Java, Ruby, Perl, Scala and some other programming languages are available on a range of platforms and provide some kind of abstract platform. And the web is often a good way to develop applications once for a large range of devices.
  • File system: At least we now have a common understanding what a file system looks like. There are some OS specific specialties, but the general idea is still the same. It is possible to share file systems between different operating systems by using technologies like Samba.
  • GNU-Tools: The GNU-Tools (bash, ls, cp, mv,……..) have become the standard on Linux boxes and they are way superior to the traditional Unix tools of the same role and name, that we can still find in Solaris, for example. You can (and should) install them on any Unix and even via cygwin on MS-Windows.

Interoperability is today for many of us interoperability between Linux (and possibly some other rare Posix/Unix-like systems) and Win32/Win64 (MS-Windows).

Experienced Linux users are used to having the forward slash („/“) as file path separator. The backslash is used for escaping special characters. In the MS-Windows-world we often see the backslash („\“) as file path separator. That is enforced in the CMD-window, because it does not pass through the forward slash. My experience with low level Win32/Win64 libraries shows that they understand both variants equally well. Anyway do Ruby, Perl, Java, Cygwin and others support the forward slash. So there is hardly ever any need to check the OS and use backslash or forward slash depending on that, other than for cmd/bat-scripts, but who wants them for more then five lines? I strongly recommend just using the forward slash also under MS-Windows when writing programs in Java, Perl, Scala, Ruby, … It makes life also easier, because the backslash often has to be doubled and it is sometimes hard to keep track of how many backslashes need to be written and to read it.

The line ending is a bit tricky. Linux and Unix use just a „Linefeed“ („\n“=Ctrl-J). For MS-DOS and MS-Windows the combination „Carriage-Return+Linefeed“ („\r\n“=Ctrl-M Ctrl-J) has become the default. Most of today’s programs do not care and understand both variants more or less equally well. Only Notepad does get confused with Linefeed-only files, but notepad is a bad choice anyway. Better Editors (gvim, emacs, ultraedit, scite, …) exist. On the other hand we get problems with the MS-Windows-line-termination in the case of executable scripts. Usually the contain a first line like „#/usr/bin/ruby“. The OS uses that as hint on how to execute them, in this case by calling /usr/bin/ruby. If the line ends with Ctrl-M Ctrl-J, then the OS tries to find a program „/usr/bin/ruby^M“ (^M = Ctrl-M = „\r“), which of course does not exist and we just get an obscure error message.

It is easy to do the conversion:

$ perl -i~ -p -e ’s/\r//g;‘ script

Or the other direction:

$ perl -i~ -p -e ’s/\n/\r\n/g;‘ textfile

For those who use subversion there are ways to enforce a certain way of line endings. Even git supports this.

Share Button

ScalaUA 2018

About a week ago I visited Scala UA in Kiev.

This was the schedule.

It was a great conference once again, as it was already in 2017 and I really enjoyed everything, including the food, which was great again… 🙂

I listened to the following talks on the first day:

And I gave this talk:

On the second day I listened to the following talks:

  • Advanced Patterns in Asynchronous Programming, Michael Arenzon & Assaf Ronen
  • Akka: Actors Design And Communication Techniques, Alex Zvolinskiy
  • Monad Stacks or: How I Learned to Stop Worrying and Love the Free Monad, and other stories, Harry Laoulakos
  • Scala on Wire: How event streams help us build Android apps, Maciej Gorywoda
  • Purely functional microservices with http4s and doobie, Jasper van Zandbeek
  • Tame Your Data with Reactive Streams and Monix, Jacek Kunicki
  • Future and issues of the Scala Ecosystem, Panel Discussion
  • Roll your own Event Sourcing, Lina Krutulytė-Kriščiūnė

And I gave this talk:

As always there was a lot of inspiration coming from the talks and a lot worth exploring in future posts. So there will be Scala posts once in a while, as in the past…

Links

  • Scala Days 2018
  • Scala UA 2017
  • Scala UA (official page)
  • NoSQL
  • Scala
  • Share Button

Daylight Saving

In many countries of Europe we have to readjust our watches and clocks today, unless they do it automatically.

It is interesting, that dealing with this has always been a great challenge for software engineers and a very high two digit percentage of software that in some way or other deals with time, does not handle it correctly. Operating systems and standard software are doing a very good job on this, but specialized software that is written today very often does not properly handle the switch. It usually does not matter, because it just results in effects like having rare phone calls charged for an hour too much, to give an example. And really critical software is properly tested for this, I hope. But software developers who are able to deal with this properly are less much than 100%, and even those who are at least able to accept that they do not know and prepared to listen to somebody who knows it, are less than 100%. And daylight saving is really a very very minor invisible side issue for most software projects. They have a task to perform and usually they do that task well enough… So we will continue to develop software that is not really properly handling daylight saving.

One more reason to stop this changing of clocks twice a year, especially since the saving of energy, that was once mentioned as advantage, does not seem to be significant.

Share Button

When to use Scala and Ruby

There are many interesting languages that have their sweet spots and of course a larger set of languages than just two should be considered for new projects.

But Ruby and Scala are both very interesting languages that did not just pick up and sell concepts that were already known, but brought them to a new level and to new beauty. Interestingly, both were started by a single person and finally became community projects.

There are some differences to observe.

Ruby is mostly a dynamic language, which means that it is easier and more natural to change the program at runtime. This is not necessarily a bad thing and different Lisp variants including today’s Clojure have successfully used and perfected this kind of capability for many decades. Consequently more things happen at runtime, especially dynamic typing is used, which means that types only exist at runtime.

Scala is mostly a static language, which means that all program structures have to be created at compile time. But this has been brought to perfection in the sense that a lot of things that are typically available only in dynamic languages, can be done. The type system is static and it is in this sense more consistent and more rigorous than the type system of Java, where we sometimes encounter areas that cannot reasonably be covered by Generics and fall back to the old flavor of untyped collections. This does not happen too often, but the static typing of Scala goes further.

In general this gives more flexibility to Ruby and makes it somewhat harder to tame the ways to do similar things in a static way in Scala. But the type system at compile time of course helps to match things, to find a certain portion of errors and even to make the program more self explanatory without relying on comments. In IDEs it is hard to properly support Scala, but the most common IDEs have achieved this to a very useful level. This should not be overvalued, because there are enough errors that cannot be detected by just using common types. It is possible to always define more specific types which include tight constraints and thus perform really tight checking of certain errors at compile time, but the built in types and the types from common libraries are to convenient and the time effort for this is too high, so it does not seem to be the usual practice. In any case it is a recommended practice to achieve a good test coverage of non-trivial functionality with automated tests. They implicitly cover type errors that are detected by the compiler in Scala, but of course only to the level of the test coverage. Ruby is less overhead to compile and run. We just write the program and run it, while we need a somewhat time intensive compile step for Scala. If tests are included, it does not make so much of a difference, because running the tests or preceding them with a compile job is kind of a minor difference.

An interesting feature of Ruby is called „monkey patching“. This means that it is possible to change methods of an existing class or even of a single object. This can be extremely powerful, but it should be used with care, because it changes the behavior of the class in the whole program and can break libraries. Usually this is not such a bad thing, because it is not used for changing existing methods, but for adding new methods. So it causes problems only when two conflicting monkey patches occur in different libraries. But for big programs with many libraries there is some risk in this area. Scala tries to achieve the same by using „implicit conversions“. So a conversion rule is implicitly around and when a method is called on an object that does not exist in its type, the adequate conversion is applied prior to the method. This works at compile time. Most of the time it is effectively quite similar to monkey patching, but it is a bit harder to tame, because writing and providing implicit conversions is more work and harder to understand than writing monkey patches. On the other hand, Scala avoids the risks of Ruby’s monkey patching.

An increasingly important issue is making use of multiple CPU cores. Scala and especially Scala in combination with Akka is very strong on this. It supports a reasonably powerful and tamable programming model for using multiple threads. The C- or JavaSE-way is very powerful, but it is quite difficult to avoid shooting oneself into the foot and even worse there is a high likelihood that such errors show up in production, in times of heavy load, while all testing seemed to go well. This is the way to go in some cases, but it requires a lot of care and a lot of thinking and a team of skillful developers. There are more developers who think they belong to this group than are actually able to do this well. Of course Scala already filters out some less skilled developers, but still I think its aproach with Akka is more sound.
Ruby on the other hand has very little support for multithreading, and cannot as easily make use of multiple cores by using threads. While the language itself does support the creation of threads, for many years the major implementation had very little support for this in the sense that not actually multiple threads were running at the same time. This propagated into the libraries, so this will probably never become the strength of Ruby. The way to go is to actually start multiple processes. This is not so bad, because the overhead of processes in Ruby is much less than in JVM-languages. Still this is an important area and Scala wins this point.

Concerning web GUIs Ruby has Rails, which is really a powerful and well established way to do this. Scala does provide Play, which is in a way a lot of concepts from Rails and similar frameworks transferred to Scala. It is ok to use it, but rails is much more mature and more mainstream. So I would give this point to Ruby. Rails includes Active Record, about which I do have doubts, but this is really not a necessary component of a pure WebGUI, but more a backend functionality…

So in the end I would recommend to use Scala and Akka for the solution, if it is anticipated that a high throughput will be needed. For smaller solutions I would favor Ruby, because it is a bit faster and easier to get it done.

For larger applications a multi tier architecture could be a reasonable choice, which opens up to combinations. The backend can be done with Scala. If server side rendering is chosen, Ruby and Rails with REST-calls to the backend can be used. Or a single page application which is done in JavaScript or some language compiling to JavaScript and again REST-calls to the backend.

Share Button

Carry Bit, Overflow Bit and Signed Integers

It has already been explained how the Carry Bit works for addition. Now there was interest in a comment about how it would work for negative numbers.

The point is, that the calculation of the carry bit does not have any dependency on the sign. The nature of the carry bit is that it is meant to be used for the less significant parts of the addition. So assuming we add two numbers x and y that are having k and l words, respectively. We assume that n=\max(k,l) and make sure that x and y are both n words long by just providing the necessary number of 0-words in the most significant positions. Now the addition is performed as described by starting with a carry bit of 0 and adding with carry x[0]+y[0], then x[1]+y[1] and so on up to x[n-1]+y[n-1], assuming that x[0] is the least significant word and x[n-1] the most significant word, respectively. Each addition includes the carry bit from the previous addition. Up to this point, it does not make any difference, if the numbers are signed or not.

Now for the last addition, we need to consider the question, if our result still fits in n words or if we need one more word. In the case of unsigned numbers we just look at the last carry bit. If it is 1, we just add one more word in the most significant position with the value of 1, otherwise we are already done with n words.

In case of signed integers, we should investigate what can possibly happen. The input for the last step is two signed words and possibly a carry bit from the previous addition. Assuming we have m-Bit-words, we are adding numbers between -2^{m-1} and 2^{m-1}-1 plus an optional carry bit c. If the numbers have different signs, actually an overflow cannot occur and we can be sure that the final result fits in at most n words.

If both are not-negative, the most significant bits of x[n-1] and y[n-1] are both 0. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \ge 2^{n-1}, which means that the result „looks negative“, although both summands were not-negative. In this case another word with value 0 has to be provided for the most significant position n to express that the result is \ge 0 while maintaining its already correctly calculated result. It cannot happen that real non-zero bits are going into this new most significant word. Consequently the carry bit can never become 1 in this last addition step.

If both are negative, the most significant bits of x[n-1] and y[n-1] are both 1. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \lt 2^{n-1}, which means that the result „looks positive or 0“, although both summands were negative. In this case another word with value 2^n-1 or -1, depending on the viewpoint, has to be prepended as new most significant word. In this case of two negative summands the carry bit is always 1.

Now typical microprocessors provide an overflow flag (called „O“ or more often „V“) to deal with this. So the final addition can be left as it is in n words, if the overflow bit is 0. If it is 1, we have to signal an overflow or we can just provided one more word. Depending on the carry flag it is 0 for C=0 or all bits 1 (2^n-1 or -1, depending on the view point) for C=1.

The overflow flag can be calculated by o=\mathrm{signbit}(x) = \mathrm{signbit}(y) \land \mathrm{signbit}(x+y\mod 2^n) \ne \mathrm{signbit}(x).
There are other ways, but they lead to the same results with approximately the same or more effort.

The following table shows the possible combinations and examples for 8-Bit arithmetic and n=1:

x<0 or x≥0y<0 or y≥ 0(x+y)%2^8 < 0 or ≥ 0Overflow BitCarry Bitadditional word neededvalue additional wordExamples (8bit)
x≥0y≥0≥000no-0+0
63+64
x≥0y≥0<010yes064+64
127+127
x≥0y<0≥000 or 1no-65+(-1)
127+(-127)
x≥0y<0<000 or 1no-7+(-8)
127+(-128)
0+(-128)
x<0y≥0≥000 or 1no--9 + 12
-1 + 127
-127+127
x<0y≥0<000 or 1no--128+127
-128+0
-1 + 0
x<0y<0≥011yes-1-64 + (-65)
-128+(-128)
x<0y<0<001no--1 + (-1)
-1 + (-127)
-64 + (-64)

If you like, you can try out examples that include the carry bit and see that the concepts still work out as described.

Links

Share Button

Java Properties Files and UTF-8

Java uses a nice pragmatic file format for simple configuration tasks and for internationalization of applications. It is called Java properties file or simply „.properties file“. It contains simple key value pairs. For most configuration task this is useful and easy to read and edit. Nested configurations can be expressed by simple using dots („.“) as part of the key. This was introduced already in Java 1.0. For internationalization there is a simple way to create properties files with almost the same name, but a language code just before the .properties-suffix. The concept is called „resource bundle“. Whenever a language specific string is needed, the program just knows a unique key and performs a lookup.

The unpleasant part of this is that these files are in the style of the 1990es encoded in ISO-8859-1, which is only covering a few languages in western, central and northern Europe. For other languages as a workaround an \u followed by the 4 digit hex code can be used to express UTF-16 encoding, but this is not in any way readable or easy to edit. Usually we want to use UTF-8 or in some cases real UTF-16, without this \u-hack.

A way to deal with this is using the native2ascii-converter, that can convert UTF-8 or UTF-16 to the format of properties files. By using some .uproperties-files, which are UTF-8 and converting them to .properties-files using native2ascee as part of the build process this can be addressed. It is still a hack, but properly done it should not hurt too much, apart from the work it takes to get this working. I would strongly recommend to make sure the converted and unconverted files never get mixed up. This is extremely important, because this is not easily detected in case of UTF-8 with typical central European content, but it creates ugly errors that we are used to see like „sch�ner Zeichensalat“ instead of „schöner Zeichensalat“. But we only discover it, when the files are already quite messed up, because at least in German the umlaut characters are only a small fraction of the text, but still annoying if messed up. So I would recommend another suffix to make this clear.

The bad thing is that most JVM-languages have been kind of „lazy“ (which is a good thing, usually) and have used some of Java’s infrastructures for this, thus inherited the problem from Java.

Another way to deal with this is to use XML-files, which are actually by default in UTF-8 and which can be configured to be UTF-16. With some work on development or search of existing implementations there should be ways to do the internationalization this way.

Typically some process needs to be added, because translators are often non-IT-people who use some tool that displays the texts in the original languages and accepts the translation. For good translations, the translator should actually use the software to see the context, but this is another topic for the future. Possibly there needs to be some conversion from the data provided by the translator into XML, uproperties, .properties or whatever is used. These should be automated by scripts or even by the build process and merge new translations properly with existing ones.

Anyway, Java 9 Java 9 will be helpful in this issue. Finally Java-9-properties that are used as resource bundles for internationalization can be UTF-8.

Links

Share Button

Scala Exchange 2017

I have visited Scala Exchange („#ScalaX“) in London on 2017-12-14 and 2017-12-15. It was great, better than 2015 in my opinion. In 2016 I missed Scala Exchange in favor of Clojure Exchange.

This time there were really many talks about category theory and of course its application to Scala. Spark, Big Data and Slick were less heavily covered this time. Lightbend (former Typesafe), the company behind Scala, did show some presence, but less than in other years. But 800 attendees are a number by itself and some talks about category theory were really great.

While I have always had a hard time accepting why we need this „Über-Mathematics“ like category theory for such a finite task as programming, I start seeing its point and usefulness. While functors and categories provide a meta layer that is actually accessible in Scala there are actually quite rich theories that can even be useful when constrained to a less infinite universe. This helps understanding things in Java. I will leave details to another post. Or forget about it until we have the next Scala conference.

So the talks that I visited were:

  • Keynote: The Maths Behind Types [Bartosz Milewski]
  • Free Monad or Tagless Final? How Not to Commit to a Monad Too Early [Adam Warski]
  • A Pragmatic Introduction to Category Theory [Daniela Sfregola]
  • Keynote: Architectural patterns in Building Modular Domain Models [Debasish Ghosh]
  • Automatic Parallelisation and Batching of Scala Code [James Belsey and Gjeta Gjyshinca]
  • The Path to Generic Endpoints Using Shapeless [Maria-Livia Chiorean]
  • Lightning talk – Optic Algebras: Beyond Immutable Data Structures [Jesus Lopez Gonzalez]
  • Lightning Talk – Exploring Phantom Types: Compile-Time Checking of Resource Patterns [Joey Capper]
  • Lightning Talk – Leave Jala Behind: Better Exception Handling in Just 15 Mins [Netta Doron]
  • Keynote: The Magic Behind Spark [Holden Karau]
  • A Practical Introduction to Reactive Streams with Monix [Jacek Kunicki]
  • Building Scalable, Back Pressured Services with Akka [Christopher Batey]
  • Deep Learning data pipeline with TensorFlow, Apache Beam and Scio [Vincent Van Steenbergen]
  • Serialization Protocols in Scala: a Shootout [Christian Uhl]
  • Don’t Call Me Frontend Framework! A Quick Ride on Akka.Js [Andrea Peruffo]
  • Keynote: Composing Programs [Rúnar Bjarnason]
Share Button

Collection Initializiation in Java

There is this so called „double brace“ pattern for initializing collection. We will see if it should be a pattern or an anti-pattern later on…

The idea is that we should consider the whole initializion of a collection one big operation. In other languages we write something like
[element1 element2 element3]
or
[element1, element2, element3]
for array-like collections and
{key1 val1, key2 val2, key3 val3}
or
{key1 => val1, key2 => val2, key3 => val3}.
Java could not do it so well until Java 9, but actually there was a way to construct sets and lists:
Arrays.asList(element1, element2, element3);
or
new HashSet<>(Arrays.asList(element1, element2, element3));.
Do not ask about immutability (or unmodifyability), which is not very well solved in the standard java library until now, unless you are willing to take a look into Guava, which we will in another article… Let us stick with Java’s own facilities for today.

So the double brace pattern would be something like this:

import java.util.*;

public class D {
    public static void main(String[] args) {
        List<String> l = new ArrayList<String>() {{
                add("abc");
                add("def");
                add("uvw");
            }};
        System.out.println("l=" + l);

        Set<String> s = new HashSet<String>() {{
                add("1A2");
                add("2B707");
                add("3DD");
            }};
        System.out.println("s=" + s);

        Map<String, String> m = new HashMap<String, String>() {{
                put("k1", "v1");
                put("k2", "v2");
                put("k3", "v3");
            }};
        System.out.println("m=" + m);
    }
}

What does this do?

First of all having an opening brace after the new XXX() creates an anonymous class extending XXX. Then we open the body of the extended class. What is well known to many is that there can be a static {....} section, that is called exactly once for each class. The same applies for a non-static section, which is achieved by omitting the static keyword. This is of course called once for each instance of the class, so in this case it will be called after the constructor of the base class and serves kind of as a replacement for the constructor. To make it look cooler the two pairs of braces are placed together.

It is not so magic, but it creates a lot of overhead by creating anonymous classes with no real additional functionality just for the sake of an initialization. It is even worse, because these anonymous inner classes are not static, so they actually can refer to their surrounding instance. They do not make use of this, but anyway they carry a reference to their surrounding class which might be a very serious problem for serialization, if that is used. And for garbage collection. So please consider the double-brace-initialization as an anti-pattern. Others have blogged about this too…

There are more legitimate ways to group the initialization together. You can put the initialization into a static method and call that. Or you could group it with single braces, just to indicate the grouping. This is a bit unusual, but at least correct:

import java.util.*;

public class E {
    public static void main(String[] args) {
        List<String> l = new ArrayList<String>();
        {
            l.add("abc");
            l.add("def");
            l.add("uvw");
        }
        System.out.println("l=" + l);

        Set<String> s = new HashSet<String>();
        {
            s.add("1A2");
            s.add("2B707");
            s.add("3DD");
        }
        System.out.println("s=" + s);

        Map<String, String> m = new HashMap<String, String>();
        {
            m.put("k1", "v1");
            m.put("k2", "v2");
            m.put("k3", "v3");
        }
        System.out.println("m=" + m);
    }
}

While the first two can somehow be written using Arrays.asList(...), now in Java 9 there are nicer ways for writing all three using List.of("abc", "def", "uvw");, Set.of("1A2", "2B707", "3DD"); and Map.of("k1", "v1", "k2", "v2", "k3", "v3");, which is recommended over any other way because there are some additional runtime and compile time checks and because these are efficient immutable collections. This has been blogged about too.

The aspect of immutability which we should consider today, is not very well covered by the java collections (apart from the new internal one for the new factory methods. Wrapping in Collections.unmodifyableXXX(...) is a bit of overhead in terms of code, memory and CPU-usage but it does not give a guarantee that the collection wrapped into this is actually not being modified elsewhere.

Share Button

Perl 5 and Perl 6

We have now two Perls. Perl 5, which has been around for more than 20 years just as the „Perl programming language“ and Perl 6, which has been developed for more than a decade and of which now stable versions exist.

The fact, that they are both called „Perl“ is a bit misleading. They are two different and incompatible programming languages. But they share the same community. And Perl conferences are usually covering both languages.

So this rises the question about the differences or about which of the two Perls to use.

Here are some differences:

  • Perl 5 is well established and many people know it. Perl 6 has to be learned, even if it is relatively easy to learn for someone with a Perl 5 background.
  • Perl 5 runs about three times faster than Perl 6
  • Perl 6 programs are a bit shorter than Perl 5 programs
  • Perl 6 regular expressions are even better than Perl 5’s regular expressions
  • Perl 6 is more logical than Perl 5
  • Perl 6 uses by default better numerical types
  • Perl 6 makes it easier and more natural to do object oriented programming and functional programming
  • Perl 6 has come up with a useful approach for doing multithreading.
  • Perl 5 has so many cool libraries on CPAN, Perl 6 just a few.

Links:

Share Button

Swiss Perl Workshop 2017

I have attended the Swiss Perl Workshop.
We were a group of about 40 people, one track and some very interesting talks, including by Damian Conway.
I gave a regular talk and a lightning talk myself.
The content of my talk might go into another Blog post in the future.
The Perl programming language is still interesting, and of course it was covered in both variants: Perl 5 and Perl 6.
But many of the talks were about general issues like security and architecture and just exemplified by Perl.

The Video recording of talks was optional. Here are those that have been recorded and already uploaded: Youtube: Swiss Perl Workshop

Share Button