Will Java, C, C++ and C# be the new Cobols?

A few decades ago most programming was performed in Cobol (I do not want to shout it), Fortran, Rexx and some typical main frame languages, that hardly made it to the Linux-, Unix- or MS-Windows-world. They are still present, but mostly used for maintenance and extension of existing software, but less often for writing new software from scratch.
In these days languages like C, C++, Java and to a slightly lesser extent C# dominate the list of most commonly used languages. I would assume that JavaScript is also quite prominent in the list, because it has become more popular to write rich web clients using frameworks like Angular JS. And there are tons of them and some really good stuff. Some people like to see JavaScript also on the server side and in spite of really interesting frameworks like Node-JS I do not really consider this a good idea. If you like you may add Objective C to this list, which I do not know very much at all, even though it has been part of my gcc since my first Linux installation in the early 1990es.

Anyway, C goes back to the 1970es and I think that it was a great language to create at that time and it still is for a certain set of purposes. When writing operating systems, database engines, compilers and interpreters for other languages, editors, or embedded software, everything that is very close to the hardware, explicit control and direct access to very powerful OS-APIs are features that prove to be useful. It has been said that Java runs as fast as C, which is at least close to the truth, but only if we do not take into account the memory usage. C has some short comings that could be done better without sacrificing its strengths in the areas where it is useful, but it does not seem to be happening.

C++ has been the OO-extension of C, but I would say that it has evolved to be a totally different language for different purposes, even though there is some overlap, it is relatively easy to call functionality written in C from C++ and a little bit harder the other way round… I have not used it very much recently, so I will refrain from commenting further on it.

Java has introduced an infrastructure that is very common now with its virtual machine. The JVM is running on a large number of servers and any JVM-language can be used there. The platform independence is an advantage, but I think that its importance on servers has diminished a little bit. There used to be all kinds of servers with different operating systems and different CPU-architectures. But now we are moving towards servers being mostly Linux with Intel-compatible CPUs, so it is becomeing less of an issue. This may change in the future again, of course.

With Mono C# can be used in ways similar to Java, at least that is what the theory says and what seems to be quite true at least up to a certain level. It seems to be a bit ahead of Java with some language features, just think of operator overloading, undeclared exceptions, properties, generics or lambdas, which have been introduced earlier or in a more elegant way or we are still waiting in Java. I think the case of lambdas also shows the limitations, because they seem to behave differently than you would expect from real closures, which is the way lambdas should be done and are done in more functionally oriented languages or even in the Ruby programming language, in the Perl programming language or typical Lisps.
Try this

List<Func<int>> actions = new List<Func<int>>();

int variable = 0;
while (variable < 5)
{
    actions.Add(() => variable * 2);
    ++ variable;
}

foreach (var act in actions)
{
    Console.WriteLine(act.Invoke());
}

We would expect the output 0, 2, 4, 6, 8, but we are getting 10, 10, 10, 10, 10 (one number in a line, respectively).
But it can be fixed:

List<Func<int>> actions = new List<Func<int>>();

int variable = 0;
while (variable < 5)
{
    int copy = variable;
    actions.Add(() => copy * 2);
    ++ variable;
}

foreach (var act in actions)
{
    Console.WriteLine(act.Invoke());
}

I would say that the concept of inner classes is better in Java, even though what is static there should be the default, but having lambdas this is less important…
You find more issues with class loader, which are kind of hard to tame in java, but extremely powerful.

Anyway, I think that all of these languages tend to be similar in their syntax, at least within a method or function and require a lot of boiler plate code. Another issue that I see is that the basic types, which include Strings, even if they are seen as basic types by the language design, are not powerful enough or full of pitfalls.

While the idea to use just null terminated character arrays as strings in C has its beauty, I think it is actually not really good enough and for more serious C applications a more advanced string library would be good, with the disadvantage that different libraries will end up using different string libraries… Anyway, for stuff that is legitimately done with C now, this is not so much of an issue and legacy software has anyway its legacy how to deal with strings, and possible painful limitations in conjunction with Unicode. Java and also C# have been introduced at a time when Unicode was already around and the standard already claimed to use more than 65536 code points (characters in Unicode-speak), but at that time 65536 seemed to be quite ok to cover the needs for all common languages and so utf-16 was picked as an encoding. This blows up the memory, because strings occupy most of the memory in typical application software, but it still leaves us with uncertainties about length and position, because code points can be one or two 16-bit-„characters“ long, which can only be seen by actually iterating through the string, which leaves us where we were with null terminated strings in C. And strings are really hard to replace or enhance in this aspect, because they are used everywhere.

Numbers are not good either. As an application developer we should not care about counting bits, unless we are in an area that needs to be specifically optimized. We are using mostly integer types in application development, at least we should. These overflow silently. Just to see it in C#:

int i = 0;
int s = 1;
for (i = 0; i < 20; i++)
{
    s *= 7;
    Console.WriteLine("i=" + i + " s=" + s);
}

which gives us:

i=0 s=7
i=1 s=49
i=2 s=343
i=3 s=2401
i=4 s=16807
i=5 s=117649
i=6 s=823543
i=7 s=5764801
i=8 s=40353607
i=9 s=282475249
i=10 s=1977326743
i=11 s=956385313
i=12 s=-1895237401
i=13 s=-381759919
i=14 s=1622647863
i=15 s=-1526366847
i=16 s=-2094633337
i=17 s=-1777531471
i=18 s=442181591
i=19 s=-1199696159

So it silently overflows, or just takes the remainder modulo 2^{32} with the representation system \{-2^{31} \ldots 2^{31}-1\}. Java, C and C++ behave exactly the same way, only that we need to know what „int“ means for our C-compiler, but if we use 32-bit-ints, it is the same. This should throw an exception or switch to some unlimited long integer. Clojure offers both options, depending on whether you use * or *‘ as operator. So as application developers we should not have to care about these bits and most developers do not think about it. Usually it goes well, but a lot of software bugs are around due to this pattern. It is just wrong in C#, Java, and C++. In C I find it more acceptable, because the typical area for using C for new software actually is quite related to bits and bytes, so the developers need to be aware of such issues all the time anyway.

I would consider it desirable to move to more expressive languages like Clojure, Scala, F#, Ruby or Perl for application development. Ruby and Perl have better Strings. Clojure and Scala inherit them from the JVM, and F# has the same strings as C#. Ruby and Clojure have a good way to deal with integers, Scala, Perl and F# can do it right if we actually want to do so, but not by default. Perl and Ruby are very weak when it comes to multithreading. As compared to Java this can be dealt with by just using more processes instead of threads, because the overhead of a Ruby or Perl process is much less than the overhead of a Java process, but I would see this as a major drawback. C, C#, Java and C++ offer good facilities to use multithreading, but the issue of avoiding typical multithreading bugs is a big deal and actually too hard for a large fraction of typical application developers. Or at least too far away from there point of focus. Moving to a more functional paradigm might be a way to go. Java enterprise edition is a failure if the goal is to get multithreading, done well without having to worry about it, because the overhead is too much. On the other hand, if you are willing to go the extra mile, having more explicit access to the multithreading mechanism and using it correctly is extremely powerful, for example in C with pthreads or with a deliberate usage of processes, shared memory and threads together. For which kind of projects do we have the time and the team for this? I am not talking about multithreaded applications that work well on the developer’s laptop, but fail during some high load processing in production with some concurrent modification issues a few months after the deployment. Thinking cannot be replaced by testing.

So now we have a lot of software in C, C++, Java and C# and a lot of new software is written in these languages, even from scratch. We could do better, sometimes we do, sometimes we don’t. It is possible to write excellent application software with Java, C++, C# and even C. It just takes a bit longer, but if we use them with care, it will be ok. Some companies are very conservative and want to use stuff that has been around for a long time. This is sometimes right and sometimes wrong. And since none of the more modern languages has really picked up so much speed that it can be considered a new main stream, it is understandable that some organizations are scared about marching into a dead end road.

On the other hand, many businesses can differentiate themselves by providing services that are only possible by having a very innovative IT. Banks like UBS and Credit Suisse in Switzerland are not likely to be there, while banks like ING are on that road. As long as they compete for totally different customer bases and as long as the business has enough strengths that are not depending so heavily on an innovative IT, but just on a working robust IT, this will be fine. But time moves on and innovation will eventually out-compete conservative businesses.

Share Button

Oracle buys the NSA

Wonderful coincidences have been discovered.

The NSA has excellent technological knowledge, especially in the area of storing and processing huge amount of data. And the US government wants to move one step ahead with privatization of state run activities. So the US government and the Oracle corporation have agreed to sell the NSA to Oracle. Oracle will be able to be the leading DB vendor, even with higher prices than today, because of unprecedented technological advances. The NSA will become more efficient when run as part of a private company. And the whole country becomes more efficient, because less lobbying is needed for companies, if they control important organizations directly and not indirectly through the US government. Another important synergy is that additional backups of database will now be available.

Share Button

Frameworks for Unit Testing and Mocking

Unit testing has fortunately become an important issue in many software projects. The idea of automatic software based unit and integration tests is actually quite old. The typical Linux software that is downloaded as source code and then built with steps like

tar xfzvv «software-name-with-version».tar.gz
cd «software-name-with-version»
./configure
make
sudo make install

often allows a step

make test

or

make check

or even both before the

make install

It was like that already in the 1990s, when the word „unit test“ was unknown and the whole concept had not been popularized to the main stream.

What we need is to write those automated tests to an extent that we have good confidence that the software will be reliable enough in terms of bugs if it passes the test suite. The tests can be written in any language and I do encourage you to think about using other languages, in order to be less biased and more efficient for writing the tests. We may choose to write a software in C, C++ or Java for the sake of efficiency or easier integration into the target platform. But these languages are efficient in their usages of CPU power, but not at all efficient in using developer time to write a lot of functionality. This is ok for most projects, because the effort it takes to develop with these languages is accepted in exchange for the anticipated benefits. For testing it is another issue.

On the other hand there are of course advantages in using actually the same language for writing the tests, because it is easier to access the APIs and even internal functionalities during the tests. So it may very well be that Unit tests are written in the same language as the software and this is actually what I am doing most of the time. But do think twice about your choice.

Now writing automated tests is actually no magic. It does not really need frameworks, but is quite easy to accomplish manually. All we need is kind of two areas in our source code tree. One area that goes into the production code and one area that is only used for the tests and remains on the development and continuous integration machines. Since writing automated tests without frameworks is not really a big deal, we should only look at frameworks that are really simple and easy to use or maybe give us really good features that we actually need. This is the case with many such frameworks, so the way to go is to actually use them and save some time and make the structure more accessible to other team members, who know the same testing framework. Writing and running unit tests should be really easy, otherwise it is not done or the unit tests are disabled and loose contact to the actual software and become worthless.

Bugs are much more expensive, the later they are discovered. So we should try to find as many of them while developing. Writing unit tests and automated integrated tests is a good thing and writing them early is even better. The pure test driven approach does so before actually writing the code. I recommend this for bug fixing, whenever possible.

There is one exception to this rule. When writing GUIs, automated testing is possible, but quite hard. Now we should have UX guys involved and we should present them with some early drafts of the software. If we had already developed elaborate selenium tests by then, it would be painful to change the software according to the advice of the UX guy and rewrite the tests. So I would keep it flexible until we are on the same page as the UX guys and add the tests later in this area.

Frameworks that I like are actually CUnit for C, JUnit for Java, where TestNG would be a viable alternative, and Google-Test for C++. CUnit works extremely well on Linux and probably on other Unix-like systems like Solaris, Aix, MacOSX, BSD etc. There is no reason why it should not work on MS-Windows. With cygwin actually it is extremely easy to use it, but with native Win32/Win64 it seems to need an effort to get this working, probably because MS-Windows is no priority for the developers of CUnit.

Now we should use our existing structures, but there can be reasons to mock a component or functionality. It can be because during the development a component does not exist. Maybe we want to see if the component is accessed the right way and this is easier to track with a mock that records the calls than with the real thing that does some processing and gives us only the result. Or maybe we have a component with is external and not always available or available, but too time consuming for most of our tests.

Again mocking is no magic and can be done without tools and frameworks. So the frameworks should again be very easy and friendly to use, otherwise they are just a pain in the neck. Early mocking frameworks were often too ambitious and too hard to use and I would have avoided them whenever possible. In Java mocking manually is quite easy. We just need an interface of the mocked component and create an implementing class. Then we need to add all missing methods, which tools like eclipse would do for us, and change some of them. That’s it. Now we have mockito for Java and Google-Mock, which is now part of Google-Test, for C++. In C++ we create a class that behaves similar to a Java interface by having all methods pure virtual with keyword „virtual“ and „=0“ instead of the implementation. The destructor is virtual with an empty implementation. They are so easy to use and they provide useful features, so they are actually good ways to go.

For C the approach is a little bit harder. We do not have the interfaces. So the way to go is to create a library of the code that we want to test and that should go to production. Then we write one of more c-files for the test, that will and up in an executable that actually runs the test. In these .c-files we can provide a mock-implementation for any function and it takes precedence of the implementation from the library. For complete tests we will need to have more than one executable, because in each case the set of mocked functions is fixed within one executable. There are tools in the web to help with this. I find the approach charming to generate the C-code for the mocked functions from the header files using scripts in the a href=“https://en.wikipedia.org/wiki/Ruby_(programming_language)“>Ruby programming language or in the Perl programming language.

Automated testing is so important that I strongly recommend to do changes to the software in order to make it accessible to tests, of course within reason. A common trick is to make certain Java methods package private and have the tests in the same package, but a different directory. Document why they are package private.

It is important to discuss and develop the automated testing within the team and find and improve a common approach. Laziness is a good thing. But laziness means running many automated tests and avoid some manual testing, not being too lazy to write them and eventually spending more time on manual repetitive activities.

I can actually teach this in a two-day or three-day course.

Share Button

The Language Issue

I had started a poll about the issue if this blog should be written in German or in English.

I would consider it a tie, but I have established the habit of writing this blog in English and translating a small fraction of the articles to German and I will keep it like that for now. If you want to volunteer for translating more articles, please let me know.

Ich habe eine Umfrage gestartet, um die Frage zu klären, in welcher Sprache dieser Blog geschrieben werden sollte.

Ich sehe das Ergebnis als unentschieden an, aber nun habe ich seit etwa 18 Monaten auf Englisch geschrieben und nur sporadisch einzelne Artikel auf Deutsch übersetzt. Das wird in der nächsten Zeit auch so bleiben. Wenn sich jemand engagieren möchte, um mehr Artikel zu übersetzen, bin ich dafür natürlich aufgeschlossen.

Here are the results:

The language issue
The language issue

Share Button

Perl Training in Switzerland

Very soon we will have the opportunity to participate in advanced Perl trainings and even some trainings about presentations.

Here are the Details.

I found these trainings useful, when I visited them. They are done by Damian Conway, one of the core developers of the Perl programming language.

The courses will be held in English. Other than in previous years the location will be in FHNW in walking distance of the railroad station of Brugg or if you are more road oriented in walking distance of the point where the Swiss national highways 5 and 3 meet in Brugg. You can find it on the map.

Share Button

Microsoft SQL Server will be available for Linux in 2017

Deutsch

Microsoft has officially announced that their database MS SQL Server will become available for Linux in 2017.

I think the time has come for this. Since the departure of Steve Ballmer Microsoft has become a little bit less religous and more pragmatic. There are good reasons to be skeptical about companies like Microsoft and Oracle, but having more competition and more choice is a good thing. Maybe the database product from Oracle is slightly better than MS SQL Server, but there are very few projects where this difference really matters. So now we have five important relational DB products: DB2, Oracle, MS-SQL-Server, PostgreSQL and MariaDB (the successor of mySQL). When starting a new project with no specific constraints about the DB I would usually look at PostgreSQL first, because it is a feature rich and powerful open source database. Since database products are usually something that cannot reasonably be changed within one software system for decades this is a good thing, because we never know what the big companies want to do in such a long time scale. If the migration to another DB product is easy, then the software does not really make use of the power and the features of the DB. And it will not be easy anyway.

There are a lot of cases where the combination of MS-SQL-Server with Linux will make a lot of sense. Since there are software systems that make use of this DB product, it gives the flexibility to run the DB on Linux servers. And maybe avoid an expensive migration to another DB product. As I already said it gives one more choice. In development environments where MS-products are commonly used, it gives one more combination. And eventually it will encourage Oracle and IBM a little bit to refrain from excessive price increases.

Share Button

Microsoft SQL-Server ab 2017 auch für Linux

English

Microsoft hat angekündigt, den MS-SQL-Server auch für Linux anzubieten.

Ja, die Zeit ist reif dafür. Es gibt genug Gründe, Vorbehalte gegen die Firmen Microsoft und Oracle zu haben, aber deren DB-Produkte sind gut und wenn es mehr Konkurrenz gibt, ist das auch gut. Ich denke, dass Oracle noch etwas besser ist, aber andererseits ist es nur ein sehr kleiner Teil der Projekte, bei denen es auf diesen Unterschied ankommt. Weiterhin stehen natürlich noch mit DB2, MariaDB und vor allem PostgreSQL noch interessante DB-Produkte zur Verfüng, auch unter Linux.

Links:

Share Button

How to make a scanned PDF smaller (Linux)

When scanning a paper, it is possible to use a lot of parameters within xsane. The output format can be chosen also, for example PNG, JPG or PDF. The outcome may be a PDF-file that is way too big, easily more than 10 megabytes for a single page. It is quite easy to transform it to a smaller file:

convert -density 200x200 -quality 60 -compress jpeg \
big-scanned-file.pdf compressed-scanned-file.pdf

Unless you scan very often, it is easier to scan once with a relatively high resolution and then run this conversion with different values for quality and density rather than running the time consuming scan with different xsane settings.

Share Button

Chemnitzer Linux-Tage

In the German city Chemnitz the conference „Chemnitzer Linux-Tage“ (Chemnitz Linux days)
will take place from 2016-03-19 to 2016-03-20.

Links:
* Informatik aktuell (German)
* Wikipedia (German)
* Offiical page (German)

Share Button

Laziness

In conservative circles, laziness has a slightly negative connotation, but in IT it should be seen positive, if we understand it right. If we as humans get our job done in a more efficient way, by working less, this laziness is a good thing. So let’s be lazy by becoming more efficient. For now, just remember, that laziness is something good.

Laziness in software refers to performing a certain operation only when needed. Imagine we store a value that is the result of some calculation in a data structure. We can only have read access to this value, speaking in the Java-world it is a „getter“. Under certain circumstances it is functionally equivalent to store the function that calculates the value instead and to call it whenever the getter is called. Usually this is combined with the memoize pattern, so the value is retained, once it has been calculated. This whole idea breaks down if the function to calculate the value has side effects, is influenced by side effects or generally does not yield the same result on the same inputs whenever it is called. Obviously for implementing such a lazy value the parameters for the function have to be retained as well, either by storing them in the structure as well or by wrapping the function and the parameters into a parameter-less function that includes them. Obviously these parameters have to be immutable for this approach to work.

Now the question is, what do we gain from this kind of laziness?
At first glance we gain nothing, because the calculation has to be done anyway and we are just procrastinating it, so the eventual calculation of whatever we are doing will be expensive. Actually this might not be true. It is quite possible, that the value is never needed, so it would be wasteful to calculate it in advance.

Actually many of us have seen both approaches in text books for Java or C++, when the singleton pattern is explained. The typical implementation looks like this:


public class S {
    private static S instance = null;

    public static synchronized S getInstance() {
        if (instance == null) {
            instance = new S();
        }
        return instance;
    }

    public S() {
        // ...
    }

    //...
}

This „synchronized“ is a bit of a pain, but necessary to ensure that the instance is created only once. Smart people came up with the „double-check-pattern“, so it looked like this:

public class S {
    private static S instance = null;

    public static S getInstance() {
        if (instance == null) {
            synchronized (S.class) {
                if (instance == null) {
                    instance = new S();
                }
            }
        }
        return instance;
    }

    public S() {
        // ...
    }

    //...
}

It is kind of cute, because it takes into account that after the first check and before acquiring the synchronized-lock some other thread already initializes it. It is supposed to work with newer Java versions, but for sure going to fail in older versions. This is due to the memory management model. Anyway it is kind of scary, because we observe that something that really looks correct won’t work anyway. Or even worse, it will work fine during testing and miserably fail on production when we don’t expect it. But it would be so easy with eager initialization:

public class S {
    private static S instance = new S();

    public static S getInstance() {
        return instance;
    }

    public S() {
        // ...
    }

    //...
}

Or use even an enum to implement the singleton.

Other language like Scala support this kind of laziness in a more natural way.

case class S(s : String) {
  lazy val t : String = (s + s);
  def tt() : String = { t }
}

In this case t would be calculated when tt is called the first time.

In this example it does not help a lot, because the calculation of t is so cheap. But imagine we wanted to create a huge collection and then start doing something with the elements until we reach a certain goal. Eagerly initializing the collection would blow up our program, but we may be able to iterate through it or work with the first n elements where n is significantly smaller than the size of the collection would be. Just think of it as an iterate-only-collection or as a collection that is too big to keep it in memory completely.

Just do this in Clojure:

(def r (range 1000000000000N))

It would define a collection with 10^{12} elements, which would most likely exceed our memory. But it is accepted and we can do stuff with it as long as we are dealing only with elements near to the beginning. It could know its size, but it does not, so

(count r)

will fail or take forever. Or maybe work in some future version of Clojure…

So laziness allows us to write more elegant, expressive programs that have a good efficiency. Off course this can all be written by ourselves without such help, but it is a good laziness to rely on programming languages, libraries or tools that allow us to do it in such an elegant way.

Now we can find out that Hibernate and JPA use some kind of laziness. They have to, because objects tend to be connected and fetching one would require to fetch a really big bunch. Unfortunately the laziness is simply not correct, because we do have side effects, that influence the outcome. That is what databases are… So data may change, transactions may have been committed or rolled back or whatever. We get „LazyLoadingException“, when we try to access some data. And when we try to adopt a beautiful programming style that mimics functional patters in Java with non-static inner classes (prior to Java 8) in conjunction with Hibernate, it will be bound to fail miserably unless we apply absolute care about this issue.

While we are always moving on thin ice with this side-effect-dependent laziness of Hibernate and JPA, it will be really powerful in functional languages like Scala, Clojure or Haskell if we are going the extra mile to ensure that the function does not have side effects and does not depend on side effects or get influenced by side effects.

Share Button