Meaningless Whitespace in Textfiles

We use different file formats that are more or less tolerant to certain changes. Most well known is white space in text files.

In some programming languages white space (space, newline, carriage return, form feed, tabulator, vertical tab) has no meaning, as long as any whitespace is present. Examples for this are Java, Perl, Lisp or C. Whitespace, that is somehow part of String content is always significant, but white space that is used within the program can be combination of one or more of the white space characters that are in the lower 128 positions (ISO-646, often referred to as ASCII or 7bit ASCII. It is of course recommended to have a certain coding standard, which gives some guidelines of when to use newlines, if tabs or spaces are preferred (please spaces) and how to indent. But this is just about human readability and the compiler does not really care. Line numbers are a bit meaningful in compiler and runtime error messages and stack traces, so putting everything into one line would harm beyond readability, but there is a wide range of ways that are all correct and equivalent. Btw. many teams limit lines to 80 characters, which was a valid choice 30 years ago, when some terminals were only 80 characters wide and 132 character wide terminals where just coming up. But as a hard limit it is a joke today, because not many of us would be able to work with a vt100 terminal efficiently anyway. Very long lines might be harder to read, so anything around 120 or 160 might still be a reasonable idea about line lengths…

Languages like Ruby and Scala put slightly more meaning into white space, because in most cases a semicolon can be skipped if it is followed by a newline and not just horizontal white space. And Perl (Perl 5) is for sure so hard to compile that only its own implementation can properly format or even recognize which white space is part of a literal string. Special cases like having the language in a string and parsing and then executing that should be ignored here.

Now we put this program files into a source code management system, usually Git. Some teams still use legacy systems like subversion, source safe, clear case or CVS, while there are some newer systems that are probably about as powerful as git, but I never saw them in use. Git creates an MD5 hash of each file, which implies that any minor change will result in a new version, even if it is just white space. Now this does not hurt too much, if we agree on the same formatting and on the same line ending (hopefully LF only, not CR LF, even on MS-Windows). But our tooling does not make any difference between significant changes and insignificant formatting only changes. This gets worse, if users have different IDEs, which they should have, because everyone should use the IDE or editor, with which he or she is most efficient and the formal description of the preferred formatting is not shared between editors or differs slightly.

I think that each programming language should come with a command line diff tool and a command line formatting tool, that obey a standard interface for calling and can be plugged into editors and into source code management systems like git. Then the same mechanisms work for C, Java, C#, Ruby, Python, Fortran, Clojure, Perl, F#, Scala, Lua or your favorite programming language.

I can imaging two ways of working: Either we have a standard format and possibly individual formats for each developer. During „git commit“ the file is brought into the standard format before it is shown to git. Meaning less whitespace changes disappear. During checkout the file can optionally be brought into the preferred format of the developer. And yes, there are ways to deal with deliberate formatting, that for some reason should be kept verbatim and for dealing differently with comments and of course all kinds of string literals. Remember, the formatting tool comes from the same source as the compiler and fully understands the language.

The other approach leaves the formatting up to the developer and only creates a new version, when the diff tool of the language signifies that there is a relevant change.

I think that we should strive for this approach. It is no rocket science, the kind of tools were around for many decades as diff and as formatting tools, it would just be necessary to go the extra mile and create sister diff and formatting tools for the compiler (or interpreter) and to actually integrate these into build environments, IDEs, editors and git. It would save a lot of time and leave more time for solving real problems.

Is there any programming language that actually does this already?

How to handle XML? Is XML just the new binary with a bit more bloat? Can we do a generic handling of all XML or should it depend on the Schema?

Share Button

Loops with unknown nesting depth

We often encounter nested loops, like

for (i = 0; i < n; i++) {
    for (j = 0; j < m; j++) {
        doSomething(i, j);
    }
}

This can be nested to a few more levels without too much pain, as long as we observe that the number of iterations for each level need to be multiplied to get the number of iterations for the whole thing and that total numbers of iterations beyond a few billions (10^9, German: Milliarden, Russian Миллиарди) become unreasonable no matter how fast the doSomethings(...) is. Just looking at this example program

public class Modular {
    public static void main(String[] args) {
        long n = Long.parseLong(args[0]);
        long t = System.currentTimeMillis();
        long m = Long.parseLong(args[1]);
        System.out.println("n=" + n + " t=" + t + " m=" + m);
        long prod = 1;
        long sum  = 0;
        for (long i = 0; i < n; i++) {
            long j = i % m;
            sum += j;
            sum %= m;
            prod *= (j*j+1) % m;
            prod %= m;
        }
        System.out.println("sum=" + sum + " prod=" + prod + " dt=" + (System.currentTimeMillis() - t));
    }
}

which measures it net run time and runs 0 msec for 1000 iterations and almost three minutes for 10 billions (10^{10}):

> java Modular 1000 1001 # 1'000
--> sum=1 prod=442 dt=0
> java Modular 10000 1001 # 10'000
--> sum=55 prod=520 dt=1
> java Modular 100000 1001 # 100'000
--> sum=45 prod=299 dt=7
> java Modular 1000000 1001 # 1'000'000
--> sum=0 prod=806 dt=36
> java Modular 10000000 1001 # 10'000'000
--> sum=45 prod=299 dt=344
> java Modular 100000000 1001 # 100'000'000
--> sum=946 prod=949 dt=3314
> java Modular 1000000000 1001 # 1'000'000'000
--> sum=1 prod=442 dt=34439
> java Modular 10000000000 1001 # 10'000'000'000
--> sum=55 prod=520 dt=332346

As soon as we do I/O, network access, database access or simply a bit more serious calculation, this becomes of course easily unbearably slow. But today it is cool to deal with big data and to at least call what we are doing big data, even though conventional processing on a laptop can do it in a few seconds or minutes... And there are of course ways to process way more iterations than this, but it becomes worth thinking about the system architecture, the hardware, parallel processing and of course algorithms and software stacks. But here we are in the "normal world", which can be a "normal subuniverse" of something really big, so running on one CPU and using a normal language like Perl, Java, Ruby, Scala, Clojure, F# or C.

Now sometimes we encounter situations where we want to nest loops, but the depth is unknown, something like

for (i_0 = 0; i_0 < n_0; i_0++) {
  for (i_1 = 0; i_1 < n_1; i_1++) {
    \cdots
      for (i_m = 0; i_m < n_m; i_m++) {
        dosomething(i_0, i_1,\ldots, i_m);
      }
    \cdots
  }
}

Now our friends from the functional world help us to understand what a loop is, because in some of these more functional languages the classical C-Style loop is either missing or at least not recommended as the everyday tool. Instead we view the set of values we iterate about as a collection and iterate through every element of the collection. This can be a bad thing, because instantiating such big collections can be a show stopper, but we don't. Out of the many features of collections we just pick the iterability, which can very well be accomplished by lazy collections. In Java we have the Iterable, Iterator, Spliterator and the Stream interfaces to express such potentially lazy collections that are just used for iterating.

So we could think of a library that provides us with support for ordinary loops, so we could write something like this:

Iterable range = new LoopRangeExcludeUpper<>(0, n);
for (Integer i : range) {
    doSomething(i);
}

or even better, if we assume 0 as a lower limit is the default anyway:

Iterable range = new LoopRangeExcludeUpper<>(n);
for (Integer i : range) {
    doSomething(i);
}

with the ugliness of boxing and unboxing in terms of runtime overhead, memory overhead, and additional complexity for development. In Scala, Ruby or Clojure the equivalent solution would be elegant and useful and the way to go...
I would assume, that a library who does something like LoopRangeExcludeUpper in the code example should easily be available for Java, maybe even in the standard library, or in some common public maven repository...

Now the issue of loops with unknown nesting depth can easily be addressed by writing or downloading a class like NestedLoopRange, which might have a constructor of the form NestedLoopRange(int ... ni) or NestedLoopRange(List li) or something with collections that are more efficient with primitives, for example from Apache Commons. Consider using long instead of int, which will break some compatibility with Java-collections. This should not hurt too much here and it is a good thing to reconsider the 31-bit size field of Java collections as an obstacle for future development and to address how collections can grow larger than 2^{31}-1 elements, but that is just a side issue here. We broke this limit with the example iterating over 10'000'000'000 values for i already and it took only a few minutes. Of course it was just an abstract way of dealing with a lazy collection without the Java interfaces involved.

So, the code could just look like this:

Iterable range = new NestedLoopRange(n_0, n_1, \ldots, n_m);
for (Tuple t : range) {
    doSomething(t);
}

Btw, it is not too hard to write it in the classical way either:

        long[] n = new long[] { n_0, n_1, \ldots, n_m };
        int m1 = n.length;
        int m  = m1-1; // just to have the math-m matched...
        long[] t = new long[m1];
        for (int j = 0; j < m1; j++) {
            t[j] = 0L;
        }
        boolean done = false;
        for (int j = 0; j < m1; j++) {
            if (n[j] <= 0) {
                done = true;
                break;
            }
        }
        while (! done) {
            doSomething(t);
            done = true;
            for (int j = 0; j < m1; j++) {
                t[j]++;
                if (t[j] < n[j]) {
                    done = false;
                    break;
                }
                t[j] = 0;
            }
        }

I have written this kind of loop several times in my life in different languages. The first time was on C64-basic when I was still in school and the last one was written in Java and shaped into a library, where appropriate collection interfaces were implemented, which remained in the project or the organization, where it had been done, but it could easily be written again, maybe in Scala, Clojure or Ruby, if it is not already there. It might even be interesting to explore, how to write it in C in a way that can be used as easily as such a library in Java or Scala. If there is interest, please let me know in the comments section, I might come back to this issue in the future...

In C it is actually quite possible to write a generic solution. I see an API like this might work:

struct nested_iteration {
  /* implementation detail */
};

void init_nested_iteration(struct nested_iteration ni, size_t m1, long *n);
void dispose_nested_iteration(struct nested_iteration ni);
int nested_iteration_done(struct nested_iteration ni); // returns 0=false or 1=true
void nested_iteration_next(struct nested_iteration ni);

and it would be called like this:

struct nested_iteration ni;
int n[] = { n_0, n_1, \ldots, n_m };
for (init_nested_iteration(ni, m+1, n); 
     ! nested_iteration_done(ni); 
     nested_iteration_next(ni)) {
...
}

So I guess, it is doable and reasonably easy to program and to use, but of course not quite as elegant as in Java 8, Clojure or Scala.
I would like to leave this as a rough idea and maybe come back with concrete examples and implementations in the future.

Links

Share Button

Scala Days 2018

In 2018 I visited Scala Days in Berlin with this schedule. Other than in previous years I missed the opening keynote, which is traditionally on the evening before the main conference starts, because this is not so well compatible with my choice of traveling with a night train, especially because I did not want to miss more than two days of my current project.

I might provide Links to Videos of the talks, if they become available, including the interesting keynote that I have missed.

So I did visit the following talks on the first full day:

And on the second full day:

Links

Share Button

ScalaUA 2018

About a week ago I visited Scala UA in Kiev.

This was the schedule.

It was a great conference once again, as it was already in 2017 and I really enjoyed everything, including the food, which was great again… 🙂

I listened to the following talks on the first day:

And I gave this talk:

On the second day I listened to the following talks:

  • Advanced Patterns in Asynchronous Programming, Michael Arenzon & Assaf Ronen
  • Akka: Actors Design And Communication Techniques, Alex Zvolinskiy
  • Monad Stacks or: How I Learned to Stop Worrying and Love the Free Monad, and other stories, Harry Laoulakos
  • Scala on Wire: How event streams help us build Android apps, Maciej Gorywoda
  • Purely functional microservices with http4s and doobie, Jasper van Zandbeek
  • Tame Your Data with Reactive Streams and Monix, Jacek Kunicki
  • Future and issues of the Scala Ecosystem, Panel Discussion
  • Roll your own Event Sourcing, Lina Krutulytė-Kriščiūnė

And I gave this talk:

As always there was a lot of inspiration coming from the talks and a lot worth exploring in future posts. So there will be Scala posts once in a while, as in the past…

Links

  • Scala Days 2018
  • Scala UA 2017
  • Scala UA (official page)
  • NoSQL
  • Scala
  • Share Button

When to use Scala and Ruby

There are many interesting languages that have their sweet spots and of course a larger set of languages than just two should be considered for new projects.

But Ruby and Scala are both very interesting languages that did not just pick up and sell concepts that were already known, but brought them to a new level and to new beauty. Interestingly, both were started by a single person and finally became community projects.

There are some differences to observe.

Ruby is mostly a dynamic language, which means that it is easier and more natural to change the program at runtime. This is not necessarily a bad thing and different Lisp variants including today’s Clojure have successfully used and perfected this kind of capability for many decades. Consequently more things happen at runtime, especially dynamic typing is used, which means that types only exist at runtime.

Scala is mostly a static language, which means that all program structures have to be created at compile time. But this has been brought to perfection in the sense that a lot of things that are typically available only in dynamic languages, can be done. The type system is static and it is in this sense more consistent and more rigorous than the type system of Java, where we sometimes encounter areas that cannot reasonably be covered by Generics and fall back to the old flavor of untyped collections. This does not happen too often, but the static typing of Scala goes further.

In general this gives more flexibility to Ruby and makes it somewhat harder to tame the ways to do similar things in a static way in Scala. But the type system at compile time of course helps to match things, to find a certain portion of errors and even to make the program more self explanatory without relying on comments. In IDEs it is hard to properly support Scala, but the most common IDEs have achieved this to a very useful level. This should not be overvalued, because there are enough errors that cannot be detected by just using common types. It is possible to always define more specific types which include tight constraints and thus perform really tight checking of certain errors at compile time, but the built in types and the types from common libraries are to convenient and the time effort for this is too high, so it does not seem to be the usual practice. In any case it is a recommended practice to achieve a good test coverage of non-trivial functionality with automated tests. They implicitly cover type errors that are detected by the compiler in Scala, but of course only to the level of the test coverage. Ruby is less overhead to compile and run. We just write the program and run it, while we need a somewhat time intensive compile step for Scala. If tests are included, it does not make so much of a difference, because running the tests or preceding them with a compile job is kind of a minor difference.

An interesting feature of Ruby is called „monkey patching“. This means that it is possible to change methods of an existing class or even of a single object. This can be extremely powerful, but it should be used with care, because it changes the behavior of the class in the whole program and can break libraries. Usually this is not such a bad thing, because it is not used for changing existing methods, but for adding new methods. So it causes problems only when two conflicting monkey patches occur in different libraries. But for big programs with many libraries there is some risk in this area. Scala tries to achieve the same by using „implicit conversions“. So a conversion rule is implicitly around and when a method is called on an object that does not exist in its type, the adequate conversion is applied prior to the method. This works at compile time. Most of the time it is effectively quite similar to monkey patching, but it is a bit harder to tame, because writing and providing implicit conversions is more work and harder to understand than writing monkey patches. On the other hand, Scala avoids the risks of Ruby’s monkey patching.

An increasingly important issue is making use of multiple CPU cores. Scala and especially Scala in combination with Akka is very strong on this. It supports a reasonably powerful and tamable programming model for using multiple threads. The C- or JavaSE-way is very powerful, but it is quite difficult to avoid shooting oneself into the foot and even worse there is a high likelihood that such errors show up in production, in times of heavy load, while all testing seemed to go well. This is the way to go in some cases, but it requires a lot of care and a lot of thinking and a team of skillful developers. There are more developers who think they belong to this group than are actually able to do this well. Of course Scala already filters out some less skilled developers, but still I think its aproach with Akka is more sound.
Ruby on the other hand has very little support for multithreading, and cannot as easily make use of multiple cores by using threads. While the language itself does support the creation of threads, for many years the major implementation had very little support for this in the sense that not actually multiple threads were running at the same time. This propagated into the libraries, so this will probably never become the strength of Ruby. The way to go is to actually start multiple processes. This is not so bad, because the overhead of processes in Ruby is much less than in JVM-languages. Still this is an important area and Scala wins this point.

Concerning web GUIs Ruby has Rails, which is really a powerful and well established way to do this. Scala does provide Play, which is in a way a lot of concepts from Rails and similar frameworks transferred to Scala. It is ok to use it, but rails is much more mature and more mainstream. So I would give this point to Ruby. Rails includes Active Record, about which I do have doubts, but this is really not a necessary component of a pure WebGUI, but more a backend functionality…

So in the end I would recommend to use Scala and Akka for the solution, if it is anticipated that a high throughput will be needed. For smaller solutions I would favor Ruby, because it is a bit faster and easier to get it done.

For larger applications a multi tier architecture could be a reasonable choice, which opens up to combinations. The backend can be done with Scala. If server side rendering is chosen, Ruby and Rails with REST-calls to the backend can be used. Or a single page application which is done in JavaScript or some language compiling to JavaScript and again REST-calls to the backend.

Share Button

Scala Exchange 2017

I have visited Scala Exchange („#ScalaX“) in London on 2017-12-14 and 2017-12-15. It was great, better than 2015 in my opinion. In 2016 I missed Scala Exchange in favor of Clojure Exchange.

This time there were really many talks about category theory and of course its application to Scala. Spark, Big Data and Slick were less heavily covered this time. Lightbend (former Typesafe), the company behind Scala, did show some presence, but less than in other years. But 800 attendees are a number by itself and some talks about category theory were really great.

While I have always had a hard time accepting why we need this „Über-Mathematics“ like category theory for such a finite task as programming, I start seeing its point and usefulness. While functors and categories provide a meta layer that is actually accessible in Scala there are actually quite rich theories that can even be useful when constrained to a less infinite universe. This helps understanding things in Java. I will leave details to another post. Or forget about it until we have the next Scala conference.

So the talks that I visited were:

  • Keynote: The Maths Behind Types [Bartosz Milewski]
  • Free Monad or Tagless Final? How Not to Commit to a Monad Too Early [Adam Warski]
  • A Pragmatic Introduction to Category Theory [Daniela Sfregola]
  • Keynote: Architectural patterns in Building Modular Domain Models [Debasish Ghosh]
  • Automatic Parallelisation and Batching of Scala Code [James Belsey and Gjeta Gjyshinca]
  • The Path to Generic Endpoints Using Shapeless [Maria-Livia Chiorean]
  • Lightning talk – Optic Algebras: Beyond Immutable Data Structures [Jesus Lopez Gonzalez]
  • Lightning Talk – Exploring Phantom Types: Compile-Time Checking of Resource Patterns [Joey Capper]
  • Lightning Talk – Leave Jala Behind: Better Exception Handling in Just 15 Mins [Netta Doron]
  • Keynote: The Magic Behind Spark [Holden Karau]
  • A Practical Introduction to Reactive Streams with Monix [Jacek Kunicki]
  • Building Scalable, Back Pressured Services with Akka [Christopher Batey]
  • Deep Learning data pipeline with TensorFlow, Apache Beam and Scio [Vincent Van Steenbergen]
  • Serialization Protocols in Scala: a Shootout [Christian Uhl]
  • Don’t Call Me Frontend Framework! A Quick Ride on Akka.Js [Andrea Peruffo]
  • Keynote: Composing Programs [Rúnar Bjarnason]
Share Button

ScalaUA 2017

About a month ago I visted the conference ScalaUA in Kiev.

This was the schedule.

It was a great conference and I really enjoyed everything, including the food, which is quite unusual for an IT-conference.. 🙂

I listened to the following talks:
First day:

  • Kappa Architecture, Juantomás García Molina
  • 50 shades of Scala Compiler, Krzysztof Romanowski
  • Functional programming techniques in real world microservices, András Papp
  • Scala Refactoring: The Good the Bad and the Ugly, Matthias Langer
  • ScalaMeta and the Future of Scala, Alexander Nemish
  • ScalaMeta semantics API, Eugene Burmako

I gave these talks:

  • Some thoughts about immutability, exemplified by sorting large amounts of data
  • Lightning talK: Rounding

Day 2:

  • Mastering Optics in Scala with Monocle, Shimi Bandiel
  • Demystifying type-class derivation in Shapeless, Yurii Ostapchuk
  • Reactive Programming in the Browser with Scala.js and Rx, Luka Jacobowitz
  • Don’t call me frontend framework! A quick ride on Akka.Js, Andrea Peruffo
  • Flawors of streaming, Ruslan Shevchenko
  • Rewriting Engine for Process Algebra, Anatolii Kmetiuk

Find recording of all the talks here:
https://www.scalaua.com/speakers-speeches-at-scalaua2017/

Share Button

Lazy Collections, Strings or Numbers

The idea is, that we have data that is obtained or calculated to give us on demand as much of it as we request. But it is not necessarily initially present. This concept is quite common in the functional world, where we in a way hide the deprecated concept of state in such structures, by the way in a way that lets use retain the benefits that led to the desire for statelessness.

Actually the concept is quite old. We have it for I/O in Unix and hence in Linux since the 1970ies. „Everything is a file“, at least as long as we constrain ourselves to a universal subset of possible file operations. It can be keyboard input, a named or anonymous pipe, an actual file, a TCP-connection, to name the most important cases. These are „lazy“ files, behave more or less like files as far as sequential reading is concerned, but not for random access reading. The I/O-concept has been done in such a way that it takes the case into account that we want to read n bytes, but get only m < n bytes. This can happen with files when we reach their end, but then we can obtain an indication that we reached the end of the file, while it is perfectly possible that we read less then we want in one access, but eventually get \ge n bytes including subsequent reads. Since the API has been done right, but by no means ideal, it generalizes well to the different cases that exist in current OS environments.

We could consider a File as an array of bytes. There is actually a way to access it in this way by memory-mapping it, but this assumes a physically present file. Now we could assume that we think of the array as a list that is optimized for sequential access and iterating, but not for random access. Both list types actually exist in languages like Java. Actually the random access structure can be made lazy as well, within certain constraints. If the source is actually sequential, we can just assume that the data is obtained up to the point where we actually read. The information about the total length of the stream may or may not be available, it is always available somehow in the case of structures that are completely available in memory. This random access on lazy collections works fine if the reason of laziness is to actually save us from doing expensive operations to obtain data that we do not actually need or to obtain them in parallel to the computation that processes the data. But we loose another potential drawback in this case. If the data is truly sequential, we can actually process data that is way beyond our memory capacity.

So the concept transfers easily from I/O-streams to lists and even arrays, most naturally to iterables that can be iterated only once. But we can easily imagine that this also applies to Strings, which can be seen a sequence of characters. If we do not constrain us to what a String is in C or Java or Ruby, but consider String to be a more abstract concept, again possibly dropping the idea of knowing the length or having a finite length. Just think of the output of the Unix command „yes“ or „cat /dev/zero“, which is infinite, in a theoretical way, but the computer won’t last forever in real life, of course. And we always interrupt the output at some time, usually be having the consumer shut down the connection.

Even numbers can be infinite. For real numbers this can happen only after the decimal point, for p-adic numbers it happens only before the decimal point, if you like to look into that. Since we rarely program with p-adic numbers this is more or less an edge case that is not part of our daily work, unless we actually do math research. But we could have integers with so many digits that we actually obtain and process them sequentially.

Reactive programming, which is promoted by lightbend in the Reactive Manifesto relies heavily on lazy structures, in this case data streams. An important concept is the so called „backpressure“, that allows the consumer to slow down the producer, if it cannot read the data fast enough.

Back to the collections, we can observe different approaches. Java 8 has introduced streams as lazy collections and we need to transform collections into streams and after the operation a stream back into a collection, at least in many real life situations. But putting all into one structure has some drawbacks as well. But looking at it from an abstract point of view this does not matter. The java8-streams to not implement a collection interface, but they are lazy collections from a more abstract point of view.

It is interesting that this allows us to relatively easily write nested loops where the depth of the nesting is a parameter that is not known at compile time. We just need a lazy collections of n-tuples, where n is the actual depth of the nesting and the contents are according to what the loops should iterate through. In this case we might or might not know the size of the collection, possibly not fitting into a 32-bit-integer. We might be able to produce a random member of the collection. And for sure we can iterate through it and stop the iteration wherever it is, once the desired calculation has been completed.

Share Button

JMS

Java has always not just been a language, but it brought us libraries and frameworks. Some of them proved to be bad ideas, some become hyped without having any obvious advantages, but some were really good.

In the JEE-stack, messaging (JMS) was included pretty much from the beginning. In those days, when Java belonged to Sun Microsystems and Sun did not belong to Oracle, an aim was to support databases, which was in those days mostly Oracle, via JDBC and so called Message oriented middleware, which was available in the IBM-world via JMS. JMS is a common interface for messaging, that is like sending micro-email-message not between human, but between software components. It can be used within one JVM, but even between geographically distant servers, provided a safe network connection exists. Since we all know EMail this is in principle not too hard to understand, but the question is, what it really means and if it brings us something that we do not already have otherwise.

We do have web services as an established way to communicate between different servers across the network and of course they can also be used locally, if desired. Web services are neither the first nor the only way to communicate between servers nor are they the most efficient way. But I would say that they are the way how we do it in typical distributed applications that are not tied to any legacy. In principal web services are network capable and synchronous. This is well understood and works fine for many applications. But it also forces us to block processes or threads while waiting for responses, thus occupying valuable resources. And we tend to loose responsiveness, because of the waiting for the response. It needs to be observed that DB-access is typically only available synchronously. In a way understandable because of the transactions, but it also blocks resources to a huge extent, because we know that the performance of many applications is DB driven.

Now message based software architectures think mostly asynchronously. Sending a message is a „fire and forget“. There is such a thing as making message transactional, but this has to be understood correctly. There is one transaction for sending the message. It is guaranteed that the message is sent. Delivery guarantees can only be given to a limited extent, because we do not know anything about the other side and if it is at all working. This is not checked as part of the transaction. We can imagine though that the messaging system has its own transactional database and stores the message there within the transaction. It then retries delivering it forever, until it succeeds. Then it is deleted from this store as part of the receiving transaction. Both these transactions can be part of a distributed transaction and thus be combined with other transactions, usually against databases, for a combined transaction. This is what we usually have in mind when talking about this. I have to mention that the distributed transaction, usually based on the so called two phase commit, is not quite as water proof as we might hope, but it can be broken by construction of a worst case scenario regarding the timing of failures of network and systems. But it is for practical purposes reasonable good to use.

While it is extremely interesting to investigate purely message based architectures, especially in conjunction with functional paradigm, this may not be the only choice. Often it is a good option to use a combination of messaging with synchronous services.

We should observe that messaging is a more abstract concept. It can be implemented by some middle ware and even be accessible by a standardized kind of interface like JMS. But it can also be more abstract as a queuing system or as something like Akka uses for its internal communication. And messaging is not limited to Java or JVM languages. Interoperability does impose some constraints on how to use it, because it bans usage of Object-messages which store serialized Java objects, but there are ways to address this by using JSON or BSON or XML or Protocol Buffers as message contents.

What is interesting about JMS and messaging in general are two major communication modes. We can have queues, which are point to point connections. Or we can have „topics“, which are channels into which messages are sent. They are then received by all current subscribers of the topic. This is interesting to notify different components about an event happening in the system, while possibly details about the event must be queried via synchronous services or requested by further messaging via queues.

Generally JMS in Java has different implementations, usually there are those coming with the application servers and there are also some standalone implementations. They can be operated via the same interface, at least as long as we constrain us to the common set of functionality. So we can exchange the JMS implementation for the whole platform (which is a nightmare in real life), but we cannot mix them, because the wire protocol is usually incompatible. There is now something like a standard network protocol for messaging, which is followed by some, but not all implementations.

As skeptical as I am against Java Enterprise edition, I do find the JMS part of enterprise Java very interesting and worthwhile exploring for projects that have a size and characteristics justifying this.

Share Button

Scala Days 2016

I have visited Scala Days in Berlin 2016-06-15 to 2016-06-17. A little remark on the format might be of interest. The conference is scheduled for 3 days. On the first day, there is only one speech, the first keynote, some time in the late afternoon. During Scala Days 2015 the rest of the day was put into use by organizing a Scala training session, where volunteers could teach Scala to other volunteers who wanted to learn it. But I think two or three sessions on the first day would be better and would still allow starting in the late afternoon with the first keynote. The venue and of course Berlin were great and I enjoyed the whole event.

The talks that I visited were:

Wednesday 2016-06-15

  • First keynote: Scala’s Road Ahead by Martin Odersky about the future of Scala. Very interesting ideas for future versions that are currently explored in dotty.

Thursday 2016-06-16

Friday 2016-06-17

Summary

The whole event was great, I got a lot of inspiration and met great people. Looking forward to the next event.

I might write more on some topics, where I consider it interesting, but for the moment this summary should be sufficient.

Links

Share Button