ScalaUA 2020

I like visiting ScalaUA conference in Kiev every year in March or April. I did so in 2017, 2018 and 2019.

So for 2020 it was kind of difficult to perform a regular conference. So there are two options, either it could have been cancelled or it could have been postponed. That is what all other conferences did. ScalaUA being an innovative tech conference opted for a third route. The conference was held, but totally online. It is agreed that it is worth to travel to the location and meet in person. But since that was not an option, it was an innovative and reasonable approach to run the conference online.

So the question is, how did this work?

There was a bit of a fight with the tools. We used Zoom, which was probably a good choice, because it has rich features. The schedule was more or leas as it would have been normally, so the 15 minute break between was enough to get some coffee or whatever, because no lines and no distances had to be dealt with. The talk was delivered in such a way that the speaker was seen full screen when no slides were shown, but usually the slides took up the whole screen and there was a small window in the corner, which could be moved around on the screen which could show the speaker. It was possible to ask questions and put oneself on video as well as a spectator. But for better support a Slack channel was provided for each talk plus a general one and a few that I did not use. They are kept around for a week after the event, so questions can still be dealt with.

One talk was done by two people. One of them wrote on a transparent board, that was between the camera and him, which was very impressive. Of course it was converted to allow him to write from left to write and us to read from left to write. His presentation partner showed code on her computer. This talk actually did something that is not possible in the usual situation.

What was important: Starting the Zoom channel for the talk about five minutes before, because it sometimes took some time to start.

What is more challenging: If the talk is not really very interesting, it is a bit harder not to get distracted. Sitting in the audience, you have to listen.

You can see the agenda. I did not speak myself this year.

Scala 3 (former Dotty) took up a lot of space, because many things have to be redone in Scala 3 or at least can be redone in a much more elegant way.

Share Button

Borrow and Carry Bit for Subtraction

Similar to the usage of the carry bit when adding there are mechanisms for subtracting that allow to integrate the result of subtraction of the lower bits into the subtraction of the next higher block of bits, where necessary.

There are two ways to do this, that are trivially equivalent by a simple not operation:

  • borrow bit (also borrow flag)
  • carry bit (also carry flag)

Often CPUs use what is the carry bit for addition and interpret it as borrow bit for subtraction.

Here are the two ways:

Borrow Bit

It is assumed, that the CPU-word is n bits long, so calculations are performed modulo N=2^n. Further it is assumed, that the range 0\ldots N-1 is preferred, so all sign issues are excluded for the moment.

When subtracting two numbers x, y from each other like x-y and y > x, the provided result is

    \[x-y+N \equiv x-y \mod N\]

and the borrow bit is set (b=1), to indicate that the subtraction caused „underflow“, which had to be corrected by added N in order to get into the desired range.

In the „normal“ case where y \le x, the provided result is simply


and the borrow bit is not set (b=0).

The next round of subtraction takes the borrow bit into account and calculates x-y-b, where the condition becomes y+b > x and the result is

    \[x-y-b+N \equiv x-y \mod N\]



respectively. This is how some of the older readers used to do it in school on paper, but of course with N=10.

Now the typical integer arithmetic of current CPUs uses Two’s complement, which means that -y=(\mathrm{NOT}\; y)+1. Combining this with the previous results in calculating

    \[x-y = x + (\mathrm{NOT}\; y) + 1 - b \mod N\]

At this point some CPU-designers found it more natural, to use the carry bit c=1-b instead of the borrow bit b.

Carry Bit

When subtracting two numbers x, y from each other like x-y and we have y > x, the provided result is

    \[x-y+N \equiv x-y \mod N\]

and the carry bit is not set (c=0), to indicate that the subtraction caused „underflow“, which had to be corrected by added N in order to get into the desired range.

In the „normal“ case where y \le x, the provided result is simply


and the carry bit is set (c=1).

The next round of subtraction takes the borrow bit into account and calculates x-y-1+c, where the condition becomes y+1-c > x and the result is

    \[x-y-1+c+N \equiv x-y \mod N\]




Now two’s complement with -y=(\mathrm{NOT}\; y)+1 this can be written as

    \[x-y = x + (\mathrm{NOT}\; y) + 1 - b \mod N\]

or with c=1-b

    \[x-y = x + (\mathrm{NOT}\; y) + c \mod N\]

These two ways are really equivalent and easily transformed into each other. Neither of them provides big advantages, apart from the fact that we have the unnecessary confusion, because it depends on the CPU design, which of the two variants is used.

Recovery of Borrow or Carry bit

The borrow bit is calculated and used during subtractions that are done at assembly language level, but higher languages like C or Java do provide access to this information. It is relatively easy to recover the carry bit in the case of addition based on x, y and x+y \mod N.

This possible as well for the subtraction. Quite easily the comparison between x and y or y+b could be done before the subtraction. This would work, but it is kind of inefficient, because under the hood the comparison is just a subtraction with discarding the result and keeping the flags. So the subtraction is performed twice. This might not be such a bad idea, because a compiler could recognize it or the work of subtracting twice could be neglectable compared to the logic for an „optimized“ recovery, based on some logic expression of certain bits from x, y and x-y \mod N.


Share Button

How to draw lines, circles and other curves

These ideas were developed more than 30 years without knowing that they were already known at that time…

Today the graphics cards can easily do things like this in very little time. And today’s CPUs are of course really good at multiplying. So this has lost a lot of its immediate relevance, but it is a fun topic and why not have some fun…

Let us assume we have a two dimensional coordinate system and a visible area that goes from x_{\min} to x_{\max} and y_{\min} to y_{\max}. Coordinates are discrete.

In this world we can easily measure an angle against a (directed) line parallel to the x-axis, for example up to an accuracy of 45^\circ=\frac{\pi}{4}:

  • y=0 \wedge x > 0 \implies \alpha = 0 (= 0^\circ)
  • 0 < y < x \implies 0 < \alpha < \frac{\pi}{4}(=45^\circ)
  • 0 < y = x \implies \alpha = \frac{\pi}{4}
  • 0 < x < y \implies \frac{\pi}{4} < \alpha < \frac{\pi}{2}(=90^\circ)
  • x = 0 \land y > 0\implies \alpha = \frac{\pi}{2}
  • x < 0 \land y > 0 \land |x| < |y|\implies \frac{\pi}{2} < \alpha < \frac{3\pi}{4}(=135^\circ)
  • x < 0 \land y > 0 \land -x = y\implies \alpha = \frac{3\pi}{4}(=135^\circ)

So let us assume we have a curve that is described by a polynomial function in two variables x and y, like this:

    \[f(x, y) = \sum_{j=0}^m\sum_{k=0}^n a_{j,k}x^jy^k = 0\]

We have to apply some math to understand that the curve behaves nicely in the sense that it does not behave to chaotic in scales that are below our accuracy, that it is connected etc. We might possibly scale and move it a bit by substituting something like c_1u+c_2 for x and c_3v+c_4 for y.

For example we may think of

  • line: f(x,y)=ax+by+c
  • circle: f(x, y)=x^2+y^2-r^2
  • eclipse: f(x, y)=\frac{x^2}{a^2}+\frac{y^2}{b^2}-1

We can assume our drawing is done with something like a king of chess. We need to find a starting point that is accurately on the curve or at least as accurately as possible. You could use knights or other chess figures or even fictive chess figures..

Now we have a starting point (x_0, y_0) which lies ideally exactly on the curve. We have a deviation from the curve, which is f(x_0, y_0)=d_0. So we have f(x_n, y_n)=d_n. Than we move to x_{n+1}=x_n + s and y_{n+1}=y_n+t with s, t = \{-1, 0, 1\}. Often only two or three combinations of (s, t) need to be considered. When calculating d_{n+1} from d_n for the different variants, it shows that for calculating d_{n+1}-d_n the difference becomes a polynomial with lower degree, because the highest terms cancel out. So drawing a line between two points or a circle with a given radius around a given point or an ellipse or a parabola or a hyperbola can be drawn without any multiplications… And powers of n-th powers of x can always be calculated with additions and subtractions only from the previous x-values, by using successive differences:
These become constant for l=n, just as the lth derivatives, so by using this triangle, successive powers can be calculated with some preparational work using just additions.
It was quite natural to program these in assembly language, even in 8-bit assembly languages that are primitive by today’s standards. And it was possible to draw such figures reasonably fast with only one MHz (yes, not GHz).

We don’t need this stuff any more. Usually the graphics card is much better than anything we can with reasonable effort program. Usually the performance is sufficient when we just program in high level languages and use standard libraries.

But occasionally situations occur where we need to think about how to get the performance we need:
Make it work,
make it right,
make it fast,
but don’t stop after the first of those.

It is important that we choose our steps wisely and use adequate methods to solve our problem. Please understand this article as a fun issue about how we could write software some decades ago, but also as an inspiration to actually look into bits and bytes when it is really helping to get the necessary performance without defeating the maintainability of the software.

Share Button



Software often contains a logging functionality. Usually entries one or sometimes multiple lines are appended to a file, written to syslog or to stdout, from where they are redirected into a file. They are telling us something about what the software is doing. Usually we can ignore all of it, but as soon as something with „ERROR“ or worse and more visible stack traces can be found, we should investigate this. Unfortunately software is often not so good, which can be due to libraries, frameworks or our own code. Then stack traces and errors are so common that it is hard to look into or to find the ones that are really worth looking into. Or there is simply no complete process in place to watch the log files. Sometimes the error shows up much later than it actually occurred and stack traces do not really lead us to the right spot. More often than we think logging actually introduces runtime errors, that were otherwise not present. This is related to a more general concept, which is called observer effect, where logging actually changes the business logic.

It is nice that log files keep to some format. Usually they start with a time stamp in ISO-format, often to the millisecond. Please add trailing zeros to always have 3 digits after the decimal point in this case. It is preferable to use UTC, but people tend to stick to local date and time zones, including the issues that come with switching to and from daylight saving time. Usually we have several processes or threads that run simultaneously. This can result in a wild mix of logging entries. As long as even multiline entries stay together and as long as beginning and end of one multiline entry can easily be recognized, this can be dealt with. Tools like splunk or simple Perl, Ruby or Python scripts can help us to follow threads separately. We could actually have separate logs for each thread in the first place, but this is not a common practice and it might hit OS-limitations on the number of open files, if we have many threads or even thousands of actors as in Erlang or Akka. Keeping log entries together can be achieved by using an atomic write, like the write system call in Linux and other Posix systems. Another way is to queue the log entries and to have a logger thread that processes the queue.

Overall this area has become very complex and hard to tame. In the Java world there used to be log4j with a configuration file that was a simple properties file, at least in the earlier version. This was so good that other languages copied it and created some log4X. Later the config file was replaced by XML and more logging frame works were added. Of course quite a lot of them just for the purpose of abstracting from the large zoo of logging frameworks and providing a unique interface for all of them. So the result was, that there was one more to deal with.

It is a good question, how much logic for handling of log files do we really want to see in our software. Does the software have to know, into which file it should log or how to do log rotation? If a configuration determines this, but the configuration is compiled into the jar file, it does have to know… We can keep our code a bit cleaner by relying on program functionality without code, but this still keeps it as part of the software.

Log files have to please the system administrator or whoever replaced them in a pure devops shop. And in the end developers will have to be able to work with the information provided by the logs to find issues in the code or to explain what is happening, if the system administrator cannot resolve an issue by himself. Should this system administrator have to deal with a different special complex setup for the logging for each software he is running? Or should it be necessary to call for developer support to get a new version of the software with just another log setting, because the configurations are hard coded in the deployment artifacts? Interesting is also, what happens when we use PAAS, where we have application server, database etc., but the software can easily move to another server, which might result in losing the logs. Moving logs to another server or logging across the network is expensive, maybe more expensive than the rest of this infrastructure.

Is it maybe a good idea to just log to stdout, maintaining a decent format and to run the software in such a way that stdout is piped into a log manager? This can be the same for all software and there is one way to configure it. The same means not only the same for all the java programs, but actually the same for all programs in all languages that comply to a minimal standard. This could be achieved using named pipes in conjunction with any hard coded log file that the software wants to use. But this is a dangerous path unless we really know what the software is doing with its log files. Just think of what weird errors might happen if the software tries to apply log rotation to the named pipe by renaming, deleting, creating new files and so on. A common trick to stop software from logging into a place where we do not want this is to create a directory with the name of the file that the software usually uses and to write protect this directory and its parent directory for the software. Please find out how to do it in detail, depending on your environment.

What about software, that is a filter by itself, so its main functionality is to actually write useful data to stdout? Usually smaller programs and scripts work like this. Often they do not need to log and often they are well tested relyable parts of our software installation. Where are the log files of cp, ls, rm, mv, grep, sort, cat, less,…? Yes, they do tend to write to stderr, if real errors occur. Where needed, programs can turn on logging with a log file provided on the command line, which is also a quite operations friendly approach. Named pipes can help here.

And we had a good logging framework in place for many years. It was called syslog and it is still around, at least on Linux.

A last thought: We spend really a lot of effort to get well performing software, using multiple processes, threads or even clusters. And then we forget about the fact that logging might become the bottle neck.

Share Button

Carry Bit, Overflow Bit and Signed Integers

It has already been explained how the Carry Bit works for addition. Now there was interest in a comment about how it would work for negative numbers.

The point is, that the calculation of the carry bit does not have any dependency on the sign. The nature of the carry bit is that it is meant to be used for the less significant parts of the addition. So assuming we add two numbers x and y that are having k and l words, respectively. We assume that n=\max(k,l) and make sure that x and y are both n words long by just providing the necessary number of 0-words in the most significant positions. Now the addition is performed as described by starting with a carry bit of 0 and adding with carry x[0]+y[0], then x[1]+y[1] and so on up to x[n-1]+y[n-1], assuming that x[0] is the least significant word and x[n-1] the most significant word, respectively. Each addition includes the carry bit from the previous addition. Up to this point, it does not make any difference, if the numbers are signed or not.

Now for the last addition, we need to consider the question, if our result still fits in n words or if we need one more word. In the case of unsigned numbers we just look at the last carry bit. If it is 1, we just add one more word in the most significant position with the value of 1, otherwise we are already done with n words.

In case of signed integers, we should investigate what can possibly happen. The input for the last step is two signed words and possibly a carry bit from the previous addition. Assuming we have m-Bit-words, we are adding numbers between -2^{m-1} and 2^{m-1}-1 plus an optional carry bit c. If the numbers have different signs, actually an overflow cannot occur and we can be sure that the final result fits in at most n words.

If both are not-negative, the most significant bits of x[n-1] and y[n-1] are both 0. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \ge 2^{n-1}, which means that the result „looks negative“, although both summands were not-negative. In this case another word with value 0 has to be provided for the most significant position n to express that the result is \ge 0 while maintaining its already correctly calculated result. It cannot happen that real non-zero bits are going into this new most significant word. Consequently the carry bit can never become 1 in this last addition step.

If both are negative, the most significant bits of x[n-1] and y[n-1] are both 1. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \lt 2^{n-1}, which means that the result „looks positive or 0“, although both summands were negative. In this case another word with value 2^n-1 or -1, depending on the viewpoint, has to be prepended as new most significant word. In this case of two negative summands the carry bit is always 1.

Now typical microprocessors provide an overflow flag (called „O“ or more often „V“) to deal with this. So the final addition can be left as it is in n words, if the overflow bit is 0. If it is 1, we have to signal an overflow or we can just provided one more word. Depending on the carry flag it is 0 for C=0 or all bits 1 (2^n-1 or -1, depending on the view point) for C=1.

The overflow flag can be calculated by o := \mathrm{signbit}(x) = \mathrm{signbit}(y) \land \mathrm{signbit}(x+y\mod 2^n) \ne \mathrm{signbit}(x).
There are other ways, but they lead to the same results with approximately the same or more effort.

The following table shows the possible combinations and examples for 8-Bit arithmetic and n=1:

x<0 or x≥0y<0 or y≥ 0(x+y)%2^8 < 0 or ≥ 0Overflow BitCarry Bitadditional word neededvalue additional wordExamples (8bit)
x≥0y<0≥000 or 1no-65+(-1)
x≥0y<0<000 or 1no-7+(-8)
x<0y≥0≥000 or 1no--9 + 12
-1 + 127
x<0y≥0<000 or 1no--128+127
-1 + 0
x<0y<0≥011yes-1-64 + (-65)
x<0y<0<001no--1 + (-1)
-1 + (-127)
-64 + (-64)

If you like, you can try out examples that include the carry bit and see that the concepts still work out as described.


Share Button

Virtual machines

We all know that Java uses a „virtual machine“ that is it simulates a non-existing hardware which is the same independent of the real hardware, thus helping to achieve the well known platform independence of Java. Btw. this is not about virtualization like VMWare, VirtualBox, Qemu, Xen, Docker and similar tools, but about byte code interpreters like the Java-VM.

We tend to believe that this is the major innovation of Java, but actually the concept of virtual machines is very old. Lisp, UCSD-Pascal, Eumel/Elan, the Perl programming language and many other systems have used this concept long before Java. The Java guys have been good in selling this and it was possible to get this really to the mainstream when Java came out. The Java guys deserve the credit for bringing this in the right time and bringing it to the main stream.

Earlier implementations where kind of cool, but the virtual machine technology and the hardware were to slow, so that they were not really attractive, at least not for high performance applications, which are now actually a domain of Java and other JVM languages. Some suggest that Java or other efficient JVM languages like Scala would run even faster than C++. While it may be true to show this in examples, and the hotspot optimization gives some theoretical evidence how optimization that takes place during run time can be better than static optimization at compile time, I do not generally trust this. I doubt that well written C-code for an application that is adequate for both C and Java will be outperformed by Java. But we have to take two more aspects into account, which tend to be considered kind of unlimited for many such comparisons to make them possible at all.

The JVM has two weaknesses in terms of performance. The start-up time is relatively long. This is addressed in those comparisons, because the claim to be fast is only maintained for long running server applications, where start-up time is not relevant. The hotspot optimization requires anyway a long running application in order to show its advantages. Another aspect that is very relevant is that Java uses a lot of memory. I do not really know why, because more high level languages like Perl or Ruby get along with less memory, but experience shows that this is true. So if we have a budget X to buy hardware and then put software written in C on it, we can just afford to buy more CPUs because we save on the memory or we can make use of the memory that the JVM would otherwise just use up to make our application faster. When we view the achievable performance with a given hardware budget, I am quite sure that well written C outperforms well written Java.

The other aspect is in favor of Java. We have implicitly assumed until now that the budget for development is unlimited. In practice that is not the case. While we fight with interesting, but time consuming low level issues in C, we already get work done in Java. A useful application in Java is usually finished faster than in C, again if it is in a domain that can reasonably be addressed with either of the two languages and if we do not get lost in the framework world. So if the Java application is good enough in terms of performance, which it often is, even for very performance critical applications, then we might be better off using Java instead of C to get the job done faster and to have time for optimization, documentation, testing, unit testing.. Yes, I am in a perfect world now, but we should always aim for that. You could argue that the same argument is valid in terms of using a more high-level language than Java, like Ruby, Perl, Perl 6, Clojure, Scala, F#,… I’ll leave this argument to other articles in the future and in the past.

What Java has really been good at is bringing the VM technology to a level that allows real world high performance server application and bringing it to the main stream.
That is already a great achievement. Interestingly there have never been serious and successful efforts to actually build the JavaVM as hardware CPU and put that as a co-processor into common PCs or servers. It would have been an issue with the upgrade to Java8, because that was an incompatible change, but other than that the JavaVM remained pretty stable. As we see the hotspot optimization is now so good that the urge for such a hardware is not so strong.

Now the JVM has been built around the Java language, which was quite legitimate, because that was the only goal in the beginning. It is even started using the command line tool java (or sometimes javaw on MS-Windows 32/64 systems). The success of Java made the JVM wide spread and efficient, so it became attractive to run other languages on it. There are more than 100 languages on the JVM. Most of them are not very relevant. A couple of them are part of the Java world, because they are or used to be specific micro languages closely related to java to achieve certain goals in the JEE-world, like the now almost obsolete JSP, JavaFX, .

Relevant languages are Scala, Clojure, JRuby, Groovy and JavaScript. I am not sure about Jython, Ceylon and Kotlin. There are interesting ideas coming up here and there like running Haskell under the name Frege on the JVM. And I would love to see a language that just adds operator overloading and provides some preprocessor to achieve this by translating for example „(+)“ in infix syntax to „.add(..)“ mainstream, to allow seriously using numeric types in Java.

Now Perl 6 started its development around 2000. They were at that time assuming that the JVM is not a good target for a dynamic language to achieve good performance. So they started developing Parrot as their own VM. The goal was to share Parrot between many dynamic languages like Ruby, Python, Scheme and Perl 6, which would have allowed inter-language inter-operation to be more easily achievable and using libraries from one of these languages in one of the others. I would not have been trivial, because I am quite sure that we would have come across issues that each language has another set of basic types, so strings and numbers would have to be converted to the strings and numbers of the library language when calling, but it would have been interesting.

In the end parrot was a very interesting project, theoretically very sound and it looked like for example the Ruby guys went for it even faster than the the Perl guys, resulting in an implementation called cardinal. But the relevant Perl 6 implementation, rakudo, eventually went for their own VM, Moar. Ruby also did itself a new better VM- Many other language, including Ruby and JavaScript also went for the JVM, at least as one implementation variant. Eventually the JVM proved to be successful even in this area. The argument to start parrot in the first place was that the JVM is not good for dynamic languages. I believe that this was true around 2000. But the JVM has vastly improved since then, even resulting in Java being a serious alternative to C for many high performance server applications. And it has been improved for dynamic languages, mostly by adding the „invoke_dynamic“-feature, that also proved to be useful for implementing Java 8 lambdas. The experience in transforming and executing dynamic languages to the JVM has grown. So in the end parrot has become kind of obsolete and seems to be maintained, but hardly used for any mainstream projects. In the end we have Perl 6 now and Parrot was an important stepping stone on this path, even if it becomes obsolete. The question of interoperability between different scripting languages remains interesting…

Share Button

How to multiply long numbers

Assuming we have a multiplication of n-bit-numbers already available.
Other than typical compiled languages, which assume that the multiplication result fits into the same size as the two factors, we should take a closer look.
Unsigned n-bit-numbers as factors x and y fullfill

    \[0 \le x \le 2^n-1\]

    \[0 \le y \le 2^n-1\]

So we have for the product z=x*y

    \[0 \le z \le (2^n-1)^2 \le 2^{2n}-1\]

So we should be prepared for a 2n-bit long result. Assembly language provides for this. In Java we are kind of running out of luck, because we have the 64-bit-long as the largest possible value, so we cannot make use of the 64-bit capabilities of our CPU by getting a 128-bit-result. Even worse we do not have unsigned primitive types. For addition and subtraction it is possible to cheat by assuming that the longs are unsigned, because the addition for signed and unsigned numbers is almost the same, but for multiplication this is slightly more complex.

In C we do have better chances, but we need to go compiler and OS-specific to some extent. gcc on 64-bit platforms has a type __uint128_t and we can hope that something like this

__uint128_t mul(unsigned long long x, unsigned long long y) {
  __uint128_t xx = (__uint128_t) x;
  __uint128_t yy = (__uint128_t) y;
  __uint128_t z = xx*yy;
  return z;

is optimized by the compiler to one assembly language instruction that multiplies two unsigned 64-bit-integers into one unsigned 128-bit integer.

Multiplication of longer numbers is achieved by just assuming

    \[ x= \sum_{j=0}^{m-1}2^{nj}x_j, y= \sum_{j=0}^{m-1}2^{nj}y_j\]


    \[ \bigwedge_{j=0}^{m-1} 0\le x_j, y_j <2^n.\]

Now the product is

    \[ z = \sum_{j=0}^{m-1}\sum_{k=0}^{m-1}2^{n(j+k)}x_j y_k \]

where each x_j y_k summand needs to be split into a lower and an upper part such that

    \[x_j y_k = 2^n h(x_j y_k) + l(x_j y_k).\]

All these half products need to be added with propagating the carry bits to the next higher level.
This can be sorted out and done with a lot of additions with carry-bit, finally giving the desired result.

For reasonably short numbers this is a good approach, if it is done well, but it can be replaced by better algorithms for larger numbers.

The most obvious next step is to move to the Karatsuba algorithm.
Assuming we have two factors x=x_l + 2^m x_h and y = y_l + 2^m y_h the naïve approach of having to calculate the four products
x_h y_h, x_h y_l, x_l y_h, x_l y_l can be replaced by calculation of only three products x_h y_h, (x_l + x_h)(y_l+ y_h), x_l y_l, because what is actually needed is the sum x_h y_l+x_l y_h which can be obtained by

    \[(x_l + x_h)(y_l+ y_h)-x_l y_l-x_h y_h=x_h y_l+x_l y_h.\]

That can be iterated by subdividing the numbers subsequently into halves until a size is reached that can be handled well enough by the naïve approach.
An issue that needs to be observed is that x_l + x_h and y_l+ y_h need one bit more than their summands.

For even larger numbers the so called Toom-Cook algorithm, an extension of the idea of the Karatsuba-algorithm for more than two summands, can be useful and for numbers larger than that the Schönhage-Strassen multiplication proves to be faster. It is implemented in some libraries for long integer arithmetic, like GMP. Since around 2007 this can be improved by using the Fürer algorithm which is assymptotically faster, but I do not know of any useful implementations and for practical purposes the improvement does not seem to be very relevant.

Addition 2019-10-08: In 2019 an algorithm has been discovered, that is assumed to be assymptotically O(n \log n), which is according to a conjecture of Schönhage the best possible result. It has not yet been proven rigorously and the conjecture is still a conjecture. The question is, if for practical purposes of multiplying numbers that still fit into real computers, there is a real gain from these algorithms. So the questions is, if they become faster than Schönhage-Strassen on numbers that are not too long to be practically multiplied. So right now this is really mostly an issue of mathematics and theoretical informatics. See also Maths whiz solves 48-year-old multiplication problem.. And see for the paper.

Share Button

How to recover the Carry Bit

As frequent readers might have observed, I like the concept of the Carry Bit as it allows for efficient implementations of long integer arithmetic, which I would like to use as default integer type for most application development. And unfortunately such facilities are not available in high level languages like C and Java. But it is possible to recover the carry bit from what is available in C or Java, with some extra cost of performance, but maybe neglectable, if the compiler does a good optimization on this. We might assume gcc on a 64-Bit-Linux. It should be possible to do similar things on other platforms.

So we add two unsigned 64-bit integers x and y and an incoming carry bit i\in\{0, 1\} to a result

    \[z\equiv x+y+i \mod 2^{64}\]


    \[0 \le z < 2^{64}\]

using the typical „long long“ of C. We assume that



    \[x_h \in \{0,1\}\]


    \[0 \le x_l < 2^{63}\]

. In the same way we assume y=2^{63}y_h + y_l and z=2^{63}z_h + z_l with the same kind of conditions for x_h, y_h, z_h or x_l, y_l, z_l, respectively.

Now we have

    \[0 \le x_l+y_l + i \le 2^{64}-1\]

and we can see that

    \[x_l + y_l + i = 2^{63}u + z_l\]

for some

    \[u\in \{0,1\}\]

And we have

    \[x+y+i = 2^{64}c+z\]



is the carry bit.
When looking just at the highest visible bit and the carry bit, this boils down to

    \[2c+z_h = x_h + y_h + u\]

This leaves us with eight cases to observe for the combination of x_h, y_h and u:


Or we can check all eight cases and find that we always have

    \[c = x_h \wedge\neg z_h \vee y_h \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h\]


    \[c = (x_h \vee y_h) \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h.\]

So the result does not depend on u anymore, allowing to calculate it by temporarily casting x, y and z to (signed long long) and using their sign.
We can express this as „use x_h \wedge y_h if z_h=1 and use x_h \vee y_h if z_h = 0„.

An incoming carry bit d does not change this, it just allows for x_l + y_l + d < 2^{64}, which is sufficient for making the previous calculations work.

In a similar way subtraction can be dealt with.

The basic operations add, adc, sub, sbb, mul, xdiv (div is not available) have been implemented in this library for C. Feel free to use it according to the license (GPL). Addition and subtraction could be implemented in a similar way in Java, with the weirdness of declaring signed longs and using them as unsigned. For multiplication and division, native code would be needed, because Java lacks 128bit-integers. So the C-implementation is cleaner.

Share Button



Das Konzept der Rundung kennt man grundsätzlich. Aber versuchen wir es etwas systematischer zu erfassen.

Man hat eine Menge M von Zahlen und eine Teilmenge N \subseteq M davon, deren Elemente in der verwendeten Programmierumgebung dargestellt werden. Dazu hat man noch eine Metrik d : M \times M \rightarrow \Bbb R_+, (x, y) \mapsto r=d(x,y)>=0. Man verlangt normalerweise noch gewisse Eigenschaften von d:

  • Positive Definitheit: d(x,y) = 0 \iff x = y
  • Symmetrie: d(x,y)=d(y,x)
  • Dreiecksungleichung: d(x,z) \le d(x,y) + d(y,z)

Typischerweise sind die Zahlen, mit denen wir uns meistens beschäftigen, in der Welt der reellen und komplexen Zahlen gedacht, man kann also fast immer sicher sein, dass M \subseteq \Bbb C ist, meistens sogar M \subseteq \Bbb R oder wenn wir ehrlich sind sogar M \subseteq \Bbb Q. Dann ist normalerweise d(x,y) = |x-y|. Wer die komplexen Zahlen noch nicht kennt, denke einfach an reelle und rationale Zahlen, das ist das, womit wir normalerweise bewusst zu tun haben. Dabei sind natürlich Konzepte wie Rundung speziell in p-adischen Zahlen hochinteressant, aber dafür muss ich die erstmal erklären und das lasse ich heute…

Was stellen wir uns nun unter einer Rundung vor?
Vielleicht eine Abbildung
r: M \rightarrow N, x \mapsto r(x),
die mit gewissen Nebenbedingungen für jedes x jeweils so gewählt wird, dass d(x, r(x)) minimal ist.
Die Nebenbedingungen brauchen wir einerseits, um es eindeutig zu machen, wenn mehrere Werte y\in N existieren, für die d(x,y) minimal ist. Der klassische Fall ist das Runden auf ganzzahlige Werte und die Frage mit der 0.5. Wenn N eine Teilmenge der reellen Zahlen ist, was ja „meistens“ der Fall ist, hat man eine Anordung. Da kommen dann die folgenden Constraints zum Tragen (Beispiel immer mit Rundung auf ganze Zahlen):

|r(x)|\ge |x| Z.B. r(0.4)=1 und r(0.5)=1 und r(-0.5)=-1
|r(x)|\le |x| Z.B. r(0.6)=0 und r(0.5)=0 und r(-0.5)=0
r(x) \ge x Z.B. r(0.4)=1 und r(0.5)=1 und r(-0.5)=0
r(x) \le x Z.B. r(0.6)=0 und r(0.5)=0 und r(-0.5)=-1
Minimiere d(r(x),x), aber wenn es mehrere optimale Werte für r(x) gibt, wähle den am weitesten von 0 entfernten, z.B. r(0.4)=0 und r(0.6)=1 und r(0.5)=1 und r(-0.5)=-1
Minimiere d(r(x),x), aber wenn es mehrere optimale Werte für r(x) gibt, wähle den am nächsten an 0, z.B. r(0.4)=0 und r(0.6)=1 und r(0.5)=0 und r(-0.5)=0
Minimiere d(r(x),x), aber wenn es mehrere optimale Werte für r(x) gibt, wähle den größten, z.B. r(0.4)=0 und r(0.6)=1 und r(0.5)=1 und r(-0.5)=0
Minimiere d(r(x),x), aber wenn es mehrere optimale Werte für r(x) gibt, wähle den kleinesten, z.B. r(0.4)=0 und r(0.6)=1 und r(0.5)=0 und r(-0.5)=-1
Minimiere d(r(x),x), aber wenn es mehrere optimale Werte für r(x) gibt, wähle den mit gerader Endziffer. Achtung, dieser Constraint ist im „klassischen“ Fall anwendbar, aber nicht allgemeingültig. Z.B.: r(0.4)=0 und r(0.6)=1 und r(0.5)=0 und r(-0.5)=0 und (1.5)=2
Dieser Constraint ist im mathematischen Sinne nicht geeignet (oder nur mit hässlichen Verrenkungen), aber programmatisch können wir das: Wir nehmen r(x)=x und schmeißen eine Exception wenn nicht x\in N schon gilt.

Typischerweise denken wir im Dezimalsystem und dann wählen wir eine Zehnerpotenz 10^n mit n \in \Bbb N, also n \ge 0. Nun ist einfach
N = \{ x \in M : 10^n x \in \Bbb Z\},
also umgangsprachlich sind in N alle Zahlen aus M mit maximal n Stellen nach dem Komma. Diese Rundung funktioniert ganz gut mit so etwas wie LongDecimal in Ruby oder BigDecimal in Scala oder Java, wobei BigDecimal weniger Rundungsmodi anbietet als LongDecimal für Ruby.

Nun kommen wir zur Restklassenrundung. Wir gehen wieder von dieser Zehnerpotenz 10^n aus. Dann brauchen wir noch eine natürlich Zahl m \ge 2 und eine Menge von Resten R \subseteq \{0,...,m-1\}. Nun ist
N = \{ x \in M : 10^n x \in {\Bbb Z} \wedge \bigvee_{r \in R} 10^n x \equiv r \mod{m}\}.
Das bedeutet, wenn wir mit Nullen auf die angegebene Anzahl von Nachkommastellen auffüllen, das „Komma“ (normalerweise als „.“ geschrieben) weglassen und dann diese Zahl mit Rest durch m teilen, der dabei herauskommende Rest in R liegt.
In diesem Fall wird es mit dem ROUND_HALF_EVEN eventuell schwierig, da es undefiniert oder mehrdeutig werden kann. Aber wir müssen auch den Fall abdecken, dass die 0 nicht in R ist und Regeln angeben, wohin die 0 gerundet werden soll. Die Kandidaten sind hier mit selbsterklärenden Namen versehen:


Ein wichtiger Anwendungsfall ist hier m=10 und R=\{0, 5\}. So kann man in der Schweiz Geldbeträge auf Vielfache von 5 Rappen (0.05 CHF) runden. Dies wurde in Rundung bei Geldbeträgen bereits genauer beschrieben.

Share Button

Scala Exchange 2014


Ich war am 8. und 9. Dezember 2014 auf der Konferenz Scala Exchange ( #scalaX ) in London.

Von mir besuchte Vorträge waren etwa folgende:

The Binary Compatibility Challenge

Martin Odersky

Es gibt Beispiele von allen vier Kombinationen von Sourcecode- und Binärkompatibilität.
Man wünscht sich von beidem mehr, Schwerpunkt heute aber Binärkompatibilität.
Konflikt für Scala: Innovation vs. Kompatibilität. Java hat den einen Extrempfad gewählt, Kompatibilität über alles, hat es aber auch leichter gehabt, weil Java die JVM mit „kontrolliert“. Clojure, Ruby, JRuby, Perl,… haben alle kein Problem mit Binärkompatibilität, weil sie in der Regel „just in time“ compiliert werden, also nur Quelltexte herumgereicht werden.
Reproduzierbare Builds sind in Scala schwierig, man denke alleine an die viele Build-Tools (ivy, gradle, maven, sbt,…)
Idee: nach etwa 10 der etwa 30 Schritte des Kompilierens den Baum des Programms in einem cleveren Binärformat speichern. Dieses Format ist ein bessere Kompromiss und von dort kann man leichter fertig-kompilieren.

REST on Akka: Connect to the world

Mathias Doenitz

Akka-http ist quasi Spray 2.0 oder Nachfolger für Spray.
Das sollte wegen der reactive Streams in Akka sein.
Verwende TCP-Flow-Control für „Backpressure“.

Bootstrapping the Scala.js Ecosystem

Haoyi Li

Scala wird nach Javascript statt JVM compiliert. Target ist eindeutig der Browser, weniger serverseitiges JS, wo man ja die JVM meist einsetzen kann.
Vorteil für Web-Applikationen: selbe Tag-Generierung, Validierung etc. auf Browser- und Serverseite. Eleganter als zwei Sprachen.
Einschränkungen: Libraries zu übertragen ist schwierig, da sie groß sind.
Keine Reflection auf scala.js verfügbar. Damit geht vieles nicht, aber es bleibt genug, was geht. Serialisierung eine Herausforderung, aber gelöst.
Testing riesige Herausforderung, gelöst halbwegs mit Teilmenge von scalatest.
Integer-Typen sind ein bisschen ein Gebastel, weil JS nur double kennt, was 53 bit entspricht. Long muss also mit zwei doubles nachgebildet werden.

Introduction to Lambda Calculus

Maciek Makowski

Sehr theoretischer Vortrag. Lambdacalculus wie Programmiersprache, Berechenbarkeitstheorie etc. Viele schöne Beweise möglich. Die theoretische Essenz der funktionalen Programmierung ist das. Stichworte: „Church Rosser Theorem“, „Programmierung mit Lambda-Kalkül“, „Zahlen als Lambda-Ausdrücke“ (church encoding), „y combinator“, „fixed point combinator“, „lambda cube“, „vierte Dimension für Subtypen“, ….
Sehr kleine Sprache, toll für Beweise, nicht für die Praxis brauchbar.

State of the Typelevel

Lars Hupel

Typelevel ist von Haskell inspiriert. Bibliotheken u.a. scalaz, shapeless, spire, scalaz-stream, monocle,….
Wir sollten anstreben, korrekte Programme zu schreiben und optimieren wenn nötig.
Die JVM-Integer-Typen sind nicht gut. Fließkommazahlen sind gut für numerische Mathematiker und in dem Bereich gebildete Wissenschaftler, die sich mit Fehlerfortpflanzung auskennen.
Natürlich behandeln wir Zahlen als Funktionen, z.B.
x=f(.) mit \bigwedge_{n \in \Bbb N: n>0} \frac{f(n)-1}{n} < x < \frac{f(n)+1}{n}
Gleichheit von Reelen Zahlen ist nicht in endlicher Zeit entscheidbar.
Was ist „Costate command coalgebra“? Monocle behandelt „lenses“ und ähnliche Dinge, die man aus Haskell kennt…
Gute binäre Serialisierungsformate sind in der JVM-Welt rar.
Wie geht man mit der Angst vor scalaZ und Monaden um?

Slick: Bringing Scala’s Powerful Features to Your Database Access

Rebecca Grenier

Slick ist eine Library, die SQL-Queries generiert und dann ausführt. Die konzeptionellen Schwächen von JPA werden geschickt umgangen.
Hat Treiber für die fünf großen DBs und einige kleinere. Aber für Oracle, DB2 und MS-SQL-Server nicht frei.
Innere und äußere Joins sind verfügbar und elegant zu schreiben. Slick kann mit dem Datenbank-Dictionary sogar Code generieren.

Upcoming in Slick 2.2

Jan Christopher Vogt

Monaden kommen auch hier vor. Zeit das endlich zu lernen. Ich werde einen Vortrag in der Ruby-on-Rails-Usergroup darüber halten, dann muss ich es lernen.
Monadische Sessions…
Sessions in Kombination mit Futures gibt lustige Albträume.
Wenigstens ist Slick theoretisch korrekt, im Gegensatz zu JPA.
Mehrere Statements kombinieren mit andThen oder for. Achtung, default verschiedene Sessions.
Threads sind teuer, Reactiveness ist wichtig. Futures und Threadpools, geht aber schief beim „blocking“.
JDBC kann nicht assynchron. Workaround oberhalb von JDBC scheint aber ausreichend zu sein und nur ein paar mehr Threads zu brauchen als die richtige Lösung, also verantwortbar.
Statisch gechecktes SQL?

No more Regular Expressions

Phil Wills

Ich mag Regex in Perl sehr. Warum also so etwas aufgeben?
Nun, man gibt es nicht wirklich auf, ändert nur die Schreibweise.
Die Schreibweise als String ist in Perl natürlich, in Scala ein Fremdkörper.
Daher typische programmatische Schreibweise angemessener, aber auch robuster bei Fehlern.
Verwende org-paraboiled2
Capture lieder positionell, tut aber viel weniger weh als bei klassischen regex.

Scala eXchange – Q&A Panel

Jon Pretty, Kingsley Davies, Lars Hupel, Martin Odersky, and Miles Sabin

Einige interessante Diskussionen…

Why Scala is Taking Over the Big Data World

Dean Wampler

Zitat (übersetzt): ‚“Hadoop“ ist das „EJB“ unserer Zeit.‘
MapReduce ist konzeptionell eigentlich schon funktionales Programmieren. Warum also Java und nicht Scala??
Stichwörter: „Scalding“, „Storm“, „Summing bird“, „Spark“.
Scala kann performanter als Python sein, das hier beliebt ist. Aber nicht überstürzt ablösen.

Case classes a la carte with shapeless, now!

Miles Sabin

Beispiel: Baum mit Rücklinks. Schwierig mit konsequenter Immutability. Shapeless hilft.

Reactive Programming with Algebra

André Van Delft

Algebra kann Spaß machen und nützlich für die Programmierung sein. Algebraische Interpretation eingeführt.
Algebra der kommunizierenden Prozesse. Sehr mächtig, auch für andere Gebiete anwendbar, etwa Betrieb von Bahnsystemen.
Jedes Programm, das Eingaben behandelt, ist auf eine Art ein Parser, also Ideen von yacc und bison interessant auch hier.

High Performance Linear Algebra in Scala

Sam Halliday

Lineare Algebra ist sehr gut gelöst, also sollte man das Rad nicht neu erfinden.
TL;D, Netflix und Breeze. Anwendungsbeispiel: Kalman Filter.
netlib hat Referenzimplementierung in Fortran. Wie bringt man das in die Scala-Welt?
Fortran mit C-Wrapper versehen, für JNI erreichbar. (cblas)
Fortran nach Java kompilieren. Ja.
Alternative Implementierung mit derselben Testsuite in C.
High-Performance geht nicht nur um Geschwindigkeit per se, sondern immer unter dem Aspekt der Korrektheit, Genauigkeit und Stabilität.
Hardware ist interessant: Die CPU-Hersteller reden mit dem netlib-Team.
Kann man GPU benutzen? Ja, aber Transfer der Daten schwierig.
FPGA? Vielleicht bald?
Oder sowas wie GPU, aber ohne echte Grafik und mit dem normalen RAM?

An invitation to functional programming

Rúnar Bjarnason

Referenzielle Transparenz, etwa die Kompatibilität mit dem Memoize-Pattern.
Pure Functions…
Verständlichkeit ist das ewige Versprechen. Warum wird es diesmal gehalten?
Funktionale Programmierung ist nicht gut für die „Schützengräben“ der reellen Aufgaben.
Aber gut, um aus den Schützengräben rauszukommen in vernünftigere Welten… So der Anspruch…

Building a Secure Distributed Social Web using Scala & Scala-JS

Henry Story

Spargl ist wie SQL für das „semantic web“.
Entwickelt mit Unterstützung von Oracle.
Wir können Relativität der Wahrheit haben, während wir die absolute Wahrheit aufrechterhalten. Der Vortrag war recht philosophisch.
Graphen können isomorph sein, aber verschieden hohes Vertrauen genießen, je nachdem, wo sie liegen.
Linked Data Protocol kommt ins Spiel
Wie verhindert man Spam und Missbrauch? WebID?
Hier geht es nicht um „Big Data“, sondern um massiv verteilte und parallele „small data“.

TableDiff – a library for showing the difference between the data in 2 tables

Sue Carter

Was ist für ein Diff die richtige Semantik?
Was will man sehen? Wie werden Zahlen und Zeichenketten in den Feldern verglichen?

Evolving Identifiers and Total Maps

Patrick Premont

Idee ganz kurz: eine Map, deren Get immer etwas zurückgibt. Durch geschickte Typ-Konzepte wird verhindert, dass ein Aufruf, der ins Leere greifen würde, überhaupt compiliert.
Interessante Idee, finde sie aber eher theoretisch nützlich.

Share Button