What do +, – and * with Integer do?

When using integers in C, Java or Scala, we often use what is called int.

It is presented to us as the default.

And it is extremely fast.

Ruby uses by default arbitrary length integers.

But what do +, – and * mean?

We can rebuild them, in Ruby, kind of artificially restrict the integers to what we have in other programming langauges as int:


MODULUS = 0x100000000;
LIMIT   =  0x80000000;

def normalize(x)
  r = x % MODULUS;
  if (r < -LIMIT) then     return r + MODULUS;   elsif (r >= LIMIT) 
    return r - MODULUS;
  else
    return r;
  end
end

def intPlus(x, y)
  normalize(x+y);
end

def intMinus(x, y)
  normalize(x-y);
end

def intTimes(x, y)
  normalize(x*y);
end

x = 0x7fffffff;
y = intPlus(x, x);
z = intPlus(x, x);
puts("x=#{x} y=#{y} z=#{z}");

What is the outcome?

Exactly what you get when doing 32-Bit-Ints in C, Java, Scala or C#:

x=2147483647 y=-2 z=-2

int is always calculated modulo a power of two, usually 2^{32}. That is the
x % MODULUS in normalize(). The Rest of the function is just for normalizing the result to the range -2^{31} \ldots 2^{31}-1.

So we silently get this kind of result when an overflow situation occurs, without any notice.
The overflow is not trivial to discover, but it can be done.
For addition I have described how to do it.

See also Overflow of Integer Types, an earlier post about these issues…

I gave a talk that included this content adapted for Scala at Scala Exchange 2015.

Share Button

System Programming on Linux and MS-Windows

Quite honestly I admit that I really love the Posix-APIs for system programming and even some Linux specific extensions to it. I/O, Locking, Semaphores, Shared Memory, Message Queues, Signals, named and anonymous pipes, Unix Domain Sockets, TCP/IP programming, Terminal I/O, pthreads and a lot more are very powerful and fun to program. I do discover some points where I regret why they have not done it better, for example the fact that almost all system calls return a value, which is interpreted in one of the following ways:

  • 0 means ok, -1 means a issue has occurred, which can be explored by calling the errno-macro.
  • Values >= 0 are useful responses and -1 is indicating an error, which again requires calling errno.
  • A pointer is returned. If the pointer is NULL, this indicates an error and requires calling errno. Sometimes (void *)-1 or similar return values are also special.
  • pthreads-methods return 0 when successful or directly the error code otherwise.

Originally errno was a variable, which had to be replaced by some weird macro construction to allow multithreading and remain backwards compatible.
I would find it most natural if there where an exception mechanism in place like in Perl, Ruby, Java and many other languages, which would transport the error information. C cannot do this, at least not without breaking the language standard. The pthreads way looks good as well. Returning a struct containg the value actually needed and the errorcode, which is 0 if everything is OK, would also be a good approach, whenever a real return value is needed, but arguable a little bit clumpsy in case of functions returning a pointer. Maybe providing a pointer to some integer variable as argument would be the way for this case, even though I find it kind of ugly to have „return values done by a parameter“. Semaphores are a little bit clumsy to handle. And fcntl and ioctl are for sure overused instead of adding specific function for specific tasks. Reading a single character from a terminal or keyboard input without waiting for return is difficult, but at least logical.

Anyway, these issues can be dealt with and the power and elegance of the API is just great. The documentation is always available by using man pages that are installed on almost every system and by using great online resources on top of that.

So how does the win32- and win64-API look like? I mean apart from the religious questions like the lack of freedom? Most of the things can be done on the MS-Windows-APIs as well. There are some differences. First of all, all the code that uses system-APIs has to be rewritten. Very few typical POSIX-functions like open, close, read and write exist in the windows world as well to facilitate such a transition, but the general answer is like „it can be done, but the code has to be rewritten from scratch“. So programs that should run on both platforms and should do basically the same on both platforms need to encapsulate their system specific code, which might be anywhere between 20 and 50 percent of the code base, in specific files and organize their structure in such a way that the remaining half or more can actually be the same. It has been done by database products (PostgreSQL, MariaDB, Oracle, DB2), interpreters and compilers for programming languages (Ruby, Perl, Scala, Java, C#, F#, PHP), browsers (Firefox, Chrome), image processing software (gimp), office software (other than MS-office), web servers (Apache) and many others and they do achieve the goal to be doing more or less the same on both platforms.

Now how does the Win32- and Win64-API look like? Obviously the code looks very different. Unimportant, but very visible differences are that function names are mixed case and start with capital letter instead of being smaller case with underscores. Parameters and variables are mixed case starting with lower case. The C-type system is not directly used, but all types are #defined in some header file and all capital, even pointer types. Some care is needed to understand how these types work together, because it is not as self documenting as the original C types, but really no big deal to get used to. A MS-specific C-extension does allow using some kind of exceptions, if that is good or bad is hard say. Function names are generally longer and have huge parameter lists with very long parameter names. When they are outdated, because more parameters or different behavior or 64-bit support is needed, often an 64 or an Ex is added to the original name to create a new name for the replacement function, retaining the old one as it is for backwards compatibility.

Shared memory can more or less easily be replaced by memory mapped files and that is what needs to be done on MS-Windows.

The named pipe of Windows kind of unifies the message queue, the Unix domain sockets, the named pipes of Unix/Posix/Linux and even allows network communication within the local network. There have been linux specific extensions to Posix-pipes that achieve this unification, but not the network transparency, as well. Mutex and Semaphore work slightly differently, but can basically achieve the same results as Mutex and Semaphore on Posix. What is beautiful is that almost all operating system objects are accessed by so called HANDLEs which unifies many functions accessing them, but brings functions like WaitForSingleObject and WaitForMultipleObjects also some fcntl-like flavor, because it depends of course very much on the type of kernel object what waiting for it means. When being aware of this, it can be very powerful.

When looking for features that are really missing on one platform we observe immediately that MS-Windows does a mandatory locking on files by default and that such a mandatory locking does not at all exist in Posix or on Unix-like operating systems like MacOS-X, even though it does exist on Linux. Discussing this issue and how to deal with it should be worth its own article. In short, it is not as bad as it sounds, but the choice of the MS-Windows-guys to implement this feature in the way it is and to make it the default does look good.

The signals are missing on the windows side. This can be overcome by using Mutexes and Conditions to replace the communication part of signals, or to simply use HANDLEs to end a specific process instead of sending a signal, provided the permissions exist to do so.

Another painful omission is the fork. Most of the time fork is accompanied by an exec and exactly that can be doe by the CreateProcess in MS-Windows. Often we do like to share open files with the forked process and there are ways to do this, at least to some extent. But to use fork for creating a couple identical processes that run on the same code and data initialized once, which is sometimes a good idea, just does not exist on MS-Windows. It can be overcome by using threads and dealing with the issues of having to take responsibility for really separating the threads or by using multiple processes and memory mapped files for sharing that initial data structure.

The Win32- and Win64-APIs are documented quite well on some Microsoft-Webpage. I find the Linux-man pages slightly more useful, but both systems are documented in a way that it should be easy to find and use the original documentation and additional resources on the web.

Generally I would recommend all system programmers to have a look at the other world and how things work there. It helps enjoy and understand the beauty and power of both systems and probably maintain or even challenge the preference.

I have been teaching system programming for both platforms to college students and I enjoyed teaching and exploring these platforms with my students very much.

Share Button

How to multiply long numbers

Assuming we have a multiplication of n-bit-numbers already available.
Other than typical compiled languages, which assume that the multiplication result fits into the same size as the two factors, we should take a closer look.
Unsigned n-bit-numbers as factors x and y fullfill

    \[0 \le x \le 2^n-1\]

    \[0 \le y \le 2^n-1\]

So we have for the product z=x*y

    \[0 \le z \le (2^n-1)^2 \le 2^{2n}-1\]

So we should be prepared for a 2n-bit long result. Assembly language provides for this. In Java we are kind of running out of luck, because we have the 64-bit-long as the largest possible value, so we cannot make use of the 64-bit capabilities of our CPU by getting a 128-bit-result. Even worse we do not have unsigned primitive types. For addition and subtraction it is possible to cheat by assuming that the longs are unsigned, because the addition for signed and unsigned numbers is almost the same, but for multiplication this is slightly more complex.

In C we do have better chances, but we need to go compiler and OS-specific to some extent. gcc on 64-bit platforms has a type __uint128_t and we can hope that something like this

__uint128_t mul(unsigned long long x, unsigned long long y) {
__uint128_t xx = (__uint128_t) x;
__uint128_t yy = (__uint128_t) y;
__uint128_t z = xx*yy;
return z;
}

is optimized by the compiler to one assembly language instruction that multiplies two unsigned 64-bit-integers into one unsigned 128-bit integer.

Multiplication of longer numbers is achieved by just assuming

    \[ x= \sum_{j=0}^{m-1}2^{nj}x_j, y= \sum_{j=0}^{m-1}2^{nj}y_j\]

with

    \[ \bigwedge_{j=0}^{m-1} 0\le x_j, y_j <2^n.\]

Now the product is

    \[ z = \sum_{j=0}^{m-1}\sum_{k=0}^{m-1}2^{n(j+k)}x_j y_k \]

where each x_j y_k summand needs to be split into a lower and an upper part such that

    \[x_j y_k = 2^n h(x_j y_k) + l(x_j y_k).\]

All these half products need to be added with propagating the carry bits to the next higher level.
This can be sorted out and done with a lot of additions with carry-bit, finally giving the desired result.

For reasonably short numbers this is a good approach, if it is done well, but it can be replaced by better algorithms for larger numbers.

The most obvious next step is to move to the Karatsuba algorithm.
Assuming we have two factors x=x_l + 2^m x_h and y = y_l + 2^m y_h the naïve approach of having to calculate the four products
x_h y_h, x_h y_l, x_l y_h, x_l y_l can be replaced by calculation of only three products x_h y_h, (x_l + x_h)(y_l+ y_h), x_l y_l, because what is actually needed is the sum x_h y_l+x_l y_h which can be obtained by

    \[(x_l + x_h)(y_l+ y_h)-x_l y_l-x_h y_h=x_h y_l+x_l y_h.\]

That can be iterated by subdividing the numbers subsequently into halves until a size is reached that can be handled well enough by the naïve approach.
An issue that needs to be observed is that x_l + x_h and y_l+ y_h need one bit more than their summands.

For even larger numbers the so called Toom-Cook algorithm, an extension of the idea of the Karatsuba-algorithm for more than two summands, can be useful and for numbers larger than that the Schönhage-Strassen multiplication proves to be faster. It is implemented in some libraries for long integer arithmetic, like GMP. Since around 2007 this can be improved by using the Fürer algorithm which is assymptotically faster, but I do not know of any useful implementations and for practical purposes the improvement does not seem to be very relevant.

Share Button

Testing Java- and C-programs with Ruby and Perl

It is very important to write good unit tests for software that is non-trivial and that is relied on by other pieces of software.
Often the logic of the software can easily be covered by the native testing facilities of the programming language, like JUnit for Java or, much less well known but available, CUnit for C. When a lot of framework code is involved or third party libraries are used heavily, there is almost no other way for certain tests, because setting up the environment cannot easily be achieved elsewhere.

But we also encounter cases where writing good unit tests in the same language as the library itself becomes a pain. We procrastinate the issue of writing them and end up with way too little or no unit tests at all. A lot of software deals with doing some calculations or transformations of numbers and strings, usually a lot of numbers and a lot of strings. Now Strings are the strength of the Perl programming language and are not really implemented very well or at least not very powerful or easy to use in most other languages. Specifically the String facilities of Perl are much superior to those of Java and C. Off course our software needs to perform well, it needs to integrate into the environment and follow the global corporate software standards, so Java or C or some other programming language is the choice that should not be challenged here for the productive software. But some tests of the functionality can more easily be achieved by iterating over some input data and creating output of the input data combined with the results. This can be perl code already or something really easy to parse. Perl is really the tool that can parse almost anything, but we do not really want to be distracted by unnecessary work but get our job done. So something like generating Perl code or CSV or YAML or JSON, but please not XML if not really needed, should do. Then we can pipe the output to perl or to a perl script and this will tell us, if everything is ok. When we know our platform, it can even be done that the Java- or C-Unit-test stores the output to a file or pipe and calls the Perl script on it and fails or succeeds depending on its output.

When it comes to numeric types, Ruby is very strong. It has unlimited size integers by default, which can be casted to n-bit-integers using constructions like
xx=x&(1<<n)-1,
it has Rational, LongDecimal (as an external gem) and Complex and is easily extendible.

Usually we can expect that corporate constraints on which tools and programming languages may be used are less restrictive when it comes to unit tests. Integrating this on a continuous integration platform is a job that needs to be addressed but it is worth the effort, if a lot of tests become easier with this approach. And doing tests in another language makes tests more credible.

Off course the general idea is applicable for other combinations. Look into Scala, F#, Clojure, JavaScript, Python and some others as well, if they seem to be more helpful than Ruby or Perl for your unit testing automation. But this does indeed raise the question if a world where corporate policies allowed Scala and Closure instead of Java, F# instead of C# and Elixir instead of Erlang and PL/I instead of Cobol would be better.

Share Button