Thread local variables

The idea of threads is to share almost all information between the threads of the same process. This implies more effort to avoid creating inconsistencies by simultaneous access to the same data. But sometimes it is necessary to have data private to one thread. This is usually easily accomplished by just using local variables and method or function parameters that are simply not naturally available to the other threads, unless they are referenced by some global structures or by structures shared with other threads. But there are some cases when it is useful to have thread local variables.

The general idea is to call a function or method several times and to retain intermediate results somewhere. In object oriented programming attributes may be used for this, in procedural programming global variables or a structure that is passed around. In functional programming this is just forbidden and a workaround is always possible but this may be painful, complicated, inefficient or impossible due to the usage of existing functionality written by others.

Now the global variable fails miserably, one of the reasons, it should not be used. And its object-oriented blessed brother, like a static attribute in Java or C++, as well. Multiple threads may run simultanous sessions with these functionalities. So thread local variables are the better approach.

They can easily be built manually. You just need to use a hash map, make sure its access is thread safe by using RWLocks in C or ConcurrentHashMap or some synchronized HashMap in Java. Then use the thread ID as key to the hash map and store some structure in it where all the bad stuff is stored. Or use multiple hash maps.

But both languages provide better facilities for this purpose. In newer Java versions ThreadLocal should be used. This needs an initializer. Then it provides a get method that accesses the thread local copy of the variable and creates it, if it is not present. This approach is strongly recommended when dealing the DateFormat and NumberFormat, which are not thread safe in Java. Using synchronized on the format or creating it on demand each time works as well, but the thread local approach seems to be the most strait-forward and most efficient way. But these special examples should lose their importance. Using Joda-Time or the new Java-8 functionality for Date and Time should make SimpleDateFormat, DateFormat, Calendar and java.util.Date obsolete. And NumberFormat is still useful, but in many cases replaceable by String.format() and the C-like functionality.

In C a function pthread_key_create can be used to create a thread local veriable, pthread_setspecific() and pthread_getspecific() allow to store and retrieve a thread local veriable and hence thread local data. Please read the man pages instead of memorizing the exact API.

I might provide or link example programs for these concepts in some future posting.

Share Button

O’Reilly schließt Aktivitäten in Deutschland

This article is of minor relevance to English speaking readers, because it concerns the German O’Reilly books, while the English O’Reilly books continue to be available. So I will write in German.

Der O’Reilly-Verlag hatte jahrelang eine deutsche Niederlassung in Köln, wo deutschsprachige Versionen der beliebten Informatik-Fachbücher herausgegeben wurden. Dies waren oft Übersetzungen, aber in einigen Fällen auch exklusiv in deutscher Sprache erscheinende Titel oder doch Titel, die später vom Deutschen ins Englische übersetzt wurden. Der d-Punkt-Verlag wird die Aktivitäten übernehmen und auch unter dem Name O’Reilly weiterführen.

Mehr dazu unter
Informatik Aktuell.

Share Button

How to multiply long numbers

Assuming we have a multiplication of n-bit-numbers already available.
Other than typical compiled languages, which assume that the multiplication result fits into the same size as the two factors, we should take a closer look.
Unsigned n-bit-numbers as factors x and y fullfill

    \[0 \le x \le 2^n-1\]

    \[0 \le y \le 2^n-1\]

So we have for the product z=x*y

    \[0 \le z \le (2^n-1)^2 \le 2^{2n}-1\]

So we should be prepared for a 2n-bit long result. Assembly language provides for this. In Java we are kind of running out of luck, because we have the 64-bit-long as the largest possible value, so we cannot make use of the 64-bit capabilities of our CPU by getting a 128-bit-result. Even worse we do not have unsigned primitive types. For addition and subtraction it is possible to cheat by assuming that the longs are unsigned, because the addition for signed and unsigned numbers is almost the same, but for multiplication this is slightly more complex.

In C we do have better chances, but we need to go compiler and OS-specific to some extent. gcc on 64-bit platforms has a type __uint128_t and we can hope that something like this

__uint128_t mul(unsigned long long x, unsigned long long y) {
  __uint128_t xx = (__uint128_t) x;
  __uint128_t yy = (__uint128_t) y;
  __uint128_t z = xx*yy;
  return z;
}

is optimized by the compiler to one assembly language instruction that multiplies two unsigned 64-bit-integers into one unsigned 128-bit integer.

Multiplication of longer numbers is achieved by just assuming

    \[ x= \sum_{j=0}^{m-1}2^{nj}x_j, y= \sum_{j=0}^{m-1}2^{nj}y_j\]

with

    \[ \bigwedge_{j=0}^{m-1} 0\le x_j, y_j <2^n.\]

Now the product is

    \[ z = \sum_{j=0}^{m-1}\sum_{k=0}^{m-1}2^{n(j+k)}x_j y_k \]

where each x_j y_k summand needs to be split into a lower and an upper part such that

    \[x_j y_k = 2^n h(x_j y_k) + l(x_j y_k).\]

All these half products need to be added with propagating the carry bits to the next higher level.
This can be sorted out and done with a lot of additions with carry-bit, finally giving the desired result.

For reasonably short numbers this is a good approach, if it is done well, but it can be replaced by better algorithms for larger numbers.

The most obvious next step is to move to the Karatsuba algorithm.
Assuming we have two factors x=x_l + 2^m x_h and y = y_l + 2^m y_h the naïve approach of having to calculate the four products
x_h y_h, x_h y_l, x_l y_h, x_l y_l can be replaced by calculation of only three products x_h y_h, (x_l + x_h)(y_l+ y_h), x_l y_l, because what is actually needed is the sum x_h y_l+x_l y_h which can be obtained by

    \[(x_l + x_h)(y_l+ y_h)-x_l y_l-x_h y_h=x_h y_l+x_l y_h.\]

That can be iterated by subdividing the numbers subsequently into halves until a size is reached that can be handled well enough by the naïve approach.
An issue that needs to be observed is that x_l + x_h and y_l+ y_h need one bit more than their summands.

For even larger numbers the so called Toom-Cook algorithm, an extension of the idea of the Karatsuba-algorithm for more than two summands, can be useful and for numbers larger than that the Schönhage-Strassen multiplication proves to be faster. It is implemented in some libraries for long integer arithmetic, like GMP. Since around 2007 this can be improved by using the Fürer algorithm which is assymptotically faster, but I do not know of any useful implementations and for practical purposes the improvement does not seem to be very relevant.

Addition 2019-10-08: In 2019 an algorithm has been discovered, that is assumed to be assymptotically O(n \log n), which is according to a conjecture of Schönhage the best possible result. It has not yet been proven rigorously and the conjecture is still a conjecture. The question is, if for practical purposes of multiplying numbers that still fit into real computers, there is a real gain from these algorithms. So the questions is, if they become faster than Schönhage-Strassen on numbers that are not too long to be practically multiplied. So right now this is really mostly an issue of mathematics and theoretical informatics. See also Maths whiz solves 48-year-old multiplication problem.. And see for the paper.

Share Button

Testing Java- and C-programs with Ruby and Perl

It is very important to write good unit tests for software that is non-trivial and that is relied on by other pieces of software.
Often the logic of the software can easily be covered by the native testing facilities of the programming language, like JUnit for Java or, much less well known but available, CUnit for C. When a lot of framework code is involved or third party libraries are used heavily, there is almost no other way for certain tests, because setting up the environment cannot easily be achieved elsewhere.

But we also encounter cases where writing good unit tests in the same language as the library itself becomes a pain. We procrastinate the issue of writing them and end up with way too little or no unit tests at all. A lot of software deals with doing some calculations or transformations of numbers and strings, usually a lot of numbers and a lot of strings. Now Strings are the strength of the Perl programming language and are not really implemented very well or at least not very powerful or easy to use in most other languages. Specifically the String facilities of Perl are much superior to those of Java and C. Off course our software needs to perform well, it needs to integrate into the environment and follow the global corporate software standards, so Java or C or some other programming language is the choice that should not be challenged here for the productive software. But some tests of the functionality can more easily be achieved by iterating over some input data and creating output of the input data combined with the results. This can be perl code already or something really easy to parse. Perl is really the tool that can parse almost anything, but we do not really want to be distracted by unnecessary work but get our job done. So something like generating Perl code or CSV or YAML or JSON, but please not XML if not really needed, should do. Then we can pipe the output to perl or to a perl script and this will tell us, if everything is ok. When we know our platform, it can even be done that the Java- or C-Unit-test stores the output to a file or pipe and calls the Perl script on it and fails or succeeds depending on its output.

When it comes to numeric types, Ruby is very strong. It has unlimited size integers by default, which can be casted to n-bit-integers using constructions like
xx=x&(1<<n)-1,
it has Rational, LongDecimal (as an external gem) and Complex and is easily extendible.

Usually we can expect that corporate constraints on which tools and programming languages may be used are less restrictive when it comes to unit tests. Integrating this on a continuous integration platform is a job that needs to be addressed but it is worth the effort, if a lot of tests become easier with this approach. And doing tests in another language makes tests more credible.

Off course the general idea is applicable for other combinations. Look into Scala, F#, Clojure, JavaScript, Python and some others as well, if they seem to be more helpful than Ruby or Perl for your unit testing automation. But this does indeed raise the question if a world where corporate policies allowed Scala and Closure instead of Java, F# instead of C# and Elixir instead of Erlang and PL/I instead of Cobol would be better.

Share Button

How to recover the Carry Bit

As frequent readers might have observed, I like the concept of the Carry Bit as it allows for efficient implementations of long integer arithmetic, which I would like to use as default integer type for most application development. And unfortunately such facilities are not available in high level languages like C and Java. But it is possible to recover the carry bit from what is available in C or Java, with some extra cost of performance, but maybe neglectable, if the compiler does a good optimization on this. We might assume gcc on a 64-Bit-Linux. It should be possible to do similar things on other platforms.

So we add two unsigned 64-bit integers x and y and an incoming carry bit i\in\{0, 1\} to a result

    \[z\equiv x+y+i \mod 2^{64}\]

with

    \[0 \le z < 2^{64}\]

using the typical „long long“ of C. We assume that

    \[x=2^{63}x_h+x_l\]

where

    \[x_h \in \{0,1\}\]

and

    \[0 \le x_l < 2^{63}\]

. In the same way we assume y=2^{63}y_h + y_l and z=2^{63}z_h + z_l with the same kind of conditions for x_h, y_h, z_h or x_l, y_l, z_l, respectively.

Now we have

    \[0 \le x_l+y_l + i \le 2^{64}-1\]

and we can see that

    \[x_l + y_l + i = 2^{63}u + z_l\]

for some

    \[u\in \{0,1\}\]

.
And we have

    \[x+y+i = 2^{64}c+z\]

where

    \[c\in\{0,1\}\]

is the carry bit.
When looking just at the highest visible bit and the carry bit, this boils down to

    \[2c+z_h = x_h + y_h + u\]

This leaves us with eight cases to observe for the combination of x_h, y_h and u:

x_hy_huz_hc
00000
10010
01010
11001
00110
10101
01101
11111

Or we can check all eight cases and find that we always have

    \[c = x_h \wedge\neg z_h \vee y_h \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h\]

or

    \[c = (x_h \vee y_h) \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h.\]

So the result does not depend on u anymore, allowing to calculate it by temporarily casting x, y and z to (signed long long) and using their sign.
We can express this as „use x_h \wedge y_h if z_h=1 and use x_h \vee y_h if z_h = 0„.

An incoming carry bit d does not change this, it just allows for x_l + y_l + d < 2^{64}, which is sufficient for making the previous calculations work.

In a similar way subtraction can be dealt with.

The basic operations add, adc, sub, sbb, mul, xdiv (div is not available) have been implemented in this library for C. Feel free to use it according to the license (GPL). Addition and subtraction could be implemented in a similar way in Java, with the weirdness of declaring signed longs and using them as unsigned. For multiplication and division, native code would be needed, because Java lacks 128bit-integers. So the C-implementation is cleaner.

Share Button

Microsoft dropping Windows-Support

Microsoft’s CEO Satya Nadella announced that Microsoft is withdrawing its support for the MS-Windows platform and concentrating on its core business, which is „services – services – services“. For a transition period of a few months, support for MS-Windows can still be obtained at an increased rate from Microsoft, but development and support have become too inefficient to be profitable and so the platform will no longer be supported by Microsoft in the future. Negotiations with Richard Stallman from the Free Software Foundation are performed about open sourcing Windows and having it thus continuously available for the existing customer basis. The FSF could take over the source code, continue the development as open source software and charge money for the support. But these talks seem to become extremely difficult, because Richard Stallman insists on not using the word „open source“ during the negotiations. Microsoft recommends using Linux as operating system for both server and desktop and plans to complete the transition of its own infrastructure within the next 8 months.

Nadella points out that this will increase the profitability of Microsoft and provide benefit to the stock holders, because the core business, services, can now be dealt much more focussed and intensely.

Share Button

Binary Data

Having discussed to some extent about strings and text data, it is time to look at the other case, binary data.

Usually we think of arrays of bytes or sequences of bytes stored in some media.

Why bytes? The 8-bit-computers are no longer so common, but the byte as a typical unit of binary data is still present. Actually this is how files on typical operating systems work, they have a length in bytes and that number of bytes as content. When we use the file system, we have to deal with this.

But assuming a perfect lossless world, the typical smallest unit of information is not a byte, but a bit. So we could actually assume that we have an array of bits of a given length l and the length does not need to be a multiple of 8. This might be subtle, but it does describe the most basic case. Padding with 0-bits does not usually do the job, but there need to be other provisions to deal with the bit-length and the storage length. In case of files this could be achieved by storing the last three bits b of the bit length in the first byte of the file and using the file size s the bit length l can be calculated as 8(s-1)+b.

How about memory? Often the array of bytes is just fine. The approach to have an array of booleans as bits usually hurts, because typical programming languages reserve 32 bit for each boolean and we actually want a packed array. On the other hand we have 64bit-computers and we would like to use this when moving around data, so actually thinking of an array of 64-bit-integers plus information about how many of the bits of the last entry actually count might be better. This does bring up the issue of endianness at some point.

But we actually want to use the data. In the good old C-days we could use quite a powerful trick: We read an array of bytes and casted the pointer to a pointer to a struct. Then we could access the struct. Poor mans serialization, but it was efficient and worked especially well for records with fixed size, for both reading and writing. Off course variable length records can be dealt with as well by imposing rules that say something about the type of the next record based on the contents of the current record.

The Perl programming language and subsequently the Ruby programming language had much more powerful and flexible mechanisms using methods of functions called pack and unpack.

Share Button

PI-Day

When using an arguable stupid American date format, today’s date looks like „03/14/15“ instead of 2015-03-14, which are the first few digits of \pi.

Now we all know how to calculate \pi, we just use the Taylor series of \arcsin or \arctan, which unfortunately does not converge too well when used naïvely, but there are tricks to make it converge by calculating \arctan(x) or \arcsin(x) of some x with 0 < x < \frac{1}{2} resulting in a fraction of \pi that can easily be multiplied to get \pi from it. This approach is quite ok and understandable with relatively basic mathematical background, when allowing for some inaccuracy and intuition when trying to understand the Taylor theorem. But we can actually do better. \pi is somehow weird for being so irrational and even non-algebraic and also in some other aspects, when investigating it.

So here are some cool approaches to calculating \pi that are efficient and beatiful, but require a little bit more advanced math or trust to the ones who provided it (which should be ok for most people, but a no-go for mathematicians):

AGM

Use the arithmetic-geometric mean, also mentioned shortly in Geometric and Harmonic Rounding:

The arithmetic-geometric mean is included in long-decimal, and long-decimal uses the AGM-method for calculating pi.

Elliptic Integrals

Elliptic Integrals are somewhat weird, because they are easy to write down, but not really easy to solve. But numerical approximations to their calculation can also help finding about the value of \pi to some desired precision. Since elliptic functions, elliptic curves and elliptic integrals have fascinated mathematicians )and at least in the case of elliptic curves over finite fields also computer scientists) a lot, this must be cool.

Other

If you just want to calculation 10000 digits on your computer (should be Linux or something with Ruby preinstalled), just type:

gem install long-decimal
ruby -e 'require "long-decimal";puts(LongMath.pi(10000))'

That should give you some bed time lecture for the \pi-day. Enjoy it.
And please, do not memorize \pi to 10000 digits. It is possible, but there is stuff more worth while memorizing.

And please, do use a decent date format, even if you are American.

Share Button

Why I still like Ruby

Some years ago Ruby in conjunction with rails was an absolute hype. In the Rails User Group in Zürich we had meetups with 30 people every two weeks. The meetings every two weeks have been retained, but often we are just five to ten people now. Is the great time of Ruby over or is it just a temporary decline? Is Perl 6 now taking over, since it is ready?

Perl 6 is cool and could challenge Ruby and it is now getting more and more towards production readiness. This will happen next year. This is what we will say next year. But Perl 6 is or will be very cool, stay tuned… What about Perl 5? At least Allison Randall still likes Perl (5). But it has other areas of strength than Ruby, so I will leave my opinionated writing focused on Ruby.

But the hype is right now in the area of JavaScript.

There are approaches to use server side JavaScript, which are interesting, because JavaScript needs to be dealt with on the client anyway and then it is tempting to have the same language on both sides.

There are also approaches to move back from the server to the client and put more functionality into the client and make the server less relevant, at least the percentage of development effort of the client gets more and of the server gets less.

There are cool frameworks for the client development in JavaScript and in conjunction with HTML5 and these frameworks a lot can be achieved.

On the other hand we are exploring NoSQL-databases like MongoDB and MongoDB is using JavaScript with some extension library as query language.

The general hype is toward functional programming and voila, JavaScript is functional, it is the most functional of the languages that have been established for a long time and brought the functional paradigm to every developer and even to the more sophisticated web designer, everything under the radar and long time before we knew how cool functional programming is. On the other hand, Ruby (and by the way Perl and especially Perl6) allow programming according to the functional paradigm as well, if used appropriately. But in JavaScript it is slightly more natural.

Then we have more contenders. Java is not dying so soon and it is still strong on its web frameworks, JavaServerFaces and one million others, not so bad actually.

And the idea of rails or at least its name, has been taken over by the groovy guys to develop groovy on grails, which integrates nicely with a Java enterprise backend.

And the more functional languages like F#, Scala, Haskell, Erlang, Elixir (and some others for the guys that don’t find Haskell theoretical enough) are around and gaining or retaining some popularity. Will Scala support the Erlang-VM as alternative backend, like JavaScript and JVM are supported today (and dotnet-CLI in the past)? And Scala has a promising web framework with play, that does WebServices right, which is what is actually needed for the modern client side JavaScript frameworks, just as an example.

PHP was always the ugly baby, why did they implement a second Perl based on the subset of Perl they understood? Well, I am not advocating PHP at all, but there are some pretty decent PHP-applications that are working really well with high load, like Wikipedia.

The unequal twin of Ruby, Python, is doing pretty well as well on attempting to get a web framework called Django. I have not investigated it at all, but it seems to exist.

So, where is ruby and rails? Ruby is a beautiful language, that has done many things better than any of the languages mentioned here:

It is easy to learn.

It has a nice and complete and consequent concept for object orientation.

It allows all the functional programming, most of it pretty natural.

It has decent numerical types by default. Integer grow to arbitrary length, Rationals are included and Complex numbers are included. JavaScript does not even have integers, it just has one numerical type: double precision floating point.

It allows extending, even operator overloading for new numerical types.

And it has a useful and easily accessible web framework like rails and actually some contender as alternative web frameworks, like camping, sinatra and some others.

And the development effort and the amount of code needed for a certain functionality is still lower than in Java or C# or COBOL.

I do believe that the web applications that are mostly server based and are using just HTML or minimal JavaScript still have their place and will be discovered again as useful for many application areas.

To conclude, I think that Ruby did not make it to become the replacement for Java, that some saw in it and that was due according to the history of replacement of other mainstream languages that have moved to legacy in the past. But I do believe that it will play an important role for some years to come and beyond the current JavaScript hype. Off course it will be challenged again in the future and maybe something will replace the main stream and will make Ruby on Rails just another legacy area for web applications. But I don’t even have an idea what that will be. I don’t think JavaScript. Perl6, Scala, F# or Clojure do have technical potential to do so, but I do not see that happening in the near future. Maybe something new will come up? Or something old? Just remember, Ruby is about as old as Java.

We will see.

Read more in this discussion on Quora.

For those who are interested, I actually teach Ruby and it is possible to arrange trainings for two to five days, depending on the previous knowledge of the participants and the goals.

Share Button

Indexing of long utf-8 files or strings

The UTF-8 format has the disadvantage that the length of characters and code points varies. Accessing a given position counted in characters is only possible by starting from the beginning or by providing an indexing structure. It is a good idea to find a balance between size and speed. So indexing blocks of several kilobyte size and scanning through them to reach the exact desired position should achieve both goals. The ideas described here can also be applied for compressing a long string or file by using the compression for these multi-kilobyte-blocks separately. In a way utf-8 is a compression scheme for allowing to store many of the four byte long code points in one or two or three bytes. Let us just assume we have characters and bytes. It could be arbitrary length numbers and bytes. And let us talk about compressed and uncompressed blocks.

Grouping and indexing can be done by having approximately a certain number of bytes in the compressed block, probably the largest number of characters that fit into the block. Now blocks can start at multiples of their maximum size and for each block we have meta information about its position in the chain of blocks, its uncompressed size and the position of its first entry in a totally uncompressed file or string. If the indexing structure is kept in memory only, this is quite easy to achieve. If it is stored in the file, some approach might be chosen: Add an additional indexing file to each „compressed“ text file, add the meta information in the beginning of the file, thus limiting the number of blocks at create time, or add the meta information to the end of the file. Or in some file systems add it into a second stream, which seems to be the cleanest approach, but only applicable for some file systems that are not main stream in the Unix- and Linux-world.

The other way around the uncompressed data can be grouped into blocks and then stored in a dense way. Then the meta information needs to contain the address of the first byte of the block. Both approaches seem to be reasonable.

It should be observed that random access reading is possible with this approach, but random access writing is doomed to fail, so it cannot be easily supported, unless the blocks can be stored anywhere and the meta information is used to contain all the information, current block size in bytes, current block size in characters, current location of first byte, position of block in the chain, number of the first character. When inserting characters, all metainformation needs to be updated and the block in which the change happens needs to be rewritten completely, but there could be ways to achieve this.

The question arises if this has been standardized in some way. I have not heard about such a standard, but I am observing that dealing with UTF-8 is always a mess or that this issue is just neglected. Databases have off course answered these kind of questions in their way quite well, but I think there could be a useful standard for compressed random access text files, maybe even supported by the libc or by the Posix standard.

If a typical compression is applied, it is possible to have a common dictionary for frequently occuring blocks and their encoding in one place of the file and just store the encoded data in the actual blocks, avoiding duplication. That would require a third area for storing data.

Share Button