Scala Days 2015

I have visited Scala Days 2015 in Amsterdam.
As usual, I took the night train to get there and back.

Creative Scala

Prior to the conference we did a training session for Scala beginners together with underscore. Some volunteers, including myself, joined the effort and so we were actually more teachers than students. It was fun.

Talks I listend to:

Monday 2015-06-08

* Monday-Keynote: Scala – where it came from, where it’s going / Martin Odersky

Tuesday 2015-06-09

* Tuesday-Keynote: Life Beyond the Illusion of Present / Jonas Bonér
* GraphX: Graph analytics for insights about developer communities / Paco Nathan @pacoid
* Meerkat parsers: a general parser combinator library for real programming languages / Ali Afroozeh @afruze & Anastasia Izmaylova @IAnastassija
* Fixing Reactive Code at 100 Miles per Hour: Five Techniques to Improve How You Debug Scala and Akka / Tal Weiss @weisstal
* Options in Futures, how to unsuck them / Erik Bakker @eamelink
* State of the Meta, Summer 2015 / Eugene Burmako @xeno_by
* Function-Passing Style, A New Model for Asynchronous and Distributed Programming / Philipp Haller @philippkhaller & Heather Miller @heathercmiller
* Essential Scala: Six Core Principles for Learning Scala / Noel Welsh @noelwelsh
* Project Gålbma: Actors vs. Types / Roland Kuhn @rolandkuhn

Wednesday 2015-06-10

* Wednesday-keynote: The Future of AI in Scala, and on the JVM / Adam Gibson
* Why Spark Is the Next Top (Compute) Model / Dean Wampler @deanwampler
* The Reactive Streams Implementation Landscape / Mathias Doenitz @sirthias
* Reactive Slick for Database Programming / Stefan Zeiger @StefanZeiger
* So how do I do a 2-phase-commit with Akka then? / Lutz Huehnken @lutzhuehnken
* Functional Data Validation (or How to Think Functionally) / Dave Gurnell @davegurnell
* Don’t Block Yourself / Flavio Brasil
* Closing Panel and Thank You

More details will come…

Link: Scala Days 2014

Share Button

Update of WordPress

Deutsch

The update of WordPress and its Plugins did not really work.

A small change helped a lot:

In .htaccess put the following line to the beginning of the file:

AddHandler php56-cgi .php

Then all updates seemed to work fine.

Maybe this helps others observing the same problem.

Share Button

Aktualisierung von WordPress

English

Die Aktualisierung von WordPress und Plugins hat hier nicht so gut funktioniert.

Es hat eine kleine Änderung geholfen, jetzt ist alles gut:

In .htaccess am Anfang diese Zeile

AddHandler php56-cgi .php

einfügen…
Danach funktionierten alle Updates gut.

Vielleicht hilft das auch anderen, die damit Probleme hatten….

Share Button

Find the next entry in a sequence

In Facebook, Xing, Google+, Vk.com, Linkedin and other of these social media networks we are often encountered with a trivial question like this:

1->2
2->8
3->18
4->32
5->50
6->72
7->?

There are some easy patterns. Either it is some polynomial formula or some trick with the digits.
But the point is, that any such sequence can easily be fullfilled by a polynomial formula. That means we can put any value for 7 and make it work. Or any answer is correct. So what would probably be the real question is the most simple function to full-fill the given constraints. Simplicity can be measured in some way… If the solution is unique is unclear, but let us just look at the polynomial solution.

A function is needed that takes as parameter a list of key-value-pairs (or a hash map) and that yields a function such that the function of any of the key is the associated value.

Assuming a polynomial function in one variable we can make use of the chinese remainder theorem, which can be applied to univariate polynomials over a field F as well as to integral numbers. For a polynomial p(X) we have

    \[p(x) \equiv p(X) \mod X-x\]

where X is the polynomial variable and x\in F is a concrete value.

We are looking for a polynomial p(X) such that for given values x_0,\ldots x_{n-1}, y_0,\ldots,y_{n-1} \in F we have

    \[\bigwedge_{i=0}^{n-1} p(x_i) = y_i\]

or in another way

    \[\bigwedge_{i=0}^{n-1} p(X) \equiv y_i \mod X-x_i\]

which is exactly the Chinese remainder theorem.
Let

    \[I=\{0,\ldots,n-1\}\]

and

    \[\bigwedge_{j=0}^{n-1} I_j = I \setminus \{j\}\]

We can see that for all i \in I the polynomials

    \[e_i = \prod_{j \in I_j} \frac{X-x_j}{x_i-x_j}\]

have the properties

    \[e_i(x_i)=1\]

    \[\bigwedge_{j \in I_i} e_i(x_j)=0\]

or

    \[\bigwedge_{i \in I}\bigwedge_{j \in J} e_i(x_j)=\delta_{i,j}\]

where \delta_{i,j} is the Kronecker symbol, which is 0 if the two indices differ and 1 if they are equal.
Or as congruence:

    \[\bigwedge_{i \in I}\bigwedge_{j \in J} e_i(X)\equiv \delta_{i,j} \mod X-x_j\]

Then we can just combine this and use

    \[p(X) =\sum_{i \in I} y_i e_i(X)\]

This can easily be written as a Ruby function

def fun_calc(pairs)
  n = pairs.size
  result = lambda do |x|
    y = 0
    n.times do |i|
      p_i = pairs[i]
      x_i = p_i[0].to_r
      y_i = p_i[1].to_r
      z = y_i
      n.times do |j|
        if (j != i)
          p_j = pairs[j]
          x_j = p_j[0]
          z *= (x - x_j) / (x_i - x_j)
        end
      end
      y += z
    end
    y
  end
  result
end

This takes a list of pairs as a parameter and returns the polynomial function als lambda.
It can be used like this:

lop = [[0, 0], [1, 1], [2, 4], [3, 9], [4, 16], [5, 25], [6, 36], [7, 64]]

f = fun_calc(lop)

20.times do |x|
  y = f.call(x)
  puts sprintf("%6d -> %6d", x, y)
end

Put this together into a ruby program and add some parsing for the list of pairs or change the program each time you use it and all these „difficult“ questions „that 99.9% fail to solve“ are not just easy, but actually soluble automatically.

This is interesting for more useful applications. I assume that there will always be situations where a function is needed that meets certain exact values a certain inputs and is an interpolation or extrapolation of this.

Please observe that there are other interesting and useful ways to approach this:

  • Use a „best“ approximation from a set of functions, for example polynomials with a given maximum degree
  • use cubic splines, which are cubic polynomials within each section between two neighboring input values such that at the input values the two adjacent functions have the same value (y_i, of course), the same first derivative and the same second derivative.

For highway and railroad construction other curves are used, because the splines are making an assumption on what is the x-axis and what is the y-axis, which does not make sense for transport facilities. They are using a curve called Clothoid.

Use Java, C, Perl, Scala, F# or the programming language of your choice to do this. You only need Closures, which are available in Java 8, F#, Scala, Perl, Ruby and any decent Lisp dialect. In Java 7 they can be done with an additional interface as anonymous inner classes. And for C it has been described in this blog how to do closures.

Share Button

Thread local variables

The idea of threads is to share almost all information between the threads of the same process. This implies more effort to avoid creating inconsistencies by simultaneous access to the same data. But sometimes it is necessary to have data private to one thread. This is usually easily accomplished by just using local variables and method or function parameters that are simply not naturally available to the other threads, unless they are referenced by some global structures or by structures shared with other threads. But there are some cases when it is useful to have thread local variables.

The general idea is to call a function or method several times and to retain intermediate results somewhere. In object oriented programming attributes may be used for this, in procedural programming global variables or a structure that is passed around. In functional programming this is just forbidden and a workaround is always possible but this may be painful, complicated, inefficient or impossible due to the usage of existing functionality written by others.

Now the global variable fails miserably, one of the reasons, it should not be used. And its object-oriented blessed brother, like a static attribute in Java or C++, as well. Multiple threads may run simultanous sessions with these functionalities. So thread local variables are the better approach.

They can easily be built manually. You just need to use a hash map, make sure its access is thread safe by using RWLocks in C or ConcurrentHashMap or some synchronized HashMap in Java. Then use the thread ID as key to the hash map and store some structure in it where all the bad stuff is stored. Or use multiple hash maps.

But both languages provide better facilities for this purpose. In newer Java versions ThreadLocal should be used. This needs an initializer. Then it provides a get method that accesses the thread local copy of the variable and creates it, if it is not present. This approach is strongly recommended when dealing the DateFormat and NumberFormat, which are not thread safe in Java. Using synchronized on the format or creating it on demand each time works as well, but the thread local approach seems to be the most strait-forward and most efficient way. But these special examples should lose their importance. Using Joda-Time or the new Java-8 functionality for Date and Time should make SimpleDateFormat, DateFormat, Calendar and java.util.Date obsolete. And NumberFormat is still useful, but in many cases replaceable by String.format() and the C-like functionality.

In C a function pthread_key_create can be used to create a thread local veriable, pthread_setspecific() and pthread_getspecific() allow to store and retrieve a thread local veriable and hence thread local data. Please read the man pages instead of memorizing the exact API.

I might provide or link example programs for these concepts in some future posting.

Share Button

O’Reilly schließt Aktivitäten in Deutschland

This article is of minor relevance to English speaking readers, because it concerns the German O’Reilly books, while the English O’Reilly books continue to be available. So I will write in German.

Der O’Reilly-Verlag hatte jahrelang eine deutsche Niederlassung in Köln, wo deutschsprachige Versionen der beliebten Informatik-Fachbücher herausgegeben wurden. Dies waren oft Übersetzungen, aber in einigen Fällen auch exklusiv in deutscher Sprache erscheinende Titel oder doch Titel, die später vom Deutschen ins Englische übersetzt wurden. Der d-Punkt-Verlag wird die Aktivitäten übernehmen und auch unter dem Name O’Reilly weiterführen.

Mehr dazu unter
Informatik Aktuell.

Share Button

How to multiply long numbers

Assuming we have a multiplication of n-bit-numbers already available.
Other than typical compiled languages, which assume that the multiplication result fits into the same size as the two factors, we should take a closer look.
Unsigned n-bit-numbers as factors x and y fullfill

    \[0 \le x \le 2^n-1\]

    \[0 \le y \le 2^n-1\]

So we have for the product z=x*y

    \[0 \le z \le (2^n-1)^2 \le 2^{2n}-1\]

So we should be prepared for a 2n-bit long result. Assembly language provides for this. In Java we are kind of running out of luck, because we have the 64-bit-long as the largest possible value, so we cannot make use of the 64-bit capabilities of our CPU by getting a 128-bit-result. Even worse we do not have unsigned primitive types. For addition and subtraction it is possible to cheat by assuming that the longs are unsigned, because the addition for signed and unsigned numbers is almost the same, but for multiplication this is slightly more complex.

In C we do have better chances, but we need to go compiler and OS-specific to some extent. gcc on 64-bit platforms has a type __uint128_t and we can hope that something like this

__uint128_t mul(unsigned long long x, unsigned long long y) {
  __uint128_t xx = (__uint128_t) x;
  __uint128_t yy = (__uint128_t) y;
  __uint128_t z = xx*yy;
  return z;
}

is optimized by the compiler to one assembly language instruction that multiplies two unsigned 64-bit-integers into one unsigned 128-bit integer.

Multiplication of longer numbers is achieved by just assuming

    \[ x= \sum_{j=0}^{m-1}2^{nj}x_j, y= \sum_{j=0}^{m-1}2^{nj}y_j\]

with

    \[ \bigwedge_{j=0}^{m-1} 0\le x_j, y_j <2^n.\]

Now the product is

    \[ z = \sum_{j=0}^{m-1}\sum_{k=0}^{m-1}2^{n(j+k)}x_j y_k \]

where each x_j y_k summand needs to be split into a lower and an upper part such that

    \[x_j y_k = 2^n h(x_j y_k) + l(x_j y_k).\]

All these half products need to be added with propagating the carry bits to the next higher level.
This can be sorted out and done with a lot of additions with carry-bit, finally giving the desired result.

For reasonably short numbers this is a good approach, if it is done well, but it can be replaced by better algorithms for larger numbers.

The most obvious next step is to move to the Karatsuba algorithm.
Assuming we have two factors x=x_l + 2^m x_h and y = y_l + 2^m y_h the naïve approach of having to calculate the four products
x_h y_h, x_h y_l, x_l y_h, x_l y_l can be replaced by calculation of only three products x_h y_h, (x_l + x_h)(y_l+ y_h), x_l y_l, because what is actually needed is the sum x_h y_l+x_l y_h which can be obtained by

    \[(x_l + x_h)(y_l+ y_h)-x_l y_l-x_h y_h=x_h y_l+x_l y_h.\]

That can be iterated by subdividing the numbers subsequently into halves until a size is reached that can be handled well enough by the naïve approach.
An issue that needs to be observed is that x_l + x_h and y_l+ y_h need one bit more than their summands.

For even larger numbers the so called Toom-Cook algorithm, an extension of the idea of the Karatsuba-algorithm for more than two summands, can be useful and for numbers larger than that the Schönhage-Strassen multiplication proves to be faster. It is implemented in some libraries for long integer arithmetic, like GMP. Since around 2007 this can be improved by using the Fürer algorithm which is assymptotically faster, but I do not know of any useful implementations and for practical purposes the improvement does not seem to be very relevant.

Addition 2019-10-08: In 2019 an algorithm has been discovered, that is assumed to be assymptotically O(n \log n), which is according to a conjecture of Schönhage the best possible result. It has not yet been proven rigorously and the conjecture is still a conjecture. The question is, if for practical purposes of multiplying numbers that still fit into real computers, there is a real gain from these algorithms. So the questions is, if they become faster than Schönhage-Strassen on numbers that are not too long to be practically multiplied. So right now this is really mostly an issue of mathematics and theoretical informatics. See also Maths whiz solves 48-year-old multiplication problem.. And see for the paper.

Share Button

Testing Java- and C-programs with Ruby and Perl

It is very important to write good unit tests for software that is non-trivial and that is relied on by other pieces of software.
Often the logic of the software can easily be covered by the native testing facilities of the programming language, like JUnit for Java or, much less well known but available, CUnit for C. When a lot of framework code is involved or third party libraries are used heavily, there is almost no other way for certain tests, because setting up the environment cannot easily be achieved elsewhere.

But we also encounter cases where writing good unit tests in the same language as the library itself becomes a pain. We procrastinate the issue of writing them and end up with way too little or no unit tests at all. A lot of software deals with doing some calculations or transformations of numbers and strings, usually a lot of numbers and a lot of strings. Now Strings are the strength of the Perl programming language and are not really implemented very well or at least not very powerful or easy to use in most other languages. Specifically the String facilities of Perl are much superior to those of Java and C. Off course our software needs to perform well, it needs to integrate into the environment and follow the global corporate software standards, so Java or C or some other programming language is the choice that should not be challenged here for the productive software. But some tests of the functionality can more easily be achieved by iterating over some input data and creating output of the input data combined with the results. This can be perl code already or something really easy to parse. Perl is really the tool that can parse almost anything, but we do not really want to be distracted by unnecessary work but get our job done. So something like generating Perl code or CSV or YAML or JSON, but please not XML if not really needed, should do. Then we can pipe the output to perl or to a perl script and this will tell us, if everything is ok. When we know our platform, it can even be done that the Java- or C-Unit-test stores the output to a file or pipe and calls the Perl script on it and fails or succeeds depending on its output.

When it comes to numeric types, Ruby is very strong. It has unlimited size integers by default, which can be casted to n-bit-integers using constructions like
xx=x&(1<<n)-1,
it has Rational, LongDecimal (as an external gem) and Complex and is easily extendible.

Usually we can expect that corporate constraints on which tools and programming languages may be used are less restrictive when it comes to unit tests. Integrating this on a continuous integration platform is a job that needs to be addressed but it is worth the effort, if a lot of tests become easier with this approach. And doing tests in another language makes tests more credible.

Off course the general idea is applicable for other combinations. Look into Scala, F#, Clojure, JavaScript, Python and some others as well, if they seem to be more helpful than Ruby or Perl for your unit testing automation. But this does indeed raise the question if a world where corporate policies allowed Scala and Closure instead of Java, F# instead of C# and Elixir instead of Erlang and PL/I instead of Cobol would be better.

Share Button

How to recover the Carry Bit

As frequent readers might have observed, I like the concept of the Carry Bit as it allows for efficient implementations of long integer arithmetic, which I would like to use as default integer type for most application development. And unfortunately such facilities are not available in high level languages like C and Java. But it is possible to recover the carry bit from what is available in C or Java, with some extra cost of performance, but maybe neglectable, if the compiler does a good optimization on this. We might assume gcc on a 64-Bit-Linux. It should be possible to do similar things on other platforms.

So we add two unsigned 64-bit integers x and y and an incoming carry bit i\in\{0, 1\} to a result

    \[z\equiv x+y+i \mod 2^{64}\]

with

    \[0 \le z < 2^{64}\]

using the typical „long long“ of C. We assume that

    \[x=2^{63}x_h+x_l\]

where

    \[x_h \in \{0,1\}\]

and

    \[0 \le x_l < 2^{63}\]

. In the same way we assume y=2^{63}y_h + y_l and z=2^{63}z_h + z_l with the same kind of conditions for x_h, y_h, z_h or x_l, y_l, z_l, respectively.

Now we have

    \[0 \le x_l+y_l + i \le 2^{64}-1\]

and we can see that

    \[x_l + y_l + i = 2^{63}u + z_l\]

for some

    \[u\in \{0,1\}\]

.
And we have

    \[x+y+i = 2^{64}c+z\]

where

    \[c\in\{0,1\}\]

is the carry bit.
When looking just at the highest visible bit and the carry bit, this boils down to

    \[2c+z_h = x_h + y_h + u\]

This leaves us with eight cases to observe for the combination of x_h, y_h and u:

x_hy_huz_hc
00000
10010
01010
11001
00110
10101
01101
11111

Or we can check all eight cases and find that we always have

    \[c = x_h \wedge\neg z_h \vee y_h \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h\]

or

    \[c = (x_h \vee y_h) \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h.\]

So the result does not depend on u anymore, allowing to calculate it by temporarily casting x, y and z to (signed long long) and using their sign.
We can express this as „use x_h \wedge y_h if z_h=1 and use x_h \vee y_h if z_h = 0„.

An incoming carry bit d does not change this, it just allows for x_l + y_l + d < 2^{64}, which is sufficient for making the previous calculations work.

In a similar way subtraction can be dealt with.

The basic operations add, adc, sub, sbb, mul, xdiv (div is not available) have been implemented in this library for C. Feel free to use it according to the license (GPL). Addition and subtraction could be implemented in a similar way in Java, with the weirdness of declaring signed longs and using them as unsigned. For multiplication and division, native code would be needed, because Java lacks 128bit-integers. So the C-implementation is cleaner.

Share Button

Microsoft dropping Windows-Support

Microsoft’s CEO Satya Nadella announced that Microsoft is withdrawing its support for the MS-Windows platform and concentrating on its core business, which is „services – services – services“. For a transition period of a few months, support for MS-Windows can still be obtained at an increased rate from Microsoft, but development and support have become too inefficient to be profitable and so the platform will no longer be supported by Microsoft in the future. Negotiations with Richard Stallman from the Free Software Foundation are performed about open sourcing Windows and having it thus continuously available for the existing customer basis. The FSF could take over the source code, continue the development as open source software and charge money for the support. But these talks seem to become extremely difficult, because Richard Stallman insists on not using the word „open source“ during the negotiations. Microsoft recommends using Linux as operating system for both server and desktop and plans to complete the transition of its own infrastructure within the next 8 months.

Nadella points out that this will increase the profitability of Microsoft and provide benefit to the stock holders, because the core business, services, can now be dealt much more focussed and intensely.

Share Button