Christmas 2016

Christmas Tree 2016
Christmas Tree 2016

З Рiздвом Христовим − کريسمس مبارک − Nollaig Shona Dhuit − Kellemes Karácsonyi Ünnepeket − Feliz Natal − Gëzuar Krishtlindjet − Joyeux Noël − καλά Χριστούγεννα − Vesele Vianoce − Fröhliche Weihnachten − Vesele bozicne praznike − Bon nadal − God Jul − Glædelig Jul − Честита Коледа − Feliz Navidad − Sretan božić − Häid jõule − Zalig Kerstfeest − Prettige Kerstdagen − Merry Christmas − 즐거운 성탄, 성탄 축하 − Срећан Божић − Crăciun fericit − 圣诞快乐 − Feliĉan Kristnaskon − Buon Natale − Priecîgus Ziemassvçtkus − Veselé Vánoce − ميلاد مجيد − God Jul − क्रिसमस मंगलमय हो − Natale hilare − Wesołych Świąt Bożego Narodzenia − Hyvää Joulua − Gledhilig jól − Gleðileg jól − С Рождеством − Selamat Hari Natal − クリスマスおめでとう ; メリークリスマス − Su Šventom Kalėdom − Bella Festas daz Nadal − Mutlu Noeller

Times have changed. In 2015 I used a Perl 5 program to generate the christmas greetings in some arbitrary order. But in 2016 Perl6 has become production ready, so this should be used. It is much shorter anyway ?:


#!/usr/bin/env perl6
my @texts = ( 'Bella Festas daz Nadal', 'Bon nadal', 'Buon Natale', 'Crăciun fericit', 'Feliz Natal', 'Feliz Navidad', 'Feliĉan Kristnaskon', 'Fröhliche Weihnachten', 'Gledhilig jól', 'Gleðileg jól', 'Glædelig Jul', 'God Jul', 'God Jul', 'Gëzuar Krishtlindjet', 'Hyvää Joulua', 'Häid jõule', 'Joyeux Noël', 'Kellemes Karácsonyi Ünnepeket', 'Merry Christmas', 'Mutlu Noeller', 'Natale hilare', 'Nollaig Shona Dhuit', 'Prettige Kerstdagen', 'Priecîgus Ziemassvçtkus', 'Selamat Hari Natal', 'Sretan božić', 'Su Šventom Kalėdom', 'Vesele Vianoce', 'Vesele bozicne praznike', 'Veselé Vánoce', 'Wesołych Świąt Bożego Narodzenia', 'Zalig Kerstfeest', 'καλά Χριστούγεννα', 'З Рiздвом Христовим', 'С Рождеством', 'Срећан Божић', 'Честита Коледа', 'ميلاد مجيد', 'کريسمس مبارک', 'क्रिसमस मंगलमय हो', 'クリスマスおめでとう ; メリークリスマス', '圣诞快乐', '즐거운 성탄, 성탄 축하');
my @shuffled = @texts.pick(*);
say @shuffled.join(" − ");

Share Button

HTTPS

This Blog is now using https. So the new URL is https://brodowsky.it-sky.net/. The old URL http://brodowsky.it-sky.net/ is no longer supported, but it is automatically forwarded to the https-URL.

If you like to read more about changing the links within the Blog you can find information on Vladimir’s Blog including a recipe, both in German.

Share Button

Is Java becoming non-free?

We are kind of used to the fact that Java is „free“.
It has been free in the sense of „free beer“ pretty much forever.
And more recently also „free“ in the sense of „free speech“.

In spite of the fact that we read that „Oracle is going to monetize on Java“, as can be read in articles like this, it is remaining like that, at least for now. This is also written in the article.
But it seems that they are looking for loopholes. For example we download and install Java SE including X, Y and Z, because it comes like that. Agree to hundred pages of license text and confirm having read and understood everything, as always… Now we really need X, which is the JDK, which is actually free. But we just accidentally also install Y and Z, which we do not need, but which has a price tag on which they are trying to get us.

Even if nothing will really happen, issues like that help undermining the trust in the platform in general, not only for Java, but also for other JVM-languages. Eventually there could be forks like we have seen with LibreOffice vs. OpenOffice or with mariaDB vs. mySQL, which kind of took over by avoiding the ties to Oracle. Solaris seems to have a similar fork, but in this case people are just moving to Linux anyway, so the issue is less relevant.

These prospects are not desirable, but I think we do not have to panic, because there are ways to solve this that are going to be pursued if necessary. Maybe it is a good idea to be more careful when installing software. And to think twice when starting a new project if Oracle or PostgreSQL is the right DB product in the long term, taking into consideration Oracle’s attitude towards loyal long term customers.

It is regrettable. Oracle has great technology from their own history and from SUN in databases, Java including the surrounding universe, Solaris and hardware. Let us hope that they will stay reasonable at least with Java.

Share Button

JMS

Java has always not just been a language, but it brought us libraries and frameworks. Some of them proved to be bad ideas, some become hyped without having any obvious advantages, but some were really good.

In the JEE-stack, messaging (JMS) was included pretty much from the beginning. In those days, when Java belonged to Sun Microsystems and Sun did not belong to Oracle, an aim was to support databases, which was in those days mostly Oracle, via JDBC and so called Message oriented middleware, which was available in the IBM-world via JMS. JMS is a common interface for messaging, that is like sending micro-email-message not between human, but between software components. It can be used within one JVM, but even between geographically distant servers, provided a safe network connection exists. Since we all know EMail this is in principle not too hard to understand, but the question is, what it really means and if it brings us something that we do not already have otherwise.

We do have web services as an established way to communicate between different servers across the network and of course they can also be used locally, if desired. Web services are neither the first nor the only way to communicate between servers nor are they the most efficient way. But I would say that they are the way how we do it in typical distributed applications that are not tied to any legacy. In principal web services are network capable and synchronous. This is well understood and works fine for many applications. But it also forces us to block processes or threads while waiting for responses, thus occupying valuable resources. And we tend to loose responsiveness, because of the waiting for the response. It needs to be observed that DB-access is typically only available synchronously. In a way understandable because of the transactions, but it also blocks resources to a huge extent, because we know that the performance of many applications is DB driven.

Now message based software architectures think mostly asynchronously. Sending a message is a „fire and forget“. There is such a thing as making message transactional, but this has to be understood correctly. There is one transaction for sending the message. It is guaranteed that the message is sent. Delivery guarantees can only be given to a limited extent, because we do not know anything about the other side and if it is at all working. This is not checked as part of the transaction. We can imagine though that the messaging system has its own transactional database and stores the message there within the transaction. It then retries delivering it forever, until it succeeds. Then it is deleted from this store as part of the receiving transaction. Both these transactions can be part of a distributed transaction and thus be combined with other transactions, usually against databases, for a combined transaction. This is what we usually have in mind when talking about this. I have to mention that the distributed transaction, usually based on the so called two phase commit, is not quite as water proof as we might hope, but it can be broken by construction of a worst case scenario regarding the timing of failures of network and systems. But it is for practical purposes reasonable good to use.

While it is extremely interesting to investigate purely message based architectures, especially in conjunction with functional paradigm, this may not be the only choice. Often it is a good option to use a combination of messaging with synchronous services.

We should observe that messaging is a more abstract concept. It can be implemented by some middle ware and even be accessible by a standardized kind of interface like JMS. But it can also be more abstract as a queuing system or as something like Akka uses for its internal communication. And messaging is not limited to Java or JVM languages. Interoperability does impose some constraints on how to use it, because it bans usage of Object-messages which store serialized Java objects, but there are ways to address this by using JSON or BSON or XML or Protocol Buffers as message contents.

What is interesting about JMS and messaging in general are two major communication modes. We can have queues, which are point to point connections. Or we can have „topics“, which are channels into which messages are sent. They are then received by all current subscribers of the topic. This is interesting to notify different components about an event happening in the system, while possibly details about the event must be queried via synchronous services or requested by further messaging via queues.

Generally JMS in Java has different implementations, usually there are those coming with the application servers and there are also some standalone implementations. They can be operated via the same interface, at least as long as we constrain us to the common set of functionality. So we can exchange the JMS implementation for the whole platform (which is a nightmare in real life), but we cannot mix them, because the wire protocol is usually incompatible. There is now something like a standard network protocol for messaging, which is followed by some, but not all implementations.

As skeptical as I am against Java Enterprise edition, I do find the JMS part of enterprise Java very interesting and worthwhile exploring for projects that have a size and characteristics justifying this.

Share Button

Clojure Exchange 2016

I have just visited Clojure Exchange. Since it had only one track, there is no point in listing which talks I have attended, since this can easily be seen on the web page of the conference.

It was interesting and there were many great talks and I also met great people among the other participants.

Share Button

Devoxx 2016 Visit

As already written in Devoxx 2016 I visited Devoxx in Antwerp 2016.

Hot topics where Java 9 and the functional features of Java 8. But there was a wide range of talks. As in previous years visitors can watch all the talks that they missed or found interesting enough to re-watch online afterwards. In earlier years it was done with „Parlays“ and only available to visitors or to those who pay for it, while it is now available on youtube for everybody. Since the conference has been sold out long before it started, this does not seam to stop people from buying tickets for the conference.

So here is what I did.
Wednesday:

Thursday:

Friday:

Find Links here….

I guess that’s it for today… I hope to visit Antwerp for Devoxx next year again.

Share Button

Devoxx 2016

I am going to the Devoxx in Antwerp 2016.
Updates about what I did will follow soon.

As a starter here is my Devoxx-Talk. Let this be the main content for this posting, which is mostly video instead of text. Here is the github repo with the code examples.

Other Links:

Share Button

How to calculate transcendental functions

There is sometimes need to calculate transcendental functions like \sin, \exp, \log or \tan^{-1}. We get them from the library and the library relies on implementations in the CPU for most of them. This is true, if we like to do them in „double“ format, which is the standard way of doing floating point arithmetic. But it can be interesting how these can be calculated to a given precision or to calculate functions that are not in the library and not easily composed from the library functions. There are many ways to do this and actually the naïve way of using the Taylor-series

    \[f(x) = \sum_{j=0}^\infty a_j (x-x_0)^j\]

is often not such a bad idea, if done correctly.
We know from math what to use for the coefficients a_j and for which ranges of x this converges. For limited fixed precision it is possible to tune the coefficients a bit and get better results with a fixed number of summands. For arbitrary precision we need to be more flexible and cannot be prepared for this exact precision.

Now mathematically we can often have a converging series, for example if we have

    \[f(x) = \sum_{j_0}^\infty \frac{x^j}{j^2}.\]

This converges for |x|\le 1, but the convergence is not necessarily computer friendly. It can be proved easily, that this series converges for |x| \le 1, but for |x|=1 it converges slowly. To give an idea, if we are calculating with 100 digits after the decimal point then we would still have single terms in the area of our desired precision for j=10^{50} and since they get smaller only slowly, we would have to go much further. This is impossible to use.

As a rule of thumb the coefficients are not our friends. They may or may not converge towards zero, but we really have to rely on the (x-x_0)^j-part to get diminishing summands. A good idea is to consider |x-x_0| \le \frac{1}{2} if the coefficients are bounded, which they usually are in real life examples. That means that there is a boundary C>0 such that for each i we have |a_j|<C. So we absolutely need to use some mathematical knowledge about the function in order to get reasonable convergence.

In case of periodic functions like the trigonometric functions, we can normalize x to values within one „period“, but that will reduce x or x-x_0 only to a range of [-\pi, \pi). Using some common trigonometric formulas, we can actually reduce this to the range [0, \pi/2], which is still not good enough. In this case we have to use formulas like \sin(3x)=3\sin(x)-4\sin^3(x) and similar formulas for other trigonometric functions. These allow us to move to smaller values of x. For the exponential function, we have even easier ways. Let n be a natural number such that |\frac{x}{n}| < \frac{1}{2}. Then we let y=\frac{x}{n} and we can calculate z=e^y=\exp(y). Now we have exp(x)=e^x=e^{ny}=(e^y)^n=z^n and we just need to take the n-th power of the intermediate result. This can be calculated using algorithms like square and multiply or even some improvements over that.

In the end we will end up writing a lot of code for different cases which are optimized in different ways for some function. For example the power p(x,y)=x^y is a function in two parameters, that has quite a wild behavior and for writing an implementation that provides reasonable performance and precision we need to handle a lot of cases. Just look at the power function of the standard Java library, which is written in native C-code. Its beauty is not the conciseness, but having some understanding about what it takes to do this well you might eventually appreciate the given implementation, even if you not only use it, but also read it.

Now dealing with the precision is a delicate question, which again requires mathematics. As a general rule we usually need to use more precision for intermediate results. A good tool is to take the derivative or the partial derivatives in case of functions with multiple parameters to see how much changes in that parameter influence changes of the value. The Taylor theorem gives some definite, but possibly hard to apply answers. And it can also be useful to look at lower and upper bounds for the operations performed.

When writing such functions, unit tests are a big deal. Often they are not so hard to write, if we have inverse functions to rely on or if we can increase the precision and see that the lower precision is at least as precise as it claims to be. In some cases existing implementations for double can be used to check if the calculation is correct for smaller precisions.

Most of all it is important to think and use some mathematics or get help for this from somebody with appropriate knowledge.

Just to give you a hint: There are tons of transcendental functions that do not exist in standard libraries and that may be interesting to use. For some of them there are libraries. For some we still need to find libraries or write them.

Share Button

Unit Testing in a non-perfect World

Test Driven Development

We all know that how good test driven development is and that we should move in that direction.

How much coverage

There are some serious obstacles. Most of all, we have some obligation to actually finish software and the resources are usually kind of limited. If they were not limited by money and time constraints, they would hit the limit of efficient team sizes and organizational structures.

We can just look at a simple application that does „CRUD“ operations. Ideally we start with a known data set and reset the database to exactly this content before starting the tests, maybe even before each single test… If we have a huge and well managed server farm to run the test, maybe possible. For the „read“-methods we need to write some tests that succeed in reading, probably performing a few reads with a single read method to cover different outcomes of the successful read or different parameter combinations. Then there are unsuccessful reads that just do not find anything and return null or an empty result collection or even those that fail with an exception. It is of interest to check the maximum and minimum allowed values, if there are such limits. So we end up writing five to a few dozen test methods for a single read method. And this is the simple case. For delete and update we should create our own data in the beginning of the test. Probably there are dependencies and constraints in conjunction with other data, so it is necessary to cover these also. Create and update actually need a variety of at least two values for each of he most simple attributes of the created data object, to deal with not null. Usually we have more constraints on attributes, concerning lengths, value range and some kind of compatibility with other data. So there will be up to around ten tests for each attribute of the created or updated entity and we have successful and unsuccessful operations that we expect. So we will end up writing hundreds or even thousands of unit test methods just to obtain the most basic coverage for a relatively simple „CRUD“-application. Writing many similar tests is not so difficult and it would be interesting to explore ways to cut down on the repetitive work involved by using less verbose languages for writing the tests, creating them partially with scripts or simply writing very powerful helper methods in the test class that just get called with slightly different parameters to do all the tests. It will anyway be a lot of work, to write the tests. I think 60% of the time for the unit tests and 40% for the actual code is a reasonable number for a relatively fair coverage of most of the code.

In practice we should really prioritize our unit testing efforts, because spending 2.5 times as much time for the whole thing as for the code itself is simply not always possible. On the other hand, the time we save in the long run with good testing is even more than ha we spend, if we do the unit test development well.

But there are some aspects to think of:

  • Which parts of the application are fairly stable?
  • Which parts of the application are used and relied on heavily by other parts of the application?
  • Which parts of the application are used a lot by end users?
  • Which parts of the application are high risk because they have more inner complexity?
  • Which parts of the application actually showed errors? Fix the errors by writing a test to expose them first.
  • Which parts of the application are high risk in terms of reputation, money loss or data loss if they go wrong?
  • Which parts of the application are undergoing internal changes, while retaining the API?
  • Which parts of the application are migrated to another platform, OS, DB, architecture …, while retaining the API?

It is good to focus primarily on areas based on these questions and to do reduced testing for areas that are less critical.
The first question is quite delicate, because it exposes some contradiction we need to cope with. We should be agile, change the application easily when requirements are understood better or the architecture is understood better. But with tests, even this effort multiplies by 2.5 or whatever we have to update the unit tests. Or even worse, it leads to disabling the unit tests or to the loss of agility. In areas that change quickly it may be better to write the complete set of tests by the time they have become relatively stable.

Database

The next issue is the database. Typical organizations like to provide one DB instance and schema for the whole development team, because the database instances and schemata are seen as expensive resources. They are hard to maintain and for various reasons it is often difficult to install a local database on each of the developers machine. If it is Oracle, DB2 or MS-SOL-Server, some know-how is needed to install it and maybe even some constraints are there in terms of the OS. MariaDB and PostgreSQL seam to be somewhat easier to install, there are less license issues involved, but still even that is an effort. This can be overcome by virtualization. An image with the DB-setup can be developed once and than copied to each team member. There are interesting and good ways to do something like this. So it is becoming less of an issue, but still it is very unusual to have that. Now there are two ways out of this. One way is to use another DB product for development and production. This is somewhat dangerous, because databases are so different, that today’s common abstractions do not hide the differences and we also might pay a high price in terms of performance if we do not use DB-specific features. So it requires extra development effort to support both DB types. And it is very important to run tests against the DB that is used in production anyway on a regular basis. It may be helpful to move part of the tests to such a similar-but-not-equal local environment. The regular development DB is unfortunately often shared between many developers. Now if tests run simultaneously from different development machines against the same DB, they will usually inter and some tests will probably fail just because of that. Not all the time, but sporadically. It can be avoided, by some team organization and some kind of reservation of the DB, but that is painful, so we just run the tests and if they fail assume it is someone else testing at the same time causing the failure. It is possible to write the tests in such a way that they can withstand this, but this is a lot of extra effort, compared to the effort of using a virtual image with a working DB instance it is not justifiable.

So what we should aim for is a dedicated DB schema for each developer. Ideally it should be of the DB software product used for production. It can be locally, on some DB-server or as a virtual image.

Share Button

Modular Arithmetic

We have some articles in this blog about integers of typical programming languages and how they work. Time to introduce the underlying mathematical concepts, that have been covered implicitly until now, since they are also interesting in many other aspects. And besides, this is a very beautiful area of mathematics.

Mathematics that we learn in school is mostly inspired by what is needed for physics. This was quite a good choice 100 years ago, because it gave some motivation to why we do certain things and it was the area, where math was applied. Of course also chemistry and engineering, but these are somewhat similar aspects of mathematics as we use in physics. Now physics and chemistry make use of quite interesting areas of mathematics like group theory or non Euclidean geometry, but these are kind of advanced areas beyond what we typically learn in school. at least in the countries where I went to school. So it is about real numbers, some trigonometry, real analysis (calculus) and maybe complex numbers.

Since more than 50 years mathematics is heavily used in informatics as well, if we abstract informatics away from computers, even longer, because for example algorithms and cryptography have been used for several thousand years already, but that was a small niche and became main stream by the existence of computers. And for informatics and computer science we need different areas of mathematics. Analysis is not the so important, though not irrelevant. One area is information theory, which is based on probability theory and statistics. Numerical calculations have to a great extent remained a domain of mathematics itself, so this connection may be strong, but it is applied mathematicians using computers and using knowledge from IT to program them better, not the other way round. Still numerical analysis is somewhat important, but not really what most of us need very often.

The areas of mathematics that are really interesting for informatics are discrete mathematics, algebra and number theory. There is enough material about this on the web, but for now we will deal with modular arithmetic, which is kind of in the intersection of discrete mathematics, algebra and number theory.

We start with the integral numbers:

    \[{\Bbb Z} = \{\ldots,-3, -2, -1, 0, 1, 2, 3, 4,\ldots\}\]

Now we take any positive integral number m \in {\Bbb N} with m \ge 2.
We say that two integral numbers x and y are congruent modulo m:

    \[x \equiv y \pmod m\]

if and only if x-y can be divided by m. We might also say that there is a k \in {\Bbb Z} such that y = x + k m.
Now we can make interesting observations:
We assume, that we have pairs of numbers such that

    \[u \equiv v \pmod m\]

and

    \[x \equiv y \pmod m\]

Then we can observe that also

    \[u+x \equiv v+y \pmod m\]

    \[u-x \equiv v-y \pmod m\]

    \[u\cdot x \equiv v\cdot y \pmod m\]

This can be proven easily.
We assume as above

    \[y = x + k \cdot m\]

and similarly

    \[v = u + l \cdot m\]

Then we have

    \[y+v = x+u+(k+l)m \equiv x+u \pmod m\]

    \[y-v = x-u+(k-l)m \equiv x-u \pmod m\]

    \[yv = xu+kum + lvm + klm^2 \equiv x-u \pmod m\]

We call a set of all numbers of \Bbb Z that are congruent to each other a remainder class and write this as

    \[\bar x = x + m{\Bbb Z}\]

There are exactly m remainder classes modulo m and usually we use a representation system of

    \[0,1,\ldots m-1\]

or for even m we often use

    \[-\frac{m}{2}, -\frac{m}{2}+1,\ldots,-1,0,1,\dots,\frac{m}{2}-1\]

or for odd m we often use

    \[-\frac{m-1}{2}, -\frac{m-1}{2}+1,\ldots,-1,0,1,\dots,\frac{m-1}{2}\]

We observe these representation systems when we do division with remainder, written as % in many programming languages, but it is necessary to do some quick research on which representation system % uses and which one we want to use and possibly adjust the result. The corresponding division may not be /, but we can obtain it by subtracting our remainder from the dividend and dividing that, which should be an exact division.

Now we need to define a ring. A ring R is a set with operations + and \cdot such that the following rules apply:

  1. For any members x, y \in R we also have x+y \in R, x-y \in R and x\cdot y \in R. This is usually not mentioned, because it is part of how we define these operations in the first place in most mathematical texts.
  2. Addition is communicative: For any members x, y \in R we have x+y=y+x.
  3. Addition has a neutral element 0: For any member x \in R we have x+0=0+x=x.
  4. Addition has inverse elements: For any member x \in R we have a member x'\in R such that x+x'=x'+x=0. Usually we write -x for this inverse element of x and we write x-y instead of x+(-y).
  5. Addition is associative: For any members x, y, z \in R we have (x+y)+z=x+(y+z). We can omit the parentheses here and write x+y+z instead.
  6. Multiplication has a neutral element 1: For any member x \in R we have x\cdot 1=1\cdot x=x.
  7. Multiplication is associative: For any members x, y, z \in R we have (x\cdot y)\cdot z=x\cdot (y\cdot z). We can omit the parentheses here and write x\cdot y\cdot z or even xyz instead.
  8. Multiplication in conjunction with addition is distributive: For any members x, y, z \in R we have (x + y)\cdot z = x\cdot z + y\cdot z and z\cdot (x+y)=z\cdot x + z\cdot y.

If the multiplication is also communicative, we call it a commutative ring. If there is a multiplicative inverse for any element other than 0, we call it a skew field. And if both conditions hold, we call it a field.

Now we can see that \Bbb Z is actually a communicative ring.

And these remainder classes modulo m also form a ring. We call it {\Bbb Z}/m{\Bbb Z} or sometimes also {\Bbb Z}_m, but I do not use the second form, because it is ambiguous with something else (p-adic numbers). If m=p is a prime number, then {\Bbb Z}/p{\Bbb Z} is actually a field and in this case we may write {\Bbb F}_p instead of {\Bbb Z}/p{\Bbb Z}. Or GF(p) in some literature, if you prefer that. Why is it a field?

Now we have an extension of the Euclid algorithm to calculate the gcd of two numbers. This also yields numbers u and v such that g=\gcd(x,y)=ux+vy. So these numbers exist. For a prime number p and a remainder class \bar x \ne 0 we know that x is not a multiple of p and since p is prime we know that

    \[1=\gcd(x,p)=ux+vp\]

. This yields a multiplicative inverse for \bar x because

    \[u\cdot x \equiv 1 \pmod p\]

.

Now we often see m as a power of 2 and the modular arithmetic, at least +, -, *, is what is sold to us as integer arithmetic of Java, C or C#.

On the other hand it can be interesting and useful to use modular arithmetic for other values of m. Interesting are mostly prime numbers, which can be relatively small like 2, 3 or 5, but also really big. For non-primes we have null-factors, that is numbers x, y \not\equiv 0 \pmod m such that x\cdot y \equiv 0 \pmod m. This breaks some fundamental mathematical assumption for integers and fields, but is perfectly correct for this modular ring.

In our daily life modular arithmetic is actually quite common. We have the week days with m=7, the hours of the clock with m=12 or m=24, the minutes and seconds of the clock with m=60 and quite a bit of m=2, which we do not really see as modular arithmetic, but maybe as boolean arithmetic with + being the „exclusive or“, \cdot being the „and“ etc.

Share Button