Article 13

The European Parliament is considering to pass an article 13 that would interfer with the internet usage and freedom of speech in the internet under the pretence of enforcing copyright violations. It is important to resist.

Links

Share Button

Flashsort in Scala

There is now also an implementation of Flashsort in Scala.

In order to solve the requirement of sorting part of an array that is needed as part of flashsort, an heapsort implementation in Scala that can be constrained to a part of an array has been included as well. Heapsort was chosen, because it can sort in place and it has a guaranteed performance of O(n \log(n)). Mergesort or quicksort would have been reasonable choices as well. Some implentations even use insertion sort for this step, because the sections are small.

Links

Share Button

Flashsort in Ruby

Deutsch

There is a simple implementation of Flashsort in Ruby, after having already provided an implementation in C. The C-implementation is typically faster than the libc-function qsort, but this depends always on the data and on how well the metric-function has been written, that is needed on top of the comparison function for Flashsort. You can think of this metric function as some kind of monotonic hash function. So we have

    \[\bigwedge_{a,b: a\le b} m(a) \le m(b) \]

This additionally needed function of method is not really there, apart from numerical values, so we really have to invest some time into writing it. This makes the use of Flashsort a bit harder. A good metric function is crucial for good performance, but for typical text files quite trivial implentations already outperform classical O(n \log n) algorithms like Heapsort and Quicksort and Mergesort for larger amounts of data.

This blog article shows other sorting algorithms for Ruby.

Share Button

Not all projects are on ideal paths II (Tim Finnerty)

This is another story of a project, that did not go as well as it could have gone while I was there. From unsuccessful projects we can learn a lot, so there will be stories like this once in a while. The first one was about Tom Rocket.

Tim Finnerty

It was the time, when all the cool companies tried to introduce Java. And some of the new Java projects failed, causing the companies to go back to C, which again scared other companies from doing this step. But some companies did not get scared by this. They embraced the new Java-fashion at a time when it still was not clear whether or not this was a good idea. What could possibly go wrong?

Well, in those days the experienced guys did not want to move to Java. It was slow, it was unreliable, not mature,… Maybe for Applets, maybe more generally for GUIs to get rid of VisualBasic, but Java on the server was considered a bad joke. For the server real people used real C, of course on Unix or maybe Linux, which was not really such a bad idea in those days. But there were the young people. Or the ones who had stayed young. They often just had finished their education and firmly believed that by just using technology „xyz“ everything would become great. xyz can be a development method (spiral model in those days, agile today), an architecture (microservices), a paragdigm („OO“, „functional“), a framework („enterprise edition“), a tool or a programming language (yeah, Java).. Often this first enthusiasm ends in a disaster: The money has been spent, the developers are leaving and the software is further away from being useful than anybody would like to admit. In lucky cases there is still some money left to do it right. Maybe even to do it right in Java.

That is were we are coming in. I do not really know the earlier history, but according to the management it was a total disaster. Now Tim Finnerty (real name known to the author) was the new technical lead, team architect or whatever this role is called. He embraced the new technology, but promised to not overstrech it. A bit of old school. Sounds good, because it is exactly what managers want to hear. No more risky experiements, but this time it needs to become a success.

So Tim Finnerty defined, how we had to work. He knew Java, he knew databases and especially Oracle, he knew the web, he even knew Perl. And he knew OO. Better than anybody else, so we did the real thing. Great head start. And everybody had to program according to Tim’s rules.

Of course we were using Java enterprise edition. That meant, that we were programming against some Weblogic application server, that was hard to install, hard to run, required a few minutes of startup time for each minor change of the software that we were writing and forced to a very archaic and primitive programming model. But that was cool and it was the future, which unfortunately proved to be true. Even though it has at least become usable by now. So far nothing to blame on Tim, because it is kind of the stack that everybody used.

Now to the OO and the database. Each database table represented a „Business Object“, with a name like XyzBO. So most of the time, we wrote a class XyzBO plus a few more to fulfill the greed and need for boilerplate code of the old J2EE-world. XyzBO was a enterprise java bean. A stateless session bean, to be accurate. Which meant, that we wrote methods of this EJB, which were basically procedures of the pre-OO-world. But within we could of course use the whole OO-toolset. Which we did. So the class to represent any data from the database was actually called Data. It was a minor subset of the standard Perl data structures, which meant that Data was a list of hash maps, which could behave just like a hash map if it had only one entry. Database queries returned Data, or of course null, if nothing was found. Nobody would ever want to use an empty collection. Pretty much the opposite of what we are now doing the hard way by introducing Optional or Option to avoid the null. But it was easy to just write
if (data == null) { data = new Data(); }
at least for the ones who new this trick.
So data resembled the content of the database or of the query result, with the column names as keys and the values as objects. When working with these, it was really easy. Just know the attribute name accurately. Get the value from data. See if it is null. If not, cast it to the real type, and voila….

The database was designed according to Tim’s advice, he had to review every table. It was mandatory to have as unique key and as first column a string of around 700 characters, which was called HANDLE. Each table had a business primary key, which was always consisting of several columns. Because the system allowed multiple instances of the software to run on the same database, there was always one column called „SITE“ in this logical primary key. But there were no primary keys defined in the database. The unique HANDLE was enough. It contained the name of the BO, like XyzBO, followed by a colon and followed by the concatenation of all the logical primary keys, separated with commas. The date had to be converted to a string using a local US format, not ISO, of course. All foreign keys were defined using HANDLE. In the end more than half of the DB space was wasted for this stupidity. But each single handle value started with an XyzBO:, to remind us that we were programming OO.

And now booleans. It was forbidden to use the boolean type of Java. All booleans were strings containing the words „true“ and „false“. This went like that all the way to the web interface.

At that time web frameworks did not yet exist or were at least unknown. So the way to go was to write JSPs, which contained kind of dynamic web pages and to write servlets to control the flow and access the EJBs. Now according to Tim it was too hard to learn servlets as well, so it was forbidden to use them and instead a connection-JSP had to be written, which did not display anything, but only contained a <% in the beginning and a %> in the end and the code between.

A lot of small and larger stupidities, that were forced on the team. Most people were new to Java and to OO and did not even realize that there was anything wrong, apart from the fact, that it was kind of hard to get stuff done. Some stupidities were due to the fact that the early J2EE really sucked, but mostly it was Tim, who forced everyone to his level. This story happened a long time ago. Tim has already retired. I would say he is one of the guys, who retire as a Junior.

There is (almost) nothing wrong with stupidity. And there is (almost) nothing with arrogance. But the combination really sucks. Especially if it it taken serious by the manager or has to be taken serious by the team.

Share Button

Google+ will be shut down

Google is terminating its Google+ service, at least the consumer version.
For this reason I have removed the „share on google+ buttons“.

Share Button

WordPress-Plugin destabilizes WordPress Installation – How to Fix

I had a plugin installed in WordPress. And had it activated. Then I got an error message like „Fatal error: Undefined class constant ‚plugin_version‘ in /www/htdocs/w00fb338/wp-content/plugins/share-on-diaspora/admin.php on line 17“, whenever I went to any admin or authoring functionality of my blog. Just reading it seemed to work fine. So there was no way to go to the plugins section of my dashboard to activate or uninstall that plugin, because there was not way to get there.

Since this blog is hosted on some typical web hosting service, there is no ssh access, which would be helpful, because there are command line interfaces to manage a wordpress installation. What remained was ftps to access the file system. Then the plugins could be found in the directory
wp-content/plugins
and the plugin creating problems could be deleted using file operations.

I think we agree that this is kind of a hack, but it worked. The wordpress installation detected that the plugin was missing and deactivated it. Then it was usable again. I assume that this applies for any plugin. It should of course be used with care.

Share Button

LongDecimal

Disclaimer: This article is an occasion, where you might need some of the presumably useless mathematics that you might have learned in school and university. If this bothers you, maybe you should wait for the next article in about two weeks time.

LongDecimal is a library that I have provided for Ruby. It is available as a ruby gem. It was originally intended to provide something like BigDecimal for Java. There is a BigDecimal, but it is not really the same. For writing finance applications, such a class is useful, so I wrote one that covers what Java’s BigDecimal has. It ended up by having a lot more, but we will get to that later.

So the general idea is that we do math with a subset of the rational numbers (\mathbb{Q}) \mathbb{D} = \{ \frac{x}{10^n} : x \in \mathbb{Z} \wedge n \in \mathbb{N}_0\}. This is not quite the truth, because the n actually carries information that we care about, so we would actually define

    \[\mathbb{D} = \{ (\frac{x}{10^n}, n) : x \in \mathbb{Z} \wedge n \in \mathbb{N}_0\}.\]

So we actually want to allow the numerator x to be a multiple of 10 and we use this to express the precision as to how many digits after the decimal point are explicitely part of our number. Having more decimal places after the decimal point expresses more precision.

Now we try to use mathematical operations +, - and \cdot on \mathbb{D}. It turns out that we have three different cases. The ring operations can be defined without problems, even though \mathbb{D} is not quite a ring, as we will see. But it is good enough for most purposes.

  • (\frac{x}{10^n}, n) + (\frac{y}{10^m}, m) = (\frac{x}{10^n} + \frac{y}{10^m}, \max(n, m))
  • (\frac{x}{10^n}, n) - (\frac{y}{10^m}, m) = (\frac{x}{10^n} - \frac{y}{10^m}, \max(n, m))
  • (\frac{x}{10^n}, n) \cdot (\frac{y}{10^m}, m) = (\frac{xy}{10^{n+m}}, m+n)

Addition and Subtraction actually lose information if n\ne m, because we might have an input with lower precision and in the end pretend to have a result of the higher precision. But not losing numerical information is considered more important and implicit rounding should be avoided at all costs, at least for the basic operations.

\mathbb{D} is not a ring, but it is a Semiring. The zero is not universally unique, but we seem to have many zeros (0, n). This is not the problem, because only (0, 0) would act as an additive neutral element. What we lack are additive inverse elements. If we have an element (x, n) with n>0, there is no element (y, m), such that (x,n)+(y,m)=(x+y, \max(n,m) = (0, 0). The distributivity, required for a semiring, can be seen easily:

  • ((\frac{x}{10^n}, n) + (\frac{y}{10^m}, m))\cdot (\frac{z}{10^l}, l) = ((\frac{x}{10^n}+\frac{y}{10^m})\cdot\frac{z}{10^l}, l+\max(m,n)
  • (\frac{x}{10^n}, n)\cdot (\frac{z}{10^l}, l) + (\frac{y}{10^m}, m)\cdot (\frac{z}{10^l}, l) = (\frac{x}{10^n}\cdot\frac{z}{10^l}+\frac{y}{10^m}\cdot\frac{z}{10^l}, \max(l+m,l+n)

But since we do computer programming and not math and only use math as a tool to help us, it is kind of OK, that it is only a semiring and not a ring, as long as we know it.

Division is a special case, because it is not always possible to express the exact numerical value of the quotient in \mathbb{D}, for example 3.0/7.0 = \frac{3}{7}, where the denominator is not a power of ten. To do such operations, a rule on how to round needs to be provided. This is cumbersome, because it blows up our formulas, so we define a set \mathbb{E}=\{(r, n) : r \in\mathbb{Q} \wedge n \in\mathbb{N}_0\}. Now the quotient of two elements of \mathbb{D} is a member of \mathbb{E}. And we have the rules

  • (\frac{x}{10^n}, n) / (\frac{y}{10^m}, m) = (\frac{x}{10^n} / \frac{y}{10^m}, p(n, m))
  • (r, n) + (s, m) = (r+s, \max(n, m))
  • (r, n) - (s, m) = (r-s, \max(n, m))
  • (r, n) \cdot (s, m) = (rs, n+m)
  • (r, n) / (s, m) = (\frac{r}{s}, q(n,m))

where p and q somehow try to estimate how precise the result of the division might be. The basic idea is to do the whole calculation that includes the division and round the result to the desired number of decimal places after the point and with the rounding mode desired.

Now the power is a hard one. Arbitrary powers can of course be defined and are supported, but most of the time, the exponent is actually an integer. These cases can be defined nicely. For exponents m\ge 0 we actually get a result in \mathbb{D} and for negative exponents m < 0 we get results in \mathbb{E}:

  • \bigwedge_{n\ge 0}:(\frac{x}{10^n}, n) ^m = \frac{x^m}{10^{mn}}, mn)
  • \bigwedge_{n < 0}:(\frac{x}{10^n}, n) ^m = \frac{x^m}{10^{mn}}, mn)

For non-integral exponents, the calculation of powers falls back to Ruby’s built in power and transforms elements of {\mathbb{D} and \mathbb{E} involved into rational numbers. These are of limited use, but they are provided and work and can be used, when needed. There is a more general power function, that has additional parameters for the desired rounding and number of digits after the decimal point. While this library goes long ways to achieve decent accuracy and speed, there are certainly possible input parameters that will result in extremely long calculation times or results that are much less accurate than claimed. Such examples are „hard“ to find and should not harm the practical usefulness of the library too much. Similar libraries in the Java world like BigDecimal do not even try to calculate powers with arbitrary exponents and the Ruby builtin library BigDecimal (which is something slightly different) does have its issues when calculating arbitrary powers.

Rounding functions are there to convert a numerical type that is at least viewable as a subset of \mathbb{R} to \mathbb{D}. The actual rounding has to be implemented, but it has been done for \mathbb{D}, \mathbb{E} and the built in types of Ruby except for Complex (\mathbb{C}). For complex numbers, the real and the imaginary part are rounded and stuffed into a new complex number.

Rounding needs two pieces of information, the desired precision (number of decimal places after the decimal point) and the rounding mode. There are different methods for rounding, but they all follow the same basic rules. A special case is the round_to_allowed_remainders, which does a residue class rounding.

There are many rounding modes. Rounding can be towards 0, away from 0, towards infinity or towards negative infinity. This boils down to cutting off all digits but n (or adding zeros) and possibly adjusting the result by one, if the cut off part contained anything but zeros. Other rounding modes take a mean between the two adjacent result candidates and decide by that which one to take, requiring an extra rule for the case that the value that needs to be rounded happens to be exactly on the border.

Generalized powers and all functions that return something irrational like square roots, cubic roots, exponential functions, logarithms and in the future also trigonometric functions needs to be calculated with the number of digits required and a rounding mode. Currently square roots (sqrt) and cube roots (cbrt) are calculated accurately according to these rounding parameters. For the transcendential functions (logarithms, exponential functions, power, trigonometric functions) minor deviations from the mathematically accurate result are still possible. Since the major usage of the library is expected to deal with the basic operations only, this is considered acceptable. To really work with the transcendental functions, using interval arithmetic in conjunction with long decimal would anyway be a better way, so the necessary guarantee to be given would be to provide a result that is close, but guaranteed to be lower or equal than the real mathematical result and one that is guaranteed to be greater or equal. Progress in this area is not going to happen very soon, unless someone would be volunteering to help with this or someone would be volunteering to sponsor the development.

Also it might be interesting to port this library to other languages, even to Java, because it has become much more sophisticated than Java’s BigDecimal library. Again this is unlikely to happen too soon without any help.

The current priority is to keep this library working with recent Ruby versions and to add the missing trigonometric functions.

Use it as follows:
gem install long-decimal
to install it. Then use it in your code with:
require "long-decimal"

A remark for people who are mathematically inclined: The definition of the natural numbers \mathbb{N} is not totally universal. Sometimes we have \mathbb{N} = \{0, 1, 2, 3, 4,\ldots\} and sometimes we have \mathbb{N} = \{1, 2, 3, 4,\ldots\}. To avoid this, I am using \mathbb{N}_0 = \{0, 1, 2, 3, 4,\ldots\}, even though the index _0 is kind of ugly. I agree with Dijkstra that we should prefer to include the 0 in the natural numbers.
Another remark for mathematically interested readers: If we were defining \mathbb{D}=\{ \frac{x}{10^n} : x \in \mathbb{Z} \wedge n \in \mathbb{N}_0\}, we would actually have a ring. If we now replaced 10 with a prime number p, we would approach the realm of p-adic numbers (\mathbb{Q}_p). This is well worth supporting by a library as well, but it is quite a different story and of course only of interest to a small group who actually knows p-adic numbers and works with them.

Links

This blog:

Share Button