How to get rid of these HTML-entities in Files

It has been written here that HTML-entities (these ä etc) should be avoided with the exception of those that we need due to the HTML-syntax like <, >, & and maybe " and  . They were already mostly obsolete more than 20 years ago, but in those days we still did not automatically use UTF-8 or UTF-16, but often an 8-bit character encoding that could express only up to 256 characters, in reality around 200 due to control characters. At least these 200 could be used. That was enough for web pages in those days and texts in German, French, Russian, Greek, Hebrew, Arabic and many other language could well be written, as long as only one language or a few similar languages were used. For the rare occasions that required some characters that were not in this character set, it was an option to rely on these HTML-entities. Or for typing HTML-pages on an US-keyboard without any good tool support.

But now Unicode has been around for more than 25 years and more than 90% of the web pages use UTF-8.

Now some people think that these HTML-entities are kind of necessary or at least „safer“ and I see people still writing HTML-code with them in these days. Or tools by relatively well known companies, that produced such output not so long ago… It is a good thing to have some courage and to change something like this to readable and natural format. Or more generally to try out if a simpler or better solution works. Reasonable courage is good for this, too much of something good can go bad, as so often…

So, please teach your collegues not to use these ugly HTML-entities, where UTF-8-characters are the better option.

And here is a perl script that converts the HTML-entities with the exceptions mentioned above to UTF-8. In the project conversion-utils some more such scripts might be added. The script is a bit too long to be pasted inline in a code block, so it is better to find the current version on github.

Then you can do something like this:

git commit
for file in *.html ; do
echo $file
mv $file ${file}~entities~
html2utf8 < ${file}~entities~ > $file
echo /$file
done
git diff

to convert all files in a directory. I assume that you are using Linux or at least have bash like for example in cygwin.
There are other tools to do the same thing, I am sure. Just use anything that works for you to get away from this unreadable crap.

Share Button

Devoxx UA and Devoxx BE 2019

In 2019 I visited Devoxx UA in Kiev and Devoxx BE in Antwerp.
Traveling was actually a little story by itself, so for now we can just assume that I magically was at the locations of DevoxxUA and DevoxxBE.

In Kiew I attended the following talks:

On Wednesday I attended the following talks in Antwerp:

On Thursday I attended the following talks in Antwerp:

On Friday I attended the following talks in Antwerp:

That’s it…
As always, a lot of these topics deserve an article in this blog. And a lot of video recordings from the conference are worth viewing.

Links

Share Button

Travelling to Devoxx UA and Devoxx BE 2019

Travelling to this years Devoxx conferences is worth its own short article, even though it is not very IT oriented material…

On the last weekend in October I brought a bicycle to a place near Frankfurt, let’s call it FrankfurtE. I took the train to Karlruhe and then back from FrankfurtE, but cycled from Karlsruhe to FrankfurtE.

On Wednesday evening 2019-11-30 I took a night train from Basel to Hannover and then a day train to some place in North-Rhine-Westphalia for private reasons, then in the evening a flight from Düsseldorf to Vienna and from there to Kiev. There was a problem with the plane, so we returned to Vienna. Austrian Airlines did a good job on rebooking, because they told us that they would find different alternative connections for all of us and send them to our phone, if possible. So there was no need to stand in a line for hours. Plus a decent hotel was booked which was next to the terminal. I had to translate all this information to Russian, because many of the passengers were only comfortable with Ukrainian and Russian languages and the guys in the airport only with German and English.

Then on the 2019-11-31 I flew back from Vienna to Frankfurt and then finally to Kiev, where I could visit the second half of the first day of Devoxx UA and the second day.

On Sunday 2019-11-03 I flew from Kiev to Frankfurt, picked my bicycle in FrankfurtE and took a train to Trier, where I stayed for the night.

On Monday 2019-11-04 I cycled to Marche-en-Famenne.

On Tuesday 2019-11-05 I cycled to Antwerp, where I participated Devoxx Belgium.

On Friday 2019-11-08 I cycled to Lanaken.



On Saturday 2019-11-09 to Bastogne.

And on Sunday 2019-11-10 to Luxemburg, from where I took the train home to Switzerland.

Just to give an idea, it is absolutely possible to use a bicycle as a means of transport on a business trip, but it has to make sense by not consuming more than one or two working days plus weekends, so it is basically necessary to go relatively long distances of 150 to 200 km on a full day and not spend more time than necessary on breaks. And most of the time it is the right choice to use the bigger highways, at least the biggest that are not forbidden to use, because beautiful quite scenic route usually are longer and would be too time consuming. It is not a vacation, but it remains a business trip. Then, on the other hand, this is not such a bad idea, because it really gives some time for thinking about some of the more interesting talks while cycling.

Share Button

Company „Skillsmatter“ stops operations

The company Skillsmatter in London has been put „under administration“ and basically stopped its operations. The web site seems to suggest, that everything is still ok, but that is not the case and I have heard so from several sources. The owner Wendy Devolder writes on Twitter and on Linkedin. Or here are some more news from cbronline or from theregister. The adminstrator is Resolve. They had put a deadline on 2019-11-05 for potential buyers and nothing indicates that such a buyer could be found.

There are some hopes expressed, that either 10’000 people will donate 250 GPB each or that someone buys the company and keeps it afloat. Reasonably it is probably not going to happen.

Now it is hard to obtain further reliable information. Have the employees already been layed off? Have all conferences been cancelled, for example Clojure Exchange (ClojureX) and Scala Exchange (ScalaX)?

The websites mention nothing about it, but simply the fact that there is nothing mentioned indicates that the employees, who could update the site, are gone and that the conferences will probably not take place. Otherwise I would expect an update on the site mentioning that it is taking place in spite of the situation. In case of Clojure Exchange I have been informed by other participants that Clojure Exchange has been canceled and that there will probably be a „community conference“ instead. Being a speaker, I volunteered to perform my talk on this community conference instead.

In case of Scala Exchange there was a strange story. A keynote speaker, John de Goes, was „uninvited“ because of „inclusiveness“. As a result, he decided to create a competing conference, Functional Scala, at exactly the same time as Scala Exchange and also in London. Some speakers have reportedly decided to speak at Functional Scala instead of Scala Exchange and speakers were encouraged to do so. In the end this might come out as a good thing, because Functional Scala will probably take place and might be an option for those who have already booked their visit to Scala Exchange.

So what does all of this mean? If we are heading for bankruptcy of Skillsmatter and if the conferences (Clojure X for sure, Scala X probably) are canceled, we as speakers or simply visitors are entitled to refund for our ticket or our non refundable travel expenses as speakers to the extent that Skillsmatter would have covered them. But reasonably there will not be enough money left for this. A company can go bankrupt and still have funds that is hard to access, but in practice banks will help out if these funds can be documented. So in reality bankruptcy usually means that there are many debts and little money already. Now the salaries of the employees get the highest priority. When they have been paid, other open payments can be covered, according to the rules that apply in the country. Possibly the price for the ticket, that has already been paid, is simply lost. Possibly travel expenses are lost if they cannot be redirected to another event.

If you like to Donate 250 GBP and 10’000 people do so too, the company could continue. I do not think that this is going to happen.

I will keep you informed if I learn more about the issue that is interesting to potential conference visitors and speakers of events organized by skillsmatter.

Update 2019-11-12: I got in contact with the administrators. They did not want to confirm or deny that the conferences scheduled in December would take place. They just do not know, but it seems to be depending on finding a buyer. If a magical buyer appears and decides to reactivate the events, they might take place. Meanwhile all web pages of skillsmatter show a text that the company is „under administration“, so I guess each day it is getting less likely that there will be anything a buyer can reactivate. I know for sure that at least some employees have already been asked to leave.

Now the good news: The replacements for Scala Exchange and Clojure Exchange are already in place, meaning a conference about the same programming language at the same date and also in London. So if you have booked your hotel and your trip to London already, you might want to check them out:

Update 2019-12-09:
Scala Exchange is not going to happen. See web page.
And since so much time has passed, it is becoming unlikely that a buyer turns up, so the company will be gone.

Update 2020-02-12:
The company found a buyer and will start working again. (see comment)

Share Button

Checked Exceptions in Java

In Java it is possible to declare a method with a „throws“-clause. For certain exceptions, that are not extending „RuntimeException“ or „Error“, this is actually required.

What looked like a good idea 25 years ago has proven to be a dead end. I do not know of any other major programming language that opts for declaring exceptions in this way. Slightly newer frameworks extend all their exceptions from RuntimeException, thus avoiding the need to declare them. Even in relatively early Java there was a weird way of working with exceptions in EJB, when it was required to write an interface and an implementation for the EJB. But it was strongly discouraged to let the implementation implement the interface, because it threw different exceptions. It was not the only weird thing about early EJB, of course. But without checked exceptions it would at least have been possible to let the implementation implement its interface.

We are now able to use Java 13 and as of Java 8 lambdas were introduced. With the introduction of lambdas the declared exceptions became especially painful and for this reason even Oracle has created twins for some essential exceptions that derive from RuntimeException, especially IOException.

We should face it: The throws clause has turned out to be a mistake and we should avoid this mistake by just using exceptions that do not have to be declared, at least in our APIs. It is not the only mistake, see Criticism of Java. Some of my other favorites are the lack of operator overloading for numeric types, the weird concept of Serializable and the lack of natively immutable collections and the lack of a convenient way to write some collections as code. But these issues are being worked on and we will eventually see some progress.

Links

Share Button

How to recover the Borrow Bit

In a similar way as the carry bit for addition it is possible to recover the borrow bit for substraction, just based on the highest bits of three numbers that we deal with during the operation.

With this program, a subtraction operation of an 8-bit CPU can be simulated exhaustively


#!/usr/bin/perl

my $x, $x, $bi;

my %tab = ();

for ($bi = 0; $bi <= 1; $bi++) {     for ($x = 0; $x < 256; $x++) {         for ($y = 0; $y < 256; $y++) {             my $zz = $x - $y - $bi;             my $b  = $zz < 0 ? 1 : 0;             my $c  = 1 - $b;             my $z = ($zz + 256) & 0xff;             my $xs = $x >> 7;
            my $ys = $y >> 7;
            my $zs = $z >> 7;
            my $key = "$xs:$ys:$zs";
            $tab{$key} //= $b;
            my $bb = $tab{$key};
            if ($bb != $b) {
                print "b=$b bb=$bb c=$c xs=$xs ys=$ys zs=$zs x=$x y=$y z=$z zz=$zz bi=$bi\n";
            }
        }
    }
}

for my $key (sort keys %tab) {
    $key =~ m/(\d+):(\d+):(\d+)/;
    $xs=$1;
    $ys=$2;
    $zs=$3;
    $b =$tab{$key};
    $c = 1 - $b;
    $bb = $xs & $ys & $zs | !$xs & ($ys | $zs);
    print "b=$b bb=$bb c=$c xs=$xs ys=$ys zs=$zs\n";
}

This gives an idea, what is happening. But in real life, probably a 64bit-CPU is used, but the concepts would work with longer or shorter CPU words the same way.

So we subtract two unsigned 64-bit integers x and y and an incoming borrow bit i\in\{0, 1\} to a result

    \[z\equiv x-y-i \mod 2^{64}\]

with

    \[0 \le z < 2^{64}\]

using the typical „long long“ of C. We assume that

    \[x=2^{63}x_h+x_l\]

where

    \[x_h \in \{0,1\}\]

and

    \[0 \le x_l < 2^{63}.\]

In the same way we assume y=2^{63}y_h + y_l and z=2^{63}z_h + z_l with the same kind of conditions for x_h, y_h, z_h or x_l, y_l, z_l, respectively.

Now we have

    \[-2^{63} \le  x_l-y_l-i \le 2^{63}-1\]

and we can see that

    \[x_l - y_l - i = z_l-2^{63}u\]

for some

    \[u\in \{0,1\}\]

.
And we have

    \[x-y-i = z-2^{64}b\]

where

    \[b\in\{0,1\}\]

is the borrow bit.
When combining we get

    \[2^{63}x_h - 2^{63}y_h + z_l - 2^{63}u = 2^{63}x_h + x_l -2^{63}y_h-x_l -i = x-y-i = z - 2^{64}b = 2^{63}z_h + z_l -2^{64}b\]

When looking just at the highest visible bit and the borrow bit, this boils down to

    \[z_h-2b = x_h - y_h - u\]

This leaves us with eight cases to observe for the combination of x_h, y_h and u:

x_hy_huz_hb
00000
00111
01011
01101
10010
10100
11000
11111

Or we can check all eight cases and find that we always have

    \[b = x_h \wedge y_h \wedge z_h \vee \neg x_h \wedge (y_h \vee z_h)\]

So the result does not depend on u anymore, allowing to calculate the borrow bit by temporarily casting x, y and z to (signed long long) and using their sign.
We can express this as „use y_h \wedge z_h if x_h=1 and use y_h \vee z_h if x_h = 0„.

The incoming borrow bit i does not change this, it just allows for x_l - y_l - d \ge -2^{64}, which is sufficient for making the previous calculations work.

The basic operations add, adc, sub, sbb, mul, xdiv (div is not available) have been implemented in this library for C. Feel free to use it according to the license (GPL). Addition and subtraction could be implemented in a similar way in Java, with the weirdness of declaring signed longs and using them as unsigned. For multiplication and division, native code would be needed, because Java lacks 128bit-integers. So the C-implementation is cleaner.

Share Button

Exceptions to implement Program Logic

Sometimes it is conveniant to use exceptions for implementing the regular program logic.

Assuming we want to find some data and then process them. When no data is found, this step can be skipped, because there is nothing to do. So we could program something like this:


public Data readData(Param param) {
    Data data = db.read(param);
    if (data.isEmpty()) {
        throw new NotFoundException("nothing found");
    }
    return data;
}

public ProcessedData doWork(Param param) {
    try {
        Data input = readData(param);
        ....
        ....
        ....
        ProcessedData result = ....
        return result;
    } catch (NotFoundException nfex) {
        return ProcessedData.empty();
    }
}

And some other exceptions could also be handled in a similar way.

Of course some people say, that this is not good and an abuse of exceptions. But sometimes it is tempting.

So is this bad? And if so, why? Let’s find out.

This is some kind of weird obfuscation of the control flow, because throwing and catching of exceptions can be far apart and it can become quite unclear, from where in the stack which exceptions can be thrown. So there are good reasons to recommend using exceptions only for what they are meant for by their name. The Goto has never made it into Java and we are discouraged from using it in many other languages, like C. But languages like Java, C, Perl, Ruby and some others provide quite rich control flow relying neither on goto nor on exceptions by using „return“ anywhere in a function or method or subroutine, leaving loops with „break“ or „last“ or going to the next iteration with „next“ or „continue“. Perl and Java even allow to specify which of nested loops to leave with break or last. These mechanisms are very powerful and there is no urgent need to add exceptions or even gotos just to support the control flow.

Once moving to newer languages like Scala much of this is gone or at least strongly discouraged in a purely functional programming style. This makes programming Scala harder, and comes with benefits that might be worth the extra effort.

But in Java these functional purists have not become very strong yet, so using „break“, „continue“, „return“ etc. is still ok and quite powerful.

In Java there is another very major problem with exceptions. Many, if not most Java programs run in a framework or container like Spring, EJB/JEE, JBoss Fuse, for example. Now a piece of software becomes a software component, that can interact with other components through the framework. And exceptions are noticed by the framework. In many cases they have the effect that an ongoing transaction is marked as „rollback only“. So the whole processing continues normally, and when all the code from the components is finally done, the framework performs a rollback instead of a commit.

As long as exceptions are only used for handling errors or unsual situations, in which cases the rollback is probably the way to go anyway, everything is fine. But if we for example look up something and base the further processing on the outcome of this, then a NotFoundException will result in very counter intuitive behavior.

So the original rule of not abusing exceptions is actually not such a bad idea.

Share Button

How to use $ in Articles using WP QuickLaTeX

I use WP QuickLaTeX by Pavel Holoborodko in some of my articles to include mathematical formulas.
Now it can be an issue that the „$“-sign, that marks the beginning of a formula, is used as dollar sign.

This can be achieved by using [latexpage] in the beginning of the page. Sometimes it is desirable, to apply this only to part of the page. This is achieved using [latexregion].

The following

$ this is a dollar sign
[latexregion]
And this is a full line formular
$$a^2+b^2=c^2$$
And this is an inline formula $a^2+b^2=c^2$. With more stuff...
[/latexregion]
And here dollar signs $$$$ are dollar signs again.

results in:

$ this is a dollar sign

And this is a full line formular

    \[a^2+b^2=c^2\]

And this is an inline formula a^2+b^2=c^2. With more stuff…

And here dollar signs $$$$ are dollar signs again.

And yes, to show the

[latexregion]
...
[/latexregion]

above, I had to actually type

&#91;latexregion&#93;
...
&#91;/latexregion&#93;

Let’s stop the recursion here…

Links

Share Button

Borrow and Carry Bit for Subtraction

Similar to the usage of the carry bit when adding there are mechanisms for subtracting that allow to integrate the result of subtraction of the lower bits into the subtraction of the next higher block of bits, where necessary.

There are two ways to do this, that are trivially equivalent by a simple not operation:

  • borrow bit (also borrow flag)
  • carry bit (also carry flag)

Often CPUs use what is the carry bit for addition and interpret it as borrow bit for subtraction.

Here are the two ways:

Borrow Bit

It is assumed, that the CPU-word is n bits long, so calculations are performed modulo N=2^n. Further it is assumed, that the range 0\ldots N-1 is preferred, so all sign issues are excluded for the moment.

When subtracting two numbers x, y from each other like x-y and y > x, the provided result is

    \[x-y+N \equiv x-y \mod N\]

and the borrow bit is set (b=1), to indicate that the subtraction caused „underflow“, which had to be corrected by added N in order to get into the desired range.

In the „normal“ case where y \le x, the provided result is simply

    \[x-y\]

and the borrow bit is not set (b=0).

The next round of subtraction takes the borrow bit into account and calculates x-y-b, where the condition becomes y+b > x and the result is

    \[x-y-b+N \equiv x-y \mod N\]

or

    \[x-y-b\]

respectively. This is how some of the older readers used to do it in school on paper, but of course with N=10.

Now the typical integer arithmetic of current CPUs uses Two’s complement, which means that -y=(\mathrm{NOT}\; y)+1. Combining this with the previous results in calculating

    \[x-y = x + (\mathrm{NOT}\; y) + 1 - b \mod N\]

At this point some CPU-designers found it more natural, to use the carry bit c=1-b instead of the borrow bit b.

Carry Bit

When subtracting two numbers x, y from each other like x-y and we have y > x, the provided result is

    \[x-y+N \equiv x-y \mod N\]

and the carry bit is not set (c=0), to indicate that the subtraction caused „underflow“, which had to be corrected by added N in order to get into the desired range.

In the „normal“ case where y \le x, the provided result is simply

    \[x-y\]

and the carry bit is set (c=1).

The next round of subtraction takes the borrow bit into account and calculates x-y-1+c, where the condition becomes y+1-c > x and the result is

    \[x-y-1+c+N \equiv x-y \mod N\]

or

    \[x-y-1+c\]

respectively.

Now two’s complement with -y=(\mathrm{NOT}\; y)+1 this can be written as

    \[x-y = x + (\mathrm{NOT}\; y) + 1 - b \mod N\]

or with c=1-b

    \[x-y = x + (\mathrm{NOT}\; y) + c \mod N\]

These two ways are really equivalent and easily transformed into each other. Neither of them provides big advantages, apart from the fact that we have the unnecessary confusion, because it depends on the CPU design, which of the two variants is used.

Recovery of Borrow or Carry bit

The borrow bit is calculated and used during subtractions that are done at assembly language level, but higher languages like C or Java do provide access to this information. It is relatively easy to recover the carry bit in the case of addition based on x, y and x+y \mod N.

This possible as well for the subtraction. Quite easily the comparison between x and y or y+b could be done before the subtraction. This would work, but it is kind of inefficient, because under the hood the comparison is just a subtraction with discarding the result and keeping the flags. So the subtraction is performed twice. This might not be such a bad idea, because a compiler could recognize it or the work of subtracting twice could be neglectable compared to the logic for an „optimized“ recovery, based on some logic expression of certain bits from x, y and x-y \mod N.

Links

Share Button

Poor mans profiling with LOGs

We have professional profiling tools and we should use them.. They give really useful and extensive information.

So why bother about doing „poor man’s profiling“?

About ten years ago, running profilers was kind of constrained to very small examples and computers with really huge memory. We have this huge memory now on every development machine. But we can only run such tests on development and test systems.

Sometimes it is possible to copy data from the productive system to the test system and really simulate what is happening on the productive system. Often this is quite far away from what is really happening on the productive system and we often only know about a problem when it has already occurred.

What we usually do have are log files. Each entry has a time stamp and can somehow be connected to a line of code where it is created. We always know the class or the file or something like that from where the log statement was issued and since there is only a limited number of log statements, they can often be recognized. Sometimes it is a good idea to actually help finding where it was logged. For example could we artificially put the source code line number or some uniq number or a short random string into the source code for each log statement. Then we can find several pieces of information in the log statement:

  • The timestamp (please use ISO-format!!! And at least milliseconds)
  • The source code location (line number, class name, random string, relative position in file, meaningful description like entering/leaving methods etc.)
  • The thread and/or process
  • The payload information of the log

Now log statements analyzed by thread. The first step is to look from which pairs of locations in the source code we observe successive log statements from the same thread, ignoring all logging from other threads and processes. We can also estimate the time that was spent for going from the first to the second location. Aggregating this by basically adding up thes timestamp deltas for each such pair over all threads and the whole log file will already indicate some hotspots where time is spent. Sometimes it is even interesting to look at sequences of three log statements (from the same thread). This was useful, when DB-commits, i.e. one such pair like „begin commit“ to „commit done“ used around 50% of the time, if not more. This was not interesting without knowing the statement that was committed, which was found out by looking at the previous log entry of the same thread just before the commit.

Such analysis can be done with scripts for example in Perl, Ruby or Python. I use Perl for this kind of task. Maybe log analysis tools provide this out of the box or with a little bit of tweaking. A concept about what entries in log files look like that make recognizing the location of the program that created the log entry, are very helpful. Sometimes we actually want to see, with which kins of data the problem occurred, which means looking also into the payload.

Share Button