Karl Brodowsky's IT-Blog

ScalaUA 2019

In March 2019 I have visited ScalaUA in Kiev.

It was interesting. I attended the following talks:

Thinking Functionally by John A. De Goes
Streaming data processing with Apache Spark by Roksolana Diachuk
Theory of Constraints in Programming: Beyond Problem-Solving by Anatolii Kmetiuk
Video Q&A session with Martin Odersky by Martin Odersky and Jon Pretty
Integrating Developer Experiences — Build Server Protocol and beyond by Justin Kaeser
Using monads to enforce programming style your Boss likes by Marcin Rzeźnicki
Scheduling and retrying effects with cats by Miklós Martin
Love Affair Between Functional & Declarative Programming by Michał Kowalczykiewicz
Know your Data as speaker
Big Step To Functional Programming by Alex Zvolinskiy
From Akka to Labelled Transition Systems — semi open questions by Ruslan Shevchenko
Interesting Scala Collections by Alexander Nemish
Why Computers calculate wrong as speaker
Introduction to Tensorflow in Scala by Xavier Tordoir
10 things I wish I’d known before using Spark in production by Himanshu Arora and Nityanand Yadav
Tour of ZIO by Oleksandra Holubitska
Building recommender system with matrix factorization by George Yarish
On denotational and implied semantics by Marco Borst
Adding Cross Multilingual Support to Wix microservices by Noam Almog
Scala Schemas with Shapeless by Kévin Rauscher
How we build language model at Grammarly by Vitalii Kotliarenko
Cellular Automata: How to become an artist with a few lines of code by Maciej Gorywoda
Deriving a Functor in Dotty by Yuriy Vishnevskyi
Tower-street attack simulator by Pavel Bastecky

50 years of internet

According to some the RFC1, which was published on 1969-04-07, was the start of the internet. Many RFCs have since this time been published and they describe the standards of the internet.

The early internet did contain functionality to communicate between computers, but it did of course not include http, html and the WWW, which were introduced much later in the early nineties.

www.it-sky-consulting.com now https only

I have converted my company site www.it-sky-consulting.com to always use https.

This is something all sites should do in the next few months.

Weird blackmailing via email from „Hacker“

I got a few emails, that looked like this (see at the button).

I replaced all references to myself with xxxx. The source of the email indicates, that a mailserver „nmail.brlp.in“ has been used for this.

The fact, that the email seems to come from my own mail address is not a proof that this guy hacked into my system. On more low level email software it is quite easy to set header fields to any valid value, this includes the from-part of the email.

So, if you get such emails, what you can do: report it to the police. This person or organization is criminal and stealing some money from people who do not understand well enough what is happening here. Maybe they can track down the criminal by international cooperation, maybe not. I uploaded one of these emails to the Swiss federal police, who have a form for such uploads. They gave a polite advice, basicly asking me not to pay.

And that is important: PLEASE DO NOT PAY. The „person“ or „script“ is just pretending to have access to my system. Even what he claims to have observed is not true, but the headers of the email also give him away as using some mail server and changing the From-line.

I included the whole text, so it is possible to search for it.

Hi, this account is hacked! Modify the password right away!
You might not know anything about me and you obviously are probably wondering why you are receiving this letter, right?
I’mhacker who openedyour emailand OSa few months ago.
Do not waste your time and try out to talk to me or find me, it is definitely hopeless, because I directed you a letter from YOUR own hacked account.
I’ve created special program on the adult videos (porn) website and suppose you spent time on this site to have a good time (you know what I want to say).
During you have been taking a look at videos, your internet browser began to act like a RDP (Remote Control) with a keylogger which gave me the ability to access your monitor and web camera.
Consequently, my softwareaquiredall information.
You wrote passwords on the sites you visited, and I intercepted all of them.
Surely, you’ll be able to modify them, or have already modified them.
Even so it does not matter, my malware renews needed data every time.
What did I do?
I compiled a backup of your system. Of all files and contacts.
I got a dual-screen video recording. The 1 screen presents the clip you had been watching (you have a very good preferences, ha-ha…), and the second screen presents the recording from your own web camera.
What actually do you have to do?
Great, in my view, 1000 USD is a inexpensive amount of money for this little riddle. You will make your payment by bitcoins (in case you don’t understand this, go searching “how to buy bitcoin” in Google).
My bitcoin wallet address:
1ChU6CTsKhRgz761eaEraDRKYRKp6HWtrA
(It is cAsE sensitive, so copy and paste it).
Important:
You have 48 hours in order to make the payment. (I put an exclusive pixel to this message, and at the moment I know that you’ve read this email).
To monitorthe reading of a letterand the actionswithin it, I usea Facebook pixel. Thanks to them. (Everything thatcan be usedfor the authorities may also helpus.)

If I do not get bitcoins, I’ll undoubtedly transfer your recording to each of your contacts, such as family members, co-workers, etc?

The source of the EMail looked like this (shortened a bit):

Return-Path:
Received: from xxxxxxxx.xxxxxxxx.com ([xx.xx.xx.xx]) by mx-ha.gmx.net
(mxgmx017 [212.227.15.9]) with ESMTPS (Nemesis) id 1MeSc2-1hZOnl0zR6-00aZJW
for ; Tue, 05 Mar 2019 14:49:21 +0100
X-Greylist: delayed 440 seconds by postgrey-1.34 at dd29014; Tue, 05 Mar 2019 14:49:18 CET
X-policyd-weight: using cached result; rate: -6.1
Received: from nmail.brlp.in (nmail.brlp.in [1.6.36.80])
by xxxxxxxx.xxxxxxxx.com (Postfix) with ESMTPS id DDCCD63C255E
for ; Tue, 5 Mar 2019 14:49:18 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
by nmail.brlp.in (Postfix) with ESMTP id D49CD45242ED
for ; Tue, 5 Mar 2019 19:11:55 +0530 (IST)
Received: from nmail.brlp.in ([127.0.0.1])
by localhost (nmail.brlp.in [127.0.0.1]) (amavisd-new, port 10032)
with ESMTP id yaoBiyeSpTXg for ;
Tue, 5 Mar 2019 19:11:55 +0530 (IST)
Received: from localhost (localhost [127.0.0.1])
by nmail.brlp.in (Postfix) with ESMTP id 11F0F452430F
for ; Tue, 5 Mar 2019 19:11:55 +0530 (IST)
X-Virus-Scanned: amavisd-new at brlp.in
Received: from nmail.brlp.in ([127.0.0.1])
by localhost (nmail.brlp.in [127.0.0.1]) (amavisd-new, port 10026)
with ESMTP id ZRHfjiakcy7Q for ;
Tue, 5 Mar 2019 19:11:54 +0530 (IST)
Received: from [216.subnet110-136-205.speedy.telkom.net.id] (unknown [110.136.205.216])
by nmail.brlp.in (Postfix) with ESMTPSA id D2C1345242C8
for ; Tue, 5 Mar 2019 19:11:53 +0530 (IST)
Subject: xxxxxxxxxx
To: xxxxx@xxxxx.com
List-Subscribe:
X-aid: 6812375433
Date: Tue, 5 Mar 2019 14:41:53 +0100
X-Complaints-To: abuse@mailer.brlp.in
Organization: Rprgtkvvr
Message-ID:
List-ID:
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset=UTF-8
From:
Envelope-To:
X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3;
X-Spam-Flag: NO
X-UI-Filterresults: notjunk:1;V03:K0:QH4Z6L3Srwk=:mzSkXH/rOihoavgPXEhMTWJI56
cKYIahCC4FgRRlHBaVws8990Br6YfEZzEIxbqryIMgtwJsN7FDjKIus+cj7uG9Tga9YXqgqay
E1J7ynKQeIqbcWraD91IZITqhvS/rlWR5NE+dn4j3hJbRoQGWunKSSuznhZQgvlS/bF8dBEUu
…
02qiW7Uezzr0BqlJ2burWZXtbmbMXXqpEvxECr+g2cXwFmSC8eXuutHrX1LMg

SGksIHRoaXMgYWNjb3VudCBpcyBoYWNrZWQhIE1vZGlmeSB0aGUgcGFzc3dvcmQgcmlnaHQgYXdh
eSENCllvdSBtaWdodCBub3Qga25vdyBhbnl0aGluZyBhYm91dCBtZSBhbmQgeW91IG9idmlvdXNs
…
Cg==

Cisactions

New research has analyzed the concept of transactions from a very theoretical point of view. An interesting result of this research was the concept of cisactions, which are in some way the opposite of transactions. A duality between cisactions and transactions has been proven. This means that in principal every application that is based on transactions can also be written with cisactions. But there are some challenges:

currently there is no database that supports cisactions
cisactions are even harder do understand than transactions
the whole program has to be written once again in a totally different way.
even the business logic has to be redefined in a totally different way, but eventually the same results can be achieved
The paradigm change from transactions to cisactions is much harder than the change from object-oriented to functional programming
mixing of cisactions and transactions in the same program is almost impossible

But we will see applications that do use cisactions properly in the future. And they will perform about 1000 times faster than programs using transactions.

The interesting question is:

When will the first cisactional databases become available? How will they work? Which programming languages will support cisactions? Or will we rather have to invent totally new programming languages for proper support of cisactions?

A lot of questions still have to be answered, and this is still theoretical research. But it is so promising for high performance usage that we absolutely must expect that this will become an important way to solve real high performance development tasks.

Unicode and C

It is a common practice in C to use arrays of char as strings. The 0 is used as end marker.

The whole thing was created like that in the 1970s and at that time it was kind of cool to get away with one less language feature and to express it in terms of others instead. And people did not think enough about the necessity to express more than ISO 646 IRV (commonly called ASCII) as string content.

This extended out of the box to 8 bit character sets like ISO 8859-1 or KOI-8, that are identical to ISO 646 in the lower 128 characters and contain an extension in the upper 128 characters. But fortunately we have moved ahead and now Unicode with its encodings UTF-8, UTF-16 and UTF-32.

How can we deal with this in C?

UTF-8 just works out of the box, because the byte 0 is only used to encode the code point U+0000. So the null termination can be kept as it is and a lot of functionality remains valid. Some issues arise, because in UTF-8 things like finding the logical length of a string, not its memory consumption or finding the nth code point, not the nth byte, require UTF-8-logic to be applied and to parse the whole string at least from the beginning to the desired position or the usage of an indexing facility. So a lot of non-trivial string functionality of the standard library will just not be as easy as people thought it would be in the 1970s and subsequently not work as needed. Libraries for better UTF-8-support in C can be found, leaving the „native“ C strings with UTF-8 content only for usage in interfaces that require them. I have not yet explored such libraries, but it would be interesting to find out how powerful and useful they are.

At the time when Unicode came out, it seemed to be sufficient to have 16 bits per character instead of 8. Java was built on this assumption. C added a wchar_t to allow for this and just required it to be „long enough“. So Linux uses 32 bits and MS-Windows 16 bits. This is not too bad, because programming in C for MS-Windows and for Linux is anyway quite different, unless we abstract the differences into a library, which would then also include a common string definition and string handling functionality. While the Linux wchar_t is sufficient, it really wastes a lot of memory, which is often undesirable, if we go the extra effort to program in C in order to gain performance. The Windows-wchar_t is „kind of sufficient“, as are the Java-Strings, because we can really do a lot with assuming that Unicode is only 16 bit or with UTF-16 and ignoring the complexities of that, that are in principal the same as for UTF-8, but can be ignored with less disadvantages most of the time. The good news is, that wchar_t is well supported by standard library functions.

Another way is to use char16_t and char32_t, that have a clear definition of their length, but much less library support.

Probably these facilities are sufficient for software whose string handling is relatively trivial. For more ambitious string handling in terms of functionality and performance, it will be necessary to find third party libraries or to write them.

Article 13

The European Parliament is considering to pass an article 13 that would interfer with the internet usage and freedom of speech in the internet under the pretence of enforcing copyright violations. It is important to resist.

Flashsort in Scala

There is now also an implementation of Flashsort in Scala.

In order to solve the requirement of sorting part of an array that is needed as part of flashsort, an heapsort implementation in Scala that can be constrained to a part of an array has been included as well. Heapsort was chosen, because it can sort in place and it has a guaranteed performance of $O(n \log(n))$ . Mergesort or quicksort would have been reasonable choices as well. Some implentations even use insertion sort for this step, because the sections are small.

Flashsort in Ruby

Deutsch

There is a simple implementation of Flashsort in Ruby, after having already provided an implementation in C. The C-implementation is typically faster than the libc-function qsort, but this depends always on the data and on how well the metric-function has been written, that is needed on top of the comparison function for Flashsort. You can think of this metric function as some kind of monotonic hash function. So we have

$\bigwedge_{a,b: a\le b} m(a) \le m(b)$

This additionally needed function of method is not really there, apart from numerical values, so we really have to invest some time into writing it. This makes the use of Flashsort a bit harder. A good metric function is crucial for good performance, but for typical text files quite trivial implentations already outperform classical $O(n \log n)$ algorithms like Heapsort and Quicksort and Mergesort for larger amounts of data.

This blog article shows other sorting algorithms for Ruby.

Not all projects are on ideal paths II (Tim Finnerty)

This is another story of a project, that did not go as well as it could have gone while I was there. From unsuccessful projects we can learn a lot, so there will be stories like this once in a while. The first one was about Tom Rocket.

Tim Finnerty

It was the time, when all the cool companies tried to introduce Java. And some of the new Java projects failed, causing the companies to go back to C, which again scared other companies from doing this step. But some companies did not get scared by this. They embraced the new Java-fashion at a time when it still was not clear whether or not this was a good idea. What could possibly go wrong?

Well, in those days the experienced guys did not want to move to Java. It was slow, it was unreliable, not mature,… Maybe for Applets, maybe more generally for GUIs to get rid of VisualBasic, but Java on the server was considered a bad joke. For the server real people used real C, of course on Unix or maybe Linux, which was not really such a bad idea in those days. But there were the young people. Or the ones who had stayed young. They often just had finished their education and firmly believed that by just using technology „xyz“ everything would become great. xyz can be a development method (spiral model in those days, agile today), an architecture (microservices), a paragdigm („OO“, „functional“), a framework („enterprise edition“), a tool or a programming language (yeah, Java).. Often this first enthusiasm ends in a disaster: The money has been spent, the developers are leaving and the software is further away from being useful than anybody would like to admit. In lucky cases there is still some money left to do it right. Maybe even to do it right in Java.

That is were we are coming in. I do not really know the earlier history, but according to the management it was a total disaster. Now Tim Finnerty (real name known to the author) was the new technical lead, team architect or whatever this role is called. He embraced the new technology, but promised to not overstrech it. A bit of old school. Sounds good, because it is exactly what managers want to hear. No more risky experiements, but this time it needs to become a success.

So Tim Finnerty defined, how we had to work. He knew Java, he knew databases and especially Oracle, he knew the web, he even knew Perl. And he knew OO. Better than anybody else, so we did the real thing. Great head start. And everybody had to program according to Tim’s rules.

Of course we were using Java enterprise edition. That meant, that we were programming against some Weblogic application server, that was hard to install, hard to run, required a few minutes of startup time for each minor change of the software that we were writing and forced to a very archaic and primitive programming model. But that was cool and it was the future, which unfortunately proved to be true. Even though it has at least become usable by now. So far nothing to blame on Tim, because it is kind of the stack that everybody used.

Now to the OO and the database. Each database table represented a „Business Object“, with a name like XyzBO. So most of the time, we wrote a class XyzBO plus a few more to fulfill the greed and need for boilerplate code of the old J2EE-world. XyzBO was a enterprise java bean. A stateless session bean, to be accurate. Which meant, that we wrote methods of this EJB, which were basically procedures of the pre-OO-world. But within we could of course use the whole OO-toolset. Which we did. So the class to represent any data from the database was actually called Data. It was a minor subset of the standard Perl data structures, which meant that Data was a list of hash maps, which could behave just like a hash map if it had only one entry. Database queries returned Data, or of course null, if nothing was found. Nobody would ever want to use an empty collection. Pretty much the opposite of what we are now doing the hard way by introducing Optional or Option to avoid the null. But it was easy to just write
if (data == null) { data = new Data(); }
at least for the ones who new this trick.
So data resembled the content of the database or of the query result, with the column names as keys and the values as objects. When working with these, it was really easy. Just know the attribute name accurately. Get the value from data. See if it is null. If not, cast it to the real type, and voila….

The database was designed according to Tim’s advice, he had to review every table. It was mandatory to have as unique key and as first column a string of around 700 characters, which was called HANDLE. Each table had a business primary key, which was always consisting of several columns. Because the system allowed multiple instances of the software to run on the same database, there was always one column called „SITE“ in this logical primary key. But there were no primary keys defined in the database. The unique HANDLE was enough. It contained the name of the BO, like XyzBO, followed by a colon and followed by the concatenation of all the logical primary keys, separated with commas. The date had to be converted to a string using a local US format, not ISO, of course. All foreign keys were defined using HANDLE. In the end more than half of the DB space was wasted for this stupidity. But each single handle value started with an XyzBO:, to remind us that we were programming OO.

And now booleans. It was forbidden to use the boolean type of Java. All booleans were strings containing the words „true“ and „false“. This went like that all the way to the web interface.

At that time web frameworks did not yet exist or were at least unknown. So the way to go was to write JSPs, which contained kind of dynamic web pages and to write servlets to control the flow and access the EJBs. Now according to Tim it was too hard to learn servlets as well, so it was forbidden to use them and instead a connection-JSP had to be written, which did not display anything, but only contained a <% in the beginning and a %> in the end and the code between.

A lot of small and larger stupidities, that were forced on the team. Most people were new to Java and to OO and did not even realize that there was anything wrong, apart from the fact, that it was kind of hard to get stuff done. Some stupidities were due to the fact that the early J2EE really sucked, but mostly it was Tim, who forced everyone to his level. This story happened a long time ago. Tim has already retired. I would say he is one of the guys, who retire as a Junior.

There is (almost) nothing wrong with stupidity. And there is (almost) nothing with arrogance. But the combination really sucks. Especially if it it taken serious by the manager or has to be taken serious by the team.

ScalaUA 2019

Links

50 years of internet

www.it-sky-consulting.com now https only

Weird blackmailing via email from „Hacker“

Links

Cisactions

Unicode and C

Links

Article 13

Links

Flashsort in Scala

Links

Flashsort in Ruby

Not all projects are on ideal paths II (Tim Finnerty)

Tim Finnerty