Perl and Scala: what can they learn from each other?

Ironically Scala at first drew my interest, because I discovered that about ten years ago there was no really good understanding of how to do a good multithreading concept for Perl 6. I thought exploring how they do it in Scala, where it was already known to be good at that time, would give a more general understanding to this issue. At this time Perl 6 (now named „Raku“) was intended to rather go without multithreading capabilities than doing them badly. In the end I got dragged into Scala and found that by itself more interesting than the original issue. And Perl 6 community eventually found good answers for providing multithreaded capabilities anyway.

So why there are technical concepts in both of these languages, that are interesting and possibly could in some way be applied to the other one, there is an interesting parallel.

Both Scala and Perl have been „cool languages“ that were really strong in an area or even in a broader range of application areas. Both of them found a competitor, that was kind of an „inferior clone“ of them. PHP in its early versions was very similar to Perl, but „simplified“ and kind of a subset of what Perl provided. At that time Perl had a real boom, because the first Web applications came up and the only reasonable way to go was CGI and of course it was done with Perl. There were some early alternatives like Cold Fusion and ASP, but they never really become main stream, at least not outside of their respective communities. Now PHP eventually took over most of Perl’s CGI and has become a major building block of our current WWW. Wikipedia and this Blog run on PHP. Perl eventually also lost its leading position as system administration scripting language to Ruby and even more to Python and some others, but it is still there and has strong string parsing capabilities and a very useful ecosystem of libraries called CPAN.

Now Scala has found Kotlin to be a similar competitor. Besides being somewhat simpler Kotlin also shines with good tooling support. It comes from the same organization as IntelliJ IDEA, which is the usual IDE for most JVM-languages for people who rely neither on Emacs nor vi. So Kotlin support in IntelliJ is always going to be a high priority. And Kotlin is officially supported by Google as programming language for Android-apps. It seems to work well, allows for more modern development than the supported Java versions and has conceptionally a lot of similarity with Swift, which is the most modern programming language supported by Apple for IOS-Apps. There have been heroic and admirable approaches to allow for Android App development using other JVM-languages, especially Scala. But they all suffer from the same set of problems. In order to avoid installing too much language specific code in the app, dynamic language features that would require a compile capability, as commonly used for Groovy or Clojure have to be avoided. And the excessive use of the languages libraries has to be avoided, because they are not on the phone already, but a copy of them has to be shipped with the app, for each App. So the storage usage is much more than for Kotlin and Java Apps. And then we see an attempt, to reduce the size of the libraries, by only including what is needed. That is necessary, but it looks too fragile to really trust it. So, for Mobile Apps, it is Kotlin. Period. And then Kotlin is already there, so why not use it on the server as well. Yes, I do believe Scala is better, but that is not what everyone thinks and it needs to be much better to justify the additional language, where App-development for Android is already happening.

Now both Perl and Scala had some problems. To some extent, they are even sharing the exact same problem. It was the possibility to write really „cool“ code that was very smart, very short and could not be read by anybody else without very much time and very much knowledge. This can be done in any language, but Perl is the number one for this and I would put Scala as number two and C++ and C as number three and four, from the languages, that I know. It is a good idea to use some coding standards that allow for clean Scala or Perl code. But please remain reasonable and do not let bureaucrats come in charge of the coding standards to create a monster that drains all creativity. Allow using powerful features, but use them in a decent and readable way.

Now in both cases, there was an effort, to write a new version of the language, that was meant to be slightly incompatible and cleaned up some of the weaknesses and brought some improvements. In case of Perl this was Perl 6. It was developed for around 20 years and came out a few years ago. Eventually it turned out too different, so it was renamed to Raku. For Scala, a new language called „Dotty“ was developed. It was decided to make this the next major version, Scala 3. Even though it is much closer to Scala 2 than Raku to Perl 5, it is still incompatible and requires an effort to rewrite code. It is already seen that large Perl 5 projects are hardly moving to Raku, so Perl 5 is there to stay and Raku is just a second language within the same community. This will probably not happen like that with Scala, and the core language team will probably at some point of time concentrate on Scala 3. But large organizations that heavily invested into Scala cannot easily migrate, simply because it needs a lot of time and money. So we will probably also see some long term coexistence of Scala 2 and Scala 3. Maybe Scala 2 will be forked by major adopters. Or it will be supported for money from Lightbend.

Share Button

Phone Numbers and E-Mail Addresses

Most data that we deal with are strings or numbers or booleans and combinations of these into classes and collections. Dates can be expressed as string or number, but have enough specific logic to be seen as a fourth group of data. All these have interesting aspects, some of which have been discussed in this blog already.

Now phone numbers are by an naïve approach numbers or strings, but very soon we see that they have their own specific aspects. The same applies for email addresses which can be represented as strings.

Often projects go by their own „simplified“ specification of what an email address or a phone number is, how to parse, compare and render them. In the end of the day the simplification is harder to tame than the real solution, because it needs to be maintained and specified by the project team rather than being based on a proven library. And once in a while „edge cases“ occur, that cannot be ignored and that make the „home grown“ library even more complex.

Behind phone numbers and email addresses there are well defined and established standards and they are hard to understand thoroughly within the constrained time budget of a typical „business project“, because the time should be allocated to enhancing the business logic and not to reinventing the basics. Unless there is a real need to do so, of course.

Just to give an idea: When phone numbers are parsed or provided by user input, they can start with a „+“ sign or use some country specific logic to express, to which country they belong. And then the „+1“, for example, does not stand for the United States alone, but also for Canada and some smaller countries that are in some way associated with the United States or Canada. Further analysis of the number is required to know about that. The prefix for international number is often „00“, but in the United States it is „011“ and there were and are some other variants, that are still frequently used. Some people like to write something like „+49(0)431 77 88 99 11 1“ instead of „+49 431 77 88 99 11 1“. We can constrain the input to the variants we happen to think of and force the supplier of data to comply, but why bother? Why not accept legitimate formats, as long as they are correct and unambiguous?

Now for E-Mail-addresses there is the famous one page regular expression to recognize correct email addresses which is even by itself not totally complete. Find it at the bottom of the article…

Of course it includes some rarely used variants of email addresses that were once used and have not been completely abolished officially, but it is hard to draw and exact border for this.

So the general recommendation is to find a good library for working with email addresses and phone numbers. Maybe the library can even to some extent eliminate input strings that are formally complying the format, but know to be incorrect by knowing about numbering schemes world wide or about email domains or even by performing lookups.

Another strong recommendation is to store data like email addresses and phone numbers in a technical format, that is in the example of phone numbers always starting with a „+“ followed by digits only. For input any positioning of spaces is accepted, for output the library knows how to format it correctly. This allows selecting by the numbers without dealing with complex formatting, by just using the technical format in the query as well.

For Java (and thus for many JVM-languages), C++ and JavaScript there is an excellent library from Google for dealing with phone numbers. For E-Mails something like apache commons email validator is a way to go.

Keep in mind that for E-Mail addresses and phone numbers, the ultimate way of verification is to send them a link or a code that they need to enter. In the end of the day it is insufficient to rely only on formal verification without this final step.

But still issues remain for transforming data into a canonical technical format for storing them, formatting data for display etc. And there is a huge added value, if we can reliably recognize formally false entries early, when the user can still easily react to it, rather than waiting for an email/SMS/phone call being processed, which may fail when the user is no longer on our „registration site“. And we can process data which has already been verified by a third party, but still we want to parse it to recognize obvious errors.

The concrete libraries may be outdated by the time you are reading this, or they may not be applicable for the language environment that you are using, but please make an effort to find something similar.

So, please use good libraries, that are like to be found for the environment that you are using and write yourself what creates value for your project or organization. Unless your goal is really to write a better library. Better invest the time into areas where there are still no good libraries around.

And as always, you may understand email addresses and phone numbers as an example for a more general idea.

Links

E-Mail Regex

Source: https://emailregex.com/:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*)

Share Button

Orthodox Christmas 2019/2020

Orthodox Christmas 2019/2020 in Ukraine and probably some other countries is on 2020-01-07.


God Jul! — Feliĉan Kristnaskon! — ميلاد مجيد — Natale hilare! — Hyvää Joulua! — Срећан Божић! — Prettige Kerstdagen! — クリスマスおめでとう ; メリークリスマス — З Рiздвом Христовим! — Buon Natale! — Joyeux Noël! — С Рождеством! — Frohe Weihnachten! — ¡Feliz Navidad! — Crăciun fericit! — Merry Christmas! — καλά Χριστούγεννα! — God Jul!

This text was generated with a C# program (using Mono on Linux):

using System;
using System.Collections.Generic;
using static System.Collections.Generic.KeyValuePair;
using System.Linq;

class OrthodoxChristmas20192020 {
    private static string[] arr = new string[] {
        "Prettige Kerstdagen!",
        "God Jul!",
        "Crăciun fericit!",
        "クリスマスおめでとう ; メリークリスマス",
        "God Jul!",
        "Feliĉan Kristnaskon!",
        "Hyvää Joulua!",
        "ميلاد مجيد",
        "Срећан Божић!",
        "καλά Χριστούγεννα!",
        "З Рiздвом Христовим!",
        "Natale hilare!",
        "Buon Natale!",
        "Joyeux Noël!",
        "Frohe Weihnachten!",
        "С Рождеством!",
        "Merry Christmas!",
        "¡Feliz Navidad!"
    };

    public static void Main() {
        Random rnd = new Random();

        var shuffled = from item in arr.Select(s => new KeyValuePair<int, string>(rnd.Next(), s)) orderby item.Key select item.Value;
        int count = 0;
        foreach (string s in shuffled) {
            if (count++ > 0) {
                Console.Write(" — ");
            }
            Console.Write(s);
        }
        Console.WriteLine();
    }
}

Share Button

How to replace svn:keywords?

In the old days we used svn, cvs, rcs or other systems for source code management, that allowed enabling something like svn:keywords. This resulted in certain strings in the source code being replaced by strings containing some version information.

More often than we might think these were useful. The question „what version are we running?“ is often answered, but surprisingly often not correctly.

Now putting the version information into a comment or even better into a string that might even be logged or that might at least be extracted by using something like

strings xyz |egrep '\$Id.*\$'

allows to find out.

Now we are using git instead of svn, or at least we should be using git or plan our migration to git. There are other tools like Mercurial, that are probably just as good as git, but git is most common and every developer knows it or has to learn it anyway to stay in business.

Now git is not supporting these svn:keywords or at least not as easily, because it relies sha-checksums, which does not allow for changing file contents. There are some tricks like pre-checking and post-checkout scripts that might solve such issues, but this is kind of difficult to tame, due to the distributed characters of git including a local repo on each developers machine.

So it is better to accept that the time of this svn:keywords-stuff is over and look for something new. As an example we will consider the world of Java and JVM languages. Most use a Jenkins server to compile the software.

To create a release, even a temporary release or a release just for testing, the right way is to first label the head of the branch we are working on, then check out based on this label, compile that and upload it to the artifactory, if it is successful. Maybe rename the label or and another label. If not, maybe delete the label, depending on the processes.

Now the jar-files contain a META-INF-directory and a MANIFEST.MF. This should be the right place to put version information during such a build. More or less this can provide the same benefit as the svn:keywords, but it works with git and needs only be done in one place.

Details about how to do it will can be found out when needed.

I assume that the same approach can also be accomplished for other environments. We can even find ways that the software logs its version by changing a string in a source code file during the build process.

Share Button

Happy New Year 2020

Un an nou fericit! — Onnellista uutta vuotta! — Feliĉan novan jaron! — Καλή Χρονια! — ¡Feliz año nuevo! — С новым годом! — FELIX SIT ANNUS NOVUS — Godt nytt år! — Щасливого нового року! — Frohes neues Jahr! — Felice anno nuovo! — Bonne année! — Gott nytt år! — Срећна нова година! — عام سعيد — Gullukkig niuw jaar! — Happy new year!

This is generated with a Java 13 program using Lambdas and secure random numbers:

import java.security.SecureRandom;
import java.util.List;
import java.util.stream.Stream;
import java.util.stream.Collectors;

public class HappyNewYearJava8 {

    private static final class Element implements Comparable<Element> {
        Element(Long sortKey, String text) {
            this.sortKey = sortKey;
            this.text = text;
        }

        private Long sortKey;
        private String text;

        public String getText() {
            return text;
        }

        public int compareTo(Element e) {
            return this.sortKey.compareTo(e.sortKey);
        }
    }

    public static void main(String[] args) {
        SecureRandom random = new SecureRandom();
        List<String> list = Stream.of("Frohes neues Jahr!",
                                      "Happy new year!",
                                      "Gott nytt år!",
                                      "¡Feliz año nuevo!",
                                      "Bonne année!",
                                      "FELIX SIT ANNUS NOVUS",
                                      "С новым годом!",
                                      "عام سعيد",
                                      "Felice anno nuovo!",
                                      "Godt nytt år!",
                                      "Gullukkig niuw jaar!",
                                      "Feliĉan novan jaron!",
                                      "Onnellista uutta vuotta!",
                                      "Срећна нова година!",
                                      "Un an nou fericit!",
                                      "Щасливого нового року!",
                                      "Καλή Χρονια!")
            .map(s->new Element(random.nextLong(), s))
            .sorted()
            .map(Element::getText)
            .collect(Collectors.toList());

        System.out.println(String.join(" — ", list));
    }
}

Share Button

Christmas 2019

Joyeux Noël! — ميلاد مجيد — Crăciun fericit! — God Jul! — God Jul! — Natale hilare! — С Рождеством! — З Рiздвом Христовим! — Prettige Kerstdagen! — Hyvää Joulua! — クリスマスおめでとう ; メリークリスマス — καλά Χριστούγεννα! — Buon Natale! — Срећан Божић! — Frohe Weihnachten! — ¡Feliz Navidad! — Feliĉan Kristnaskon! — Merry Christmas!

This time the greetings were generated with a C program:

#include <stdio.h>
#include <stdint.h>
#include <openssl/rand.h>

#define N 18
static const uint32_t n = N;

int main(int argc, char **argv) {
  char greetings[N][60] = {
    "С Рождеством!",
    "Hyvää Joulua!",
    "καλά Χριστούγεννα!",
    "Buon Natale!",
    "Prettige Kerstdagen!",
    "З Рiздвом Христовим!",
    "Merry Christmas!",
    "Срећан Божић!",
    "God Jul!",
    "¡Feliz Navidad!",
    "ميلاد مجيد",
    "クリスマスおめでとう ; メリークリスマス",
    "Natale hilare!",
    "Joyeux Noël!",
    "God Jul!",
    "Frohe Weihnachten!",
    "Crăciun fericit!",
    "Feliĉan Kristnaskon!" };
  int32_t i, j;
  uint32_t x;
  uint32_t idx[N];
  int rtc;
  uint64_t r = 0;
  for (i = n-1; i >= 0; i--) {
    idx[i] = i;
  }
  RAND_bytes((char *) &r, sizeof(r));
  for (i = n-1; i > 0; i--) {
    j = r % i;
    r = r / i;
    x = idx[i];
    idx[i] = idx[j];
    idx[j] = x;
  }
  for (i = 0; i < n; i++) {
    if (i > 0) {
      printf(" — ");
    }
    printf("%s", greetings[idx[i]]);
  }
  printf("\n");
}

Share Button

Ranges of Dates and Times

In Software we often deal with ranges of dates and times.

Let us look at it from the perspective of an end user.

When we say something like „from 2020-03-07 to 2019-03-10“ we mean the set of all timestamps t such that

    \[\text{2019-03-07} \le d < \text{2019-03-11}\]

or more accurately:

    \[\text{2019-03-07T00:00:00}+TZ \le d < \text{2019-03-11T00:00:00}+TZ\]

Important is, that we mean to include the whole 24 hour day of 2019-03-10. Btw. please try to get used to the ISO-date even when writing normal human readable texts, it just makes sense…

Now when we are not talking about dates, but about times or instants of time, the interpretation is different.
When we say sonmething like „from 07:00 to 10:00“ or „from 2020-03-10T07:00:00+TZ to 2020-04-11T09:00:00+TZ“, we actually mean the set of all timestamps t such that

    \[givenDate\text{T07:00:00}+TZ \le t < givenDate\text{T10:00:00}+TZ\]

or

    \[\text{2020-03-10T07:00:00}+TZ \le t < \text{2020-04-11T09:00:00}+TZ,\]

respectively. It is important that we have to add one in case of date only (accuracy to one day) and we do not in case of finer grained date/time information. The question if the upper bound is included or not is not so important in our everyday life, but it proves that commonly the most useful way is not to include the upper bound. If you prefer to have all options, it is a better idea to employ an interval library, i.e. to find one or to write one. But for most cases it is enough to exclude the upper limit. This guarantees disjoint adjacent intervals which is usually what we want. I have seen people write code that adds 23:59:59.999 to a date and compares with \le instead of <, but this is an ugly hack that needs a lot of boiler plate code and a lot of time to understand. Use the exclusive upper limit, because we have it.

Now the requirement is to add one day to the upper limit to get from the human readable form of date-only ranges to something computers can work with. It is a good thing to agree on where this transformation is made. And to do it in such a way that it even behaves correctly on those dates where daylight saving starts or ends, because adding one day might actually mean „23 hours“ or „25 hours“. If we need to be really very accurate, sometimes switch seconds need to be added.

Just another issue has come up here. Local time is much harder than UTC. We need to work with local time on all kinds of user interfaces for humans, with very few exceptions like for pilots, who actually work with UTC. But local date and time is ambiguous for one hour every year and at least a bit special to handle for these two days where daylight saving starts and ends. Convert dates to UTC and work with that internally. And convert them to local date on all kinds of user interfaces, where it makes sense, including documents that are printed or provided as PDFs, for example. When we work with dates without time, we need to add one day to the upper limit and then round it to the nearest some-date\text{T00:00:00}+TZ for our timezone TZ or know when to add 23, 24 or 25 hours, respectively, which we do not want to know, but we need to use modern time libraries like the java.time.XXX stuff in Java, for example.

Working with date and time is hard. It is important to avoid making it harder than it needs to be. Here some recommendations:

  • Try to use UTC for the internal use of the software as much as possible
  • Use local date or time or date and time in all kinds of user interfaces (with few exceptions)
  • add one day to the upper limit and round it to the nearest midnight of local time exactly once in the stack
  • exclude the upper limit in date ranges
  • Use ISO-date formats even in the user interfaces, if possible

Links

Share Button

Functional Scala London 2019

In December 2019 I attended the conference Functional Scala in London which was initiated and managed by John de Goes. See Skillsmatter about what happened to Scala Exchange. Of course a large part of the conference was related to ZIO, which seems to be a part of the eco system surrounding Scala with a lot of dynamic.

It was a single track conference with a lot of talks, so I have attended all of them:
Day 1 (2019-12-12)

  • KEYNOTE: XS — A Collections CLI [Paul Phillips] (Video)
  • Introduction to Interruption [Jakub Kozlowski] (Video)
  • Making Algorthms work with Functional Scala [Karl Brodowsky] (Video)
  • Solving the Scala Notebook Experience [Jeremy Smith & Jonathan Indig] (Video)
  • Mixing Scala & Kotlin [Alexey Soshin] (Video)
  • Prototyping the Future with Functional Scala [Mike Kotsur] (Video)
  • Test Effects: First Class [Adam Fraser] (Video)
  • Let’s Gossip! [Dejan Mijic & Przemyslaw Wierzbicki] (Video)
  • Ray Tracing with ZIO [Pierangelo Cecchetto] (Video)
  • Invertible Programs [Sergei Shabanau] (Video)
  • Hyper-pragmatic Pure FP Testing with DIStage-Testkit [Pavel Shirshov & Kai] (Video)
  • KEYNOTE: Unleash Your Fury [Jon Pretty] (Video)

Day 2 (2019-12-14)

  • Modern Data-Driven Applications with ZIO Streams [Itamar Ravid] (Video)
  • Functional Architecture [Piotr Golebiewski] (Video)
  • ZIO Chunk: A Fast, Pure Alternative to Arrays [Aleksandra A. Holubitska]
  • Caliban: Designing a Functional GraphQL Library [Pierre Ricadat] (Video)
  • Macros and Environmental Effects [Maxim Schuwalow] (Video)
  • Streaming Analytics with Scala and Spark [Bas Geerdink] (Video)
  • ZIO Actors [Mateusz Sokol] (Video)
  • Adventures in Type-safe Error Handling [Jacob Wang]
  • Composition using Arrows and Monoidal Categories [Oleg Nizhnik]
  • Practical Logic(al) Programming with Dotty [Lander Lopez]
  • Next-Level Type Safety: An Intro to Generalized Algebraic Data Types [Matthias Berndt]
  • KEYNOTE: The Many Faces of Modularity [Eric Torreborre]

See Agenda

Maybe I will write more about some topics.

Talks will be on youtube in the near future.

Links

Share Button

Visit to reClojure in London 2019

On 2019-12-02 I visited the conference reClojure.

This was an admirable community effort to create a replacement for ScalaExchange, which simply did not happen because of the bankruptcy of Skillsmatter.

There was only one track, so the schedule is exactly what I visited.

I will just copy it below, because schedules from conference sites usually disappear after some time:

  • Building stuff with Clojure and 3D Printing. Clément Salaün.
    How to design objects with Clojure, OpenSCAD and then 3D print them. This talk covers the motivations, basic concepts and features with a live demo.
  • Clojure Art. Karl Brodowsky.
    Teaching or learning Clojure using images has been proven to be fun and beneficial! In this talk, learn how.
  • Growing Mobile Apps with ClojureScript and React Native. Daniel Neal.
    Starting things is fun, but growing them can be a real challenge – and mobile apps are no different…
  • Live Coding a Mandelbrot Renderer. Peter Westmacott.
    In this talk, Peter will demonstrate live coding of a fractal renderer, with the aim to show how complex beauty can emerge from simple mathematical rules and a little code.
  • Pizza Party Lunch (Thank You uSwitch!)
    Short 10 minute talks. Various Speakers.
  • Unleash the power of the REPL. Dana Borinski.
    Return to basics and dive into how to leverage the REPL to solve problems and debug more quickly – and with the added bonus of honing our Clojure skills!
  • Generating Generators. Andy Chambers.
    Generating data for use in tests can be laborious and boring. However, using the database’s information schema you can alleviate that! Discover the ways to achieve this.
  • Living in a Box. Life in Containers with the JVM. Matthew Gilliard.
    A focus on how containers and the JVM interact and what implications are there for Clojure Developers. Get the best results from the work gone into OpenJDK container support.
  • Closing Keynote – Code, meet data! Malcolm Sparks.
    Computers have 3 jobs: Input, process, output. How have we made such a mess of something so fundamental? Observations, opportunities for Clojurists and hope for the future.

There is a youtube channel for reClojure, where we can now find recordings of the talks.

Share Button

How to get rid of these HTML-entities in Files

It has been written here that HTML-entities (these &auml; etc) should be avoided with the exception of those that we need due to the HTML-syntax like &lt;, &gt;, &amp; and maybe &quot; and &nbsp;. They were already mostly obsolete more than 20 years ago, but in those days we still did not automatically use UTF-8 or UTF-16, but often an 8-bit character encoding that could express only up to 256 characters, in reality around 200 due to control characters. At least these 200 could be used. That was enough for web pages in those days and texts in German, French, Russian, Greek, Hebrew, Arabic and many other language could well be written, as long as only one language or a few similar languages were used. For the rare occasions that required some characters that were not in this character set, it was an option to rely on these HTML-entities. Or for typing HTML-pages on an US-keyboard without any good tool support.

But now Unicode has been around for more than 25 years and more than 90% of the web pages use UTF-8.

Now some people think that these HTML-entities are kind of necessary or at least „safer“ and I see people still writing HTML-code with them in these days. Or tools by relatively well known companies, that produced such output not so long ago… It is a good thing to have some courage and to change something like this to readable and natural format. Or more generally to try out if a simpler or better solution works. Reasonable courage is good for this, too much of something good can go bad, as so often…

So, please teach your collegues not to use these ugly HTML-entities, where UTF-8-characters are the better option.

And here is a perl script that converts the HTML-entities with the exceptions mentioned above to UTF-8. In the project conversion-utils some more such scripts might be added. The script is a bit too long to be pasted inline in a code block, so it is better to find the current version on github.

Then you can do something like this:

git commit
for file in *.html ; do
echo $file
mv $file ${file}~entities~
html2utf8 < ${file}~entities~ > $file
echo /$file
done
git diff

to convert all files in a directory. I assume that you are using Linux or at least have bash like for example in cygwin.
There are other tools to do the same thing, I am sure. Just use anything that works for you to get away from this unreadable crap.

Share Button