Phone Numbers and E-Mail Addresses

Most data that we deal with are strings or numbers or booleans and combinations of these into classes and collections. Dates can be expressed as string or number, but have enough specific logic to be seen as a fourth group of data. All these have interesting aspects, some of which have been discussed in this blog already.

Now phone numbers are by an naïve approach numbers or strings, but very soon we see that they have their own specific aspects. The same applies for email addresses which can be represented as strings.

Often projects go by their own „simplified“ specification of what an email address or a phone number is, how to parse, compare and render them. In the end of the day the simplification is harder to tame than the real solution, because it needs to be maintained and specified by the project team rather than being based on a proven library. And once in a while „edge cases“ occur, that cannot be ignored and that make the „home grown“ library even more complex.

Behind phone numbers and email addresses there are well defined and established standards and they are hard to understand thoroughly within the constrained time budget of a typical „business project“, because the time should be allocated to enhancing the business logic and not to reinventing the basics. Unless there is a real need to do so, of course.

Just to give an idea: When phone numbers are parsed or provided by user input, they can start with a „+“ sign or use some country specific logic to express, to which country they belong. And then the „+1“, for example, does not stand for the United States alone, but also for Canada and some smaller countries that are in some way associated with the United States or Canada. Further analysis of the number is required to know about that. The prefix for international number is often „00“, but in the United States it is „011“ and there were and are some other variants, that are still frequently used. Some people like to write something like „+49(0)431 77 88 99 11 1“ instead of „+49 431 77 88 99 11 1“. We can constrain the input to the variants we happen to think of and force the supplier of data to comply, but why bother? Why not accept legitimate formats, as long as they are correct and unambiguous?

Now for E-Mail-addresses there is the famous one page regular expression to recognize correct email addresses which is even by itself not totally complete. Find it at the bottom of the article…

Of course it includes some rarely used variants of email addresses that were once used and have not been completely abolished officially, but it is hard to draw and exact border for this.

So the general recommendation is to find a good library for working with email addresses and phone numbers. Maybe the library can even to some extent eliminate input strings that are formally complying the format, but know to be incorrect by knowing about numbering schemes world wide or about email domains or even by performing lookups.

Another strong recommendation is to store data like email addresses and phone numbers in a technical format, that is in the example of phone numbers always starting with a „+“ followed by digits only. For input any positioning of spaces is accepted, for output the library knows how to format it correctly. This allows selecting by the numbers without dealing with complex formatting, by just using the technical format in the query as well.

For Java (and thus for many JVM-languages), C++ and JavaScript there is an excellent library from Google for dealing with phone numbers. For E-Mails something like apache commons email validator is a way to go.

Keep in mind that for E-Mail addresses and phone numbers, the ultimate way of verification is to send them a link or a code that they need to enter. In the end of the day it is insufficient to rely only on formal verification without this final step.

But still issues remain for transforming data into a canonical technical format for storing them, formatting data for display etc. And there is a huge added value, if we can reliably recognize formally false entries early, when the user can still easily react to it, rather than waiting for an email/SMS/phone call being processed, which may fail when the user is no longer on our „registration site“. And we can process data which has already been verified by a third party, but still we want to parse it to recognize obvious errors.

The concrete libraries may be outdated by the time you are reading this, or they may not be applicable for the language environment that you are using, but please make an effort to find something similar.

So, please use good libraries, that are like to be found for the environment that you are using and write yourself what creates value for your project or organization. Unless your goal is really to write a better library. Better invest the time into areas where there are still no good libraries around.

And as always, you may understand email addresses and phone numbers as an example for a more general idea.

Links

E-Mail Regex

Source: https://emailregex.com/:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*)

Share Button

Orthodox Christmas 2019/2020

Orthodox Christmas 2019/2020 in Ukraine and probably some other countries is on 2020-01-07.


God Jul! — Feliĉan Kristnaskon! — ميلاد مجيد — Natale hilare! — Hyvää Joulua! — Срећан Божић! — Prettige Kerstdagen! — クリスマスおめでとう ; メリークリスマス — З Рiздвом Христовим! — Buon Natale! — Joyeux Noël! — С Рождеством! — Frohe Weihnachten! — ¡Feliz Navidad! — Crăciun fericit! — Merry Christmas! — καλά Χριστούγεννα! — God Jul!

This text was generated with a C# program (using Mono on Linux):

using System;
using System.Collections.Generic;
using static System.Collections.Generic.KeyValuePair;
using System.Linq;

class OrthodoxChristmas20192020 {
    private static string[] arr = new string[] {
        "Prettige Kerstdagen!",
        "God Jul!",
        "Crăciun fericit!",
        "クリスマスおめでとう ; メリークリスマス",
        "God Jul!",
        "Feliĉan Kristnaskon!",
        "Hyvää Joulua!",
        "ميلاد مجيد",
        "Срећан Божић!",
        "καλά Χριστούγεννα!",
        "З Рiздвом Христовим!",
        "Natale hilare!",
        "Buon Natale!",
        "Joyeux Noël!",
        "Frohe Weihnachten!",
        "С Рождеством!",
        "Merry Christmas!",
        "¡Feliz Navidad!"
    };

    public static void Main() {
        Random rnd = new Random();

        var shuffled = from item in arr.Select(s => new KeyValuePair<int, string>(rnd.Next(), s)) orderby item.Key select item.Value;
        int count = 0;
        foreach (string s in shuffled) {
            if (count++ > 0) {
                Console.Write(" — ");
            }
            Console.Write(s);
        }
        Console.WriteLine();
    }
}

Share Button

How to replace svn:keywords?

In the old days we used svn, cvs, rcs or other systems for source code management, that allowed enabling something like svn:keywords. This resulted in certain strings in the source code being replaced by strings containing some version information.

More often than we might think these were useful. The question „what version are we running?“ is often answered, but surprisingly often not correctly.

Now putting the version information into a comment or even better into a string that might even be logged or that might at least be extracted by using something like

strings xyz |egrep '\$Id.*\$'

allows to find out.

Now we are using git instead of svn, or at least we should be using git or plan our migration to git. There are other tools like Mercurial, that are probably just as good as git, but git is most common and every developer knows it or has to learn it anyway to stay in business.

Now git is not supporting these svn:keywords or at least not as easily, because it relies sha-checksums, which does not allow for changing file contents. There are some tricks like pre-checking and post-checkout scripts that might solve such issues, but this is kind of difficult to tame, due to the distributed characters of git including a local repo on each developers machine.

So it is better to accept that the time of this svn:keywords-stuff is over and look for something new. As an example we will consider the world of Java and JVM languages. Most use a Jenkins server to compile the software.

To create a release, even a temporary release or a release just for testing, the right way is to first label the head of the branch we are working on, then check out based on this label, compile that and upload it to the artifactory, if it is successful. Maybe rename the label or and another label. If not, maybe delete the label, depending on the processes.

Now the jar-files contain a META-INF-directory and a MANIFEST.MF. This should be the right place to put version information during such a build. More or less this can provide the same benefit as the svn:keywords, but it works with git and needs only be done in one place.

Details about how to do it will can be found out when needed.

I assume that the same approach can also be accomplished for other environments. We can even find ways that the software logs its version by changing a string in a source code file during the build process.

Share Button

Happy New Year 2020

Un an nou fericit! — Onnellista uutta vuotta! — Feliĉan novan jaron! — Καλή Χρονια! — ¡Feliz año nuevo! — С новым годом! — FELIX SIT ANNUS NOVUS — Godt nytt år! — Щасливого нового року! — Frohes neues Jahr! — Felice anno nuovo! — Bonne année! — Gott nytt år! — Срећна нова година! — عام سعيد — Gullukkig niuw jaar! — Happy new year!

This is generated with a Java 13 program using Lambdas and secure random numbers:

import java.security.SecureRandom;
import java.util.List;
import java.util.stream.Stream;
import java.util.stream.Collectors;

public class HappyNewYearJava8 {

    private static final class Element implements Comparable<Element> {
        Element(Long sortKey, String text) {
            this.sortKey = sortKey;
            this.text = text;
        }

        private Long sortKey;
        private String text;

        public String getText() {
            return text;
        }

        public int compareTo(Element e) {
            return this.sortKey.compareTo(e.sortKey);
        }
    }

    public static void main(String[] args) {
        SecureRandom random = new SecureRandom();
        List<String> list = Stream.of("Frohes neues Jahr!",
                                      "Happy new year!",
                                      "Gott nytt år!",
                                      "¡Feliz año nuevo!",
                                      "Bonne année!",
                                      "FELIX SIT ANNUS NOVUS",
                                      "С новым годом!",
                                      "عام سعيد",
                                      "Felice anno nuovo!",
                                      "Godt nytt år!",
                                      "Gullukkig niuw jaar!",
                                      "Feliĉan novan jaron!",
                                      "Onnellista uutta vuotta!",
                                      "Срећна нова година!",
                                      "Un an nou fericit!",
                                      "Щасливого нового року!",
                                      "Καλή Χρονια!")
            .map(s->new Element(random.nextLong(), s))
            .sorted()
            .map(Element::getText)
            .collect(Collectors.toList());

        System.out.println(String.join(" — ", list));
    }
}

Share Button

Christmas 2019

Joyeux Noël! — ميلاد مجيد — Crăciun fericit! — God Jul! — God Jul! — Natale hilare! — С Рождеством! — З Рiздвом Христовим! — Prettige Kerstdagen! — Hyvää Joulua! — クリスマスおめでとう ; メリークリスマス — καλά Χριστούγεννα! — Buon Natale! — Срећан Божић! — Frohe Weihnachten! — ¡Feliz Navidad! — Feliĉan Kristnaskon! — Merry Christmas!

This time the greetings were generated with a C program:

#include <stdio.h>
#include <stdint.h>
#include <openssl/rand.h>

#define N 18
static const uint32_t n = N;

int main(int argc, char **argv) {
  char greetings[N][60] = {
    "С Рождеством!",
    "Hyvää Joulua!",
    "καλά Χριστούγεννα!",
    "Buon Natale!",
    "Prettige Kerstdagen!",
    "З Рiздвом Христовим!",
    "Merry Christmas!",
    "Срећан Божић!",
    "God Jul!",
    "¡Feliz Navidad!",
    "ميلاد مجيد",
    "クリスマスおめでとう ; メリークリスマス",
    "Natale hilare!",
    "Joyeux Noël!",
    "God Jul!",
    "Frohe Weihnachten!",
    "Crăciun fericit!",
    "Feliĉan Kristnaskon!" };
  int32_t i, j;
  uint32_t x;
  uint32_t idx[N];
  int rtc;
  uint64_t r = 0;
  for (i = n-1; i >= 0; i--) {
    idx[i] = i;
  }
  RAND_bytes((char *) &r, sizeof(r));
  for (i = n-1; i > 0; i--) {
    j = r % i;
    r = r / i;
    x = idx[i];
    idx[i] = idx[j];
    idx[j] = x;
  }
  for (i = 0; i < n; i++) {
    if (i > 0) {
      printf(" — ");
    }
    printf("%s", greetings[idx[i]]);
  }
  printf("\n");
}

Share Button

Ranges of Dates and Times

In Software we often deal with ranges of dates and times.

Let us look at it from the perspective of an end user.

When we say something like „from 2020-03-07 to 2019-03-10“ we mean the set of all timestamps t such that

    \[\text{2019-03-07} \le d < \text{2019-03-11}\]

or more accurately:

    \[\text{2019-03-07T00:00:00}+TZ \le d < \text{2019-03-11T00:00:00}+TZ\]

Important is, that we mean to include the whole 24 hour day of 2019-03-10. Btw. please try to get used to the ISO-date even when writing normal human readable texts, it just makes sense…

Now when we are not talking about dates, but about times or instants of time, the interpretation is different.
When we say sonmething like „from 07:00 to 10:00“ or „from 2020-03-10T07:00:00+TZ to 2020-04-11T09:00:00+TZ“, we actually mean the set of all timestamps t such that

    \[givenDate\text{T07:00:00}+TZ \le t < givenDate\text{T10:00:00}+TZ\]

or

    \[\text{2020-03-10T07:00:00}+TZ \le t < \text{2020-04-11T09:00:00}+TZ,\]

respectively. It is important that we have to add one in case of date only (accuracy to one day) and we do not in case of finer grained date/time information. The question if the upper bound is included or not is not so important in our everyday life, but it proves that commonly the most useful way is not to include the upper bound. If you prefer to have all options, it is a better idea to employ an interval library, i.e. to find one or to write one. But for most cases it is enough to exclude the upper limit. This guarantees disjoint adjacent intervals which is usually what we want. I have seen people write code that adds 23:59:59.999 to a date and compares with \le instead of <, but this is an ugly hack that needs a lot of boiler plate code and a lot of time to understand. Use the exclusive upper limit, because we have it.

Now the requirement is to add one day to the upper limit to get from the human readable form of date-only ranges to something computers can work with. It is a good thing to agree on where this transformation is made. And to do it in such a way that it even behaves correctly on those dates where daylight saving starts or ends, because adding one day might actually mean „23 hours“ or „25 hours“. If we need to be really very accurate, sometimes switch seconds need to be added.

Just another issue has come up here. Local time is much harder than UTC. We need to work with local time on all kinds of user interfaces for humans, with very few exceptions like for pilots, who actually work with UTC. But local date and time is ambiguous for one hour every year and at least a bit special to handle for these two days where daylight saving starts and ends. Convert dates to UTC and work with that internally. And convert them to local date on all kinds of user interfaces, where it makes sense, including documents that are printed or provided as PDFs, for example. When we work with dates without time, we need to add one day to the upper limit and then round it to the nearest some-date\text{T00:00:00}+TZ for our timezone TZ or know when to add 23, 24 or 25 hours, respectively, which we do not want to know, but we need to use modern time libraries like the java.time.XXX stuff in Java, for example.

Working with date and time is hard. It is important to avoid making it harder than it needs to be. Here some recommendations:

  • Try to use UTC for the internal use of the software as much as possible
  • Use local date or time or date and time in all kinds of user interfaces (with few exceptions)
  • add one day to the upper limit and round it to the nearest midnight of local time exactly once in the stack
  • exclude the upper limit in date ranges
  • Use ISO-date formats even in the user interfaces, if possible

Links

Share Button

Functional Scala London 2019

In December 2019 I attended the conference Functional Scala in London which was initiated and managed by John de Goes. See Skillsmatter about what happened to Scala Exchange. Of course a large part of the conference was related to ZIO, which seems to be a part of the eco system surrounding Scala with a lot of dynamic.

It was a single track conference with a lot of talks, so I have attended all of them:
Day 1 (2019-12-12)

  • KEYNOTE: XS — A Collections CLI [Paul Phillips] (Video)
  • Introduction to Interruption [Jakub Kozlowski] (Video)
  • Making Algorthms work with Functional Scala [Karl Brodowsky] (Video)
  • Solving the Scala Notebook Experience [Jeremy Smith & Jonathan Indig] (Video)
  • Mixing Scala & Kotlin [Alexey Soshin] (Video)
  • Prototyping the Future with Functional Scala [Mike Kotsur] (Video)
  • Test Effects: First Class [Adam Fraser] (Video)
  • Let’s Gossip! [Dejan Mijic & Przemyslaw Wierzbicki] (Video)
  • Ray Tracing with ZIO [Pierangelo Cecchetto] (Video)
  • Invertible Programs [Sergei Shabanau] (Video)
  • Hyper-pragmatic Pure FP Testing with DIStage-Testkit [Pavel Shirshov & Kai] (Video)
  • KEYNOTE: Unleash Your Fury [Jon Pretty] (Video)

Day 2 (2019-12-14)

  • Modern Data-Driven Applications with ZIO Streams [Itamar Ravid] (Video)
  • Functional Architecture [Piotr Golebiewski] (Video)
  • ZIO Chunk: A Fast, Pure Alternative to Arrays [Aleksandra A. Holubitska]
  • Caliban: Designing a Functional GraphQL Library [Pierre Ricadat] (Video)
  • Macros and Environmental Effects [Maxim Schuwalow] (Video)
  • Streaming Analytics with Scala and Spark [Bas Geerdink] (Video)
  • ZIO Actors [Mateusz Sokol] (Video)
  • Adventures in Type-safe Error Handling [Jacob Wang]
  • Composition using Arrows and Monoidal Categories [Oleg Nizhnik]
  • Practical Logic(al) Programming with Dotty [Lander Lopez]
  • Next-Level Type Safety: An Intro to Generalized Algebraic Data Types [Matthias Berndt]
  • KEYNOTE: The Many Faces of Modularity [Eric Torreborre]

See Agenda

Maybe I will write more about some topics.

Talks will be on youtube in the near future.

Links

Share Button

Visit to reClojure in London 2019

On 2019-12-02 I visited the conference reClojure.

This was an admirable community effort to create a replacement for ScalaExchange, which simply did not happen because of the bankruptcy of Skillsmatter.

There was only one track, so the schedule is exactly what I visited.

I will just copy it below, because schedules from conference sites usually disappear after some time:

  • Building stuff with Clojure and 3D Printing. Clément Salaün.
    How to design objects with Clojure, OpenSCAD and then 3D print them. This talk covers the motivations, basic concepts and features with a live demo.
  • Clojure Art. Karl Brodowsky.
    Teaching or learning Clojure using images has been proven to be fun and beneficial! In this talk, learn how.
  • Growing Mobile Apps with ClojureScript and React Native. Daniel Neal.
    Starting things is fun, but growing them can be a real challenge – and mobile apps are no different…
  • Live Coding a Mandelbrot Renderer. Peter Westmacott.
    In this talk, Peter will demonstrate live coding of a fractal renderer, with the aim to show how complex beauty can emerge from simple mathematical rules and a little code.
  • Pizza Party Lunch (Thank You uSwitch!)
    Short 10 minute talks. Various Speakers.
  • Unleash the power of the REPL. Dana Borinski.
    Return to basics and dive into how to leverage the REPL to solve problems and debug more quickly – and with the added bonus of honing our Clojure skills!
  • Generating Generators. Andy Chambers.
    Generating data for use in tests can be laborious and boring. However, using the database’s information schema you can alleviate that! Discover the ways to achieve this.
  • Living in a Box. Life in Containers with the JVM. Matthew Gilliard.
    A focus on how containers and the JVM interact and what implications are there for Clojure Developers. Get the best results from the work gone into OpenJDK container support.
  • Closing Keynote – Code, meet data! Malcolm Sparks.
    Computers have 3 jobs: Input, process, output. How have we made such a mess of something so fundamental? Observations, opportunities for Clojurists and hope for the future.

There is a youtube channel for reClojure, where we can now find recordings of the talks.

Share Button

How to get rid of these HTML-entities in Files

It has been written here that HTML-entities (these &auml; etc) should be avoided with the exception of those that we need due to the HTML-syntax like &lt;, &gt;, &amp; and maybe &quot; and &nbsp;. They were already mostly obsolete more than 20 years ago, but in those days we still did not automatically use UTF-8 or UTF-16, but often an 8-bit character encoding that could express only up to 256 characters, in reality around 200 due to control characters. At least these 200 could be used. That was enough for web pages in those days and texts in German, French, Russian, Greek, Hebrew, Arabic and many other language could well be written, as long as only one language or a few similar languages were used. For the rare occasions that required some characters that were not in this character set, it was an option to rely on these HTML-entities. Or for typing HTML-pages on an US-keyboard without any good tool support.

But now Unicode has been around for more than 25 years and more than 90% of the web pages use UTF-8.

Now some people think that these HTML-entities are kind of necessary or at least „safer“ and I see people still writing HTML-code with them in these days. Or tools by relatively well known companies, that produced such output not so long ago… It is a good thing to have some courage and to change something like this to readable and natural format. Or more generally to try out if a simpler or better solution works. Reasonable courage is good for this, too much of something good can go bad, as so often…

So, please teach your collegues not to use these ugly HTML-entities, where UTF-8-characters are the better option.

And here is a perl script that converts the HTML-entities with the exceptions mentioned above to UTF-8. In the project conversion-utils some more such scripts might be added. The script is a bit too long to be pasted inline in a code block, so it is better to find the current version on github.

Then you can do something like this:

git commit
for file in *.html ; do
echo $file
mv $file ${file}~entities~
html2utf8 < ${file}~entities~ > $file
echo /$file
done
git diff

to convert all files in a directory. I assume that you are using Linux or at least have bash like for example in cygwin.
There are other tools to do the same thing, I am sure. Just use anything that works for you to get away from this unreadable crap.

Share Button

Devoxx UA and Devoxx BE 2019

In 2019 I visited Devoxx UA in Kiev and Devoxx BE in Antwerp.
Traveling was actually a little story by itself, so for now we can just assume that I magically was at the locations of DevoxxUA and DevoxxBE.

In Kiew I attended the following talks:

On Wednesday I attended the following talks in Antwerp:

On Thursday I attended the following talks in Antwerp:

On Friday I attended the following talks in Antwerp:

That’s it…
As always, a lot of these topics deserve an article in this blog. And a lot of video recordings from the conference are worth viewing.

Links

Share Button