Phone Numbers and E-Mail Addresses

Most data that we deal with are strings or numbers or booleans and combinations of these into classes and collections. Dates can be expressed as string or number, but have enough specific logic to be seen as a third group of data. All these have interesting aspects, some of which have been discussed in this blog already.

Now phone numbers are by an naïve approach numbers or strings, but very soon we see that they have their own specific aspects. The same applies for email addresses which can be represented as strings.

Often projects go by their own „simplified“ specification of what an email address or a phone number is, how to parse, compare and render them. In the end of the day the simplification is harder to tame than the real solution, because it needs to be maintained and specified by the project team rather than being based on a proven library. And once in a while „edge cases“ occur, that cannot be ignored and that make the „home grown“ library even more complex.

Behind phone numbers and email addresses there are well defined and established standards and they are hard to understand thoroughly within the constrained time budget of a typical „business project“, because the time should be allocated to enhancing the business logic and not to reinventing the basics. Unless there is a real need to do so, of course.

Just to give an idea: When phone numbers are parsed or provided by user input, they can start with a „+“ sign or use some country specific logic to express, to which country they belong. And then the „+1“, for example, does not stand for the United States alone, but also for Canada and some smaller countries that are in some way associated with the United States or Canada. Further analysis of the number is required to know about that. The prefix for international number is often „00“, but in the United States it is „011“ and there were and are some other variants, that are still frequently used. Some people like to write something like „+49(0)431 77 88 99 11 1“ instead of „+49 431 77 88 99 11 1“. We can constrain the input to the variants we happen to think of and force the supplier of data to comply, but why bother? Why not accept legitimate formats, as long as they are correct and unambiguous?

Now for E-Mail-addresses there is the famous one page regular expression to recognize correct email addresses which is even by itself not totally complete. Find it at the bottom of the article…

Of course it includes some rarely used variants of email addresses that were once used and have not been completely abolished officially, but it is hard to draw and exact border for this.

So the general recommendation is to find a good library for working with email addresses and phone numbers. Maybe the library can even to some extent eliminate input strings that are formally complying the format, but know to be incorrect by knowing about numbering schemes world wide or about email domains or even by performing lookups.

Another strong recommendation is to store data like email addresses and phone numbers in a technical format, that is in the example of phone numbers always starting with a „+“ followed by digits only. For input any positioning of spaces is accepted, for output the library knows how to format it correctly. This allows selecting by the numbers without dealing with complex formatting, by just using the technical format in the query as well.

For Java (and thus for many JVM-languages), C++ and JavaScript there is an excellent library from Google for dealing with phone numbers. For E-Mails something like apache commons email validator is a way to go.

Keep in mind that for E-Mail addresses and phone numbers, the ultimate way of verification is to send them a link or a code that they need to enter. In the end of the day it is insufficient to rely only on formal verification without this final step.

But still issues remain for transforming data into a canonical technical format for storing them, formatting data for display etc. And there is a huge added value, if we can reliably recognize formally false entries early, when the user can still easily react to it, rather than waiting for an email/SMS/phone call being processed, which may fail when the user is no longer on our „registration site“. And we can process data which has already been verified by a third party, but still we want to parse it to recognize obvious errors.

The concrete libraries may be outdated by the time you are reading this, or they may not be applicable for the language environment that you are using, but please make an effort to find something similar.

So, please use good libraries, that are like to be found for the environment that you are using and write yourself what creates value for your project or organization. Unless your goal is really to write a better library. Better invest the time into areas where there are still no good libraries around.

And as always, you may understand email addresses and phone numbers as an example for a more general idea.

Links

E-Mail Regex

Source: https://emailregex.com/:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*)

Share Button

Orthodox Christmas 2019/2020

Orthodox Christmas 2019/2020 in Ukraine and probably some other countries is on 2020-01-07.


God Jul! — Feliĉan Kristnaskon! — ميلاد مجيد — Natale hilare! — Hyvää Joulua! — Срећан Божић! — Prettige Kerstdagen! — クリスマスおめでとう ; メリークリスマス — З Рiздвом Христовим! — Buon Natale! — Joyeux Noël! — С Рождеством! — Frohe Weihnachten! — ¡Feliz Navidad! — Crăciun fericit! — Merry Christmas! — καλά Χριστούγεννα! — God Jul!

This text was generated with a C# program (using Mono on Linux):

using System;
using System.Collections.Generic;
using static System.Collections.Generic.KeyValuePair;
using System.Linq;

class OrthodoxChristmas20192020 {
    private static string[] arr = new string[] {
        "Prettige Kerstdagen!",
        "God Jul!",
        "Crăciun fericit!",
        "クリスマスおめでとう ; メリークリスマス",
        "God Jul!",
        "Feliĉan Kristnaskon!",
        "Hyvää Joulua!",
        "ميلاد مجيد",
        "Срећан Божић!",
        "καλά Χριστούγεννα!",
        "З Рiздвом Христовим!",
        "Natale hilare!",
        "Buon Natale!",
        "Joyeux Noël!",
        "Frohe Weihnachten!",
        "С Рождеством!",
        "Merry Christmas!",
        "¡Feliz Navidad!"
    };

    public static void Main() {
        Random rnd = new Random();

        var shuffled = from item in arr.Select(s => new KeyValuePair<int, string>(rnd.Next(), s)) orderby item.Key select item.Value;
        int count = 0;
        foreach (string s in shuffled) {
            if (count++ > 0) {
                Console.Write(" — ");
            }
            Console.Write(s);
        }
        Console.WriteLine();
    }
}

Share Button

How to replace svn:keywords?

In the old days we used svn, cvs, rcs or other systems for source code management, that allowed enabling something like svn:keywords. This resulted in certain strings in the source code being replaced by strings containing some version information.

More often than we might think these were useful. The question „what version are we running?“ is often answered, but surprisingly often not correctly.

Now putting the version information into a comment or even better into a string that might even be logged or that might at least be extracted by using something like

strings xyz |egrep '\$Id.*\$'

allows to find out.

Now we are using git instead of svn, or at least we should be using git or plan our migration to git. There are other tools like Mercurial, that are probably just as good as git, but git is most common and every developer knows it or has to learn it anyway to stay in business.

Now git is not supporting these svn:keywords or at least not as easily, because it relies sha-checksums, which does not allow for changing file contents. There are some tricks like pre-checking and post-checkout scripts that might solve such issues, but this is kind of difficult to tame, due to the distributed characters of git including a local repo on each developers machine.

So it is better to accept that the time of this svn:keywords-stuff is over and look for something new. As an example we will consider the world of Java and JVM languages. Most use a Jenkins server to compile the software.

To create a release, even a temporary release or a release just for testing, the right way is to first label the head of the branch we are working on, then check out based on this label, compile that and upload it to the artifactory, if it is successful. Maybe rename the label or and another label. If not, maybe delete the label, depending on the processes.

Now the jar-files contain a META-INF-directory and a MANIFEST.MF. This should be the right place to put version information during such a build. More or less this can provide the same benefit as the svn:keywords, but it works with git and needs only be done in one place.

Details about how to do it will can be found out when needed.

I assume that the same approach can also be accomplished for other environments. We can even find ways that the software logs its version by changing a string in a source code file during the build process.

Share Button

Happy New Year 2020

Un an nou fericit! — Onnellista uutta vuotta! — Feliĉan novan jaron! — Καλή Χρονια! — ¡Feliz año nuevo! — С новым годом! — FELIX SIT ANNUS NOVUS — Godt nytt år! — Щасливого нового року! — Frohes neues Jahr! — Felice anno nuovo! — Bonne année! — Gott nytt år! — Срећна нова година! — عام سعيد — Gullukkig niuw jaar! — Happy new year!

This is generated with a Java 13 program using Lambdas and secure random numbers:

import java.security.SecureRandom;
import java.util.List;
import java.util.stream.Stream;
import java.util.stream.Collectors;

public class HappyNewYearJava8 {

    private static final class Element implements Comparable<Element> {
        Element(Long sortKey, String text) {
            this.sortKey = sortKey;
            this.text = text;
        }

        private Long sortKey;
        private String text;

        public String getText() {
            return text;
        }

        public int compareTo(Element e) {
            return this.sortKey.compareTo(e.sortKey);
        }
    }

    public static void main(String[] args) {
        SecureRandom random = new SecureRandom();
        List<String> list = Stream.of("Frohes neues Jahr!",
                                      "Happy new year!",
                                      "Gott nytt år!",
                                      "¡Feliz año nuevo!",
                                      "Bonne année!",
                                      "FELIX SIT ANNUS NOVUS",
                                      "С новым годом!",
                                      "عام سعيد",
                                      "Felice anno nuovo!",
                                      "Godt nytt år!",
                                      "Gullukkig niuw jaar!",
                                      "Feliĉan novan jaron!",
                                      "Onnellista uutta vuotta!",
                                      "Срећна нова година!",
                                      "Un an nou fericit!",
                                      "Щасливого нового року!",
                                      "Καλή Χρονια!")
            .map(s->new Element(random.nextLong(), s))
            .sorted()
            .map(Element::getText)
            .collect(Collectors.toList());

        System.out.println(String.join(" — ", list));
    }
}

Share Button