Perl Scripts for editing

Even though we do have IDEs with quite powerful refactoring mechanisms for many languages, it is still sometimes useful, to have another automated editing mechanism.

Why would that be the case?

Some examples:

In some cases there was an SQL script to create the database tables, which had been written first. From that classes and even CRUD operations in something like JDBC or DBI can be created. Even though most Java-projects that I have seen recently prefer using Hibernate (or more generally JPA2), there are some benefits in doing just plain old JDBC. This requires some discipline in writing the SQL in a uniform way and it was sometimes even necessary to have „magical comments“ that were ignored by the DB, but used by the script. This can save a lot of work and errors.

In some cases there were large numbers of HTML files. Now a similar kind of change had to be applied to each of them. Using a Script to parse the HTML file and to apply the changes can save a lot of work and provide consistency that is often badly needed. And if the whole thing happens in the context of an agile project, then the stakeholders might want to see the outcome and come up with further ideas. No big deal, just change the transformation scripts and check it out again. I recommend thinking in terms of larger transformation steps. Then I would retain the original files and apply the script for the step until the outcome is ok. Then this can be committed to git and the next step can be worked on. If the intermediate results are not useful for anybody else, just wait with the push or better work on a branch.

Sometimes refactorings have to be done that are not easily supported by the IDE. For example a whole bunch of classes need to be moved to different packages according to some new naming rule. Just find all the classes with their old package names, then move to the new directory structure and rename imports and package declarations to the new structure. This can easily be done with a Script. Always remember to be careful if there is Reflection involved, this will most likely break all refactorings, no matter if by IDE or by script.

Also it is often useful to use scripts to analyze code and to find occurrences of certain patterns.

These things can be done with scripts in Ruby, Python, Perl or Raku. All of these are valid options. Ruby and Raku are somewhat more sophisticated languages than the other two. And Perl and Raku have the most sophisticated Regex capabilities. I would assume that Raku is the best choice, if you start from scratch, it even supports grammars out of the box, which might be a way to address such issues. Perl has it as add on libraries. Of course it is also useful to work with what you know. But sometimes it is worth learning the right tool instead of just using the golden hammer to put in screws.

So in my case the tool for this is currently Perl. It does the job, is available more or less everywhere and I know it well enough. It might be worth moving to Raku in the future.

Some things that often work for simple scripts are the following:

Try to start with normalizing the input files, this might be the first transformation step. Remove trailing spaces, replace tabs with spaces, maybe normalize idention, replace line endings by LF only without CR. In HTML replace HTML-entities for UNICODE-characters by the appropriate Unicode characters, convert everything to UTF-8 etc.
If you have data like phone numbers and dates in the file that are meaningful for your further steps, bring them to a standard format. Of course always depending on your local circumstances, but this is usually the right way to go. This might be partially done with an external tool like xmllint.

For the next steps we need to consider the issue that we have multiple lines. There are regex-variants that do not stop at line boundaries. But if we read the content line wise, it can still be a bit more work. Sometimes it is easier to just replace line feeds by some marker string that does not otherwise occur in your files (check it before!) and then apply usual regex on this longer line.

What we often need is a regex that finds the shortest and not the longest match. If we write something like /a.*b/, this will look for a sequence that starts with an „a“ and ends with the last „b“ that can be found. Often we want to end with the first „b“. This can be achieved by /a.*?b/.

Another pattern that is often useful for such scripts is to work with a state machine. If a certain pattern is discovered, we react to it depending on the state and possibly change the state. So we can apply changes to something only if it occurs in a certain context.

Share Button

JSON instead of Java Serialization: The solution?

We start recognizing that Serialization is not such a good idea.

It is cool and can really work on a wide range of objects, even including complex and cyclic reference graphs. And it was essential for some older Java frameworks like EJB and RMI, which allowed remote access to Java objects and classes.

But it is no longer the future, Oracle will soon deprecate and later remove it. And it will happen this time, even though they really keep stuff around for a long time due to compatibility requirements.

Just to recap: it opens up security discussion, it opens up hidden behavior and makes it harder to reason about code, it creates tight coupling between remote components and it can result in bugs, that only occur at runtime and cannot be discovered at compile time. In short, it is not resilient.

So we need something else. Obvious candidates are XML, YAML and JSON. XML is of course an option and is powerful enough to do many things, but often a bit too clumbsy and too much boiler plate, so we try to move away from it. YAML and JSON kind of do the same thing, but it seems that JSON is winning the race and we all need to know JSON and many of us tend to skip YAML.

So why not use JSON. It is easy, it has good libraries and we can even find databases that work with JSON.

What JSON can express very well are scalars, lists and maps and combinations of these. This is quite exactly what we have in Perl, JavaScript or Clojure as basic building blocks. These languages support object oriented programming, but for simple stuff we go with these basic building blocks. And objects can be modelled as (hash-)maps, with the attribute names as keys. Actually JSON is valid JavaScript code.

We do have to change our thinking when moving from Java Serialization to JSON. JSON does not store any serializable object but just data. Maybe that is enough and that is what we actually want. It totally works in heterogeneous environments, where we are using different programming languages or different implementations.

There are good libraries. I have tried two, Jackson and GSON which both work well, recently mostly Jackson. It is important to think of Clojure, JavaScript, Perl or something like that without objects. So we loose type information, which can be considered good or bad, but if we can arrange ourselves with it, we avoid the tight coupling. JavaBeans are expressed exactly the same as a HashMap with the attribute names as keys. We can provide the top level class when deserializing, but at the child levels it will not be able to figure that out, if it relies on runtime information.

Example Code

Here it has been tried out. Find full example code on github.

A class that contains all kinds of stuff. Not prepared for really putting in nulls, but it is just experimental code…


package net.itsky.jackson;

import java.util.List;
import java.util.Map;
import java.util.Set;

public class TestObject {
    private Long l;
    private String s;
    private Boolean b;
    private Set set;
    private List list;
    private Map map;

    public TestObject(Long l, String s, Boolean b, Set set, List list, Map map) {
        this.l = l;
        this.s = s;
        this.b = b;
        this.set = set;
        this.list = list;
        this.map = map;
    }

    public TestObject() {
        // only for framework purposes
    }

    public Long getL() {
        return l;
    }

    public String getS() {
        return s;
    }

    public Boolean getB() {
        return b;
    }

    public Set getSet() {
        return set;
    }

    public List getList() {
        return list;
    }

    public Map getMap() {
        return map;
    }

    @Override
    public String toString() {
        return getClass().getSimpleName() + "("
+                "l=" + l + " (" + l.getClass() + ") "
                + " s=\"" + s + "\" (" + s.getClass() + ") "
                + " b=" + b + " (" + b.getClass() + ") "
                + " set=" + set  + " (" + set.getClass() + ") "
                + " list=" + list + " (" + list.getClass() + ") "
                + " map=" + map + " (" + map.getClass() + "))";
    }
}

And this is used for running everything. To play around more, it should probably be moved to tests..

package net.itsky.jackson;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.ObjectWriter;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.ImmutableSet;

import java.io.StringReader;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class App {

    public static void main(String[] args) {
        try {
            Set s1 = ImmutableSet.of(1, 2, 3);
            Set s2 = ImmutableSet.of(1, 2, 3);
            Map m1 = ImmutableMap.of("A", "abc", "B", 3L, "C", s1);
            List l1 = ImmutableList.of("i", "e", "a", "o", "u");
            TestObject t1 = new TestObject(30303L, "uv", true, s2, l1, m1);
            Map m2 = ImmutableMap.of("r", "r101", "s", 202, "t", t1);
            List l2 = ImmutableList.of("ä", "ö", "ü", "å", "ø");
            Set s3 = ImmutableSet.of("x", "y", "z");
            TestObject t2 = new TestObject(40404L, "ijk", false, s3, l2, m2);
            ObjectMapper mapper = new ObjectMapper();
            ObjectWriter writer = mapper.writerWithDefaultPrettyPrinter();
            System.out.println("t2=" + t2);
            String json = writer.writeValueAsString(t2);
            System.out.println("json=" + json);
            StringReader stringReader = new StringReader(json);
            TestObject t3 = mapper.readValue(stringReader, TestObject.class);
            System.out.println("t3=" + t3);
        } catch (Exception ex) {
            RuntimeException rex;
            if (ex instanceof RuntimeException) {
                rex = (RuntimeException) ex;
            } else {
                rex = new RuntimeException(ex);
            }
            throw rex;
        }
    }
}

And here is the output:

t2=TestObject(l=40404 (class java.lang.Long)  s="ijk" (class java.lang.String)
  b=false (class java.lang.Boolean)  
set=[x, y, z] (class com.google.common.collect.RegularImmutableSet)
  list=[ä, ö, ü, å, ø] (class com.google.common.collect.RegularImmutableList)
  map={r=r101, s=202, 
t=TestObject(l=30303 (class java.lang.Long)
  s="uv" (class java.lang.String)  b=true (class java.lang.Boolean)
  set=[1, 2, 3] (class com.google.common.collect.RegularImmutableSet)
  list=[i, e, a, o, u] (class com.google.common.collect.RegularImmutableList)
  map={A=abc, B=3, C=[1, 2, 3]} (class com.google.common.collect.RegularImmutableMap))}
 (class com.google.common.collect.RegularImmutableMap))
json={
  "l" : 40404,
  "s" : "ijk",
  "b" : false,
  "set" : [ "x", "y", "z" ],
  "list" : [ "ä", "ö", "ü", "å", "ø" ],
  "map" : {
    "r" : "r101",
    "s" : 202,
    "t" : {
      "l" : 30303,
      "s" : "uv",
      "b" : true,
      "set" : [ 1, 2, 3 ],
      "list" : [ "i", "e", "a", "o", "u" ],
      "map" : {
        "A" : "abc",
        "B" : 3,
        "C" : [ 1, 2, 3 ]
      }
    }
  }
}
t3=TestObject(l=40404 (class java.lang.Long)  s="ijk" (class java.lang.String)  
b=false (class java.lang.Boolean)  
set=[x, y, z] (class java.util.HashSet)  
list=[ä, ö, ü, å, ø] (class java.util.ArrayList)  
map={r=r101, s=202, 
t={l=30303, s=uv, b=true, 
set=[1, 2, 3], 
list=[i, e, a, o, u], 
map={A=abc, B=3, C=[1, 2, 3]}}}
  (class java.util.LinkedHashMap))

Process finished with exit code 0

So the immediate object and its immediate attributes were deserialized properly to what we provided. But everything inside went to maps, lists and scalars.

The intermediate JSON does not carry the type information at all, so this is the best that can be done.
Often it is useful what we want. If not, we need to find something else or see if we can tweak JSON to carry type information.

It will be interesting to explore other serialization protocols…

Links

Share Button

Project-local Libraries

Many projects have these „project-local“ or „company-local“ libraries, that can be used optionally or are even strongly imposed on developers. They may be called something like

  • core
  • toolkit
  • toolbox
  • base
  • platform
  • utils
  • lib
  • common
  • framework
  • baselib
  • sdk
  • tools

or even with a meaningful good name.

Generally there is nothing wrong with such libraries, if they are used correctly.

Some things should be observed, though: Such libraries are there to be useful. If that is not the case, it is better not to use them and to deprecate and discard them. Quite a lot of projects suffer from the fact that inferior local libraries are imposed on them and have to be used. The strive for „consistency“ is not bad either, but it should be kept to a level at which it is useful and not become a primary goal by itself.

So when need for a functionality arises, an existing library might cover this well enough. If it is not too hard to integrate, there is little need to write a „local library functionality“ for this. If no good library can be found or reasonably be integrated, it is a good idea to write what is needed oneself. It should be observed, that such a library should have slightly higher quality standards and very good automated testing, because it is more universal and the actual and future usage cannot be anticipated as easily as with „business code“. There are places, where it is good to impose a local library, to make sure that certain fields are validated in the same way across the software or even across the organization, for example. For data like phone numbers, dates, email addresses etc. that are commonly used in our world, it is probably possible to find good libraries that do this much better than any local library with reasonable team effort could do.

Now the world moves and we might discover better replacements for parts of the local library. It is a good idea to move on to these better replacements.

Things that can go into a local library are small pieces of business logic that need to be used everywhere. Think of an insurance company. Customers have customer numbers. They have to be entered into web application forms and they have to be checked for correctness, they have to be formatted etc. This could go into a library and we could be sure that everyone in the larger project uses the same rules for what is a valid customer number and how to format and parse them. It is a bit too hard to implement it again and again, because tiny errors sneak in, for example into the code to check for validity with some check sum, but it is too easy to put it into a service that can be called to check formal validity. Such a service will be there to check if some formally correct customer number is actually a real customer’s number, maybe with constraints on who can see this for which customer. And of course to generate new customer numbers, when needed.

It is always good to ask twice if certain functionality should rather sit in a service that can be called or in a library, of course depending also on the local architecture preferences. In cases where the functionality is useful for others in the organization, but where the exact same behavior is less important, it can also be an option to just copy the code to different teams. But this should be used with care, because it means that improvements do not easily flow back to benefit other users of the functionality.

Generally it is an observation that organizations rarely have good „local libraries“. The good libraries are found on the internet. Or only exist within the team. And the bad company libraries are often forced on those who cannot find ways to opt out. Good team libraries that are not known outside of the team can sometimes be a loss for the organization, because things are done twice or even worse are done differently where it would be good to have the same behavior.

Share Button

Perl and Scala: what can they learn from each other?

Ironically Scala at first drew my interest, because I discovered that about ten years ago there was no really good understanding of how to do a good multithreading concept for Perl 6. I thought exploring how they do it in Scala, where it was already known to be good at that time, would give a more general understanding to this issue. At this time Perl 6 (now named „Raku“) was intended to rather go without multithreading capabilities than doing them badly. In the end I got dragged into Scala and found that by itself more interesting than the original issue. And Perl 6 community eventually found good answers for providing multithreaded capabilities anyway.

So why there are technical concepts in both of these languages, that are interesting and possibly could in some way be applied to the other one, there is an interesting parallel.

Both Scala and Perl have been „cool languages“ that were really strong in an area or even in a broader range of application areas. Both of them found a competitor, that was kind of an „inferior clone“ of them. PHP in its early versions was very similar to Perl, but „simplified“ and kind of a subset of what Perl provided. At that time Perl had a real boom, because the first Web applications came up and the only reasonable way to go was CGI and of course it was done with Perl. There were some early alternatives like Cold Fusion and ASP, but they never really become main stream, at least not outside of their respective communities. Now PHP eventually took over most of Perl’s CGI and has become a major building block of our current WWW. Wikipedia and this Blog run on PHP. Perl eventually also lost its leading position as system administration scripting language to Ruby and even more to Python and some others, but it is still there and has strong string parsing capabilities and a very useful ecosystem of libraries called CPAN.

Now Scala has found Kotlin to be a similar competitor. Besides being somewhat simpler Kotlin also shines with good tooling support. It comes from the same organization as IntelliJ IDEA, which is the usual IDE for most JVM-languages for people who rely neither on Emacs nor vi. So Kotlin support in IntelliJ is always going to be a high priority. And Kotlin is officially supported by Google as programming language for Android-apps. It seems to work well, allows for more modern development than the supported Java versions and has conceptionally a lot of similarity with Swift, which is the most modern programming language supported by Apple for IOS-Apps. There have been heroic and admirable approaches to allow for Android App development using other JVM-languages, especially Scala. But they all suffer from the same set of problems. In order to avoid installing too much language specific code in the app, dynamic language features that would require a compile capability, as commonly used for Groovy or Clojure have to be avoided. And the excessive use of the languages libraries has to be avoided, because they are not on the phone already, but a copy of them has to be shipped with the app, for each App. So the storage usage is much more than for Kotlin and Java Apps. And then we see an attempt, to reduce the size of the libraries, by only including what is needed. That is necessary, but it looks too fragile to really trust it. So, for Mobile Apps, it is Kotlin. Period. And then Kotlin is already there, so why not use it on the server as well. Yes, I do believe Scala is better, but that is not what everyone thinks and it needs to be much better to justify the additional language, where App-development for Android is already happening.

Now both Perl and Scala had some problems. To some extent, they are even sharing the exact same problem. It was the possibility to write really „cool“ code that was very smart, very short and could not be read by anybody else without very much time and very much knowledge. This can be done in any language, but Perl is the number one for this and I would put Scala as number two and C++ and C as number three and four, from the languages, that I know. It is a good idea to use some coding standards that allow for clean Scala or Perl code. But please remain reasonable and do not let bureaucrats come in charge of the coding standards to create a monster that drains all creativity. Allow using powerful features, but use them in a decent and readable way.

Now in both cases, there was an effort, to write a new version of the language, that was meant to be slightly incompatible and cleaned up some of the weaknesses and brought some improvements. In case of Perl this was Perl 6. It was developed for around 20 years and came out a few years ago. Eventually it turned out too different, so it was renamed to Raku. For Scala, a new language called „Dotty“ was developed. It was decided to make this the next major version, Scala 3. Even though it is much closer to Scala 2 than Raku to Perl 5, it is still incompatible and requires an effort to rewrite code. It is already seen that large Perl 5 projects are hardly moving to Raku, so Perl 5 is there to stay and Raku is just a second language within the same community. This will probably not happen like that with Scala, and the core language team will probably at some point of time concentrate on Scala 3. But large organizations that heavily invested into Scala cannot easily migrate, simply because it needs a lot of time and money. So we will probably also see some long term coexistence of Scala 2 and Scala 3. Maybe Scala 2 will be forked by major adopters. Or it will be supported for money from Lightbend.

Share Button

Phone Numbers and E-Mail Addresses

Most data that we deal with are strings or numbers or booleans and combinations of these into classes and collections. Dates can be expressed as string or number, but have enough specific logic to be seen as a fourth group of data. All these have interesting aspects, some of which have been discussed in this blog already.

Now phone numbers are by an naïve approach numbers or strings, but very soon we see that they have their own specific aspects. The same applies for email addresses which can be represented as strings.

Often projects go by their own „simplified“ specification of what an email address or a phone number is, how to parse, compare and render them. In the end of the day the simplification is harder to tame than the real solution, because it needs to be maintained and specified by the project team rather than being based on a proven library. And once in a while „edge cases“ occur, that cannot be ignored and that make the „home grown“ library even more complex.

Behind phone numbers and email addresses there are well defined and established standards and they are hard to understand thoroughly within the constrained time budget of a typical „business project“, because the time should be allocated to enhancing the business logic and not to reinventing the basics. Unless there is a real need to do so, of course.

Just to give an idea: When phone numbers are parsed or provided by user input, they can start with a „+“ sign or use some country specific logic to express, to which country they belong. And then the „+1“, for example, does not stand for the United States alone, but also for Canada and some smaller countries that are in some way associated with the United States or Canada. Further analysis of the number is required to know about that. The prefix for international number is often „00“, but in the United States it is „011“ and there were and are some other variants, that are still frequently used. Some people like to write something like „+49(0)431 77 88 99 11 1“ instead of „+49 431 77 88 99 11 1“. We can constrain the input to the variants we happen to think of and force the supplier of data to comply, but why bother? Why not accept legitimate formats, as long as they are correct and unambiguous?

Now for E-Mail-addresses there is the famous one page regular expression to recognize correct email addresses which is even by itself not totally complete. Find it at the bottom of the article…

Of course it includes some rarely used variants of email addresses that were once used and have not been completely abolished officially, but it is hard to draw and exact border for this.

So the general recommendation is to find a good library for working with email addresses and phone numbers. Maybe the library can even to some extent eliminate input strings that are formally complying the format, but know to be incorrect by knowing about numbering schemes world wide or about email domains or even by performing lookups.

Another strong recommendation is to store data like email addresses and phone numbers in a technical format, that is in the example of phone numbers always starting with a „+“ followed by digits only. For input any positioning of spaces is accepted, for output the library knows how to format it correctly. This allows selecting by the numbers without dealing with complex formatting, by just using the technical format in the query as well.

For Java (and thus for many JVM-languages), C++ and JavaScript there is an excellent library from Google for dealing with phone numbers. For E-Mails something like apache commons email validator is a way to go.

Keep in mind that for E-Mail addresses and phone numbers, the ultimate way of verification is to send them a link or a code that they need to enter. In the end of the day it is insufficient to rely only on formal verification without this final step.

But still issues remain for transforming data into a canonical technical format for storing them, formatting data for display etc. And there is a huge added value, if we can reliably recognize formally false entries early, when the user can still easily react to it, rather than waiting for an email/SMS/phone call being processed, which may fail when the user is no longer on our „registration site“. And we can process data which has already been verified by a third party, but still we want to parse it to recognize obvious errors.

The concrete libraries may be outdated by the time you are reading this, or they may not be applicable for the language environment that you are using, but please make an effort to find something similar.

So, please use good libraries, that are like to be found for the environment that you are using and write yourself what creates value for your project or organization. Unless your goal is really to write a better library. Better invest the time into areas where there are still no good libraries around.

And as always, you may understand email addresses and phone numbers as an example for a more general idea.

Links

E-Mail Regex

Source: https://emailregex.com/:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*)

Share Button

Orthodox Christmas 2019/2020

Orthodox Christmas 2019/2020 in Ukraine and probably some other countries is on 2020-01-07.


God Jul! — Feliĉan Kristnaskon! — ميلاد مجيد — Natale hilare! — Hyvää Joulua! — Срећан Божић! — Prettige Kerstdagen! — クリスマスおめでとう ; メリークリスマス — З Рiздвом Христовим! — Buon Natale! — Joyeux Noël! — С Рождеством! — Frohe Weihnachten! — ¡Feliz Navidad! — Crăciun fericit! — Merry Christmas! — καλά Χριστούγεννα! — God Jul!

This text was generated with a C# program (using Mono on Linux):

using System;
using System.Collections.Generic;
using static System.Collections.Generic.KeyValuePair;
using System.Linq;

class OrthodoxChristmas20192020 {
    private static string[] arr = new string[] {
        "Prettige Kerstdagen!",
        "God Jul!",
        "Crăciun fericit!",
        "クリスマスおめでとう ; メリークリスマス",
        "God Jul!",
        "Feliĉan Kristnaskon!",
        "Hyvää Joulua!",
        "ميلاد مجيد",
        "Срећан Божић!",
        "καλά Χριστούγεννα!",
        "З Рiздвом Христовим!",
        "Natale hilare!",
        "Buon Natale!",
        "Joyeux Noël!",
        "Frohe Weihnachten!",
        "С Рождеством!",
        "Merry Christmas!",
        "¡Feliz Navidad!"
    };

    public static void Main() {
        Random rnd = new Random();

        var shuffled = from item in arr.Select(s => new KeyValuePair<int, string>(rnd.Next(), s)) orderby item.Key select item.Value;
        int count = 0;
        foreach (string s in shuffled) {
            if (count++ > 0) {
                Console.Write(" — ");
            }
            Console.Write(s);
        }
        Console.WriteLine();
    }
}

Share Button

How to replace svn:keywords?

In the old days we used svn, cvs, rcs or other systems for source code management, that allowed enabling something like svn:keywords. This resulted in certain strings in the source code being replaced by strings containing some version information.

More often than we might think these were useful. The question „what version are we running?“ is often answered, but surprisingly often not correctly.

Now putting the version information into a comment or even better into a string that might even be logged or that might at least be extracted by using something like

strings xyz |egrep '\$Id.*\$'

allows to find out.

Now we are using git instead of svn, or at least we should be using git or plan our migration to git. There are other tools like Mercurial, that are probably just as good as git, but git is most common and every developer knows it or has to learn it anyway to stay in business.

Now git is not supporting these svn:keywords or at least not as easily, because it relies sha-checksums, which does not allow for changing file contents. There are some tricks like pre-checking and post-checkout scripts that might solve such issues, but this is kind of difficult to tame, due to the distributed characters of git including a local repo on each developers machine.

So it is better to accept that the time of this svn:keywords-stuff is over and look for something new. As an example we will consider the world of Java and JVM languages. Most use a Jenkins server to compile the software.

To create a release, even a temporary release or a release just for testing, the right way is to first label the head of the branch we are working on, then check out based on this label, compile that and upload it to the artifactory, if it is successful. Maybe rename the label or and another label. If not, maybe delete the label, depending on the processes.

Now the jar-files contain a META-INF-directory and a MANIFEST.MF. This should be the right place to put version information during such a build. More or less this can provide the same benefit as the svn:keywords, but it works with git and needs only be done in one place.

Details about how to do it will can be found out when needed.

I assume that the same approach can also be accomplished for other environments. We can even find ways that the software logs its version by changing a string in a source code file during the build process.

Share Button

Happy New Year 2020

Un an nou fericit! — Onnellista uutta vuotta! — Feliĉan novan jaron! — Καλή Χρονια! — ¡Feliz año nuevo! — С новым годом! — FELIX SIT ANNUS NOVUS — Godt nytt år! — Щасливого нового року! — Frohes neues Jahr! — Felice anno nuovo! — Bonne année! — Gott nytt år! — Срећна нова година! — عام سعيد — Gullukkig niuw jaar! — Happy new year!

This is generated with a Java 13 program using Lambdas and secure random numbers:

import java.security.SecureRandom;
import java.util.List;
import java.util.stream.Stream;
import java.util.stream.Collectors;

public class HappyNewYearJava8 {

    private static final class Element implements Comparable<Element> {
        Element(Long sortKey, String text) {
            this.sortKey = sortKey;
            this.text = text;
        }

        private Long sortKey;
        private String text;

        public String getText() {
            return text;
        }

        public int compareTo(Element e) {
            return this.sortKey.compareTo(e.sortKey);
        }
    }

    public static void main(String[] args) {
        SecureRandom random = new SecureRandom();
        List<String> list = Stream.of("Frohes neues Jahr!",
                                      "Happy new year!",
                                      "Gott nytt år!",
                                      "¡Feliz año nuevo!",
                                      "Bonne année!",
                                      "FELIX SIT ANNUS NOVUS",
                                      "С новым годом!",
                                      "عام سعيد",
                                      "Felice anno nuovo!",
                                      "Godt nytt år!",
                                      "Gullukkig niuw jaar!",
                                      "Feliĉan novan jaron!",
                                      "Onnellista uutta vuotta!",
                                      "Срећна нова година!",
                                      "Un an nou fericit!",
                                      "Щасливого нового року!",
                                      "Καλή Χρονια!")
            .map(s->new Element(random.nextLong(), s))
            .sorted()
            .map(Element::getText)
            .collect(Collectors.toList());

        System.out.println(String.join(" — ", list));
    }
}

Share Button

Christmas 2019

Joyeux Noël! — ميلاد مجيد — Crăciun fericit! — God Jul! — God Jul! — Natale hilare! — С Рождеством! — З Рiздвом Христовим! — Prettige Kerstdagen! — Hyvää Joulua! — クリスマスおめでとう ; メリークリスマス — καλά Χριστούγεννα! — Buon Natale! — Срећан Божић! — Frohe Weihnachten! — ¡Feliz Navidad! — Feliĉan Kristnaskon! — Merry Christmas!

This time the greetings were generated with a C program:

#include <stdio.h>
#include <stdint.h>
#include <openssl/rand.h>

#define N 18
static const uint32_t n = N;

int main(int argc, char **argv) {
  char greetings[N][60] = {
    "С Рождеством!",
    "Hyvää Joulua!",
    "καλά Χριστούγεννα!",
    "Buon Natale!",
    "Prettige Kerstdagen!",
    "З Рiздвом Христовим!",
    "Merry Christmas!",
    "Срећан Божић!",
    "God Jul!",
    "¡Feliz Navidad!",
    "ميلاد مجيد",
    "クリスマスおめでとう ; メリークリスマス",
    "Natale hilare!",
    "Joyeux Noël!",
    "God Jul!",
    "Frohe Weihnachten!",
    "Crăciun fericit!",
    "Feliĉan Kristnaskon!" };
  int32_t i, j;
  uint32_t x;
  uint32_t idx[N];
  int rtc;
  uint64_t r = 0;
  for (i = n-1; i >= 0; i--) {
    idx[i] = i;
  }
  RAND_bytes((char *) &r, sizeof(r));
  for (i = n-1; i > 0; i--) {
    j = r % i;
    r = r / i;
    x = idx[i];
    idx[i] = idx[j];
    idx[j] = x;
  }
  for (i = 0; i < n; i++) {
    if (i > 0) {
      printf(" — ");
    }
    printf("%s", greetings[idx[i]]);
  }
  printf("\n");
}

Share Button

Ranges of Dates and Times

In Software we often deal with ranges of dates and times.

Let us look at it from the perspective of an end user.

When we say something like „from 2020-03-07 to 2019-03-10“ we mean the set of all timestamps t such that

    \[\text{2019-03-07} \le d < \text{2019-03-11}\]

or more accurately:

    \[\text{2019-03-07T00:00:00}+TZ \le d < \text{2019-03-11T00:00:00}+TZ\]

Important is, that we mean to include the whole 24 hour day of 2019-03-10. Btw. please try to get used to the ISO-date even when writing normal human readable texts, it just makes sense…

Now when we are not talking about dates, but about times or instants of time, the interpretation is different.
When we say sonmething like „from 07:00 to 10:00“ or „from 2020-03-10T07:00:00+TZ to 2020-04-11T09:00:00+TZ“, we actually mean the set of all timestamps t such that

    \[givenDate\text{T07:00:00}+TZ \le t < givenDate\text{T10:00:00}+TZ\]

or

    \[\text{2020-03-10T07:00:00}+TZ \le t < \text{2020-04-11T09:00:00}+TZ,\]

respectively. It is important that we have to add one in case of date only (accuracy to one day) and we do not in case of finer grained date/time information. The question if the upper bound is included or not is not so important in our everyday life, but it proves that commonly the most useful way is not to include the upper bound. If you prefer to have all options, it is a better idea to employ an interval library, i.e. to find one or to write one. But for most cases it is enough to exclude the upper limit. This guarantees disjoint adjacent intervals which is usually what we want. I have seen people write code that adds 23:59:59.999 to a date and compares with \le instead of <, but this is an ugly hack that needs a lot of boiler plate code and a lot of time to understand. Use the exclusive upper limit, because we have it.

Now the requirement is to add one day to the upper limit to get from the human readable form of date-only ranges to something computers can work with. It is a good thing to agree on where this transformation is made. And to do it in such a way that it even behaves correctly on those dates where daylight saving starts or ends, because adding one day might actually mean „23 hours“ or „25 hours“. If we need to be really very accurate, sometimes switch seconds need to be added.

Just another issue has come up here. Local time is much harder than UTC. We need to work with local time on all kinds of user interfaces for humans, with very few exceptions like for pilots, who actually work with UTC. But local date and time is ambiguous for one hour every year and at least a bit special to handle for these two days where daylight saving starts and ends. Convert dates to UTC and work with that internally. And convert them to local date on all kinds of user interfaces, where it makes sense, including documents that are printed or provided as PDFs, for example. When we work with dates without time, we need to add one day to the upper limit and then round it to the nearest some-date\text{T00:00:00}+TZ for our timezone TZ or know when to add 23, 24 or 25 hours, respectively, which we do not want to know, but we need to use modern time libraries like the java.time.XXX stuff in Java, for example.

Working with date and time is hard. It is important to avoid making it harder than it needs to be. Here some recommendations:

  • Try to use UTC for the internal use of the software as much as possible
  • Use local date or time or date and time in all kinds of user interfaces (with few exceptions)
  • add one day to the upper limit and round it to the nearest midnight of local time exactly once in the stack
  • exclude the upper limit in date ranges
  • Use ISO-date formats even in the user interfaces, if possible

Links

Share Button