Perl 5 and Perl 6

We have now two Perls. Perl 5, which has been around for more than 20 years just as the „Perl programming language“ and Perl 6, which has been developed for more than a decade and of which now stable versions exist.

The fact, that they are both called „Perl“ is a bit misleading. They are two different and incompatible programming languages. But they share the same community. And Perl conferences are usually covering both languages.

So this rises the question about the differences or about which of the two Perls to use.

Here are some differences:

  • Perl 5 is well established and many people know it. Perl 6 has to be learned, even if it is relatively easy to learn for someone with a Perl 5 background.
  • Perl 5 runs about three times faster than Perl 6
  • Perl 6 programs are a bit shorter than Perl 5 programs
  • Perl 6 regular expressions are even better than Perl 5’s regular expressions
  • Perl 6 is more logical than Perl 5
  • Perl 6 uses by default better numerical types
  • Perl 6 makes it easier and more natural to do object oriented programming and functional programming
  • Perl 6 has come up with a useful approach for doing multithreadoing.
  • Perl 5 has so many cool libraries on CPAN, Perl 6 just a few.

Links:

Share Button

Swiss Perl Workshop 2017

I have attended the Swiss Perl Workshop.
We were a group of about 40 people, one track and some very interesting talks, including by Damian Conway.
I gave a regular talk and a lightning talk myself.
The content of my talk might go into another Blog post in the future.
The Perl programming language is still interesting, and of course it was covered in both variants: Perl 5 and Perl 6.
But many of the talks were about general issues like security and architecture and just exemplified by Perl.

The Video recording of talks was optional. Here are those that have been recorded and already uploaded: Youtube: Swiss Perl Workshop

Share Button

Shell Scripts

Shell scripts can be useful for writing small stuff like combining a few commands to pipes or doing a bit of „back ticking“. Even simple loops and if-conditions are possible. And if we want, it is almost a full programming language. A bit hard to tame, maybe, but quite a lot of stuff is possible. Those who like to know more about it may look into startup scripts of typical java software. Often a .bat and a .sh file are provided, where the right jvm is found, the classpath and the execution path and maybe some other environment are put together. In the end the .sh-file is quite a long and unreadable horror story and the .bat file is even much worse, because the cmd-language is just a lot more primitive and less capable and requires even worse hacks.

There are ways to make shell scripts more readable, which by themselves are truly admirable, but I think that route is wrong. We can learn all the Shell functionalities and understand bit by bit even more complex shell scripts, but I think for non trivial shell scripts it is time to switch to real programming languages instead. Scripting languages, of course, for example Perl, Ruby, Python or Lua. We may still execute „shell commands“, that are actually programs in /bin, /usr/bin or /usr/local/bin where they are powerful and more concise than writing purely in that programming language. But a magic for putting together a classpath is much cleaner in the Perl programming language than in pure bash (or worse cmd/bat).

This is of course another example of the Golden Hammer anti pattern. We should balance our tool box. Not add specific tools for making any minor task a bit easier on the expense of supporting one more tool, but keep a broad range of tools that in conjunction are very powerful. For example I would retire awk and sed and use either Perl or Ruby instead. We only have to keep them around because a lot of system tools that are just there still rely on them, but for a team I would deprecate awk and sed for new scripts or even for enhancing existing scripts. Bash would be ok only for small scripts, you can invent a line number or a maximum complexity, but for very short scripts I think bash is a legitimate tool.

Switch to Perl, Perl6, Ruby or… when you encounter any of the following:

  • The scripts is getting kind of long (>= 100 lines)
  • You find yourself modularizing it with functions
  • You find yourself using non trivial perl, ruby, sed or awk within the script, for example regex-stuff
  • The script need interaction
  • The scripts needs arrays, numbers or other types
  • More than one or two trivial if-statements or loop-statements are needed
  • Database access is done by the script (SQL or NoSQL)
  • String encoding becomes relevant
  • Quoting levels become an issue

This post was inspired by a similar post on the Isoblog by Kris. And the Shell Style Guide of Google is quite good especially in limiting the area where shell scripts are acceptable.

Share Button

Using non-ASCII-characters

Some of us still remember the times when it was recommended to avoid „special characters“ when writing on the computer. Some keyboards did not contain „Umlaut“-characters in Germany and we fell back to the ugly, but generally understandable way of replacing the German special characters like this: ä->ae, ö->oe, ü->ue, ß->sz or ß->ss. This was due to the lack of decent keyboards, decent entry methods, but also due to transmission methods that stripped the upper bit. It did happen in emails that they where „enhanced“ like this: ä->d, ö->v, ü->|,… So we had to know our ways and sometimes use ae oe ue ss. Similar issues applied to other languages like the Scandinavian languages, Spanish, French, Italian, Hungarian, Croatian, Slovenian, Slovak, Serbian, the Baltic languages, Esperanto,… in short to all languages that could regularly be written with the Latin alphabet but required some additional letters to be written properly.

When we wrote on paper, the requirement to write properly was more obvious, while email and other electronic communication via the internet of those could be explained as being something like short wave radio. It worked globally, but with some degradation of quality compared to other media of the time. So for example with TeX it was possible to write the German special letters (and others in a similar way) like this: ä->\“a, ö->\“o, ü->\“u, ß->\ss and later even like this ä->“a, ö->“o, ü->“u, ß->“s, which some people, including myself, even used for email and other electronic communication when the proper way was not possible. I wrote Emacs-Lisp-Software that could put my Emacs in a mode where the Umlaut keys actually produced these combination when being typed and I even figured out how to tweak an xterm window for that for the sake of using IRC where Umlaut letters did not at all work and quick online typing was the way to go, where the Umlaut-characters where automatically typed because I used 10-finger system typing words, not characters.

On the other hand TeX could be configured to process Umlaut characters properly (more or less, up to the issue of hyphenation) and I wrote Macros to do this and provided them to CTAN, the repository for TeX-related software in the internet around 1994 or so. Later a better and more generic solution become part of standard TeX and superseded this, which was a good thing. So TeX guys could type ä ö ü ß and I strongly recommended (and still recommend) to actually do so. It works, it is more user friendly and in the end the computer should adapt to the humans not vice versa.

The web could process Umlaut characters (and everything) from day one. The transfer was not an issue, it could handle binary data like images, so no stripping of high bit or so was happening and the umlaut characters just went through. For people having problems to find them on the keyboard, transcriptions like this were created: ä->ä ö->ö ü->ü ß->ß. I used them not knowing that they where not actually needed, but I relied on a Perl script to do the conversion so it was possible to actually type them properly.

Now some languages like Russian, Chinese, Greek, Japanese, Georgian, Thai, Korean use a totally different alphabet, so they had to solve this earlier, but others might know better how it was done in the early days. Probably it helped develop the technology. Even harder are languages like Arabic, Hebrew, and Farsi, that are written right to left. It is still ugly when editing a Wikipedia page and the switching between left-to-right and right-to-left occurs correctly, but magically and unexpected.

While ISO-8859-x seemed to solve the issue for most European languages and ISO-8859-1 became the de-facto standard in many areas, this was eventually a dead end, because only Unicode provided a way for hosting all live languages, which is what we eventually wanted, because even in quite closed environments excluding some combinations of languages in the same document will at some point of time prove to be a mistake. This has its issues. The most painful is that files and badly transmitted content do not have a clear information about the encoding attached to them. The same applies to Strings in some programming languages. We need to know from the context what it is. And now UTF-8 is becoming the predominant encoding, but in many areas ISO-8859-x or the weird cp1252 still prevail and when we get the encoding or the implicit conversions wrong, the Umlaut characters or whatever we have gets messed up. I would recommend to work carefully and keep IT-systems, configurations and programs well maintained and documented and to move to UTF-8 whenever possible. Falling back to ae oe ue for content is sixties- and seventies-technology.

Now I still do see an issue with names that are both technical and human readable like file names and names of attachments or variable names and function names in programming languages. While these do very often allow Umlaut characters, I would still prefer the ugly transcriptions in this area, because a mismatch of encoding becomes more annoying and ugly to handle than if it is just concerning text, where we as human readers are a bit more tolerant than computers, so tolerant that we would even read the ugly replacements of the old days.

But for content, we should insist on writing it with our alphabet. And move away from the ISO-8869-x encodings to UTF-8.

Links (this Blog):

Share Button

Collection Libraries

The standard libraries of newer programming languages usually contain so called collection libraries.

Collections can usually be Lists, Sets, Maps or specialization of these.

They cover quite a lot and we start seeing variants that are built on immutability and variants that allow mutability and as always the hybrid in Ruby, that combines these and does an irreversible transition using the freeze method.

There are some interesting collection types other than these, most often we find the Bag as fourth member in the club and then more complex and more specific collections.

What they all have in common is storing a finite number of elements in a certain structure.

Some languages like Clojure, Haskell or Perl 6 use so called lazy collections. That can mean that the members are not actually stored, but that there are methods to calculate them on demand. This allows for very interesting, expressive and beautiful programming, if used properly. Typically a Range of integers is provided as a lazy collection. But there can also be quite interesting lazy collections that are a little bit more sophisticated. Some allow random access to the nth element, like arrays or vectors or arrayLists, some only via iteration.

Interesting lazy collections could be multi-dimensional ranges. Assume we have an array of integers [n_0, n_1, n_2, ...., n_{m-1}] where even m is only known at runtime. Then it is a challenge that sometimes occurs to do a loop like this:

for (i_0 = 0\ldots n_0-1) {
for (i_1 = 0\ldots n_1-1) {
for (i_2 = 0\ldots n_2-1) {
\cdots
}
}
}

Which is kind of hard to write, because we cannot nest the loops if we do not know how deeply they need to be nested.

But if we have a multi-range collection and do something like this

Collection> mr = new MultiRange([n_0, n_1, n_2, ...., n_{m-1});
for (List li : mr) {
\cdots
}

and this beast becomes quite approachable.

A similar one, that is sometimes needed, is a lazy collection containing all the permutations of the n numbers \left\{0\ldots n-1\right\}. Again we only want to iterate over it and possibly not complete the iteration.

Another interesting idea is to perform the set operations like union, intersection and difference lazily. That means that we have a collection class Union, that implements the union of its members. Testing for membership is trivial, iteration does involve some additional structure to avoid duplicates. Intersection and difference are even easier, because they cannot produce duplicates.

What is also interesting is Sets built from intervals. Intervals can be defined in any base set {\mathrm T} (type) that supports comparisons like <, <=, ... We have

  • an open interval (a,b)=\left\{x \in {\mathrm T} : a < x < b\right\}
  • an left half-open interval (a,b]=\left\{x \in {\mathrm T} : a < x \le b\right\}
  • an right half-open interval [a,b)=\left\{x \in {\mathrm T} : a \le x < b\right\}
  • a closed interval [a,b]=\left\{x \in {\mathrm T} : a \le x \le b\right\}

Of these we can create unions and intersections and in the end can always reduce this to unions of intervals. Adjacent intervals can sometimes be merged, overlapping intervals always. If {\mathrm T} supports the concept of successors, than even closed intervals with different limits can be discovered to be adjacent, for example [1,2] and [3,4] for {\mathrm T}={\Bbb Z}. Often this cannot be assumed, for example if we are working with rational numbers with arbitrarily long integers as numerator and denominator.

So these are three concepts to get memory saving, easy to use lazy collections.

Share Button

Alpine Perl Workshop

On 2016-09-02 and 2016-09-03 I was able to visit the Alpine Perl Workshop. This was a Perl conference with around 50 participants, among them core members of the Perl community. We had mostly one track, so the documented information about the talks that were given is actually quite closely correlated to the list of talks that I have actually visited.

We had quite a diverse set of talks about technical issues but also about the role of Perl programming language in projects and in general. The speeches were in English and German…

Perl 6 is now a reality. It can be used together with Perl 5, there are ways to embed them within each other and they seem to work reasonably well. This fills some of the gaps of Perl 5, since the set of modules is by far not as complete as for Perl 5.

Perl 5 has since quite a few years established a time boxed release schedule. Each year they ship a new major release. The previous two releases are supported for bugfixes. The danger that major Linux distributions remain on older releases has been banned. Python 3 has been released in 2008 and still in 2016 Python 2.7 is what is usually used and shipped with major Linux distributions. It looks like Perl 5 is there to stay, not be replaced by Perl 6, which is a quite different language that just shares the name and the community. But the recent versions are actually adopted and the incompatible changes are so little that they do not hurt too much, usually. An advantage of Perl is the CPAN repository for libraries. It is possible to test new versions against a ton of such libraries and to find out, where it might break or even providing fixes for the library.

An interesting issue is testing of software. For continuous integration we can now find servers and they will run against a configurable set of Perl versions. But using different Linux distributions or even non-Linux-systems becomes a more elaborate issue. People willing to test new versions of Perl or of libraries on exotic hardware and OS are still welcome and often they discover a weakness that might be of interest even for the mainstream platforms in the long run.

I will leave it with this. You can find more information in the web site of the conference.

And some of the talks are on youtube already.

It was fun to go there, I learned a lot and met nice people. It would be great to be able to visit a similar event again…

Share Button

Error Messages in Web Applications

In Applications that are used by non-technical end users, which are these days very often web applications, we have to deal with the issue that an unexpected error occurs.

There are the two extremes:

We can just show a screen telling in a nice design that the application does not work and that is all. In terms of user experience that is a good approach and often the recommended way. But there is no reasonable path to recover, other than retrying.

The other extreme, which we actually see quite a lot, is this:

Stacktrace of exception shown to the user in the browser

Stack-trace of exception shown to the user in the browser

That may be ok in some rare case, during beta test or for an internal application that is only used by the development team.

This was a productive application, apparently developed in some MS-dotnet-language, most likely C# or ASP-dotnet, but the same issues are valid for PHP, Java/JSF/JEE, Ruby, Perl, Perl 6, Python, Scala, Clojure, C, C++ and whatever you like.

Imagine Google would display such information because of a bug in their search engine and there were a contact mail address or phone number and millions of people would call their call center every day with such bugs… In this case it is probably better to hide the exception from the users and probably write the software well enough that such issues do not occur too often, because too many people rely on this every day.

The other side is, that there are probably some log files and the exceptions can be found there. Now the log files can be monitored manually, which becomes a bad idea as soon as there is actually one or more full time user, because the logs become huge, but tools like grep or simple Perl scripts or tools like splunk can help to deal with this. Since applications tend to be distributed, we have to deal with the fact that the same single instance of error and even more so the same kind of error will occur in many different logs and we need to match them to make sense of this and to understand the problem.

Reading logs and especially stack traces is especially hard in framework worlds, where there are hundreds of levels in the stacktraces coming from the frameworks. Often this is were the error actually is, but even more often it is the application and anyway we are more comfortable doing an workaround rather than fixing the framework, which we could do at least if it is open source. And we should actually send a bug report with as much information as possible, but avoid interpretation on what the bug could be. This can be added as a comment to the bug report, maybe even with a hint how to fix it or a patch, but it should be kept separate.

Anyway, usually we assume the error is not in the framework and this is usually true. So it is a challenge to read this and again tools or scripts should be used to do this. It is also possible and usually necessary to find occurrences of the same kind of error. Often this is hard, because the root cause does not manifest itself and we get consequential errors much later. That is why we are IT experts, so we can find even such hidden bugs. 🙂

Now it is possible to make life a little bit easier. We can give exceptions unique IDs, that can be something like this:
eeeeee-cccc-hhhh-tttttttt-ttttttttt-nnnnnnnnn
where each block consists of digits and upper case letters.
eeeeee encodes the type of the exception, for example by skipping all lower case letters. It does not have to be perfect, just give a hint.
cccc is an error-code if this is used. If and when exceptions should include error codes is an interesting issue by itself.
hhhh is encodes the host where it originally occurred.
tttt… is the time-stamp. If we use msec since 1970 and use base 36 to encode it, it can be shorter.
nnnn… is a number from some counter.
This is just an idea. You could use UUIDs or do something along these lines, but different. Using base 36 is actually a good idea, it makes these codes shorter.

Anyway, having such an ID in each exception in the log allows more easily to find which are different log entries for exactly the same exception. And yes, they do occur and that is OK. Such a code could also be displayed on the screen of the end user if it is an application where users actually have access and contact to some support team. Then they can read it. Again, aim to make it short and unique, but don’t make the whole mechanism too fragile, otherwise we deal with finding the exceptions in the exception handling framework itself and that is not desirable.

What is important: We should actually fix bugs, when we find them. Free some time for it, write unit tests that prove the bug, fix it and make sure it does not come up again by retaining the unit test. Yes, it is work but it is worth it. If the bug justifies an immediate deployment or if we have to wait for a deployment window is another issue, but it should be fixed at least with the next deployment. If regular deployments should be done twice a year or daily is an interesting issue by itself. There should always be ways to do an „emergency deployment“ in case of a critical bug, but it is good to have strong regular mechanism so the emergency deployment can remain an exception.

Share Button

Integers in Perl 6

The language Perl 6 has been announced to be production ready by the beginning of this year. Its implementation is Rakudo, while the Perl 6 programming language itself is an abstract language definition that allows any language implementation that passes the test suite to call itself an Perl 6 implementation. The idea is not totally new, we see the Ruby language being implemented more than once (Ruby, JRuby, Rubinius, IronRuby), but we can also learn from the Ruby guys that it is a challenge to keep this up to date and eventually it is likely that one implementation will fall back or go its own way at some point of time.

Perl 6 is also called „Perl“ as part of its name, but quite different from its sister language Perl, which is sometimes called „Perl 5“ to emphasize the distinction, so it is absolutely necessary to call it „Perl 6“ or maybe „Rakudo“, but not just „Perl“.

Even though many things can be written in a similar way, a major change to Perl 5 is the way of dealing with numeric types. You can find an article describing Numeric Types in Perl [5]. So now we will see how to do the same things in Perl 6.

Dealing with numeric types in Perl 6 is neither like in Perl 5 nor like what we are used to in many other languages.

So when we just use numbers in a naïve way, we get long integers automatically:

my $f = 2_000_000_000;
my $p = 1;
loop (my Int $i = 0; $i < 10; $i++) {
    say($i, " ", $p);
    $p *= $f;
}

creates this output:

0 1
1 2000000000
2 4000000000000000000
3 8000000000000000000000000000
4 16000000000000000000000000000000000000
5 32000000000000000000000000000000000000000000000
6 64000000000000000000000000000000000000000000000000000000
7 128000000000000000000000000000000000000000000000000000000000000000
8 256000000000000000000000000000000000000000000000000000000000000000000000000
9 512000000000000000000000000000000000000000000000000000000000000000000000000000000000

This is an nice default, similar to what Ruby, Clojure and many other Lisps use, but most languages have a made a choice that is weird for application development.

Now we can also statically type this:

my Int $f = 2_000_000_000;
my Int $p = 1;
loop (my Int $i = 0; $i < 10; $i++) {
    say($i, " ", $p);
    $p *= $f;
}

and we get the exact same result:

0 1
1 2000000000
2 4000000000000000000
3 8000000000000000000000000000
4 16000000000000000000000000000000000000
5 32000000000000000000000000000000000000000000000
6 64000000000000000000000000000000000000000000000000000000
7 128000000000000000000000000000000000000000000000000000000000000000
8 256000000000000000000000000000000000000000000000000000000000000000000000000
9 512000000000000000000000000000000000000000000000000000000000000000000000000000000000

Now we can actually use low-level machine integers which do an arithmetic modulo powers of 2, usually 2^{32} or 2^{64}:

my int $f = 2_000_000_000;
my int $p = 1;
loop (my Int $i = 0; $i < 10; $i++) {
    say($i, " ", $p);
    $p *= $f;
}

and we get the same kind of results that we would get in java or C with (signed) long, if we are on a typical 64-bit environment:

0 1
1 2000000000
2 4000000000000000000
3 -106958398427234304
4 3799332742966018048
5 7229403301836488704
6 -8070450532247928832
7 0
8 0
9 0

We can try it in Java. I was lazy and changed as little as possible and the "$" is allowed as part of the variable name by the language, but of course not by the coding standards:

public class JavaInt {
    public static void main(String[] args) {
        long $f = 2_000_000_000;
        long $p = 1;
        for (int $i = 0; $i < 10; $i++) {
            System.out.println($i + " " +  $p);
            $p *= $f;
        }
    }
}

We get this output:

0 1
1 2000000000
2 4000000000000000000
3 -106958398427234304
4 3799332742966018048
5 7229403301836488704
6 -8070450532247928832
7 0
8 0
9 0

And we see, with C# we get the same result:

using System;

public class CsInt {

    public static void Main(string[] args) {
        long f = 2000000000;
        long p = 1;
        for (int i = 0; i < 10; i++) {
            Console.WriteLine(i + " " +  p);
            p *= f;
        }
    }
}

gives us:

0 1
1 2000000000
2 4000000000000000000
3 -106958398427234304
4 3799332742966018048
5 7229403301836488704
6 -8070450532247928832
7 0
8 0
9 0

If you like, you can try the same in C using signed long long (or whatever is 64 bits), and you will get the exact same result.

Now we can simulate this in Perl 6 also using Int, to understand what int is really doing to us. The idea has already been shown with Ruby before:

my Int $MODULUS = 0x10000000000000000;
my Int $LIMIT   =  0x8000000000000000;
sub mul($x, $y) {
    my Int $result = ($x * $y) % $MODULUS;
    if ($result >= $LIMIT) {
        $result -= $MODULUS;
    } elsif ($result < - $LIMIT) {
        $result += $MODULUS;
    }
    $result;
}

my Int $f = 2_000_000_000;
my Int $p = 1;
loop (my Int $i = 0; $i < 10; $i++) {
    say($i, " ", $p);
    $p = mul($p, $f);
}

and we get the same again:

0 1
1 2000000000
2 4000000000000000000
3 -106958398427234304
4 3799332742966018048
5 7229403301836488704
6 -8070450532247928832
7 0
8 0
9 0

The good thing is that the default has been chosen correctly as Int and that Int allows easily to do integer arithmetic with arbitrary precision.

Now the question is, how we actually get floating point numbers. This will be covered in another blog posting, because it is a longer story of its own interest.

Share Button

Virtual machines

We all know that Java uses a „virtual machine“ that is it simulates a non-existing hardware which is the same independent of the real hardware, thus helping to achieve the well known platform independence of Java. Btw. this is not about virtualization like VMWare, VirtualBox, Qemu, Xen, Docker and similar tools, but about byte code interpreters like the Java-VM.

We tend to believe that this is the major innovation of Java, but actually the concept of virtual machines is very old. Lisp, UCSD-Pascal, Eumel/Elan, the Perl programming language and many other systems have used this concept long before Java. The Java guys have been good in selling this and it was possible to get this really to the mainstream when Java came out. The Java guys deserve the credit for bringing this in the right time and bringing it to the main stream.

Earlier implementations where kind of cool, but the virtual machine technology and the hardware were to slow, so that they were not really attractive, at least not for high performance applications, which are now actually a domain of Java and other JVM languages. Some suggest that Java or other efficient JVM languages like Scala would run even faster than C++. While it may be true to show this in examples, and the hotspot optimization gives some theoretical evidence how optimization that takes place during run time can be better than static optimization at compile time, I do not generally trust this. I doubt that well written C-code for an application that is adequate for both C and Java will be outperformed by Java. But we have to take two more aspects into account, which tend to be considered kind of unlimited for many such comparisons to make them possible at all.

The JVM has two weaknesses in terms of performance. The start-up time is relatively long. This is addressed in those comparisons, because the claim to be fast is only maintained for long running server applications, where start-up time is not relevant. The hotspot optimization requires anyway a long running application in order to show its advantages. Another aspect that is very relevant is that Java uses a lot of memory. I do not really know why, because more high level languages like Perl or Ruby get along with less memory, but experience shows that this is true. So if we have a budget X to buy hardware and then put software written in C on it, we can just afford to buy more CPUs because we save on the memory or we can make use of the memory that the JVM would otherwise just use up to make our application faster. When we view the achievable performance with a given hardware budget, I am quite sure that well written C outperforms well written Java.

The other aspect is in favor of Java. We have implicitly assumed until now that the budget for development is unlimited. In practice that is not the case. While we fight with interesting, but time consuming low level issues in C, we already get work done in Java. A useful application in Java is usually finished faster than in C, again if it is in a domain that can reasonably be addressed with either of the two languages and if we do not get lost in the framework world. So if the Java application is good enough in terms of performance, which it often is, even for very performance critical applications, then we might be better off using Java instead of C to get the job done faster and to have time for optimization, documentation, testing, unit testing.. Yes, I am in a perfect world now, but we should always aim for that. You could argue that the same argument is valid in terms of using a more high-level language than Java, like Ruby, Perl, Perl 6, Clojure, Scala, F#,… I’ll leave this argument to other articles in the future and in the past.

What Java has really been good at is bringing the VM technology to a level that allows real world high performance server application and bringing it to the main stream.
That is already a great achievement. Interestingly there have never been serious and successful efforts to actually build the JavaVM as hardware CPU and put that as a co-processor into common PCs or servers. It would have been an issue with the upgrade to Java8, because that was an incompatible change, but other than that the JavaVM remained pretty stable. As we see the hotspot optimization is now so good that the urge for such a hardware is not so strong.

Now the JVM has been built around the Java language, which was quite legitimate, because that was the only goal in the beginning. It is even started using the command line tool java (or sometimes javaw on MS-Windows 32/64 systems). The success of Java made the JVM wide spread and efficient, so it became attractive to run other languages on it. There are more than 100 languages on the JVM. Most of them are not very relevant. A couple of them are part of the Java world, because they are or used to be specific micro languages closely related to java to achieve certain goals in the JEE-world, like the now almost obsolete JSP, JavaFX, .

Relevant languages are Scala, Clojure, JRuby, Groovy and JavaScript. I am not sure about Jython, Ceylon and Kotlin. There are interesting ideas coming up here and there like running Haskell under the name Frege on the JVM. And I would love to see a language that just adds operator overloading and provides some preprocessor to achieve this by translating for example „(+)“ in infix syntax to „.add(..)“ mainstream, to allow seriously using numeric types in Java.

Now Perl 6 started its development around 2000. They were at that time assuming that the JVM is not a good target for a dynamic language to achieve good performance. So they started developing Parrot as their own VM. The goal was to share Parrot between many dynamic languages like Ruby, Python, Scheme and Perl 6, which would have allowed inter-language inter-operation to be more easily achievable and using libraries from one of these languages in one of the others. I would not have been trivial, because I am quite sure that we would have come across issues that each language has another set of basic types, so strings and numbers would have to be converted to the strings and numbers of the library language when calling, but it would have been interesting.

In the end parrot was a very interesting project, theoretically very sound and it looked like for example the Ruby guys went for it even faster than the the Perl guys, resulting in an implementation called cardinal. But the relevant Perl 6 implementation, rakudo, eventually went for their own VM, Moar. Ruby also did itself a new better VM- Many other language, including Ruby and JavaScript also went for the JVM, at least as one implementation variant. Eventually the JVM proved to be successful even in this area. The argument to start parrot in the first place was that the JVM is not good for dynamic languages. I believe that this was true around 2000. But the JVM has vastly improved since then, even resulting in Java being a serious alternative to C for many high performance server applications. And it has been improved for dynamic languages, mostly by adding the „invoke_dynamic“-feature, that also proved to be useful for implementing Java 8 lambdas. The experience in transforming and executing dynamic languages to the JVM has grown. So in the end parrot has become kind of obsolete and seems to be maintained, but hardly used for any mainstream projects. In the end we have Perl 6 now and Parrot was an important stepping stone on this path, even if it becomes obsolete. The question of interoperability between different scripting languages remains interesting…

Share Button

Numeric types in Perl

Dealing with numeric types in Perl is not as strait-forward as in other programming languages. We can use „scalars“ out of the box, but then we get floating point numbers, more precisely what is called „double“ in most programming languages. This is kind of ok for trivial programs, but we should make a deliberate choice on what to use.

Actually the Perl programming language gives us (at least) two more choices. We can use 64-bit integers (or 32-bit on some platforms) by just adding

use integer;

somewhere in the beginning of the file. This causes Perl to work mostly with integer instead of floating point numbers, but the rules for this are not so obvious. You may read about them in the official documentation. Or find another explanation or one more.

Now we do want to control this on a more fine granular basis than the whole program. There may be legitimate programs that use both floating point and integers. This can be achieved in Perl as well. We can turn this off using:

no integer;

More likely we want to use another approach, that looks more natural and more robust most of the time. We just have to use blocks:

#!/usr/bin/perl -w                                  
                                                                                        
use strict;                                                                             
                                                                                        
my $f1 = 2_000_000_000;                                                                
my $f2 = $f1 * $f1;                                                                  
my $f3 = $f1 * $f2;                                                                  
my $f4 = $f1 * $f3;                                                                  
my $f5 = $f1 * $f4;                                                                  
                                                                                        
my @f = (1, $f1, $f2, $f3, $f4, $f5);                                              
for (my $i = 0; $i <= 5; $i++) {                                                          print($i, " ", $f[$i], "\n");                                                     }                                                                                                                                                                                 my $n2x;                                                                                {                                                                                            use integer;                                                                             my $n1 = 2_000_000_000;                                                                 my $n2 = $n1 * $n1;                                                                   my $n3 = $n1 * $n2;                                                                   my $n4 = $n1 * $n3;                                                                   my $n5 = $n1 * $n4;                                                                                                                                                            my @n = (1, $n1, $n2, $n3, $n4, $n5);                                               for (my $i = 0; $i <= 5; $i++) {                                                          print($i, " ", $n[$i], "\n");                                                     }                                                                                        $n2x = $n2;                                                                        }                                                                                                                                                                                 print "n2x=$n2x\n";                                                                                                                                                              my $g1 = 2_000_000_000;                                                                 my $g2 = $g1 * $g1;                                                                   my $g3 = $g1 * $g2;                                                                   my $g4 = $g1 * $g3;                                                                   my $g5 = $g1 * $g4;                                                                                                                                                            my @g = (1, $g1, $g2, $g3, $g4, $g5);                                               for (my $i = 0; $i <= 5; $i++) {                                                          print($i, " ", $g[$i], "\n");                                                     }                                                                                       
                                                                                 
This will output:                                                                       
                                                                                  
0 1                                                                                     
1 2000000000                                                                            
2 4000000000000000000                                                                   
3 8e+27                                                                                 
4 1.6e+37                                                                               
5 3.2e+46                                                                               
0 1                                                                                     
1 2000000000                                                                            
2 4000000000000000000                                                                   
3 -106958398427234304                                                                   
4 3799332742966018048                                                                   
5 7229403301836488704                                                                   
n2x=4000000000000000000                                                                 

So we see that the integer mode is constrained to the block. And we see that the results for 3, 4 and 5 went wrong...

So it may be a little bit tricky to do this, but we can. These integers have the same flaw as integers in many popular programming languages, because they silently overflow by taking the remainder modulo 2^{64} that lies in the interval [-2^{63}, 2^{63}-1] or modulo 2^{32} that lies in the interval [-2^{31}, 2^{31}-1]. I do not think that is really what we usually want and just hoping that our numbers remain within the safe range may go well in the 64-bit-case, but we have to be sure and explain this in a comment, when we work like this. Usually we do not want to think about this and spending a few extra bits costs less than hunting obscure bugs where everything looks so correct.

Our friend is

use bigint;

which switches to arbitrary precision integers.

#!/usr/bin/perl -w                             
                                                                                 
use strict;                                                                      
                                                                                 
my $f1 = 2_000_000_000;                                                         
my $f2 = $f1 * $f1;                                                           
my $f3 = $f1 * $f2;                                                           
my $f4 = $f1 * $f3;                                                           
my $f5 = $f1 * $f4;                                                           
                                                                                 
my @f = (1, $f1, $f2, $f3, $f4, $f5);                                       
for (my $i = 0; $i <= 5; $i++) {                                                   print($i, " ", $f[$i], "\n");                                              }                                                                                                                                                                   my $b2x;                                                                         {                                                                                     use bigint;                                                                       my $b1 = 2_000_000_000;                                                          my $b2 = $b1 * $b1;                                                            my $b3 = $b1 * $b2;                                                            my $b4 = $b1 * $b3;                                                            my $b5 = $b1 * $b4;                                                                                                                                              my @b = (1, $b1, $b2, $b3, $b4, $b5);                                        for (my $i = 0; $i <= 5; $i++) {                                                   print($i, " ", $b[$i], "\n");                                              }                                                                                 $b2x = $b2;                                                                 }                                                                                                                                                                   print "b2x=$b2x\n";                                                                                                                                                my $g1 = 2_000_000_000;                                                          my $g2 = $g1 * $g1;                                                            my $g3 = $g1 * $g2;                                                            my $g4 = $g1 * $g3;                                                            my $g5 = $g1 * $g4;                                                                                                                                              my @g = (1, $g1, $g2, $g3, $g4, $g5);                                        for (my $i = 0; $i <= 5; $i++) {                                                   print($i, " ", $g[$i], "\n");                                              }                                                                                

This gives us the output:

0 1
1 2000000000
2 4000000000000000000
3 8e+27
4 1.6e+37
5 3.2e+46
0 1
1 2000000000
2 4000000000000000000
3 8000000000000000000000000000
4 16000000000000000000000000000000000000
5 32000000000000000000000000000000000000000000000
b2x=4000000000000000000
0 1
1 2000000000
2 4000000000000000000
3 8e+27
4 1.6e+37
5 3.2e+46

So it is again constrained to the block, but it allows us to use arbitrary lengths of integers, as long as our memory is sufficient.

A less commonly used, but interesting approach is to work with rational numbers:

#!/usr/bin/perl -w                                      
                                                                                       
use strict;                                                                            
use bigrat;                                                                            
                                                                                       
my $x = 3/4;                                                                          
my $y = 4/5;                                                                          
my $z = 5/6;                                                                          
print("x=$x y=$y z=$z\n");                                                          
                                                                                       
my $sum = $x+$y+$z;                                                                
my $diff = $x - $y;                                                                 
my $prod = $x * $x * $z;                                                           
my $quot = $x / $y;                                                                 
print("sum=$sum diff=$diff prod=$prod quot=$quot\n");                              

This gives us:

x=3/4 y=4/5 z=5/6
sum=143/60 diff=-1/20 prod=15/32 quot=15/16

That is kind of cool...

There is also something like Math::BigFloat which can be used most easily by having

use bignum;

Find the documentation about "use bignum" and about Math::BigFloat...

You will find more numeric types, like Math::Decimal and Math::Complex.

While I would say that using good numeric types in Perl is not quite as easy as it should be, at least if we want to mix them, at least we have the means to use the adequate numeric types. And it is way better than in Java.

Share Button