Devoxx 2015

This year I have had the pleasure to visit the Devoxx-Conference in Antwerp in Belgium.

I have visited the following talks:

There is a lot to write about this and I will get back to specific interesting topics in the future…

My previous visits in 2012, 2013 (part 1), 2013 (part 2), and 2014 have their own blog articles.

Share Button

Creating Unique Numbers

Many software systems rely on some kind of unique numbers. Uniqueness is always a question in what universe this uniqueness is required. We do see the different kinds of universes in the case of the IP-addresses. In theory they are world wide unique. In practice we have mechanisms in place like NAT, that use certain dedicated IP-ranges for an intranet and map them to publicly available addresses for internet traffic outside the intranet. So we already have two kinds of universes… This is typical.

A common case are database IDs, that are often used as primary keys in databases. I do challenge this by the question, if there is a natural key already available in the data, which might make this db internal id unnecessary, but more often than not DB tables do have these ID columns as primary keys. They have to be unique within the DB table. Which can mean across several servers, because somewhat distributed databases are common.

Other examples are message ids of emails. They may be quite long, a short line of text is acceptable and they should be world wide unique. Combining the fully qualified publicly accessible hostname and a time stamp and a counter is usually good enough. They look like this 5AE.630049D.2EA050907@gmx.net or 5AE.630049D.2EA050907@ms-93424.gmx.net, where the part after the „@“ stands for the mail server and the part in front is a unique id for the message created by the mail server and looking slightly different depending on its software.

Often the length is not arbitrary and the UUIDs are a good compromise for this. They have 128 bits, some of which are used to specify the type of UUID. One type combines a hostname and timestamp like the message ids. But since in some contexts the generation of the UUID should not reveal the hostname and the time, some implementations prefer random UUIDs. It can very well be argued that with good random numbers duplicates of such random UUIDs are less likely than events that would bother us much more than having a duplicate. For randomly generated UUIDs six bits are used up for expressing the version and the fact that it is a random UUID, leaving 122 bits, which is a total of 2^{122} = 5316911983139663491615228241121378304 \approx 5.317\cdot 10^{36} different possible values. Generating billions of UUIDs for many years leaves the risk of creating duplicates acceptably low.

But the issue of the quality of the random generator and the issue of potential duplicates remain something that needs attention. So it is worth to consider the path of using the host and timestamp. Now the host can not be identified by an IP address or fully qualified domain and host name, because these tend to be either too long or not unique enough. The MAC address used to be a good possibility. But I would not be so sure about this any more. Most server systems are virtualized these days and the MAC address is configured by software, so duplicates can occur accidentally or deliberately. Using a time stamp by itself can be a problem too because sooner or later it will happen that two IDs are generated at the same time, within the given granularity. Machines have several processors, run several processes and several threads within each process.

So achieving the goal of a real globally unique UUID value remains a difficult question. Following a more local uniqueness within an application or application landscape might be more reasonable. The number of servers may be large and may vary, but it should be possible to assign numbers to virtual or real servers. In case there are multiple different processes on the same (virtual) machine to assign numbers to these as well. This can be used as a replacement for the host part of the UUID. If it does not use up all the bits, these can be filled up with random numbers.

Timestamps can be obtained easily and relatively reliably for a granularity of msec (Milliseconds). The UUID timestamp allows up to a granularity of 100 nsec, which is 10’000 sub divisions of the msec. A thread safe counter that may reset during program start or with its overflow can be used to count and its positive remainder modulo 10’000 can be used instead of the 100 nsec part in conjunction with the msec.

Often a uniqueness within an application or application landscape can be achieved by using some kind of unique counter. The best choice is often the sequence of a database, which is good in this task and well tested. It is not too hard to create such a functionality. Handling of multiple processes and threads needs to be addressed. For persistence, it can be an improvement to reserve blocks of 100 or 1000 numbers and persist less often. This will result in skipping some numbers when restarting, but otherwise work out well. The same idea can also be applied for a distributed unique number generator, where each instances gets ranges from some master generator and gets new ranges, when they are used up.

Such unique numbers or identifiers are needed quite often. It is usually best to use something that works reliably, like the DB sequence. But it can be developed with adequate care, if there is a need. Testing and especially automated testing is off course very important, but only sufficient if the whole implementation is conceptionally sound and robust.

Share Button

Changing of Keyboard Mappings with xmodmap

Deutsch

Introduction

When running a Linux system in its graphical mode, keyboard mappings can be changed by using xmodmap.
Each key on the keyboard has a „keycode“ which can be found out by looking at the output of
xmodmap -pke > current-keyboard
or by running
xev
for trying out the keys.

I am using a modified German keyboard, but off course the ideas can be adapted to any setup.

Given setting as a Basis

You can start with any keyboard, for example with the German keyboard with no dead keys, which is often useful as a starting point. I prefer to modify it a little bit. This allows me to support more languages like Swedish, Norwegian, Danish, Dutch, Spanish and Esperanto. Russian is an issue that I will address in another article. On the other hand it is a good idea to have secondary positions for some symbols that are on keys that might be missing on some physical layouts, like „<", "|" and ">“ on the German keyboard, whose key is just not present in the American keyboard. The third idea is to have two Altgr-keys, because many important symbols are just accessible in conjunction with Altgr and thus easier if there are two Altgr-keys like there are two Shift-keys.

Special characters for Esperanto

For Esperanto (Esperanto explained in Esperanto) the latin alphabet with its 26 letters is needed, even though some of them are never used. And on top of that the following letters are needed as well:
ĉ Ĉ ĝ Ĝ ĵ Ĵ ĥ Ĥ ŝ Ŝ ŭ Ŭ
Unfortunately they have not reused the letters commonly used in Slavic languages and present on many international setups:
č Č ž Ž š Š
but Unicode covers it all and as long as the keyboard does not need to support Slavic or Baltic or Sami languages simultaneously with Esperanto, things should be fine.

A reasonable approach is tu put these symbols on Altgr-C, Altgr-G, Altgr-J, Altgr-H, Altgr-S and Altgr-U.

This can be achieved easily:
Create a file
.xmodmap-ori
by

xmodmap -pke > .xmodmap-ori

.
And a script $HOME/bin/orikb:

#!/bin/sh
xmodmap $HOME/.xmodmap-ori

Look up where the letters S, G, H J C and U are positioned on the keyboard in the xmodmap-ori-file.
Now create a file
.xmodmap-esperanto
using the following command

egrep 'keycode *[0-9]+ *= *[SsGgCcJjHhUu] ' < .xmodmap-ori

and edit it. Leave the part keycode = intact and change it to something like this:

keycode 39 = s S s S scircumflex Scircumflex
keycode 42 = g G g G gcircumflex Gcircumflex
keycode 43 = h H h H hcircumflex Hcircumflex
keycode 44 = j J j J jcircumflex Jcircumflex
keycode 54 = c C c C ccircumflex Ccircumflex
keycode 30 = u U u U ubreve Ubreve

The numbers between "keycode" and "=" could be different on your machine, but the rest should be like that.

Now create two scripts
$HOME/bin/eokb

#!/bin/sh
xmodmap $HOME/.xmodmap-esperanto

and
$HOME/bin/orikb

#!/bin/sh
xmodmap $HOME/.xmodmap-ori

.

Do not forgot to do the

chmod +x $HOME/bin/eokb $HOME/bin/orikb

for your scripts... 🙂

Now you can use eokb for enabling the Esperanto keys and orikb to return to your original setting.
The Esperanto keys will be accessible by using Altgr and the Latin letter they are derived from.

Other language specific characters

In a similar way you can have other characters
å and Å on the A,
ë and Ë on the E
ï and Ï on the I
ø and Ø on the O
æ and Æ on the Ä
ÿ and Ÿ on the Y
ñ and Ñ on the N
< on the , > on the .
| on the -

This allows to write a lot of languages. See what works for you...

Remaining Issues

For Russian I have bought a physical Cyrillic keyboard and I am using it with the setup that is printed on the keys. I might write about this another time.

Generally I like to have two Altgr keys and I can very well live without a windows key. I might write about this another time as well.

Btw. this article has been written for Linux, but it works perfectly well with any other Unix-like system, as long as X11 is used, just think of Aix, HPUX, BSD, Solaris and likewise systems. But Linux is by far the most common unix like desktop system these days.

Here is another approach:
xkbmap

Share Button

Tastaturbelegung mit xmodmap ändern (Linux/X11)

English

Einleitung

Wenn ein Linuxsystem im grafischen Modus läuft, kann man die Tastaturbelegung mittels xmodmap ändern.
Jede Taste hat einen sogenannten „keycode“, den man leicht ermitteln kann, indem man sich die jetzige Belegung mittels
xmodmap -pke > current-keyboard
ausgeben lässt oder mit
xev
die Tasten ausprobiert. Ich benutze gerne eine deutsche Tastatur und tue das unabhängig davon, was auf den Tasten so aufgedruckt ist. In der Schweiz ist leider eine andere Belegung üblich, die minimale Abweichungen aufweist. Angeblich (inoffiziell) um die Schweizer Tastaturhersteller zu schützen, weil man vergass, dass asiatische Hersteller diese Hürde mühelos überspringen können. Ageblich (offiziell) um einen Kompromiss für die verschiedenen Sprachregionen der Schweiz zu schaffen und die Unterschiede minimal zu halten. Die Schweizer Tastatur ist aber unpraktisch, weil das „ß“ und die großen Umlaute fehlen.

Vorgegebene Einstellung als Basis

Nun kann man unter Linux eine Deutsche Tastatur mit „no dead keys“ wählen, was normalerweise sinnvoll ist. Oder sagen wir eine sinnvolle Ausgangsbasis. Ich finde es aber praktisch, diese noch etwas zu modifizieren. Einerseits können so mehr Sprachen unterstützt werden, wobei für mich die skandinavischen Sprachen, Spanisch, Russisch und Esperanto im Moment am interessantesten sind. Andererseits finde ich es praktisch, die Zeichen, die auf Tasten liegen, die bei manchen physikalischen Layouts fehlen, z.B. die drei Zeichen „<", "|", ">„, noch zusätzlich woanders hin zu legen, damit man sie auch dann noch hat. Drittens finde ich es schön, wenn man zwei Altgr-Tasten hat, weil viele wichtige Zeichen nur über Altgr erreichbar sind und sich besser greifen lassen, wenn es ein rechtes und linkes Altgr gibt.

Zeichen für Esperanto

Für Esperanto (Esperanto auf Esperanto erklärt) braucht man das übliche lateinische Alphabet mit 26 Buchstaben, wovon einige allerdings nicht verwendet werden, und die folgenden zusätzlichen Buchstaben:
ĉ Ĉ ĝ Ĝ ĵ Ĵ ĥ Ĥ ŝ Ŝ ŭ Ŭ
Leider hat man sich nicht an die in einigen slawischen Sprachen verwendeten zusätzliche Buchstaben
č Č ž Ž š Š
gehalten, aber solange man nicht außerdem slawische oder baltische Sprachen mit lateinischer Schrift parallel verwendet, ist das kein Problem.

Die sinnvollste Lösung scheint mir, die jeweiligen Zeichen auf Altgr-C, Altgr-G, Altgr-J, Altgr-H, Altgr-S und Altgr-U zu legen.

Das lässt sich recht einfach erreichen:
Man legt eine Datei
.xmodmap-ori
mittels

xmodmap -pke > .xmodmap-ori

an.
Und ein Skript $HOME/bin/orikb:

#!/bin/sh
xmodmap $HOME/.xmodmap-ori

Man legt danach eine Datei
.xmodmap-esperanto
an, in die man die folgenden Zeilen einträgt:

keycode 39 = s S s S scircumflex Scircumflex
keycode 42 = g G g G gcircumflex Gcircumflex
keycode 43 = h H h H hcircumflex Hcircumflex
keycode 44 = j J j J jcircumflex Jcircumflex
keycode 54 = c C c C ccircumflex Ccircumflex
keycode 30 = u U u U ubreve Ubreve

Und ein script in
$HOME/bin/eokb

#!/bin/sh
xmodmap $HOME/.xmodmap-esperanto

Natürlich chmod +x für die Skripte nicht vergessen… 🙂

Nun kann man mittels eokb die Esperanto-Zeichen auf die Tastatur bekommen und mittels orikb den Originalzustand wiederherstellen.

Andere sprachspezifische Sonderzeichen

Auf ähnliche Weise lassen sich natürlich auch andere Zeichen auf die Tastatur bringen. Ich habe z.B. noch
å und Å auf dem A,
ë und Ë auf dem E
ï und Ï auf dem I
ø und Ø auf dem O
æ und Æ auf dem Ä
ÿ und Ÿ auf dem Y
ñ und Ñ auf dem N
< auf dem , > auf dem .
| auf dem –

Damit lassen sich die skandinavischen Sprachen, Spanisch und Esperanto schreiben.
Manchen Tastaturen fehlt die Taste <>| links neben dem Y zugunsten einer größeren Shift-Taste. Deshalb habe ich mir <>| zusätzlich noch auf Altgr-, , Altgr-. und Altgr– gelegt.

Ausblick für weitere Artikel zum Thema

Für Russisch habe ich mir eine kyrillische Tastatur gekauft und benutze die Belegung wie sie aufgedruckt ist.

Generell stört es mich, dass man so häufige Zeichen wie []{}\ nicht so gut erreichen kann und so habe ich jeweils Altgr rechts und links verfügbar gemacht…

Share Button

Java: Using Enums for Singletons

This singleton pattern is not really a big deal, but overused because it gives us the coolness of knowing about design patters without learning too much stuff for that… It does have its uses in some programming languages, while more or less implicit or obsolete in others, depending on the point of view.

I have already discussed the generalization of the singleton.
Actually it can be seen that the enum of Java and of languages who do something like Java’s enum in a similar way can be used as a way to build a singleton by just limiting it to one instance. Having been skeptical about this as an abuse of enums that might confuse people who read this I would now suggest to embrace this as a possible and useful way to write singletons and to be familiar with it when reading other people’s code or even to use it when writing code.

The enum way to write a singleton is the most elegant way in Java, because it uses least boiler plate code and includes some benefits that are usually ignored, especially when it comes to serialization that has to be addressed in singletons quite often, but is quite often ignored, resulting in multiple instances of the „singleton“ to float around…

So here we go:


enum Singleton  {
    INSTANCE;

    private long count;

    private Singleton() {
        count = 0;
    }

    public synchronized long nextValue() {
        return count++;
    }
}

Remarks:

  • For real programs it would be important to check if long is sufficient or BigInteger is needed.
  • For real programs using an AtomicLong could be a better way than using synchronized.

Some links:

Share Button

Automation and its Limits

What we do in IT is programming and implementing and maintaining automatism. The general usefulness is not really challenged these days, but it was quite a big deal a few decades ago.
Many routine tasks can be done more efficiently and more reliably or become feasible at all with IT…

Just think of getting money from your bank account. Banks typically offer the possibility to do that at the bank counter without extra charges or any questions. But such a trivial operation can be done at the ATM, even while the counter is closed and I guess it is quite uncommon to go to the bank counter for that. There are more complex issues in banking that do require human interaction and they can be addressed with a human.

Buying a railroad ticket can be as trivial as getting money from the ATM, if the software of the ticket vending machine is really good, the tariff system is somewhat transparent, understandable and simple or at least the complexities can be covered well by the software. Most tickets are bought from the ticket vending machine, but the human at the ticket office in the railroad station is still important for more complex tickets that are not covered at all by the machine or that are too complex to buy there. It is interesting to see what can work and what not, maybe in another article in the future…

Phone lines are more and more answered by a machine, which has been given a beautiful voice by a human speaker… „For ABC press 1, for DEF press 2,….“ and so on. That works well for the trivial and most common cases. No matter how good the software is, usually at some point a limit is reached and we do want to find a human. I wanted to find out if there is a packaging service at some airport. It was possible to call, but there was no possibility at all or at least not within one hour of trying, to get to a human through these menus. So in the end the question remained unanswered. For the railroad ticket this can be an issue, if a more complex ticket is needed in a minor railroad station, but that can be solved nicely by providing a phone number (with real humans in the call center) that can be called and getting the ticket printed in ticket vending machine. This is possible in Germany and quite useful and basically covering most needs, even if talking with a real person is maybe nicer than a phone call with a dying battery… In Switzerland the ticket vending machines have a VOIP connection to the call center, so you do not even need to use the phone, but I think they are constrained to what can be done on the ticket vending machine GUI, so it does not help to solve the more difficult cases… Maybe in the future as well.

More delicate issues are the nanosecond trading. Trading (they do not like to call it gambling) on the stock exchange, with foreign currencies and all kinds of derivatives is done by algorithms automatically. Using software that is faulty, using the same software in many places, more or less. And also the risk management is done like that. It could be seen positive, because some traders tend to lose the sense of reasonable risk and machines kind of eliminate the emotions that can be counter productive in trading. But the risk takers are still there, setting out more automated trading systems… Will these systems one day create the real stock market crash by running into weird directions because of having the same software bug or because a situation not anticipated when the software was programmed occurs? Is it good to take such high risks without human control? Or it is even more delicate, if algorithms decide who is a terrorist and send drones, off course without a court order…

An issue that is mentioned quite a lot these days is motor vehicles operated automatically. While I do question the excessive use of cars of our time, I do think that that is an improvement, because human drivers are not very good and too risky. For rail transport this is much easier to achieve and many modern subway system operate automatically without any known problems.

We should always ask the question, does this IT solution and the automation it provides serve the humans and provide an advantage. That is the stuff we should build and operate.

Share Button

MS-Windows-Encodings with CMD: Bug or Feature?

Deutsch

Whoever is working with MS-Windows, should know these black windows with CMD running in them, even though they are not really popular. The Unix and Linux guys hate them, because they are really primitive compared to their shells. Windows guys like to work graphically. Or they prefer powershell or bash with cygwin. Linux and Unix have the equivalent of these windows, but usually they are white. Being able to configure the colors on both systems in any way this is of no relevance.

NT-based MS-Windows systems (NT 3.x, 4.x, 2000, XP, Vista, 7, 8, 10) have several subsystems and programs are running in them, for example Win64, Win32 (or Wow64 on 64-bit-systems), Win16, cygwin (if installed), DOS… Because programs for the DOS subsystem are typically started in a CMD window, and because some of the DOS commands have equally named and similarly operating pendents in the CMD window, the CMD window is sometimes called DOS-window, which is just incorrect. Actually this black window comes into existence in many situations. Whenever a program is started that has input or output (stdin, stdout, stderr), a black window is provided aroudn, if no redirection is in place. This applies for CMD. Under Linux (and Unix) with X11 it is the other way round. You start the program that provides the window and it automatically starts the default shell within that window, unless something else is stated.

Now I recommend an experiment. You just need an MS-Windows installation with any graphical editor like emacs, gvim, ultraedit, textpad, scite, or even notepad. And a cmd-window.

  • Please type these commands, do not use copy/paste
  • In the cmd-window cd into a directory you may write in.
  • echo "xäöüx" > filec.txt. Yes, there are ways to type these letters even with an American keyboard. 🙂
  • Open the file with a graphical editor. How do the Umlauts look?
  • Use the editor to create a second file in the same directory with contents like this: yäöüy.
  • view it in CMD:
  • type fileg.txt
  • How do the Umlauts look?

It is a feature or bug, that all common MS-Windows versions are putting the umlauts to different positions then the graphical editors. If you know how to fix this, let me know.

What has happened? In the early 80es MS-DOS came into existence. By that time standards for character encoding were not very good. Only ASCII or ISO-646-IRV existed, which was at least a big step ahead of EBCDIC. But this standardized only the lower 128 characters (7 Bit) and lacked at some characters for almost any language other than English. It was tried to put a small number of these additional letters into the positions of irrelevant characters like „@“, „[„, „~“, „$“ etc. And software vendors started to make use of the upper 128 characters. Commodore, Atari, MS-DOS, Apple, NeXT, TeX and „any“ software came up with a specific way for that, often specific for a language region.

These solutions where incompatible with each other between different software systems, sometimes even between versions or language versions of the same software. Remember that at that time networks were unusual and when they existed, they were proprietary to the operating system with bridge solutions being extremely difficult to implement. Even formats for floppy disks (the three-dimensional incarnations of the save button) had proprietary formats. So it did not hurt so much to have incompatible encodings.

But relatively early X11 which became the typical graphical system for Unix and later Linux started to use standard encodings like the ISO-8859-x family, UTF-8 and UTF-16. Linux was already on ISO-8859-1 in version 0.99 in the early 90es and never tried to invent its own character encoding. Thank god for that….

Today all relevant systems have moved to Unicode standard and standardized encodings like ISO-8869-x, UTF-8, UTF-16… But MS-Windows has done that only partially. The graphical system is using modern encodings or at leas Cp1252, which is a decent approximation. But the text based system with the black window, in which CMD is running, is still using encodings from the MS-DOS times more than 30 years ago, like Cp850. This results in a break within the system, which is at least very annoying, when working with cygwin or CMD-windows.

Those who have a lot of courage can change this in the registry. Just change the entries for OEMCP and OEMHAL in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage simultaneously. One of them is for input, the other one for output. So if you change only one, you will even get inconsistencies within the window… Sleep well with these night mares. 🙂
Research in the internet has revealed that some have tried to change to utf-8 (CP65001) and got a system that could not even boot as a result. Try it with a copy of a virtual system without too much risk, if you like… I have not verified this, so maybe it is just bad rumors to create damage for a great company that has brought is this interesting zoo of encodings within the same system. But anyway, try it at your own risk.
Maybe something like chcp and chhal can work as well. I have not tried that either…

It is up to you if you consider this whole issue a bug or a feature.

Share Button

Code examples in WordPress

When writing word press articles and including example code with <code>, the leading spaces are discarded when the article is displayed and without indention code looks weird and unreadable.

A way to bypass this is to transform tabs to spaces, and then replace all spaces by no-break-spaces (unicode 0x00A0). Then the indention remains intact. Direct copy paste from the blog to an editor or IDE will not be resulting in runnable code. But for non trivial examples I would recommend to put the code on github and add a link to the article. Code sniplets in articles should be mostly for human readers.

Share Button

Design Patterns: Singleton

Deutsch

This Singleton Pattern has the advantage to be easy to memorize.

The only really interesting aspect of it is the issue of initialization („lazy or „eager“) and maybe the dependencies between multiple singletons.

But I would like to mention two generalizations.

A Singleton exists once in the whole program. Generalizations can address the uniqueness in two ways. Either the number of instances can be increased from one to a small number or the universe („whole program“) can be changed. What is the universe in a big software system that is running on multiple servers with many processes and components?

Changing from one to a small fixed finite number is quite a routine thing. Java developers are calling this enum, but other programming language often have similar structures or at least allow to build them easily by restricting the generation of objects and creating the whole set of instances statically and making them accessible. I am not talking about the enum of C, C++ and C#.

The other, slightly more complex generalization looks at the „universe“. An application can be distributed and have several parallel processes, not only threads. This would mean dealing with a larger universe, which can become slightly tricky, but some examples of that are actually routine. Think of a database that exists once in the application to allow operating on the same data and keeping them consistent. Frameworks sometimes allow for an application server wide more or less classical singleton that exists once in the whole server landscape but can be accessed more or less transparently. An interesting case that is needed quite often is a global counter. It can be done using a DB-sequence. Or giving disjoint subsets of the numbers that could be generated to different processes and server. Or by using something like a UUID and praying that no collisions will occur. But an application wide singleton might be useful in some cases. Some frameworks are talking about a „bean“ with „application scope“. On the other hand smaller universes are even more useful. A typical example is the „session scope“.

Maybe just understanding these as a proper generalization of the singleton pattern will help in making these frameworks more understandable and more useful. The investment of having started with the most simple design pattern might actually pay off. Looking into some of the other patterns could be a worth while task for the future… 😉

Share Button

Using Collections

When Java came out about 20 years, it was great to have a decent and quite extensive collection library available as part of the standard setup and ready to use.

Before that we often had to develop our own or find one of many interesting collection libraries and when writing and using APIs it was not a good idea to rely on them as part of the API.

Since Java 2 (technical name „Java 1.2“) collection interfaces have been added and now the implementation is kind of detached, because we should use the interfaces as much as possible and make the implementation exchangeable.

An interesting question arose in conjunction with concurrency. The early Java 1 default collections where synchronized all the way. In Java 2 non synchronized variants were added and became the default. Synchronization can be achieved by wrapping them or by using the old collections (that do implement the interfaces as well since Java 2).

This was a performance improvement, because most of the time it is expensive and unnecessary overhead to synchronize collections. As a matter of fact special care should be used anyway to know who is accessing a collection in what way. Even if the collection itself does not get broken by simultaneous access, your application most likely is, unless you really know what you are doing. Are you?

Now it is usually a good idea to control changes of a collection. This is achieved by wrapping it with some Collections.umodifyableXXX-method. The result is that accessing the wrapped collection with set or put will cause an exception. It was a good approach, as a first shot, but not where we want to be now.

Of the wrapped collection still references to the inner, non-wrapped collection can be around, so it can still change while being accessed. If you can easily afford it, just copy collections when taking them in or giving them out. Or go immutable all the way and wrap your own in an umnodifiable-wrapper, if that works.

What I would like to see is something along the following lines:

  • We have two kinds of collection interfaces, those that are immutable and those that are mutable.
  • The immutable should be the default.
  • We have implementations of the collections and construction facilities for the immutable collections
  • The immutable implementation is off course the default.

I do not want to advocate going immutable collections only, because that does come at a high price in terms of efficiency. The usual pattern is to still have methods that modify a collection, but these leave the original collection as it is and just create a modified copy. Usually these implementations have been done in such a smart way that they share a lot, which is no pain, because they are all immutable. No matter how smart and admirable these tricks are, I strongly doubt that they can reach the performance of modifiable collections, if modifications are actually used a lot, at least in a purely single threaded environment.

Ruby has taken an interesting approach. Collections have a method freeze that can be called to make them immutable. That is adding runtime checks, which is a good match for Ruby. Java should check this at compile time, because it is so important. Having different interfaces would do that.

I recommend checking out the guava-collection library from google. It does come with most of the issues described here addressed and I think it is the best bet at the moment for that purpose. There are some other collection libraries to explore. Maybe one is actually better then guava.

Share Button