Code examples in WordPress

When writing word press articles and including example code with <code>, the leading spaces are discarded when the article is displayed and without indention code looks weird and unreadable.

A way to bypass this is to transform tabs to spaces, and then replace all spaces by no-break-spaces (unicode 0x00A0). Then the indention remains intact. Direct copy paste from the blog to an editor or IDE will not be resulting in runnable code. But for non trivial examples I would recommend to put the code on github and add a link to the article. Code sniplets in articles should be mostly for human readers.

Share Button

Design Patterns: Singleton

Deutsch

This Singleton Pattern has the advantage to be easy to memorize.

The only really interesting aspect of it is the issue of initialization („lazy or „eager“) and maybe the dependencies between multiple singletons.

But I would like to mention two generalizations.

A Singleton exists once in the whole program. Generalizations can address the uniqueness in two ways. Either the number of instances can be increased from one to a small number or the universe („whole program“) can be changed. What is the universe in a big software system that is running on multiple servers with many processes and components?

Changing from one to a small fixed finite number is quite a routine thing. Java developers are calling this enum, but other programming language often have similar structures or at least allow to build them easily by restricting the generation of objects and creating the whole set of instances statically and making them accessible. I am not talking about the enum of C, C++ and C#.

The other, slightly more complex generalization looks at the „universe“. An application can be distributed and have several parallel processes, not only threads. This would mean dealing with a larger universe, which can become slightly tricky, but some examples of that are actually routine. Think of a database that exists once in the application to allow operating on the same data and keeping them consistent. Frameworks sometimes allow for an application server wide more or less classical singleton that exists once in the whole server landscape but can be accessed more or less transparently. An interesting case that is needed quite often is a global counter. It can be done using a DB-sequence. Or giving disjoint subsets of the numbers that could be generated to different processes and server. Or by using something like a UUID and praying that no collisions will occur. But an application wide singleton might be useful in some cases. Some frameworks are talking about a „bean“ with „application scope“. On the other hand smaller universes are even more useful. A typical example is the „session scope“.

Maybe just understanding these as a proper generalization of the singleton pattern will help in making these frameworks more understandable and more useful. The investment of having started with the most simple design pattern might actually pay off. Looking into some of the other patterns could be a worth while task for the future… 😉

Share Button

Using Collections

When Java came out about 20 years, it was great to have a decent and quite extensive collection library available as part of the standard setup and ready to use.

Before that we often had to develop our own or find one of many interesting collection libraries and when writing and using APIs it was not a good idea to rely on them as part of the API.

Since Java 2 (technical name „Java 1.2“) collection interfaces have been added and now the implementation is kind of detached, because we should use the interfaces as much as possible and make the implementation exchangeable.

An interesting question arose in conjunction with concurrency. The early Java 1 default collections where synchronized all the way. In Java 2 non synchronized variants were added and became the default. Synchronization can be achieved by wrapping them or by using the old collections (that do implement the interfaces as well since Java 2).

This was a performance improvement, because most of the time it is expensive and unnecessary overhead to synchronize collections. As a matter of fact special care should be used anyway to know who is accessing a collection in what way. Even if the collection itself does not get broken by simultaneous access, your application most likely is, unless you really know what you are doing. Are you?

Now it is usually a good idea to control changes of a collection. This is achieved by wrapping it with some Collections.umodifyableXXX-method. The result is that accessing the wrapped collection with set or put will cause an exception. It was a good approach, as a first shot, but not where we want to be now.

Of the wrapped collection still references to the inner, non-wrapped collection can be around, so it can still change while being accessed. If you can easily afford it, just copy collections when taking them in or giving them out. Or go immutable all the way and wrap your own in an umnodifiable-wrapper, if that works.

What I would like to see is something along the following lines:

  • We have two kinds of collection interfaces, those that are immutable and those that are mutable.
  • The immutable should be the default.
  • We have implementations of the collections and construction facilities for the immutable collections
  • The immutable implementation is off course the default.

I do not want to advocate going immutable collections only, because that does come at a high price in terms of efficiency. The usual pattern is to still have methods that modify a collection, but these leave the original collection as it is and just create a modified copy. Usually these implementations have been done in such a smart way that they share a lot, which is no pain, because they are all immutable. No matter how smart and admirable these tricks are, I strongly doubt that they can reach the performance of modifiable collections, if modifications are actually used a lot, at least in a purely single threaded environment.

Ruby has taken an interesting approach. Collections have a method freeze that can be called to make them immutable. That is adding runtime checks, which is a good match for Ruby. Java should check this at compile time, because it is so important. Having different interfaces would do that.

I recommend checking out the guava-collection library from google. It does come with most of the issues described here addressed and I think it is the best bet at the moment for that purpose. There are some other collection libraries to explore. Maybe one is actually better then guava.

Share Button

Indexing of Database Tables II (additional indices)

Additional indices („indexes“ in Oracle’s English) apart from the primary key are extremely important for the performance of databases.
There is this magic behind it, when a complicated query with multiple joins runs slowly and magically becomes faster when adding the right index and the right hint or does not become faster at all, even though the index being added could be so useful.. I leave this to the DBAs.

Some thoughts about indexing in general. Each index costs a little bit of storage and a little bit of performance when writing to the table, especially when inserting or when updating columns that are included in the index. This is more, if the amount of data per row in the index is more. And the helpfulness of the index is more, if it allows drilling down to a much smaller number of rows, not necessarily one. I would call this selectiveness of the index.

An interesting issue arises when combining multiple indices and there is interest in selecting by any one of them or their conjunction. This leads to Z-curve based indices, an issue worth an article by itself, maybe in the future.

Generally it is desirable to only include fields in the index that contribute to selecting the data. That means that adding this column to the where criteria of a select significantly reduces the number of records found. If that is not the case, the overhead of maintaining this field as part of the index and blowing up the index operations for both read and write access is probably not worth it. Sometimes a field with very little selectiveness is included in an index that is supporting a unique constraint. That may be ok, if the table is small, but for larger tables, rethinking the database design should be considered. I have seen this happening when several quite different kinds of records where stored in the same table. Splitting up to different tables would have resolved this. Other cases require different answers.

An interesting issue is also a situation, where many selects contain columns A and B in the WHERE-clause. By itself column A as well as column B are quite selective. But they are quite strongly correlated, so adding the other of the two only marginally contributes to the selectiveness of the index. In this case it should be considered, which of the three options, having A, B or both in the index is best. Try it out, discuss it with a good DBA. Yes, I have seen many people calling themselves DBA who were not really good in their area and I have learned what really good DBAs can do. Get one for a few days, when designing a serious database…

Share Button