Changing of Keyboard Mappings with xmodmap

Deutsch

Introduction

When running a Linux system in its graphical mode, keyboard mappings can be changed by using xmodmap.
Each key on the keyboard has a „keycode“ which can be found out by looking at the output of
xmodmap -pke > current-keyboard
or by running
xev
for trying out the keys.

I am using a modified German keyboard, but off course the ideas can be adapted to any setup.

Given setting as a Basis

You can start with any keyboard, for example with the German keyboard with no dead keys, which is often useful as a starting point. I prefer to modify it a little bit. This allows me to support more languages like Swedish, Norwegian, Danish, Dutch, Spanish and Esperanto. Russian is an issue that I will address in another article. On the other hand it is a good idea to have secondary positions for some symbols that are on keys that might be missing on some physical layouts, like „<", "|" and ">“ on the German keyboard, whose key is just not present in the American keyboard. The third idea is to have two Altgr-keys, because many important symbols are just accessible in conjunction with Altgr and thus easier if there are two Altgr-keys like there are two Shift-keys.

Special characters for Esperanto

For Esperanto (Esperanto explained in Esperanto) the latin alphabet with its 26 letters is needed, even though some of them are never used. And on top of that the following letters are needed as well:
ĉ Ĉ ĝ Ĝ ĵ Ĵ ĥ Ĥ ŝ Ŝ ŭ Ŭ
Unfortunately they have not reused the letters commonly used in Slavic languages and present on many international setups:
č Č ž Ž š Š
but Unicode covers it all and as long as the keyboard does not need to support Slavic or Baltic or Sami languages simultaneously with Esperanto, things should be fine.

A reasonable approach is tu put these symbols on Altgr-C, Altgr-G, Altgr-J, Altgr-H, Altgr-S and Altgr-U.

This can be achieved easily:
Create a file
.xmodmap-ori
by

xmodmap -pke > .xmodmap-ori

.
And a script $HOME/bin/orikb:

#!/bin/sh
xmodmap $HOME/.xmodmap-ori

Look up where the letters S, G, H J C and U are positioned on the keyboard in the xmodmap-ori-file.
Now create a file
.xmodmap-esperanto
using the following command

egrep 'keycode *[0-9]+ *= *[SsGgCcJjHhUu] ' < .xmodmap-ori

and edit it. Leave the part keycode = intact and change it to something like this:

keycode 39 = s S s S scircumflex Scircumflex
keycode 42 = g G g G gcircumflex Gcircumflex
keycode 43 = h H h H hcircumflex Hcircumflex
keycode 44 = j J j J jcircumflex Jcircumflex
keycode 54 = c C c C ccircumflex Ccircumflex
keycode 30 = u U u U ubreve Ubreve

The numbers between "keycode" and "=" could be different on your machine, but the rest should be like that.

Now create two scripts
$HOME/bin/eokb

#!/bin/sh
xmodmap $HOME/.xmodmap-esperanto

and
$HOME/bin/orikb

#!/bin/sh
xmodmap $HOME/.xmodmap-ori

.

Do not forgot to do the

chmod +x $HOME/bin/eokb $HOME/bin/orikb

for your scripts... 🙂

Now you can use eokb for enabling the Esperanto keys and orikb to return to your original setting.
The Esperanto keys will be accessible by using Altgr and the Latin letter they are derived from.

Other language specific characters

In a similar way you can have other characters
å and Å on the A,
ë and Ë on the E
ï and Ï on the I
ø and Ø on the O
æ and Æ on the Ä
ÿ and Ÿ on the Y
ñ and Ñ on the N
< on the , > on the .
| on the -

This allows to write a lot of languages. See what works for you...

Remaining Issues

For Russian I have bought a physical Cyrillic keyboard and I am using it with the setup that is printed on the keys. I might write about this another time.

Generally I like to have two Altgr keys and I can very well live without a windows key. I might write about this another time as well.

Btw. this article has been written for Linux, but it works perfectly well with any other Unix-like system, as long as X11 is used, just think of Aix, HPUX, BSD, Solaris and likewise systems. But Linux is by far the most common unix like desktop system these days.

Here is another approach:
xkbmap

Share Button

Tastaturbelegung mit xmodmap ändern (Linux/X11)

English

Einleitung

Wenn ein Linuxsystem im grafischen Modus läuft, kann man die Tastaturbelegung mittels xmodmap ändern.
Jede Taste hat einen sogenannten „keycode“, den man leicht ermitteln kann, indem man sich die jetzige Belegung mittels
xmodmap -pke > current-keyboard
ausgeben lässt oder mit
xev
die Tasten ausprobiert. Ich benutze gerne eine deutsche Tastatur und tue das unabhängig davon, was auf den Tasten so aufgedruckt ist. In der Schweiz ist leider eine andere Belegung üblich, die minimale Abweichungen aufweist. Angeblich (inoffiziell) um die Schweizer Tastaturhersteller zu schützen, weil man vergass, dass asiatische Hersteller diese Hürde mühelos überspringen können. Ageblich (offiziell) um einen Kompromiss für die verschiedenen Sprachregionen der Schweiz zu schaffen und die Unterschiede minimal zu halten. Die Schweizer Tastatur ist aber unpraktisch, weil das „ß“ und die großen Umlaute fehlen.

Vorgegebene Einstellung als Basis

Nun kann man unter Linux eine Deutsche Tastatur mit „no dead keys“ wählen, was normalerweise sinnvoll ist. Oder sagen wir eine sinnvolle Ausgangsbasis. Ich finde es aber praktisch, diese noch etwas zu modifizieren. Einerseits können so mehr Sprachen unterstützt werden, wobei für mich die skandinavischen Sprachen, Spanisch, Russisch und Esperanto im Moment am interessantesten sind. Andererseits finde ich es praktisch, die Zeichen, die auf Tasten liegen, die bei manchen physikalischen Layouts fehlen, z.B. die drei Zeichen „<", "|", ">„, noch zusätzlich woanders hin zu legen, damit man sie auch dann noch hat. Drittens finde ich es schön, wenn man zwei Altgr-Tasten hat, weil viele wichtige Zeichen nur über Altgr erreichbar sind und sich besser greifen lassen, wenn es ein rechtes und linkes Altgr gibt.

Zeichen für Esperanto

Für Esperanto (Esperanto auf Esperanto erklärt) braucht man das übliche lateinische Alphabet mit 26 Buchstaben, wovon einige allerdings nicht verwendet werden, und die folgenden zusätzlichen Buchstaben:
ĉ Ĉ ĝ Ĝ ĵ Ĵ ĥ Ĥ ŝ Ŝ ŭ Ŭ
Leider hat man sich nicht an die in einigen slawischen Sprachen verwendeten zusätzliche Buchstaben
č Č ž Ž š Š
gehalten, aber solange man nicht außerdem slawische oder baltische Sprachen mit lateinischer Schrift parallel verwendet, ist das kein Problem.

Die sinnvollste Lösung scheint mir, die jeweiligen Zeichen auf Altgr-C, Altgr-G, Altgr-J, Altgr-H, Altgr-S und Altgr-U zu legen.

Das lässt sich recht einfach erreichen:
Man legt eine Datei
.xmodmap-ori
mittels

xmodmap -pke > .xmodmap-ori

an.
Und ein Skript $HOME/bin/orikb:

#!/bin/sh
xmodmap $HOME/.xmodmap-ori

Man legt danach eine Datei
.xmodmap-esperanto
an, in die man die folgenden Zeilen einträgt:

keycode 39 = s S s S scircumflex Scircumflex
keycode 42 = g G g G gcircumflex Gcircumflex
keycode 43 = h H h H hcircumflex Hcircumflex
keycode 44 = j J j J jcircumflex Jcircumflex
keycode 54 = c C c C ccircumflex Ccircumflex
keycode 30 = u U u U ubreve Ubreve

Und ein script in
$HOME/bin/eokb

#!/bin/sh
xmodmap $HOME/.xmodmap-esperanto

Natürlich chmod +x für die Skripte nicht vergessen… 🙂

Nun kann man mittels eokb die Esperanto-Zeichen auf die Tastatur bekommen und mittels orikb den Originalzustand wiederherstellen.

Andere sprachspezifische Sonderzeichen

Auf ähnliche Weise lassen sich natürlich auch andere Zeichen auf die Tastatur bringen. Ich habe z.B. noch
å und Å auf dem A,
ë und Ë auf dem E
ï und Ï auf dem I
ø und Ø auf dem O
æ und Æ auf dem Ä
ÿ und Ÿ auf dem Y
ñ und Ñ auf dem N
< auf dem , > auf dem .
| auf dem –

Damit lassen sich die skandinavischen Sprachen, Spanisch und Esperanto schreiben.
Manchen Tastaturen fehlt die Taste <>| links neben dem Y zugunsten einer größeren Shift-Taste. Deshalb habe ich mir <>| zusätzlich noch auf Altgr-, , Altgr-. und Altgr– gelegt.

Ausblick für weitere Artikel zum Thema

Für Russisch habe ich mir eine kyrillische Tastatur gekauft und benutze die Belegung wie sie aufgedruckt ist.

Generell stört es mich, dass man so häufige Zeichen wie []{}\ nicht so gut erreichen kann und so habe ich jeweils Altgr rechts und links verfügbar gemacht…

Share Button

Java: Using Enums for Singletons

This singleton pattern is not really a big deal, but overused because it gives us the coolness of knowing about design patters without learning too much stuff for that… It does have its uses in some programming languages, while more or less implicit or obsolete in others, depending on the point of view.

I have already discussed the generalization of the singleton.
Actually it can be seen that the enum of Java and of languages who do something like Java’s enum in a similar way can be used as a way to build a singleton by just limiting it to one instance. Having been skeptical about this as an abuse of enums that might confuse people who read this I would now suggest to embrace this as a possible and useful way to write singletons and to be familiar with it when reading other people’s code or even to use it when writing code.

The enum way to write a singleton is the most elegant way in Java, because it uses least boiler plate code and includes some benefits that are usually ignored, especially when it comes to serialization that has to be addressed in singletons quite often, but is quite often ignored, resulting in multiple instances of the „singleton“ to float around…

So here we go:


enum Singleton  {
    INSTANCE;

    private long count;

    private Singleton() {
        count = 0;
    }

    public synchronized long nextValue() {
        return count++;
    }
}

Remarks:

  • For real programs it would be important to check if long is sufficient or BigInteger is needed.
  • For real programs using an AtomicLong could be a better way than using synchronized.

Some links:

Share Button

Automation and its Limits

What we do in IT is programming and implementing and maintaining automatism. The general usefulness is not really challenged these days, but it was quite a big deal a few decades ago.
Many routine tasks can be done more efficiently and more reliably or become feasible at all with IT…

Just think of getting money from your bank account. Banks typically offer the possibility to do that at the bank counter without extra charges or any questions. But such a trivial operation can be done at the ATM, even while the counter is closed and I guess it is quite uncommon to go to the bank counter for that. There are more complex issues in banking that do require human interaction and they can be addressed with a human.

Buying a railroad ticket can be as trivial as getting money from the ATM, if the software of the ticket vending machine is really good, the tariff system is somewhat transparent, understandable and simple or at least the complexities can be covered well by the software. Most tickets are bought from the ticket vending machine, but the human at the ticket office in the railroad station is still important for more complex tickets that are not covered at all by the machine or that are too complex to buy there. It is interesting to see what can work and what not, maybe in another article in the future…

Phone lines are more and more answered by a machine, which has been given a beautiful voice by a human speaker… „For ABC press 1, for DEF press 2,….“ and so on. That works well for the trivial and most common cases. No matter how good the software is, usually at some point a limit is reached and we do want to find a human. I wanted to find out if there is a packaging service at some airport. It was possible to call, but there was no possibility at all or at least not within one hour of trying, to get to a human through these menus. So in the end the question remained unanswered. For the railroad ticket this can be an issue, if a more complex ticket is needed in a minor railroad station, but that can be solved nicely by providing a phone number (with real humans in the call center) that can be called and getting the ticket printed in ticket vending machine. This is possible in Germany and quite useful and basically covering most needs, even if talking with a real person is maybe nicer than a phone call with a dying battery… In Switzerland the ticket vending machines have a VOIP connection to the call center, so you do not even need to use the phone, but I think they are constrained to what can be done on the ticket vending machine GUI, so it does not help to solve the more difficult cases… Maybe in the future as well.

More delicate issues are the nanosecond trading. Trading (they do not like to call it gambling) on the stock exchange, with foreign currencies and all kinds of derivatives is done by algorithms automatically. Using software that is faulty, using the same software in many places, more or less. And also the risk management is done like that. It could be seen positive, because some traders tend to lose the sense of reasonable risk and machines kind of eliminate the emotions that can be counter productive in trading. But the risk takers are still there, setting out more automated trading systems… Will these systems one day create the real stock market crash by running into weird directions because of having the same software bug or because a situation not anticipated when the software was programmed occurs? Is it good to take such high risks without human control? Or it is even more delicate, if algorithms decide who is a terrorist and send drones, off course without a court order…

An issue that is mentioned quite a lot these days is motor vehicles operated automatically. While I do question the excessive use of cars of our time, I do think that that is an improvement, because human drivers are not very good and too risky. For rail transport this is much easier to achieve and many modern subway system operate automatically without any known problems.

We should always ask the question, does this IT solution and the automation it provides serve the humans and provide an advantage. That is the stuff we should build and operate.

Share Button

MS-Windows-Encodings with CMD: Bug or Feature?

Deutsch

Whoever is working with MS-Windows, should know these black windows with CMD running in them, even though they are not really popular. The Unix and Linux guys hate them, because they are really primitive compared to their shells. Windows guys like to work graphically. Or they prefer powershell or bash with cygwin. Linux and Unix have the equivalent of these windows, but usually they are white. Being able to configure the colors on both systems in any way this is of no relevance.

NT-based MS-Windows systems (NT 3.x, 4.x, 2000, XP, Vista, 7, 8, 10) have several subsystems and programs are running in them, for example Win64, Win32 (or Wow64 on 64-bit-systems), Win16, cygwin (if installed), DOS… Because programs for the DOS subsystem are typically started in a CMD window, and because some of the DOS commands have equally named and similarly operating pendents in the CMD window, the CMD window is sometimes called DOS-window, which is just incorrect. Actually this black window comes into existence in many situations. Whenever a program is started that has input or output (stdin, stdout, stderr), a black window is provided aroudn, if no redirection is in place. This applies for CMD. Under Linux (and Unix) with X11 it is the other way round. You start the program that provides the window and it automatically starts the default shell within that window, unless something else is stated.

Now I recommend an experiment. You just need an MS-Windows installation with any graphical editor like emacs, gvim, ultraedit, textpad, scite, or even notepad. And a cmd-window.

  • Please type these commands, do not use copy/paste
  • In the cmd-window cd into a directory you may write in.
  • echo "xäöüx" > filec.txt. Yes, there are ways to type these letters even with an American keyboard. 🙂
  • Open the file with a graphical editor. How do the Umlauts look?
  • Use the editor to create a second file in the same directory with contents like this: yäöüy.
  • view it in CMD:
  • type fileg.txt
  • How do the Umlauts look?

It is a feature or bug, that all common MS-Windows versions are putting the umlauts to different positions then the graphical editors. If you know how to fix this, let me know.

What has happened? In the early 80es MS-DOS came into existence. By that time standards for character encoding were not very good. Only ASCII or ISO-646-IRV existed, which was at least a big step ahead of EBCDIC. But this standardized only the lower 128 characters (7 Bit) and lacked at some characters for almost any language other than English. It was tried to put a small number of these additional letters into the positions of irrelevant characters like „@“, „[„, „~“, „$“ etc. And software vendors started to make use of the upper 128 characters. Commodore, Atari, MS-DOS, Apple, NeXT, TeX and „any“ software came up with a specific way for that, often specific for a language region.

These solutions where incompatible with each other between different software systems, sometimes even between versions or language versions of the same software. Remember that at that time networks were unusual and when they existed, they were proprietary to the operating system with bridge solutions being extremely difficult to implement. Even formats for floppy disks (the three-dimensional incarnations of the save button) had proprietary formats. So it did not hurt so much to have incompatible encodings.

But relatively early X11 which became the typical graphical system for Unix and later Linux started to use standard encodings like the ISO-8859-x family, UTF-8 and UTF-16. Linux was already on ISO-8859-1 in version 0.99 in the early 90es and never tried to invent its own character encoding. Thank god for that….

Today all relevant systems have moved to Unicode standard and standardized encodings like ISO-8869-x, UTF-8, UTF-16… But MS-Windows has done that only partially. The graphical system is using modern encodings or at leas Cp1252, which is a decent approximation. But the text based system with the black window, in which CMD is running, is still using encodings from the MS-DOS times more than 30 years ago, like Cp850. This results in a break within the system, which is at least very annoying, when working with cygwin or CMD-windows.

Those who have a lot of courage can change this in the registry. Just change the entries for OEMCP and OEMHAL in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage simultaneously. One of them is for input, the other one for output. So if you change only one, you will even get inconsistencies within the window… Sleep well with these night mares. 🙂
Research in the internet has revealed that some have tried to change to utf-8 (CP65001) and got a system that could not even boot as a result. Try it with a copy of a virtual system without too much risk, if you like… I have not verified this, so maybe it is just bad rumors to create damage for a great company that has brought is this interesting zoo of encodings within the same system. But anyway, try it at your own risk.
Maybe something like chcp and chhal can work as well. I have not tried that either…

It is up to you if you consider this whole issue a bug or a feature.

Share Button

Code examples in WordPress

When writing word press articles and including example code with <code>, the leading spaces are discarded when the article is displayed and without indention code looks weird and unreadable.

A way to bypass this is to transform tabs to spaces, and then replace all spaces by no-break-spaces (unicode 0x00A0). Then the indention remains intact. Direct copy paste from the blog to an editor or IDE will not be resulting in runnable code. But for non trivial examples I would recommend to put the code on github and add a link to the article. Code sniplets in articles should be mostly for human readers.

Share Button

Design Patterns: Singleton

Deutsch

This Singleton Pattern has the advantage to be easy to memorize.

The only really interesting aspect of it is the issue of initialization („lazy or „eager“) and maybe the dependencies between multiple singletons.

But I would like to mention two generalizations.

A Singleton exists once in the whole program. Generalizations can address the uniqueness in two ways. Either the number of instances can be increased from one to a small number or the universe („whole program“) can be changed. What is the universe in a big software system that is running on multiple servers with many processes and components?

Changing from one to a small fixed finite number is quite a routine thing. Java developers are calling this enum, but other programming language often have similar structures or at least allow to build them easily by restricting the generation of objects and creating the whole set of instances statically and making them accessible. I am not talking about the enum of C, C++ and C#.

The other, slightly more complex generalization looks at the „universe“. An application can be distributed and have several parallel processes, not only threads. This would mean dealing with a larger universe, which can become slightly tricky, but some examples of that are actually routine. Think of a database that exists once in the application to allow operating on the same data and keeping them consistent. Frameworks sometimes allow for an application server wide more or less classical singleton that exists once in the whole server landscape but can be accessed more or less transparently. An interesting case that is needed quite often is a global counter. It can be done using a DB-sequence. Or giving disjoint subsets of the numbers that could be generated to different processes and server. Or by using something like a UUID and praying that no collisions will occur. But an application wide singleton might be useful in some cases. Some frameworks are talking about a „bean“ with „application scope“. On the other hand smaller universes are even more useful. A typical example is the „session scope“.

Maybe just understanding these as a proper generalization of the singleton pattern will help in making these frameworks more understandable and more useful. The investment of having started with the most simple design pattern might actually pay off. Looking into some of the other patterns could be a worth while task for the future… 😉

Share Button

Using Collections

When Java came out about 20 years, it was great to have a decent and quite extensive collection library available as part of the standard setup and ready to use.

Before that we often had to develop our own or find one of many interesting collection libraries and when writing and using APIs it was not a good idea to rely on them as part of the API.

Since Java 2 (technical name „Java 1.2“) collection interfaces have been added and now the implementation is kind of detached, because we should use the interfaces as much as possible and make the implementation exchangeable.

An interesting question arose in conjunction with concurrency. The early Java 1 default collections where synchronized all the way. In Java 2 non synchronized variants were added and became the default. Synchronization can be achieved by wrapping them or by using the old collections (that do implement the interfaces as well since Java 2).

This was a performance improvement, because most of the time it is expensive and unnecessary overhead to synchronize collections. As a matter of fact special care should be used anyway to know who is accessing a collection in what way. Even if the collection itself does not get broken by simultaneous access, your application most likely is, unless you really know what you are doing. Are you?

Now it is usually a good idea to control changes of a collection. This is achieved by wrapping it with some Collections.umodifyableXXX-method. The result is that accessing the wrapped collection with set or put will cause an exception. It was a good approach, as a first shot, but not where we want to be now.

Of the wrapped collection still references to the inner, non-wrapped collection can be around, so it can still change while being accessed. If you can easily afford it, just copy collections when taking them in or giving them out. Or go immutable all the way and wrap your own in an umnodifiable-wrapper, if that works.

What I would like to see is something along the following lines:

  • We have two kinds of collection interfaces, those that are immutable and those that are mutable.
  • The immutable should be the default.
  • We have implementations of the collections and construction facilities for the immutable collections
  • The immutable implementation is off course the default.

I do not want to advocate going immutable collections only, because that does come at a high price in terms of efficiency. The usual pattern is to still have methods that modify a collection, but these leave the original collection as it is and just create a modified copy. Usually these implementations have been done in such a smart way that they share a lot, which is no pain, because they are all immutable. No matter how smart and admirable these tricks are, I strongly doubt that they can reach the performance of modifiable collections, if modifications are actually used a lot, at least in a purely single threaded environment.

Ruby has taken an interesting approach. Collections have a method freeze that can be called to make them immutable. That is adding runtime checks, which is a good match for Ruby. Java should check this at compile time, because it is so important. Having different interfaces would do that.

I recommend checking out the guava-collection library from google. It does come with most of the issues described here addressed and I think it is the best bet at the moment for that purpose. There are some other collection libraries to explore. Maybe one is actually better then guava.

Share Button

Indexing of Database Tables II (additional indices)

Additional indices („indexes“ in Oracle’s English) apart from the primary key are extremely important for the performance of databases.
There is this magic behind it, when a complicated query with multiple joins runs slowly and magically becomes faster when adding the right index and the right hint or does not become faster at all, even though the index being added could be so useful.. I leave this to the DBAs.

Some thoughts about indexing in general. Each index costs a little bit of storage and a little bit of performance when writing to the table, especially when inserting or when updating columns that are included in the index. This is more, if the amount of data per row in the index is more. And the helpfulness of the index is more, if it allows drilling down to a much smaller number of rows, not necessarily one. I would call this selectiveness of the index.

An interesting issue arises when combining multiple indices and there is interest in selecting by any one of them or their conjunction. This leads to Z-curve based indices, an issue worth an article by itself, maybe in the future.

Generally it is desirable to only include fields in the index that contribute to selecting the data. That means that adding this column to the where criteria of a select significantly reduces the number of records found. If that is not the case, the overhead of maintaining this field as part of the index and blowing up the index operations for both read and write access is probably not worth it. Sometimes a field with very little selectiveness is included in an index that is supporting a unique constraint. That may be ok, if the table is small, but for larger tables, rethinking the database design should be considered. I have seen this happening when several quite different kinds of records where stored in the same table. Splitting up to different tables would have resolved this. Other cases require different answers.

An interesting issue is also a situation, where many selects contain columns A and B in the WHERE-clause. By itself column A as well as column B are quite selective. But they are quite strongly correlated, so adding the other of the two only marginally contributes to the selectiveness of the index. In this case it should be considered, which of the three options, having A, B or both in the index is best. Try it out, discuss it with a good DBA. Yes, I have seen many people calling themselves DBA who were not really good in their area and I have learned what really good DBAs can do. Get one for a few days, when designing a serious database…

See also Indexing of Database Tables I (Primary Keys)

Share Button

PDF and PDF/A formats

The PDF format has experienced a success story on its way from being a quasi proprietary format that could only be dealt with using Adobe tools to a format that is specified and standardized and can be dealt with using open source tools and tools from different vendors. It has become accepted that PDF is primarily a print format and that for web content HTML is the better choice, which was not clear 15 years ago, when people coming from print layout who just considered themselves trivially capable of adding web to their portfolio just wanted to build whole web pages by just using PDF instead of HTML.

Now the format did change over time and there are always PDF files that use specific features that do not work in certain PDF viewers.

But there are requirements for maintaining documents over a long period of time. Just consider long term contracts that have a duration of 50-100 years. The associated documents usually need to be retained for that duration plus ten years. Alone the issue of storing data for such a long time and being able to read it physically is a challenge, but assuming that this issue is addressed and files can still be read in 110 years, the file format should be readable.

Now companies disappear. A lot of them in 100 years, probably even big ones like Adobe, Apple, Microsoft, Oracle and others. We do not know which companies will disappear, only that it is very likely that some companies that are big now will disappear. Proprietary software may make it to another vendor when shutting down the company, to pay the salaries of the former employees for some more days. But it might eventually disappear as well. Open source software has a better chance of being available in 100 years, but that cannot be absolutely guaranteed either, unless special attention is given to that software over such a long time. And if software is not maintained, it is highly unlikely that it will be able to run on the platforms that are common in 100 years.

So it is good to create a stable and simplified standard for long term archiving. Software for accessing that can be written from scratch based on that specification. And it is more likely to remain available, if continuous need for it can be seen.

The idea is a format called PDF/A, where A stands for „archive“, which is an option for storing PDF files over a very long period of time. Many cool features of PDF have been removed from PDF/A and make it more robust and easy to use. Important is also not to rely on additional data sources, for example for passwords of encrypted PDF files or for fonts. Encryption with password protection is a bad thing because it is quite likely that the password is gone in 100 years. Fonts need to be included, because finding them in 100 years might not be trivial. This usually means that proprietary fonts have to be avoided, unless the licensing allows inclusion of the fonts into the PDF file and unlimited reading. Including JavaScript, Video, Audio or Forms is also a bad idea. Video should be archived separately and it has the same issues as PDF for long term archiving.

Share Button