PDF and PDF/A formats

The PDF format has experienced a success story on its way from being a quasi proprietary format that could only be dealt with using Adobe tools to a format that is specified and standardized and can be dealt with using open source tools and tools from different vendors. It has become accepted that PDF is …

Share Button

Find the next entry in a sequence

In Facebook, Xing, Google+, Vk.com, Linkedin and other of these social media networks we are often encountered with a trivial question like this: 1->2 2->8 3->18 4->32 5->50 6->72 7->? There are some easy patterns. Either it is some polynomial formula or some trick with the digits. But the point is, that any such sequence …

Share Button

Binary Data

Having discussed to some extent about strings and text data, it is time to look at the other case, binary data. Usually we think of arrays of bytes or sequences of bytes stored in some media. Why bytes? The 8-bit-computers are no longer so common, but the byte as a typical unit of binary data …

Share Button

Indexing of long utf-8 files or strings

The UTF-8 format has the disadvantage that the length of characters and code points varies. Accessing a given position counted in characters is only possible by starting from the beginning or by providing an indexing structure. It is a good idea to find a balance between size and speed. So indexing blocks of several kilobyte …

Share Button

Random-Access and UTF-8

Deutsch It is a nice thing to be able to use random access files and to have the possibility to efficiently move to any byte position for reading or writing. This is even true for text files that have a fixed number of bytes per character, for example exactly one, exactly two or exactly four …

Share Button

Development of Hardware: Parallelism

Deutsch Until recently we could just rely on the fact that the CPU frequencies doubled at least every year, which has stopped a couple of years ago. So we can no longer compensate the inefficiencies of our software by just waiting for the next hardware release, which was no big deal, because software was often …

Share Button

Data Quality

Deutsch Very often we experience that software is not working well. Often this is a problem of the software itself, which we all know quite well. Experience shows, however, that more often the problem is of the data on which the software operates. In short, junk in — junk out. In organizations that use software, …

Share Button