Indexing of long utf-8 files or strings

The UTF-8 format has the disadvantage that the length of characters and code points varies. Accessing a given position counted in characters is only possible by starting from the beginning or by providing an indexing structure. It is a good idea to find a balance between size and speed. So indexing blocks of several kilobyte …

Share Button

Random-Access and UTF-8

Deutsch It is a nice thing to be able to use random access files and to have the possibility to efficiently move to any byte position for reading or writing. This is even true for text files that have a fixed number of bytes per character, for example exactly one, exactly two or exactly four …

Share Button

Development of Hardware: Parallelism

Deutsch Until recently we could just rely on the fact that the CPU frequencies doubled at least every year, which has stopped a couple of years ago. So we can no longer compensate the inefficiencies of our software by just waiting for the next hardware release, which was no big deal, because software was often …

Share Button

Data Quality

Deutsch Very often we experience that software is not working well. Often this is a problem of the software itself, which we all know quite well. Experience shows, however, that more often the problem is of the data on which the software operates. In short, junk in — junk out. In organizations that use software, …

Share Button