The PDF format has experienced a success story on its way from being a quasi proprietary format that could only be dealt with using Adobe tools to a format that is specified and standardized and can be dealt with using open source tools and tools from different vendors. It has become accepted that PDF is …
Kategorie-Archive: Data
Find the next entry in a sequence
In Facebook, Xing, Google+, Vk.com, Linkedin and other of these social media networks we are often encountered with a trivial question like this: 1->2 2->8 3->18 4->32 5->50 6->72 7->? There are some easy patterns. Either it is some polynomial formula or some trick with the digits. But the point is, that any such sequence …
Binary Data
Having discussed to some extent about strings and text data, it is time to look at the other case, binary data. Usually we think of arrays of bytes or sequences of bytes stored in some media. Why bytes? The 8-bit-computers are no longer so common, but the byte as a typical unit of binary data …
Indexing of long utf-8 files or strings
The UTF-8 format has the disadvantage that the length of characters and code points varies. Accessing a given position counted in characters is only possible by starting from the beginning or by providing an indexing structure. It is a good idea to find a balance between size and speed. So indexing blocks of several kilobyte …
Random-Access and UTF-8
Deutsch It is a nice thing to be able to use random access files and to have the possibility to efficiently move to any byte position for reading or writing. This is even true for text files that have a fixed number of bytes per character, for example exactly one, exactly two or exactly four …
Development of Hardware: Parallelism
Deutsch Until recently we could just rely on the fact that the CPU frequencies doubled at least every year, which has stopped a couple of years ago. So we can no longer compensate the inefficiencies of our software by just waiting for the next hardware release, which was no big deal, because software was often …
Data Quality
Deutsch Very often we experience that software is not working well. Often this is a problem of the software itself, which we all know quite well. Experience shows, however, that more often the problem is of the data on which the software operates. In short, junk in — junk out. In organizations that use software, …