Unicode – Karl Brodowsky's IT-Blog

How to get rid of these HTML-entities in Files

It has been written here that HTML-entities (these ä etc) should be avoided with the exception of those that we need due to the HTML-syntax like <, >, & and maybe " and  . They were already mostly obsolete more than 20 years ago, but in those days we still did not automatically use UTF-8 …

„How to get rid of these HTML-entities in Files“ weiterlesen

Unicode and C

It is a common practice in C to use arrays of char as strings. The 0 is used as end marker. The whole thing was created like that in the 1970s and at that time it was kind of cool to get away with one less language feature and to express it in terms of …

„Unicode and C“ weiterlesen

Tips and Tricks: Typing Unicode

I found this on netzwolf.info: You can enter arbitrary Unicode characters (more precisely code points) in X11 on Linux, if you know their Hex-Code: Press Shift-Ctrl (keep them pressed) Press also the letter u Release the Ctrl-Key Release the u-key Keep the Shift-key pressed Enter the Hex-Code of the Code point with the number of …

„Tips and Tricks: Typing Unicode“ weiterlesen

Some thoughts about String equality

Of course Strings are today in some way Unicode. In this article we assume code points as the building blocks of Strings. That means for example in the Java-world, that we are talking about one code point being comprised of one Java character for typical European languages, using Latin, Greek or Cyrillic alphabets including extensions …

„Some thoughts about String equality“ weiterlesen

Unicode, UTF-8, UTF-16, ISO-8859-1: Why is it so difficult?

Deutsch Since about 20 years we have been kept busy with the change to Unicode. The good thing: We all know that Unicode and usually UTF-8 as representation is the way we should express textual data. The web is mostly UTF-8 today. But it has been a painful path and it still is sometimes. Why …

„Unicode, UTF-8, UTF-16, ISO-8859-1: Why is it so difficult?“ weiterlesen

GNU-Emacs und Unicode

Heute sollte man Text-Dateien bevorzugt in Unicode erstellen und speichern. Natürlich braucht man nur englische Texte, deshalb reicht ISO-646 (ASCII) aus, aber ein paar Umlaute kommen doch noch rein, allein wegen Eigennamen und so kann man ISO-8859-1 oder ISO-8859-15 nehmen und hat die Umlaute auch dabei. Praktisch mit demselben Aufwand kann man stattdessen UTF-8 verwenden. …

„GNU-Emacs und Unicode“ weiterlesen

Unicode, UTF-8, UTF-16, ISO-8859-1: Warum ist das so schwierig?

English Seit etwa 20 Jahren schlagen wir uns mit der Umstellung auf Unicode herum. Warum ist das so schwierig? Das größte Problem ist, dass man Dateien nur sehr begrenzt ansieht, wie ihr Inhalt zu interpretieren ist. Wir haben letztlich ein paar Tricks, mit denen man es oft erkennen kann: Die Endungen funktionieren für häufige und …

„Unicode, UTF-8, UTF-16, ISO-8859-1: Warum ist das so schwierig?“ weiterlesen