UTF-8 – Karl Brodowsky's IT-Blog

How to get rid of these HTML-entities in Files

It has been written here that HTML-entities (these ä etc) should be avoided with the exception of those that we need due to the HTML-syntax like <, >, & and maybe " and  . They were already mostly obsolete more than 20 years ago, but in those days we still did not automatically use UTF-8 …

„How to get rid of these HTML-entities in Files“ weiterlesen

Unicode and C

It is a common practice in C to use arrays of char as strings. The 0 is used as end marker. The whole thing was created like that in the 1970s and at that time it was kind of cool to get away with one less language feature and to express it in terms of …

„Unicode and C“ weiterlesen

ä ö … in HTML

In the old days of the web, more than 20 years ago, we found a possibility to write German Umlaut letters and a lot of other letters and symbols using pure ASCII. These are called „entities„, btw. Many people, including myself, started writing web pages using these transcriptions, in the assumption that they were required. …

„ä ö … in HTML“ weiterlesen

Unicode, UTF-8, UTF-16, ISO-8859-1: Why is it so difficult?

Deutsch Since about 20 years we have been kept busy with the change to Unicode. The good thing: We all know that Unicode and usually UTF-8 as representation is the way we should express textual data. The web is mostly UTF-8 today. But it has been a painful path and it still is sometimes. Why …

„Unicode, UTF-8, UTF-16, ISO-8859-1: Why is it so difficult?“ weiterlesen

GNU-Emacs und Unicode

Heute sollte man Text-Dateien bevorzugt in Unicode erstellen und speichern. Natürlich braucht man nur englische Texte, deshalb reicht ISO-646 (ASCII) aus, aber ein paar Umlaute kommen doch noch rein, allein wegen Eigennamen und so kann man ISO-8859-1 oder ISO-8859-15 nehmen und hat die Umlaute auch dabei. Praktisch mit demselben Aufwand kann man stattdessen UTF-8 verwenden. …

„GNU-Emacs und Unicode“ weiterlesen

Unicode, UTF-8, UTF-16, ISO-8859-1: Warum ist das so schwierig?

English Seit etwa 20 Jahren schlagen wir uns mit der Umstellung auf Unicode herum. Warum ist das so schwierig? Das größte Problem ist, dass man Dateien nur sehr begrenzt ansieht, wie ihr Inhalt zu interpretieren ist. Wir haben letztlich ein paar Tricks, mit denen man es oft erkennen kann: Die Endungen funktionieren für häufige und …

„Unicode, UTF-8, UTF-16, ISO-8859-1: Warum ist das so schwierig?“ weiterlesen