ä ö … in HTML

In the old days of the web, more than 20 years ago, we found a possibility to write German Umlaut letters and a lot of other letters and symbols using pure ASCII. These are called „entities„, btw.

Many people, including myself, started writing web pages using these transcriptions, in the assumption that they were required. Actually in the early days of the web there were some rumors that some browsers actually did not understand the character encodings that contained these letters properly, which was kind of plausible, because the late 80s and the 90s were the transition period where people discovered that computers are useful outside of the United States and at least for non-IT-guys it was or it should have been a natural requirement that computers understand their language or at least can process, store and transmit texts using the proper characters of the language. In case of German language this was not so terrible, because there were transcriptions for the special characters (ä->ae, ö->oe, ü->ue, ß->ss) that were somewhat ugly, but widely understandable to native German speakers. Other languages like Russian, Greek, Arabic or East-Asian languages were in more trouble, because they consist of „special characters“ only.

Anyway, this „ä“-transcription for web pages, which is actually superior to the „ae“, because the reader of the web page will read the correct „ä“, was part of the HTML-standard to support writing characters not on the keyboard. This was a useful feature in those days, but today we can find web pages that help use with the transliteration or just look up the word with the special characters in the web in order to write it correctly. Then we can as well copy it into our HTML-code, including all the special characters.

There could be some argument about UTF-8 vs. UTF-16 vs. ISO-8859-x as possible encodings of the web page. But in the area of the web this was never really an issue, because the web pages have headers that should be present and inform the browser about the encoding. Now I recommend to use UTF-8 as the default, because that includes all the potential characters that we might want to use sporadically. And then the Umlaut kann just directly be part of the HTML content. I converted all my web pages to use Umlaut-letters properly, where needed, without using entities in the mid 90s.

Some entities are still useful:

„<“ for „<“, because „<“ is used as part of the HTML-syntax
„&“ for „&“ for the same reason
„>“ for „>“ for consistency with „<„
„ “ for no-break-space, to emphasize it, since most editors make it hard to tell the difference between regular space and no-break-space.

Schreibe einen Kommentar

Beteilige dich an der Unterhaltung

Schreibe einen Kommentar

Antworten abbrechen