MS-Windows-Encodings with CMD: Bug or Feature?

Deutsch

Whoever is working with MS-Windows, should know these black windows with CMD running in them, even though they are not really popular. The Unix and Linux guys hate them, because they are really primitive compared to their shells. Windows guys like to work graphically. Or they prefer powershell or bash with cygwin. Linux and Unix have the equivalent of these windows, but usually they are white. Being able to configure the colors on both systems in any way this is of no relevance.

NT-based MS-Windows systems (NT 3.x, 4.x, 2000, XP, Vista, 7, 8, 10) have several subsystems and programs are running in them, for example Win64, Win32 (or Wow64 on 64-bit-systems), Win16, cygwin (if installed), DOS… Because programs for the DOS subsystem are typically started in a CMD window, and because some of the DOS commands have equally named and similarly operating pendents in the CMD window, the CMD window is sometimes called DOS-window, which is just incorrect. Actually this black window comes into existence in many situations. Whenever a program is started that has input or output (stdin, stdout, stderr), a black window is provided aroudn, if no redirection is in place. This applies for CMD. Under Linux (and Unix) with X11 it is the other way round. You start the program that provides the window and it automatically starts the default shell within that window, unless something else is stated.

Now I recommend an experiment. You just need an MS-Windows installation with any graphical editor like emacs, gvim, ultraedit, textpad, scite, or even notepad. And a cmd-window.

  • Please type these commands, do not use copy/paste
  • In the cmd-window cd into a directory you may write in.
  • echo "xäöüx" > filec.txt. Yes, there are ways to type these letters even with an American keyboard. 🙂
  • Open the file with a graphical editor. How do the Umlauts look?
  • Use the editor to create a second file in the same directory with contents like this: yäöüy.
  • view it in CMD:
  • type fileg.txt
  • How do the Umlauts look?

It is a feature or bug, that all common MS-Windows versions are putting the umlauts to different positions then the graphical editors. If you know how to fix this, let me know.

What has happened? In the early 80es MS-DOS came into existence. By that time standards for character encoding were not very good. Only ASCII or ISO-646-IRV existed, which was at least a big step ahead of EBCDIC. But this standardized only the lower 128 characters (7 Bit) and lacked at some characters for almost any language other than English. It was tried to put a small number of these additional letters into the positions of irrelevant characters like „@“, „[„, „~“, „$“ etc. And software vendors started to make use of the upper 128 characters. Commodore, Atari, MS-DOS, Apple, NeXT, TeX and „any“ software came up with a specific way for that, often specific for a language region.

These solutions where incompatible with each other between different software systems, sometimes even between versions or language versions of the same software. Remember that at that time networks were unusual and when they existed, they were proprietary to the operating system with bridge solutions being extremely difficult to implement. Even formats for floppy disks (the three-dimensional incarnations of the save button) had proprietary formats. So it did not hurt so much to have incompatible encodings.

But relatively early X11 which became the typical graphical system for Unix and later Linux started to use standard encodings like the ISO-8859-x family, UTF-8 and UTF-16. Linux was already on ISO-8859-1 in version 0.99 in the early 90es and never tried to invent its own character encoding. Thank god for that….

Today all relevant systems have moved to Unicode standard and standardized encodings like ISO-8869-x, UTF-8, UTF-16… But MS-Windows has done that only partially. The graphical system is using modern encodings or at leas Cp1252, which is a decent approximation. But the text based system with the black window, in which CMD is running, is still using encodings from the MS-DOS times more than 30 years ago, like Cp850. This results in a break within the system, which is at least very annoying, when working with cygwin or CMD-windows.

Those who have a lot of courage can change this in the registry. Just change the entries for OEMCP and OEMHAL in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage simultaneously. One of them is for input, the other one for output. So if you change only one, you will even get inconsistencies within the window… Sleep well with these night mares. 🙂
Research in the internet has revealed that some have tried to change to utf-8 (CP65001) and got a system that could not even boot as a result. Try it with a copy of a virtual system without too much risk, if you like… I have not verified this, so maybe it is just bad rumors to create damage for a great company that has brought is this interesting zoo of encodings within the same system. But anyway, try it at your own risk.
Maybe something like chcp and chhal can work as well. I have not tried that either…

It is up to you if you consider this whole issue a bug or a feature.

Share Button

Beteilige dich an der Unterhaltung

3 Kommentare

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

*