Shell Scripts

Shell scripts can be useful for writing small stuff like combining a few commands to pipes or doing a bit of „back ticking“. Even simple loops and if-conditions are possible. And if we want, it is almost a full programming language. A bit hard to tame, maybe, but quite a lot of stuff is possible. Those who like to know more about it may look into startup scripts of typical java software. Often a .bat and a .sh file are provided, where the right jvm is found, the classpath and the execution path and maybe some other environment are put together. In the end the .sh-file is quite a long and unreadable horror story and the .bat file is even much worse, because the cmd-language is just a lot more primitive and less capable and requires even worse hacks.

There are ways to make shell scripts more readable, which by themselves are truly admirable, but I think that route is wrong. We can learn all the Shell functionalities and understand bit by bit even more complex shell scripts, but I think for non trivial shell scripts it is time to switch to real programming languages instead. Scripting languages, of course, for example Perl, Ruby, Python or Lua. We may still execute „shell commands“, that are actually programs in /bin, /usr/bin or /usr/local/bin where they are powerful and more concise than writing purely in that programming language. But a magic for putting together a classpath is much cleaner in the Perl programming language than in pure bash (or worse cmd/bat).

This is of course another example of the Golden Hammer anti pattern. We should balance our tool box. Not add specific tools for making any minor task a bit easier on the expense of supporting one more tool, but keep a broad range of tools that in conjunction are very powerful. For example I would retire awk and sed and use either Perl or Ruby instead. We only have to keep them around because a lot of system tools that are just there still rely on them, but for a team I would deprecate awk and sed for new scripts or even for enhancing existing scripts. Bash would be ok only for small scripts, you can invent a line number or a maximum complexity, but for very short scripts I think bash is a legitimate tool.

Switch to Perl, Perl6, Ruby or… when you encounter any of the following:

  • The scripts is getting kind of long (>= 100 lines)
  • You find yourself modularizing it with functions
  • You find yourself using non trivial perl, ruby, sed or awk within the script, for example regex-stuff
  • The script need interaction
  • The scripts needs arrays, numbers or other types
  • More than one or two trivial if-statements or loop-statements are needed
  • Database access is done by the script (SQL or NoSQL)
  • String encoding becomes relevant
  • Quoting levels become an issue

This post was inspired by a similar post on the Isoblog by Kris. And the Shell Style Guide of Google is quite good especially in limiting the area where shell scripts are acceptable.

Share Button

tmp-directories

On all computers we have some concept of a tmp-directory. Typically it is /tmp on Linux- and Unix-systems and something like C:/TEMP plus some subdirectory in each users home directory on MS-Windows.

In terms of software development this tends to be some dark area. Programs like to create some files there, store some stuff there and then maybe remove it, maybe not. And we do not know for sure, when we can delete these files and we actually do not want to care. Linux and Unix-Systems sometimes clear their tmp-directories on reboot, while providing an additional /var/tmp-directory, that survives reboot. Sometimes the tmp-directory is deducted from shared memory, so it is kind of a RAM-disk, but usually stored in the swap partition (or swap file) of our OS. Now this cleanup on reboot does not help too much, when we want to keep our system running for a long time.

These days most computers are somehow dedicated. Either they are virtual computers that run exactly one server application or a set of closely related server applications. Or it is a mobile phone, tablet or desktop computer that is typically used by only one person. But still we should not forget that the system should allow being used by several applications and by several users. So sharing the same tmp-directory for everyone can cause some conflicts. The Unix- and Linux-family has a way of setting file permissions for the tmp-directory itself and for its entries that stop users from reading, changing or deleting each others files, but still there is some concurrency about using the namespace of this one directory, which is usually quite elegantly bypassed by each software by using smart naming or by having the OS create unique names. But I would not consider it ideal. On the other hand, sometimes we might actually want to use the tmp-directory to share something between users or between processes, where this one tmp-directory might come in handy.

The approach of having a separate tmp-directory in each home directory and in a sub directory of each server application’s installation is tempting, because it separates name spaces, allows to disallow reading the directory entries by others and does not mix totally unrelated stuff in one directory. There is a drawback to this. We usually have different storage technologies. Some are optimized for reading, maybe even avoiding redundancy, because the system can be reinstalled. Some use sporadic writing, some are strictly read-only. And some use a lot of reading and writing. Some data is transient, some can be easily restored and some data needs to be stored redundantly to be safe. Depending on that we should aim to put it on Flash disks, or on a different RAID setup of hard disks. This is getting harder with virtualization, but eventually we can get to the point where virtual computers have disks of different characteristics, that are mapped to the appropriate hardware.

So there is no real good answer to this question, but I think that a tmp-directory that is separate from the home directory, but specific to each user, would be the best approach. Will this change? Probably not so easily. But maybe in some distant future.

Share Button

MS-Windows-Encodings with CMD: Bug or Feature?

Deutsch

Whoever is working with MS-Windows, should know these black windows with CMD running in them, even though they are not really popular. The Unix and Linux guys hate them, because they are really primitive compared to their shells. Windows guys like to work graphically. Or they prefer powershell or bash with cygwin. Linux and Unix have the equivalent of these windows, but usually they are white. Being able to configure the colors on both systems in any way this is of no relevance.

NT-based MS-Windows systems (NT 3.x, 4.x, 2000, XP, Vista, 7, 8, 10) have several subsystems and programs are running in them, for example Win64, Win32 (or Wow64 on 64-bit-systems), Win16, cygwin (if installed), DOS… Because programs for the DOS subsystem are typically started in a CMD window, and because some of the DOS commands have equally named and similarly operating pendents in the CMD window, the CMD window is sometimes called DOS-window, which is just incorrect. Actually this black window comes into existence in many situations. Whenever a program is started that has input or output (stdin, stdout, stderr), a black window is provided aroudn, if no redirection is in place. This applies for CMD. Under Linux (and Unix) with X11 it is the other way round. You start the program that provides the window and it automatically starts the default shell within that window, unless something else is stated.

Now I recommend an experiment. You just need an MS-Windows installation with any graphical editor like emacs, gvim, ultraedit, textpad, scite, or even notepad. And a cmd-window.

  • Please type these commands, do not use copy/paste
  • In the cmd-window cd into a directory you may write in.
  • echo "xäöüx" > filec.txt. Yes, there are ways to type these letters even with an American keyboard. 🙂
  • Open the file with a graphical editor. How do the Umlauts look?
  • Use the editor to create a second file in the same directory with contents like this: yäöüy.
  • view it in CMD:
  • type fileg.txt
  • How do the Umlauts look?

It is a feature or bug, that all common MS-Windows versions are putting the umlauts to different positions then the graphical editors. If you know how to fix this, let me know.

What has happened? In the early 80es MS-DOS came into existence. By that time standards for character encoding were not very good. Only ASCII or ISO-646-IRV existed, which was at least a big step ahead of EBCDIC. But this standardized only the lower 128 characters (7 Bit) and lacked at some characters for almost any language other than English. It was tried to put a small number of these additional letters into the positions of irrelevant characters like „@“, „[„, „~“, „$“ etc. And software vendors started to make use of the upper 128 characters. Commodore, Atari, MS-DOS, Apple, NeXT, TeX and „any“ software came up with a specific way for that, often specific for a language region.

These solutions where incompatible with each other between different software systems, sometimes even between versions or language versions of the same software. Remember that at that time networks were unusual and when they existed, they were proprietary to the operating system with bridge solutions being extremely difficult to implement. Even formats for floppy disks (the three-dimensional incarnations of the save button) had proprietary formats. So it did not hurt so much to have incompatible encodings.

But relatively early X11 which became the typical graphical system for Unix and later Linux started to use standard encodings like the ISO-8859-x family, UTF-8 and UTF-16. Linux was already on ISO-8859-1 in version 0.99 in the early 90es and never tried to invent its own character encoding. Thank god for that….

Today all relevant systems have moved to Unicode standard and standardized encodings like ISO-8869-x, UTF-8, UTF-16… But MS-Windows has done that only partially. The graphical system is using modern encodings or at leas Cp1252, which is a decent approximation. But the text based system with the black window, in which CMD is running, is still using encodings from the MS-DOS times more than 30 years ago, like Cp850. This results in a break within the system, which is at least very annoying, when working with cygwin or CMD-windows.

Those who have a lot of courage can change this in the registry. Just change the entries for OEMCP and OEMHAL in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage simultaneously. One of them is for input, the other one for output. So if you change only one, you will even get inconsistencies within the window… Sleep well with these night mares. 🙂
Research in the internet has revealed that some have tried to change to utf-8 (CP65001) and got a system that could not even boot as a result. Try it with a copy of a virtual system without too much risk, if you like… I have not verified this, so maybe it is just bad rumors to create damage for a great company that has brought is this interesting zoo of encodings within the same system. But anyway, try it at your own risk.
Maybe something like chcp and chhal can work as well. I have not tried that either…

It is up to you if you consider this whole issue a bug or a feature.

Share Button