Running a Large Number of Servers

These days we often have to run a large number of servers, and the times where we could afford to manually log into each one to do system administration tasks are mostly over.

It turns out that there are always different approaches to deal with this. In most cases we are talking about virtual hosts, so we have a layer between via the visualization that can help us. We can have a number of master images and create virtual images from those even on demand in a matter of a minute. In case of MS-Windows it is an issue that they have some internal UUID as host-id which should be unique and which is heavily spread throughout the image, but this issue can be ignored if we do not worry about windows domains. Usually we do and I leave dealing with these issues to MS-Windows-experts.

Talking about Linux, we only need to make sure that the network interface is unique, which it is if we use hardware and do not mess around with it, but it is not necessarily if we use visualization and virtual network devices. This issue needs to be addressed, but it is well supported by common visualization tools. Another point is the host name. This is not too hard, because we only need to change it in one or two places, which can easily be done by a script. We can mount the image and do the change. Now the image can contain a start-up script that discovers on boot that it is a fresh copy and uses its host-name to retrieve further setup from some server. And we just have to maintain there which host has which setup. These can be automated to a very high extent. Then we can for example request a certain number of servers with certain software and configuration via an web interface. This creates new host-names, stores the setup with these host-names in its setup table, creates the virtual images, deploys them on any available hardware server and once they have stared they retrieve their setup from the server. We can also have master images that already contain certain predefined setups so that this second step is only needed for minor adjustments. We have to assume that these exist. Yes, this is called cloud technology.

If we keep the data somewhere else, these servers can be discarded and new ones can be created, so there is no need to do too complex stuff on them. Off course we want to run our software on them. So the day long procedure to install our software is not attractive any more. We need mechanisms that can be automated.

Running real hardware is a bit more demanding and for larger servers that might even be justifiable, because they do a lot of work for us. Quite often it is possible to actually do mechanisms quite similar to the virtual world even on real hardware. It is possible to boot the machine from an USB-stick which copies fetches an image and copies it on disk. Again only the host-name needs to be provided and then the rest can be automated. Another approach is to initially boot via the network, which is an option that most of us rarely use, but which is supported by the hardware. For running a large server farm such a hardware and bios setting can just be initially the default and from there machines can install and reconfigure themselves. In this case we probably need to use the Ethernet-address of the network device as a key to our setup table and we need to know what Ethernet addresses are in use. It is a big deal to set up such an environment, but once it is running, it is tremendously efficient. Homogenous hardware is off course essential, maybe an small number of hardware setups, but not a new model with each delivery. It is not enough that the new hardware is named the same as the old one, it needs to be able to run the same images without manual customization. It is possible to have a small number of images, but having to supply already different images for different server setups multiplying there number with the number of the hardware setups can grow out of control, if one of the numbers or both become too large.

Now we also have ways to actually access oure servers. there have been tools to run a shell just simultaneously on n hosts to do exactly the same at once. This is fine if they are exactly the same, but this is something we need to enforce or we need to accept that servers deviate. There are tools around to deal with these issues, but it is actually quite reasonable to do a script based approach. What we do is using ssh-key-exchange to make sure that we can log into the servers from some admin server without password. We can then define a subset of the set of our servers, which can be one, a couple, a large fraction, all or all with a few exceptions, for example. Then we distribute a script with scp to all the target machines in a loop. We run this on each target machine using ssh and parse the outputs to see which have been successful and which not. Here it is usually a good idea to have a farm of test servers to try this out first and then start on a small number of servers before running it on all of them.

The big bang philosophy of applying a change twice a year on the whole server landscape is not really a good idea here, because we can loose all our servers if we make a mistake and this can be hard to recover, although still have the same tools and scripts even for that, unless we really screw things up. So in these scenarios software that supports the interfaces of the previous version for its communication partners is useful, because it allows to do a smooth migration.

Just to give you a few hints: During some coffee break I suggested that Google has around a million servers. Even though there is no hard evidence for this, because this number seems to be confidential and only known to Google employees, I would say that this is a reasonable number. For sure they cannot afford a million system administrators. The whole processes needs to be very stream-lined. Or take the hosting provider where this site is running on. It is possible to have virtual web-hosts, in this case it is multiple sites running on the same virtual or physical machine sharing the same Apache instance with just different directories attached to different URL-patterns. This is available for very little money, again suggesting that they are tremendously efficient.

Share Button

Microsoft SQL Server will be available for Linux in 2017

Deutsch

Microsoft has officially announced that their database MS SQL Server will become available for Linux in 2017.

I think the time has come for this. Since the departure of Steve Ballmer Microsoft has become a little bit less religous and more pragmatic. There are good reasons to be skeptical about companies like Microsoft and Oracle, but having more competition and more choice is a good thing. Maybe the database product from Oracle is slightly better than MS SQL Server, but there are very few projects where this difference really matters. So now we have three important relational DB products: DB2, Oracle, MS-SQL-Server, PostgreSQL and MariaDB (the successor of mySQL). When starting a new project with no specific constraints about the DB I would usually look at PostgreSQL first, because it is a feature rich and powerful open source database. Since database products are usually something that cannot reasonably be changed within one software system for decades this is a good thing, because we never know what the big companies want to do in such a long time scale. If the migration to another DB product is easy, then the software does not really make use of the power and the features of the DB. And it will not be easy anyway.

There are a lot of cases where the combination of MS-SQL-Server with Linux will make a lot of sense. Since there are software systems that make use of this DB product, it gives the flexibility to run the DB on Linux servers. And maybe avoid an expensive migration to another DB product. As I already said it gives one more choice. In development environments where MS-products are commonly used, it gives one more combination. And eventually it will encourage Oracle and IBM a little bit to refrain from excessive price increases.

* MS official: Announcing SQL Server on Linux

Share Button

How to make a scanned PDF smaller (Linux)

When scanning a paper, it is possible to use a lot of parameters within xsane. The output format can be chosen also, for example PNG, JPG or PDF. The outcome may be a PDF-file that is way too big, easily more than 10 megabytes for a single page. It is quite easy to transform it to a smaller file:

convert -density 200x200 -quality 60 -compress jpeg \
big-scanned-file.pdf compressed-scanned-file.pdf

Unless you scan very often, it is easier to scan once with a relatively high resolution and then run this conversion with different values for quality and density rather than running the time consuming scan with different xsane settings.

Share Button

Chemnitzer Linux-Tage

In the German city Chemnitz the conference „Chemnitzer Linux-Tage“ (Chemnitz Linux days)
will take place from 2016-03-19 to 2016-03-20.

Links:
* Informatik aktuell (German)
* Wikipedia (German)
* Offiical page (German)

Share Button

How to create ISO Date String

It is a more and more common task that we need to have a date or maybe date with time as String.

There are two reasonable ways to do this:
* We may want the date formatted in the users Locale, whatever that is.
* We want to use a generic date format, that is for a broader audience or for usage in data exchange formats, log files etc.

The first issue is interesting, because it is not always trivial to teach the software to get the right locale and to use it properly… The mechanisms are there and they are often used correctly, but more often this is just working fine for the locale that the software developers where asked to support.

So now the question is, how do we get the ISO-date of today in different environments.

Linux/Unix-Shell (bash, tcsh, …)

date "+%F"

TeX/LaTeX


\def\dayiso{\ifcase\day \or
01\or 02\or 03\or 04\or 05\or 06\or 07\or 08\or 09\or 10\or% 1..10
11\or 12\or 13\or 14\or 15\or 16\or 17\or 18\or 19\or 20\or% 11..20
21\or 22\or 23\or 24\or 25\or 26\or 27\or 28\or 29\or 30\or% 21..30
31\fi}
\def\monthiso{\ifcase\month \or
01\or 02\or 03\or 04\or 05\or 06\or 07\or 08\or 09\or 10\or 11\or 12\fi}
\def\dateiso{\def\today{\number\year-\monthiso-\dayiso}}
\def\todayiso{\number\year-\monthiso-\dayiso}

This can go into a file isodate.sty which can then be included by \include or \input Then using \todayiso in your TeX document will use the current date. To be more precise, it is the date when TeX or LaTeX is called to process the file. This is what I use for my paper letters.

LaTeX

(From Fritz Zaucker, see his comment below):

\usepackage{isodate} % load package
\isodate % switch to ISO format
\today % print date according to current format

Oracle


SELECT TO_CHAR(SYSDATE, 'YYYY-MM-DD') FROM DUAL;

On Oracle Docs this function is documented.
It can be chosen as a default using ALTER SESSION for the whole session. Or in SQL-developer it can be configured. Then it is ok to just call

SELECT SYSDATE FROM DUAL;

Btw. Oracle allows to add numbers to dates. These are days. Use fractions of a day to add hours or minutes.

PostreSQL

(From Fritz Zaucker, see his comment):

select current_date;
—> 2016-01-08


select now();
—> 2016-01-08 14:37:55.701079+01

Emacs

In Emacs I like to have the current Date immediately:

(defun insert-current-date ()
"inserts the current date"
(interactive)
(insert
(let ((x (current-time-string)))
(concat (substring x 20 24)
"-"
(cdr (assoc (substring x 4 7)
cmode-month-alist))
"-"
(let ((y (substring x 8 9)))
(if (string= y " ") "0" y))
(substring x 9 10)))))
(global-set-key [S-f5] 'insert-current-date)

Pressing Shift-F5 will put the current date into the cursor position, mostly as if it had been typed.

Emacs (better Variant)

(From Thomas, see his comment below):

(defun insert-current-date ()
"Insert current date."
(interactive)
(insert (format-time-string "%Y-%m-%d")))

Perl

In Perl we can use a command line call

perl -e 'use POSIX qw/strftime/;print strftime("%F", localtime()), "\n"'

or to use it in larger programms

use POSIX qw/strftime/;
my $isodate_of_today = strftime("%F", localtime());

I am not sure, if this works on MS-Windows as well, but Linux-, Unix- and MacOS-X-users should see this working.

If someone has tried it on Windows, I will be interested to hear about it…
Maybe I will try it out myself…

Perl 5 (second suggestion)

(From Fritz Zaucker, see his comment below):

perl -e 'use DateTime; use 5.10.0; say DateTime->now->strftime(„%F“);‘

Perl 6

(From Fritz Zaucker, see his comment below):

say Date.today;

or

Date.today.say;

Ruby

This is even more elegant than Perl:

ruby -e 'puts Time.new.strftime("%F")'

will do it on the command line.
Or if you like to use it in your Ruby program, just use

d = Time.new
s = d.strftime("%F")

Btw. like in Oracle SQL it is possible add numbers to this. In case of Ruby, you are adding seconds.

It is slightly confusing that Ruby has two different types, Date and Time. Not quite as confusing as Java, but still…
Time is ok for this purpose.

C on Linux / Posix / Unix


#include
#include
#include

main(int argc, char **argv) {

char s[12];
time_t seconds_since_1970 = time(NULL);
struct tm local;
struct tm gmt;
localtime_r(&seconds_since_1970, &local);
gmtime_r(&seconds_since_1970, &gmt);
size_t l1 = strftime(s, 11, "%Y-%m-%d", &local);
printf("local:\t%s\n", s);
size_t l2 = strftime(s, 11, "%Y-%m-%d", &gmt);
printf("gmt:\t%s\n", s);
exit(0);
}

This speeks for itself..
But if you like to know: time() gets the seconds since 1970 as some kind of integer.
localtime_r or gmtime_r convert it into a structur, that has seconds, minutes etc as separate fields.
stftime formats it. Depending on your C it is also possible to use %F.

Scala


import java.util.Date
import java.text.SimpleDateFormat
...
val s : String = new SimpleDateFormat("YYYY-MM-dd").format(new Date())

This uses the ugly Java-7-libraries. We want to go to Java 8 or use Joda time and a wrapper for Scala.

Java 7


import java.util.Date
import java.text.SimpleDateFormat

...
String s = new SimpleDateFormat("YYYY-MM-dd").format(new Date());

Please observe that SimpleDateFormat is not thread safe. So do one of the following:
* initialize it each time with new
* make sure you run only single threaded, forever
* use EJB and have the format as instance variable in a stateless session bean
* protect it with synchronized
* protect it with locks
* make it a thread local variable

In Java 8 or Java 7 with Joda time this is better. And the toString()-method should have ISO8601 as default, but off course including the time part.

Summary

This is quite easy to achieve in many environments.
I could provide more, but maybe I leave this to you in the comments section.
What could be interesting:
* better ways for the ones that I have provided
* other databases
* other editors (vim, sublime, eclipse, idea,…)
* Office packages (Libreoffice and MS-Office)
* C#
* F#
* Clojure
* C on MS-Windows
* Perl and Ruby on MS-Windows
* Java 8
* Scala using better libraries than the Java-7-library for this
* Java using better libraries than the Java-7-library for this
* C++
* PHP
* Python
* Cobol
* JavaScript
* …
If you provide a reasonable solution I will make it part of the article with a reference…
See also Date Formats

Share Button

Changing of Keyboard Mappings with xmodmap

Deutsch

Introduction

When running a Linux system in its graphical mode, keyboard mappings can be changed by using xmodmap.
Each key on the keyboard has a „keycode“ which can be found out by looking at the output of
xmodmap -pke > current-keyboard
or by running
xev
for trying out the keys.

I am using a modified German keyboard, but off course the ideas can be adapted to any setup.

Given setting as a Basis

You can start with any keyboard, for example with the German keyboard with no dead keys, which is often useful as a starting point. I prefer to modify it a little bit. This allows me to support more languages like Swedish, Norwegian, Danish, Dutch, Spanish and Esperanto. Russian is an issue that I will address in another article. On the other hand it is a good idea to have secondary positions for some symbols that are on keys that might be missing on some physical layouts, like „<", "|" and ">“ on the German keyboard, whose key is just not present in the American keyboard. The third idea is to have two Altgr-keys, because many important symbols are just accessible in conjunction with Altgr and thus easier if there are two Altgr-keys like there are two Shift-keys.

Special characters for Esperanto

For Esperanto (Esperanto explained in Esperanto) the latin alphabet with its 26 letters is needed, even though some of them are never used. And on top of that the following letters are needed as well:
ĉ Ĉ ĝ Ĝ ĵ Ĵ ĥ Ĥ ŝ Ŝ ŭ Ŭ
Unfortunately they have not reused the letters commonly used in Slavic languages and present on many international setups:
č Č ž Ž š Š
but Unicode covers it all and as long as the keyboard does not need to support Slavic or Baltic or Sami languages simultaneously with Esperanto, things should be fine.

A reasonable approach is tu put these symbols on Altgr-C, Altgr-G, Altgr-J, Altgr-H, Altgr-S and Altgr-U.

This can be achieved easily:
Create a file
.xmodmap-ori
by

xmodmap -pke > .xmodmap-ori

.
And a script $HOME/bin/orikb:

#!/bin/sh
xmodmap $HOME/.xmodmap-ori

Look up where the letters S, G, H J C and U are positioned on the keyboard in the xmodmap-ori-file.
Now create a file
.xmodmap-esperanto
using the following command

egrep 'keycode *[0-9]+ *= *[SsGgCcJjHhUu] ' < .xmodmap-ori

and edit it. Leave the part keycode = intact and change it to something like this:

keycode 39 = s S s S scircumflex Scircumflex
keycode 42 = g G g G gcircumflex Gcircumflex
keycode 43 = h H h H hcircumflex Hcircumflex
keycode 44 = j J j J jcircumflex Jcircumflex
keycode 54 = c C c C ccircumflex Ccircumflex
keycode 30 = u U u U ubreve Ubreve

The numbers between "keycode" and "=" could be different on your machine, but the rest should be like that.

Now create two scripts
$HOME/bin/eokb

#!/bin/sh
xmodmap $HOME/.xmodmap-esperanto

and
$HOME/bin/orikb

#!/bin/sh
xmodmap $HOME/.xmodmap-ori

.

Do not forgot to do the

chmod +x $HOME/bin/eokb $HOME/bin/orikb

for your scripts... 🙂

Now you can use eokb for enabling the Esperanto keys and orikb to return to your original setting.
The Esperanto keys will be accessible by using Altgr and the Latin letter they are derived from.

Other language specific characters

In a similar way you can have other characters
å and Å on the A,
ë and Ë on the E
ï and Ï on the I
ø and Ø on the O
æ and Æ on the Ä
ÿ and Ÿ on the Y
ñ and Ñ on the N
< on the , > on the .
| on the -

This allows to write a lot of languages. See what works for you...

Remaining Issues

For Russian I have bought a physical Cyrillic keyboard and I am using it with the setup that is printed on the keys. I might write about this another time.

Generally I like to have two Altgr keys and I can very well live without a windows key. I might write about this another time as well.

Btw. this article has been written for Linux, but it works perfectly well with any other Unix-like system, as long as X11 is used, just think of Aix, HPUX, BSD, Solaris and likewise systems. But Linux is by far the most common unix like desktop system these days.

Here is another approach:
xkbmap

Share Button

Dämonisierung von Prozessen

Auf Unix- und Linux-artigen Systemen laufen immer einige sogenannte Daemon-Prozesse. Diese laufen im Hintergrund, haben also keine Verbindung mit einem Terminal.
Beim Start kann man eine sogenannte Daemonisierung verwenden. Man startet von dem interaktiv gestarteten Prozess einen Child-Prozess. Dieser hat noch Verbindung zum ersten und damit zum Terminal. Nun startet man von diesem den eigentlichen Daemon-Prozess und beendet den Child-Prozess. Nun ist die Verbindung zum Parent-Prozess gekappt und der Daemon-Prozess wird damit zum Child-Prozess des Init-Prozesses. Wenn sich der Daemon beendet, wird durch init regelmäßig wait() aufgerufen und damit verhindert, dass dieser als Zombie-Prozess noch lange im System unterwegs ist. Nun muss der Daemon-Prozess aber informiert werden, wann der Child-Prozess beendet ist.
Dies kann wie folgt erfolgen:

/* (C) IT Sky Consulting GmbH 2014
 * http://www.it-sky-consulting.com/
 * Author: Karl Brodowsky
 * Date: 2014-02-27
 * License: GPL v2 (See https://de.wikipedia.org/wiki/GNU_General_Public_License )
 */

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
#include <signal.h>

int daemonized = 0;

void signal_handler(int signo) {
  daemonized = 1;
}

int main(int argc, char *argv[]) {
  int fork_result;
  int ret_code;
  int status;
  int pid = getpid();

  /* set pgid to pid */
  setpgid(pid, pid);
  signal(SIGUSR1, signal_handler);

  /* first fork() to create child */
  fork_result = fork();
  if (fork_result == 0) {
    printf("In child\n");

    /* second fork to create grand child */
    fork_result = fork();
    if (fork_result != 0) {
      /* exit child, make grand child a daemon */
      printf("exiting child\n");
      exit(0);
    }
    
    /* in daemon (grand child) */
    pid = getpid();
    while (! daemonized) {
      pause();
    }

    printf("daemon has pid=%d pgid=%d ppid=%d\n", pid, getpgid(pid), getppid());

    /* do daemon stuff */
    sleep(30);
    printf("done with daemon\n");
    exit(0);
  }
  printf("parent waiting for child\n");
  wait(&status);
  printf("child terminated\n");
  kill(-getpid(), SIGUSR1);
  printf("parent done\n");
}

Durch das Setzen einer Prozessgruppen-ID ist es möglich, den Parent-Prozess, den Child-Prozess und den Daemonprozess auch über diese gemeinsame Gruppen-Id anzusprechen, ohne die eigentliche pid zu kennen. Negative Werte als Funktionsparameter für Funktionenen, die dort eine Prozess-Id (pid) erwarten, werden oft als pgid (Prozessgruppen-ID) interpretiert. Wenn der Child-Prozess beendet ist, wird das wait im Parent-Prozess beendet und dieser schickt ein Signal an den Daemon-Prozess, das von diesem ignoriert wird, aber zur Beendigung des Wait-Prozesses führt.

Für ein kleines Beispiel mag es noch akzeptabel sein, nach stdout zu schreiben, aber die Ausgaben eines Daemons sollten natürlich am Ende in einer Log-Datei landen. Das kann durch Ausgabeumleitung oder noch schöner durch Verwendung von syslog geschehen, ist aber ein Thema für sich.

Share Button

Systemprogrammierung

So etwas muss man ja heute selten machen, Applikationsentwicklung ist eher die Tätigkeit, mit der man sich herumschlägt, wenn man kundenspezifische „Business-Software“ entwickelt oder anpasst. Bei der Systemprogrammierung schreibt man Software, die direkt auf Betriebssystemfunktionen und Hardware zugreift oder Teile von Betriebssystemen im weiteren Sinn, also nicht unbedingt nur Teile des Linux-Kernels oder Kernelmodule, sondern auch so etwas wie ls, mv oder auch Datenbanken und Webserver.

Aber gelegentlich kommt es doch in Projekten vor, dass man solche Kenntnisse einsetzen muss und dann ist es auch gut, sie zu haben. Für ein Projekt in der Vergangenheit, als es darum ging, Serversoftware für Billingsysteme zu entwickeln, die in C für Solaris geschrieben wurde, mit anderen Komponenten kommunizieren sollte und dann auch noch Performance bringen sollte, die ausreicht, um wenigstens jeweils in 24 Stunden mindestens die Daten zu verarbeiten, die durch die Telefonate innerhalb von 24 Stunden angesammelt wurden, was sich noch alle paar Monate verdoppelte.

Auch um bei Fahrkartenautomaten einer großen Bahngesellschaft die betriebssystemnahe Funktionalität zum Betrieb und zur Wartung von Fahrkartenautomaten zur Verfügung zu stellen, war es erforderlich, sich in diesem Bereich bewegen zu können.

Zur Zeit halte ich eine Vorlesung an einer Fachhochschule (ZHAW) in Zürich über Systemprogrammierung. Deshalb wird hier vielleicht auch gelegentlich einmal der eine oder andere Artikel zu Themen aus dem Gebiet auftauchen.

Die Beispielprogramme, die ich zu dem Thema erstelle, sind als Open-Source-Software in github und unter den Bedingungen der GPL v2 für jeden Interessenten verfügbar.

Share Button

Unicode, UTF-8, UTF-16, ISO-8859-1: Why is it so difficult?

Deutsch

Since about 20 years we have been kept busy with the change to Unicode.

Why is it so difficult?

The most important problem is that it is hard to tell how the content of a file is to be interpreted. We do have some hacks that often allow recognizing this:
The suffix is helpful for common and well defined file types, for example .jpg or .png. In other cases the content of the file is analyzed and something like the following is found in the beginning of the file:

#!/usr/bin/ruby

From this it can be deduced that the file should be executed with ruby, more precisely with the ruby implementation that is found under /usr/bin/ruby. If it should be the ruby that comes first in the path, something like

#!/usr/bin/env ruby

could be used instead. When using MS-Windows, this works as well, when using cygwin and cygwin’s Ruby, but not with native Win32-Ruby or Win64-Ruby.

The next thing is quite annoying. Which encoding is used for the file? It can be a useful agreement to assume UTF-8 or ISO-8859-1, but as soon as one team member forgets to configure the editor appropriately, a mess can be expected, because files appear that mix UTF-8 and ISO-8859-1 or other encodings, leading to obscure errors that are often hard to find and hard to fix.

Maybe it was a mistake when C and Unix and libc were defined and developed to understand files just as byte sequences without any meta information about the content. In the internet mime headers have proved to be useful for email and web pages and some other content. This allows the recipient of the communication to know how to interpret the content. It would have been good to have such meta-information also for files, allowing files to be renamed to anything with any suffix without loosing the readability. But in the seventies, when Unix and C and libc where initially created, such requirements were much less obvious and it was part of the beauty to have a very simple concept of an I/O-stream universally applicable to devices, files, keyboard input and some other ways of I/O. Also MS-Windows has probably been developed in C and has inherited this flaw. It has been tried to keep MS-Windows runnable on FAT-file-systems, which made it hard to benefit from the feature of NTFS of having multiple streams in a file, so the second stream could be used for the meta information. But as a matter of fact suffixes are still used and text files are analyzed for guessing the encoding and magic bytes in the beginning of binary files are used to assume a certain type.

Off course some text formats like XML have ways of writing the encoding within the content. That requires iterating through several assumptions in order to read up to that encoding information, which is not as bad as it sounds, because usually only a few encodings have to be tried in order to find that out. It is a little bit annoying to deal with this when reading XML from a network connection and not from a file, which requires some smart caching mechanism.

This is most dangerous with UTF-8 and ISO-8859-x (x=1,2,3,….), which are easy to mix up. The lower 128 characters are the same and the vast majority of the content consists of these characters for many languages. So it is easy to combine two files with different encodings and not recognizing that until the file is already somewhat in use and has undergone several conversion attempts to „fix“ the problem. Eventually this can lead to byte sequences that are not allowed in the encoding. Since spoken languages are usually quite redundant, it usually is possible to really fix such mistakes, but it can become quite expensive for large amounts of text. For UTF-16 this is easier because files have to start with FFFE or FEFF (two bytes in hex-notation), so it is relatively reliable to tell that a file is utf-16 with a certain endianness. There is even such a magic sequence of three bytes to mark utf-8, but it is not known by many people, not supported by the majority of the software and not at all commonly used.

In the MS-Windows-world things are even more annoying because the whole system is working with modern encodings, but this black CMD-windows is still using CP-850 or CP-437, which contain the same characters as ISO-8859-1, but in different positions. So an „ä“ might be displayed as a sigma-character, for example. This incompatibility within the same system does have its disadvantages. In theory there should be ways to fix this by changing some settings in the registry, but actually almost nobody has done that and messing with the registry is not exactly risk-less.

Share Button

Virtuellen Speicher überbelegen

So etwas müßte ja offensichtlich nicht gehen, man kann ja nur das belegen, was man hat…

Ich schreibe hier mal wieder mit Blick auf die Linux-Speicherverwaltung, die ich am besten kenne, aber die Ideen stammen teilweise von früheren Unix-Systemen.

Ein System hat einen physikalischen Speicher (RAM) von einer bestimmten Größe, heute meistens ein paar Gigabyte, aber mein erstes Linux lief in den frühen 90er-Jahren auch mit 4 Megabyte irgendwie. Oft wird das noch ergänzt durch einen sogenannten Swap- oder Paging-Bereich. Das bedeutet, daß man mehr Speicher belegen kann, als da ist und der weniger genutzte Bereich wird auf die Festplatte ausgelagert. Im Falle von ausführbaren Programmen reicht es dafür oft, diese zu memory-mappen, das heißt, daß man die Stelle auf der Platte, wo das Binary steht, verwendet und von dem Programm jeweils nur die aktuell benötigten Teile in den Hauptspeicher lädt, statt das im Swap-Bereich zu duplizieren.

Schön ist es immer, wenn man diesen Swap-Bereich nicht braucht, denn dann läuft alles schneller und RAM ist ja so billig geworden. Gerade wenn man nur SSDs hat, ist das durchaus eine valable Überlegung, weil die SSDs nicht so viel billiger als RAM sind und weil die SSDs durch das besonders häufige Schreiben bei Verwendung als Swap relativ schnell aufgebraucht werden. Mit guten SSDs sollte das durchaus noch brauchbar funktionieren, weil deren Lebensdauer so gestiegen ist, aber dafür muß man schon genau schauen, daß man solche mit vielen Schreibzyklen und einer hohen Schreibgeschwindigkeit kauft und die sind dann wirklich nicht mehr so billig.

Letztlich funktoniert es aber mit magnetischen Festplatten oder mit SSD prinzipiell genauso, solange die Hardware mitmacht. Einen Teil des belegten Speichers im Swap zu haben, ist nicht unbedingt falsch, denn viele Programme belegen Speicherbereiche und benutzen die nur sehr selten. Und weil in der Zeit, wo Speicherbereiche in den Swap-Bereich ein- und ausgelagert werden, andere Threads noch weiterrechnen können, kann es durchaus auch sein, daß man mit einem Teil der häufiger gebrauchten Speicherbereiche im Swap noch eine passable Performance erreichen kann. Das ist vor allem für seltener gebrauchte oft Software vertretbar, wobei man aber den Anwendungsfall genau anschauen muß. Zum Beispiel muß eine aufwendige Jahresendverarbeitung gut geplant werden, weil sie zu einem bestimmten Zeitpunkt abgeschlossen sein soll und vielleicht besonders viele Ressourcen benötigt.

Den Swap-Bereich und den physikalischen Speicher zusammen nennt man virtuellen Speicher. Von diesem wird noch ein Teil durch das Betreibssystem belegt, so daß man den Rest zur Verfügung hat. Nun kann man unter Linux diesen virtuellen Speicher tatsächlich überbelegen. Das nennt sich „overcommit“. Dabei wird in den Kernel-Einstellungen ein Parameter entsprechend gesetzt.

Man kann abfragen, welche overcommit-Strategie eingeschaltet ist:

$ cat /proc/sys/vm/overcommit_memory
0

0 bedeutet, daß nur Speicheranforderungen, die nach einer Heuristik plausibel ist, erlaubt wird, 1 bedeutet, daß alle Speicheranforderungen grundsätzlich erlaubt werden.

$ cat /proc/sys/vm/overcommit_ratio
50

bedeutet, daß eine Überbelegung von 50%, also ein Faktor von bis zu 1.5 möglich ist.
Mit sysctl als root oder durch schreiben in diese Pseudo-Dateien kann man das einmalig ändern, mit Einträgen in /etc/sysctl.conf für die Zukunft ab dem nächsten reboot.

Ein paar Fragen, die sich aufdrängen sind:

  • Wie kann das überhaupt funktionieren?
  • Wofür braucht man das?
  • Was sind die Risiken?

Es gibt verschiedene Möglichkeiten, Speicher anzufordern, aber es reduziert sich im wesentlichen auf malloc und Konsorten, fork und exec.

malloc(), calloc(), sbrk(), realloc() und einige andere fordern einen Speicherbereich an, den ein Programm benutzen möchte. Solange da nichts reingeschrieben wurde, brauchen diese Bereiche aber nicht wirklich zur Verfügung gestellt werden. Erst wenn tatsächlich die Zugriffe darauf erfolgen, ist das nötig. Das soll von älteren Fortran-Versionen kommen. Dort hat man Matrizen und Vektoren statisch allozieren müssen, es mußte also im Programm hardcodiert sein, wie groß sie sind. Für kleinere Matrizen hat man einfach nur ein Rechteck in der linken oberen Ecke benutzt. So haben Fortran-Programme früher viel mehr Speicher angefordert, als sie wirklich brauchten und RAM war damals sehr teuer und auch der Plattenplatz für Swap war teuer. So konnte man mit entsprechender Einschätzung der Charakteristik der laufenden Programme mit overcommit das System tunen und mehr damit machen. Auch heute dürften noch viele Programme Speicher allozieren, den sie nicht brauchen, auch wenn das sich heute leichter umgehen ließe als mit Fortran66. Ein spezieller Fall sind Java-Programme, die einen recht großen, konfigurierbaren Speichereich anfordern und den dann weitgehend selber verwalten.

Der andere Fall ist das Starten von Prozessen. Ein neuer Prozess braucht Speicher. Nun wird unter Linux ein Prozess in zwei Schritten gestartet. Mit fork() wird ein laufender Prozess dupliziert. Wenn man also von einem großen Prozess aus ein fork() aufruft, braucht man potentiell sehr viel memory. Das wird aber nicht so heiß gegessen wie gekocht. Denn Linux kennt einen Mechanismus „copy on write“, das heißt, die Speicherbereiche werden von beiden Prozessen gemeinsam benutzt, bis einer der Prozesse dort etwas schreibt. Das darf der andere nicht sehen, weshalb der eine Speicherblock vorher dupliziert werden muß, so daß jeder Prozess seinen eigenen hat. fork() lädt also durchaus dazu ein, mit overcommit verwendet zu werden. Um nun einen anderen Prozess zu starten, wird exec() aufgerufen. Dann wird ein neues Programm im laufenden Prozess ausgeführt und dabei fast der gesamte Speicher des Prozesses neu initialisiert. Das kann Memory freigeben oder zusätzlich in Anspruch nehmen. Erhalten bleiben bei exec() zum Beispiel die Umgebungsvariablen und die offenen Dateien und Verbindungen.

Wenn nun aber das ganze angeforderte Memory den virtuellen Speicher überschreitet und auch tatsächlich verwendet wird? Dann tritt der out-of-memory-Killer auf den Plan und beendet einige Prozesse, bis der Speicher wieder ausreicht. Es ist also wichtig, daß der Systemadministrator seine Systeme kennt und beobachtet, wenn diese nicht krass überdimensioniert sind. So oder so führt es zu Problemen, wenn man den virtuellen Speicher vollständig ausschöpft, weil viele Programme bei scheiternden mallocs(), forks() u.s.w. auch nicht mehr korrekt funktionieren können. Damit meine ich nicht unbedingt, daß man sich jeden Tag auf allen Systemen einloggt, um zu schauen, wie es denen geht, sondern eher, daß man Skripte oder Werkzeuge verwendet, um die Daten zu erfassen und zu aggregieren.

Ich habe einmal für ein paar Monate etwa 1’000 Rechner administriert. Wenn man sich in jeden einloggen muß und dabei pro Rechner etwa 10 min verbringt, sind das 10’000 Minuten, also etwa 167 Stunden. Man wäre allein damit also einen Monat lang voll beschäftigt. Benötigt wurden weitgehende Automatisierungen, die es ermöglichten, die Aufgabe in etwa 20 Stunden pro Woche zu bewältigen. Ja, es waren eine sehr homogene Rechnerlandschaft, sonst wäre der Aufwand sicher größer gewesen.

Share Button