# Exceptions to implement Program Logic

Sometimes it is conveniant to use exceptions for implementing the regular program logic.

Assuming we want to find some data and then process them. When no data is found, this step can be skipped, because there is nothing to do. So we could program something like this:

 public Data readData(Param param) {     Data data = db.read(param);     if (data.isEmpty()) {         throw new NotFoundException("nothing found");     }     return data; }

 

public ProcessedData doWork(Param param) {     try {         Data input = readData(param);         ....         ....         ....         ProcessedData result = ....         return result;     } catch (NotFoundException nfex) {         return ProcessedData.empty();     } } 

And some other exceptions could also be handled in a similar way.

Of course some people say, that this is not good and an abuse of exceptions. But sometimes it is tempting.

So is this bad? And if so, why? Let’s find out.

This is some kind of weird obfuscation of the control flow, because throwing and catching of exceptions can be far apart and it can become quite unclear, from where in the stack which exceptions can be thrown. So there are good reasons to recommend using exceptions only for what they are meant for by their name. The Goto has never made it into Java and we are discouraged from using it in many other languages, like C. But languages like Java, C, Perl, Ruby and some others provide quite rich control flow relying neither on goto nor on exceptions by using „return“ anywhere in a function or method or subroutine, leaving loops with „break“ or „last“ or going to the next iteration with „next“ or „continue“. Perl and Java even allow to specify which of nested loops to leave with break or last. These mechanisms are very powerful and there is no urgent need to add exceptions or even gotos just to support the control flow.

Once moving to newer languages like Scala much of this is gone or at least strongly discouraged in a purely functional programming style. This makes programming Scala harder, and comes with benefits that might be worth the extra effort.

But in Java these functional purists have not become very strong yet, so using „break“, „continue“, „return“ etc. is still ok and quite powerful.

In Java there is another very major problem with exceptions. Many, if not most Java programs run in a framework or container like Spring, EJB/JEE, JBoss Fuse, for example. Now a piece of software becomes a software component, that can interact with other components through the framework. And exceptions are noticed by the framework. In many cases they have the effect that an ongoing transaction is marked as „rollback only“. So the whole processing continues normally, and when all the code from the components is finally done, the framework performs a rollback instead of a commit.

As long as exceptions are only used for handling errors or unsual situations, in which cases the rollback is probably the way to go anyway, everything is fine. But if we for example look up something and base the further processing on the outcome of this, then a NotFoundException will result in very counter intuitive behavior.

So the original rule of not abusing exceptions is actually not such a bad idea.

# Unicode and C

It is a common practice in C to use arrays of char as strings. The 0 is used as end marker.

The whole thing was created like that in the 1970s and at that time it was kind of cool to get away with one less language feature and to express it in terms of others instead. And people did not think enough about the necessity to express more than ISO 646 IRV (commonly called ASCII) as string content.

This extended out of the box to 8 bit character sets like ISO 8859-1 or KOI-8, that are identical to ISO 646 in the lower 128 characters and contain an extension in the upper 128 characters. But fortunately we have moved ahead and now Unicode with its encodings UTF-8, UTF-16 and UTF-32.

How can we deal with this in C?

UTF-8 just works out of the box, because the byte 0 is only used to encode the code point U+0000. So the null termination can be kept as it is and a lot of functionality remains valid. Some issues arise, because in UTF-8 things like finding the logical length of a string, not its memory consumption or finding the nth code point, not the nth byte, require UTF-8-logic to be applied and to parse the whole string at least from the beginning to the desired position or the usage of an indexing facility. So a lot of non-trivial string functionality of the standard library will just not be as easy as people thought it would be in the 1970s and subsequently not work as needed. Libraries for better UTF-8-support in C can be found, leaving the „native“ C strings with UTF-8 content only for usage in interfaces that require them. I have not yet explored such libraries, but it would be interesting to find out how powerful and useful they are.

At the time when Unicode came out, it seemed to be sufficient to have 16 bits per character instead of 8. Java was built on this assumption. C added a wchar_t to allow for this and just required it to be „long enough“. So Linux uses 32 bits and MS-Windows 16 bits. This is not too bad, because programming in C for MS-Windows and for Linux is anyway quite different, unless we abstract the differences into a library, which would then also include a common string definition and string handling functionality. While the Linux wchar_t is sufficient, it really wastes a lot of memory, which is often undesirable, if we go the extra effort to program in C in order to gain performance. The Windows-wchar_t is „kind of sufficient“, as are the Java-Strings, because we can really do a lot with assuming that Unicode is only 16 bit or with UTF-16 and ignoring the complexities of that, that are in principal the same as for UTF-8, but can be ignored with less disadvantages most of the time. The good news is, that wchar_t is well supported by standard library functions.

Another way is to use char16_t and char32_t, that have a clear definition of their length, but much less library support.

Probably these facilities are sufficient for software whose string handling is relatively trivial. For more ambitious string handling in terms of functionality and performance, it will be necessary to find third party libraries or to write them.

# Meaningless Whitespace in Textfiles

We use different file formats that are more or less tolerant to certain changes. Most well known is white space in text files.

In some programming languages white space (space, newline, carriage return, form feed, tabulator, vertical tab) has no meaning, as long as any whitespace is present. Examples for this are Java, Perl, Lisp or C. Whitespace, that is somehow part of String content is always significant, but white space that is used within the program can be combination of one or more of the white space characters that are in the lower 128 positions (ISO-646, often referred to as ASCII or 7bit ASCII. It is of course recommended to have a certain coding standard, which gives some guidelines of when to use newlines, if tabs or spaces are preferred (please spaces) and how to indent. But this is just about human readability and the compiler does not really care. Line numbers are a bit meaningful in compiler and runtime error messages and stack traces, so putting everything into one line would harm beyond readability, but there is a wide range of ways that are all correct and equivalent. Btw. many teams limit lines to 80 characters, which was a valid choice 30 years ago, when some terminals were only 80 characters wide and 132 character wide terminals where just coming up. But as a hard limit it is a joke today, because not many of us would be able to work with a vt100 terminal efficiently anyway. Very long lines might be harder to read, so anything around 120 or 160 might still be a reasonable idea about line lengths…

Languages like Ruby and Scala put slightly more meaning into white space, because in most cases a semicolon can be skipped if it is followed by a newline and not just horizontal white space. And Perl (Perl 5) is for sure so hard to compile that only its own implementation can properly format or even recognize which white space is part of a literal string. Special cases like having the language in a string and parsing and then executing that should be ignored here.

Now we put this program files into a source code management system, usually Git. Some teams still use legacy systems like subversion, source safe, clear case or CVS, while there are some newer systems that are probably about as powerful as git, but I never saw them in use. Git creates an MD5 hash of each file, which implies that any minor change will result in a new version, even if it is just white space. Now this does not hurt too much, if we agree on the same formatting and on the same line ending (hopefully LF only, not CR LF, even on MS-Windows). But our tooling does not make any difference between significant changes and insignificant formatting only changes. This gets worse, if users have different IDEs, which they should have, because everyone should use the IDE or editor, with which he or she is most efficient and the formal description of the preferred formatting is not shared between editors or differs slightly.

I think that each programming language should come with a command line diff tool and a command line formatting tool, that obey a standard interface for calling and can be plugged into editors and into source code management systems like git. Then the same mechanisms work for C, Java, C#, Ruby, Python, Fortran, Clojure, Perl, F#, Scala, Lua or your favorite programming language.

I can imaging two ways of working: Either we have a standard format and possibly individual formats for each developer. During „git commit“ the file is brought into the standard format before it is shown to git. Meaning less whitespace changes disappear. During checkout the file can optionally be brought into the preferred format of the developer. And yes, there are ways to deal with deliberate formatting, that for some reason should be kept verbatim and for dealing differently with comments and of course all kinds of string literals. Remember, the formatting tool comes from the same source as the compiler and fully understands the language.

The other approach leaves the formatting up to the developer and only creates a new version, when the diff tool of the language signifies that there is a relevant change.

I think that we should strive for this approach. It is no rocket science, the kind of tools were around for many decades as diff and as formatting tools, it would just be necessary to go the extra mile and create sister diff and formatting tools for the compiler (or interpreter) and to actually integrate these into build environments, IDEs, editors and git. It would save a lot of time and leave more time for solving real problems.

Is there any programming language that actually does this already?

How to handle XML? Is XML just the new binary with a bit more bloat? Can we do a generic handling of all XML or should it depend on the Schema?

# Loops with unknown nesting depth

We often encounter nested loops, like

for (i = 0; i < n; i++) {
for (j = 0; j < m; j++) {
doSomething(i, j);
}
}


This can be nested to a few more levels without too much pain, as long as we observe that the number of iterations for each level need to be multiplied to get the number of iterations for the whole thing and that total numbers of iterations beyond a few billions (, German: Milliarden, Russian Миллиарди) become unreasonable no matter how fast the doSomethings(...) is. Just looking at this example program

public class Modular {
public static void main(String[] args) {
long n = Long.parseLong(args[0]);
long t = System.currentTimeMillis();
long m = Long.parseLong(args[1]);
System.out.println("n=" + n + " t=" + t + " m=" + m);
long prod = 1;
long sum  = 0;
for (long i = 0; i < n; i++) {
long j = i % m;
sum += j;
sum %= m;
prod *= (j*j+1) % m;
prod %= m;
}
System.out.println("sum=" + sum + " prod=" + prod + " dt=" + (System.currentTimeMillis() - t));
}
}


which measures it net run time and runs 0 msec for 1000 iterations and almost three minutes for 10 billions ():

> java Modular 1000 1001 # 1'000
--> sum=1 prod=442 dt=0
> java Modular 10000 1001 # 10'000
--> sum=55 prod=520 dt=1
> java Modular 100000 1001 # 100'000
--> sum=45 prod=299 dt=7
> java Modular 1000000 1001 # 1'000'000
--> sum=0 prod=806 dt=36
> java Modular 10000000 1001 # 10'000'000
--> sum=45 prod=299 dt=344
> java Modular 100000000 1001 # 100'000'000
--> sum=946 prod=949 dt=3314
> java Modular 1000000000 1001 # 1'000'000'000
--> sum=1 prod=442 dt=34439
> java Modular 10000000000 1001 # 10'000'000'000
--> sum=55 prod=520 dt=332346


As soon as we do I/O, network access, database access or simply a bit more serious calculation, this becomes of course easily unbearably slow. But today it is cool to deal with big data and to at least call what we are doing big data, even though conventional processing on a laptop can do it in a few seconds or minutes... And there are of course ways to process way more iterations than this, but it becomes worth thinking about the system architecture, the hardware, parallel processing and of course algorithms and software stacks. But here we are in the "normal world", which can be a "normal subuniverse" of something really big, so running on one CPU and using a normal language like Perl, Java, Ruby, Scala, Clojure, F# or C.

Now sometimes we encounter situations where we want to nest loops, but the depth is unknown, something like

for ( = 0;  < ; ++) {
for ( = 0;  < ; ++) {

for ( = 0;  < ; ++) {
dosomething();
}

}
}


Now our friends from the functional world help us to understand what a loop is, because in some of these more functional languages the classical C-Style loop is either missing or at least not recommended as the everyday tool. Instead we view the set of values we iterate about as a collection and iterate through every element of the collection. This can be a bad thing, because instantiating such big collections can be a show stopper, but we don't. Out of the many features of collections we just pick the iterability, which can very well be accomplished by lazy collections. In Java we have the Iterable, Iterator, Spliterator and the Stream interfaces to express such potentially lazy collections that are just used for iterating.

So we could think of a library that provides us with support for ordinary loops, so we could write something like this:

Iterable range = new LoopRangeExcludeUpper<>(0, n);
for (Integer i : range) {
doSomething(i);
}


or even better, if we assume 0 as a lower limit is the default anyway:

Iterable range = new LoopRangeExcludeUpper<>(n);
for (Integer i : range) {
doSomething(i);
}


with the ugliness of boxing and unboxing in terms of runtime overhead, memory overhead, and additional complexity for development. In Scala, Ruby or Clojure the equivalent solution would be elegant and useful and the way to go...
I would assume, that a library who does something like LoopRangeExcludeUpper in the code example should easily be available for Java, maybe even in the standard library, or in some common public maven repository...

Now the issue of loops with unknown nesting depth can easily be addressed by writing or downloading a class like NestedLoopRange, which might have a constructor of the form NestedLoopRange(int ... ni) or NestedLoopRange(List li) or something with collections that are more efficient with primitives, for example from Apache Commons. Consider using long instead of int, which will break some compatibility with Java-collections. This should not hurt too much here and it is a good thing to reconsider the 31-bit size field of Java collections as an obstacle for future development and to address how collections can grow larger than elements, but that is just a side issue here. We broke this limit with the example iterating over 10'000'000'000 values for i already and it took only a few minutes. Of course it was just an abstract way of dealing with a lazy collection without the Java interfaces involved.

So, the code could just look like this:

Iterable range = new NestedLoopRange();
for (Tuple t : range) {
doSomething(t);
}


Btw, it is not too hard to write it in the classical way either:

        long[] n = new long[] {  };
int m1 = n.length;
int m  = m1-1; // just to have the math- matched...
long[] t = new long[m1];
for (int j = 0; j < m1; j++) {
t[j] = 0L;
}
boolean done = false;
for (int j = 0; j < m1; j++) {
if (n[j] <= 0) {
done = true;
break;
}
}
while (! done) {
doSomething(t);
done = true;
for (int j = 0; j < m1; j++) {
t[j]++;
if (t[j] < n[j]) {
done = false;
break;
}
t[j] = 0;
}
}


I have written this kind of loop several times in my life in different languages. The first time was on C64-basic when I was still in school and the last one was written in Java and shaped into a library, where appropriate collection interfaces were implemented, which remained in the project or the organization, where it had been done, but it could easily be written again, maybe in Scala, Clojure or Ruby, if it is not already there. It might even be interesting to explore, how to write it in C in a way that can be used as easily as such a library in Java or Scala. If there is interest, please let me know in the comments section, I might come back to this issue in the future...

In C it is actually quite possible to write a generic solution. I see an API like this might work:

struct nested_iteration {
/* implementation detail */
};

void init_nested_iteration(struct nested_iteration ni, size_t m1, long *n);
void dispose_nested_iteration(struct nested_iteration ni);
int nested_iteration_done(struct nested_iteration ni); // returns 0=false or 1=true
void nested_iteration_next(struct nested_iteration ni);


and it would be called like this:

struct nested_iteration ni;
int n[] = {  };
for (init_nested_iteration(ni, , n);
! nested_iteration_done(ni);
nested_iteration_next(ni)) {
...
}


So I guess, it is doable and reasonably easy to program and to use, but of course not quite as elegant as in Java 8, Clojure or Scala.
I would like to leave this as a rough idea and maybe come back with concrete examples and implementations in the future.

# How to create ISO Date String

It is a more and more common task that we need to have a date or maybe date with time as String.

There are two reasonable ways to do this:
* We may want the date formatted in the users Locale, whatever that is.
* We want to use a generic date format, that is for a broader audience or for usage in data exchange formats, log files etc.

The first issue is interesting, because it is not always trivial to teach the software to get the right locale and to use it properly… The mechanisms are there and they are often used correctly, but more often this is just working fine for the locale that the software developers where asked to support.

So now the question is, how do we get the ISO-date of today in different environments.

## Linux/Unix-Shell (bash, tcsh, …)

date "+%F"

## TeX/LaTeX

 \def\dayiso{\ifcase\day \or 01\or 02\or 03\or 04\or 05\or 06\or 07\or 08\or 09\or 10\or% 1..10 11\or 12\or 13\or 14\or 15\or 16\or 17\or 18\or 19\or 20\or% 11..20 21\or 22\or 23\or 24\or 25\or 26\or 27\or 28\or 29\or 30\or% 21..30 31\fi} \def\monthiso{\ifcase\month \or 01\or 02\or 03\or 04\or 05\or 06\or 07\or 08\or 09\or 10\or 11\or 12\fi} \def\dateiso{\def\today{\number\year-\monthiso-\dayiso}} \def\todayiso{\number\year-\monthiso-\dayiso} 
This can go into a file isodate.sty which can then be included by \include or \input Then using \todayiso in your TeX document will use the current date. To be more precise, it is the date when TeX or LaTeX is called to process the file. This is what I use for my paper letters.

## LaTeX

(From Fritz Zaucker, see his comment below):
 \usepackage{isodate} % load package \isodate % switch to ISO format \today % print date according to current format 

## Oracle

 SELECT TO_CHAR(SYSDATE, 'YYYY-MM-DD') FROM DUAL; 
On Oracle Docs this function is documented.
It can be chosen as a default using ALTER SESSION for the whole session. Or in SQL-developer it can be configured. Then it is ok to just call
 SELECT SYSDATE FROM DUAL; 

Btw. Oracle allows to add numbers to dates. These are days. Use fractions of a day to add hours or minutes.

## PostreSQL

(From Fritz Zaucker, see his comment):
 select current_date; —> 2016-01-08 
 select now(); —> 2016-01-08 14:37:55.701079+01 

## Emacs

In Emacs I like to have the current Date immediately:
 (defun insert-current-date () "inserts the current date" (interactive) (insert (let ((x (current-time-string))) (concat (substring x 20 24) "-" (cdr (assoc (substring x 4 7) cmode-month-alist)) "-" (let ((y (substring x 8 9))) (if (string= y " ") "0" y)) (substring x 9 10))))) (global-set-key [S-f5] 'insert-current-date) 
Pressing Shift-F5 will put the current date into the cursor position, mostly as if it had been typed.

## Emacs (better Variant)

(From Thomas, see his comment below):
 (defun insert-current-date () "Insert current date." (interactive) (insert (format-time-string "%Y-%m-%d"))) 

## Perl

In the Perl programming language we can use a command line call
 perl -e 'use POSIX qw/strftime/;print strftime("%F", localtime()), "\n"' 
or to use it in larger programms
 use POSIX qw/strftime/; my \$isodate_of_today = strftime("%F", localtime()); 
I am not sure, if this works on MS-Windows as well, but Linux-, Unix- and MacOS-X-users should see this working.

If someone has tried it on Windows, I will be interested to hear about it…
Maybe I will try it out myself…

## Perl 5 (second suggestion)

(From Fritz Zaucker, see his comment below):
 perl -e 'use DateTime; use 5.10.0; say DateTime->now->strftime(„%F“);‘ 

## Perl 6

(From Fritz Zaucker, see his comment below):
 say Date.today; 
or
 Date.today.say; 

## Ruby

This is even more elegant than Perl:
 ruby -e 'puts Time.new.strftime("%F")' 
will do it on the command line.
Or if you like to use it in your Ruby program, just use
 d = Time.new s = d.strftime("%F") 

Btw. like in Oracle SQL it is possible add numbers to this. In case of Ruby, you are adding seconds.

It is slightly confusing that Ruby has two different types, Date and Time. Not quite as confusing as Java, but still…
Time is ok for this purpose.

## C on Linux / Posix / Unix

 #include #include #include 

 main(int argc, char **argv) { 

 char s[12]; time_t seconds_since_1970 = time(NULL); struct tm local; struct tm gmt; localtime_r(&seconds_since_1970, &local); gmtime_r(&seconds_since_1970, &gmt); size_t l1 = strftime(s, 11, "%Y-%m-%d", &local); printf("local:\t%s\n", s); size_t l2 = strftime(s, 11, "%Y-%m-%d", &gmt); printf("gmt:\t%s\n", s); exit(0); } 
This speeks for itself..
But if you like to know: time() gets the seconds since 1970 as some kind of integer.
localtime_r or gmtime_r convert it into a structur, that has seconds, minutes etc as separate fields.
stftime formats it. Depending on your C it is also possible to use %F.

## Scala

 import java.util.Date import java.text.SimpleDateFormat ... val s : String = new SimpleDateFormat("YYYY-MM-dd").format(new Date()) 
This uses the ugly Java-7-libraries. We want to go to Java 8 or use Joda time and a wrapper for Scala.

## Java 7

 import java.util.Date import java.text.SimpleDateFormat

 

... String s = new SimpleDateFormat("YYYY-MM-dd").format(new Date()); 
Please observe that SimpleDateFormat is not thread safe. So do one of the following:
* initialize it each time with new
* make sure you run only single threaded, forever
* use EJB and have the format as instance variable in a stateless session bean
* protect it with synchronized
* protect it with locks
* make it a thread local variable

In Java 8 or Java 7 with Joda time this is better. And the toString()-method should have ISO8601 as default, but off course including the time part.

## Summary

This is quite easy to achieve in many environments.
I could provide more, but maybe I leave this to you in the comments section.
What could be interesting:
* better ways for the ones that I have provided
* other databases
* other editors (vim, sublime, eclipse, idea,…)
* Office packages (Libreoffice and MS-Office)
* C#
* F#
* Clojure
* C on MS-Windows
* Perl and Ruby on MS-Windows
* Java 8
* Scala using better libraries than the Java-7-library for this
* Java using better libraries than the Java-7-library for this
* C++
* PHP
* Python
* Cobol
* JavaScript
* …
If you provide a reasonable solution I will make it part of the article with a reference…

# Conversion of ASCII-graphics to PNG or JPG

Images are usually some obscure binary files. Their most common formats, PNG, SVG, JPEG and GIF are well documented and supported by many software tools. Libraries and APIs exist for accessing these formats, but also a phantastic free interactive software like Gimp. The compression rate that can reasonably be achieved when using these format is awesome, especially when picking the right format and the right settings. Tons of good examples can be found how to manipulate these image formats in C, Java, Scala, F#, Ruby, Perl or any other popular language, often by using language bindings for Image Magick.

There is another approach worth exploring. You can use a tool called convert to just convert an image from PNG, JPG or GIF to XPM. The other direction is also possible. Now XPM is a text format, which basically represents the image in ASCII graphics. It is by the way also valid C-code, so it can be included directly in C programms and used from there, when an image needs to be hard coded into a program. It is not generally recommended to use this format, because it is terribly inefficient because it uses no compression at all, but as intermediate format for exploring additional ways for manipulating images it is of interest.
An interesting option is to create the XPM-file using ERB in Ruby and then converting it to PNG or JPG.

# System Programming on Linux and MS-Windows

Quite honestly I admit that I really love the Posix-APIs for system programming and even some Linux specific extensions to it. I/O, Locking, Semaphores, Shared Memory, Message Queues, Signals, named and anonymous pipes, Unix Domain Sockets, TCP/IP programming, Terminal I/O, pthreads and a lot more are very powerful and fun to program. I do discover some points where I regret why they have not done it better, for example the fact that almost all system calls return a value, which is interpreted in one of the following ways:

• 0 means ok, -1 means a issue has occurred, which can be explored by calling the errno-macro.
• Values >= 0 are useful responses and -1 is indicating an error, which again requires calling errno.
• A pointer is returned. If the pointer is NULL, this indicates an error and requires calling errno. Sometimes (void *)-1 or similar return values are also special.
• pthreads-methods return 0 when successful or directly the error code otherwise.

Originally errno was a variable, which had to be replaced by some weird macro construction to allow multithreading and remain backwards compatible.
I would find it most natural if there where an exception mechanism in place like in Perl, Ruby, Java and many other languages, which would transport the error information. C cannot do this, at least not without breaking the language standard. The pthreads way looks good as well. Returning a struct containg the value actually needed and the errorcode, which is 0 if everything is OK, would also be a good approach, whenever a real return value is needed, but arguable a little bit clumpsy in case of functions returning a pointer. Maybe providing a pointer to some integer variable as argument would be the way for this case, even though I find it kind of ugly to have „return values done by a parameter“. Semaphores are a little bit clumsy to handle. And fcntl and ioctl are for sure overused instead of adding specific function for specific tasks. Reading a single character from a terminal or keyboard input without waiting for return is difficult, but at least logical.

Anyway, these issues can be dealt with and the power and elegance of the API is just great. The documentation is always available by using man pages that are installed on almost every system and by using great online resources on top of that.

So how does the win32- and win64-API look like? I mean apart from the religious questions like the lack of freedom? Most of the things can be done on the MS-Windows-APIs as well. There are some differences. First of all, all the code that uses system-APIs has to be rewritten. Very few typical POSIX-functions like open, close, read and write exist in the windows world as well to facilitate such a transition, but the general answer is like „it can be done, but the code has to be rewritten from scratch“. So programs that should run on both platforms and should do basically the same on both platforms need to encapsulate their system specific code, which might be anywhere between 20 and 50 percent of the code base, in specific files and organize their structure in such a way that the remaining half or more can actually be the same. It has been done by database products (PostgreSQL, MariaDB, Oracle, DB2), interpreters and compilers for programming languages (Ruby, Perl, Scala, Java, C#, F#, PHP), browsers (Firefox, Chrome), image processing software (gimp), office software (other than MS-office), web servers (Apache) and many others and they do achieve the goal to be doing more or less the same on both platforms.

Now how does the Win32- and Win64-API look like? Obviously the code looks very different. Unimportant, but very visible differences are that function names are mixed case and start with capital letter instead of being smaller case with underscores. Parameters and variables are mixed case starting with lower case. The C-type system is not directly used, but all types are #defined in some header file and all capital, even pointer types. Some care is needed to understand how these types work together, because it is not as self documenting as the original C types, but really no big deal to get used to. A MS-specific C-extension does allow using some kind of exceptions, if that is good or bad is hard say. Function names are generally longer and have huge parameter lists with very long parameter names. When they are outdated, because more parameters or different behavior or 64-bit support is needed, often an 64 or an Ex is added to the original name to create a new name for the replacement function, retaining the old one as it is for backwards compatibility.

Shared memory can more or less easily be replaced by memory mapped files and that is what needs to be done on MS-Windows.

The named pipe of Windows kind of unifies the message queue, the Unix domain sockets, the named pipes of Unix/Posix/Linux and even allows network communication within the local network. There have been linux specific extensions to Posix-pipes that achieve this unification, but not the network transparency, as well. Mutex and Semaphore work slightly differently, but can basically achieve the same results as Mutex and Semaphore on Posix. What is beautiful is that almost all operating system objects are accessed by so called HANDLEs which unifies many functions accessing them, but brings functions like WaitForSingleObject and WaitForMultipleObjects also some fcntl-like flavor, because it depends of course very much on the type of kernel object what waiting for it means. When being aware of this, it can be very powerful.

When looking for features that are really missing on one platform we observe immediately that MS-Windows does a mandatory locking on files by default and that such a mandatory locking does not at all exist in Posix or on Unix-like operating systems like MacOS-X, even though it does exist on Linux. Discussing this issue and how to deal with it should be worth its own article. In short, it is not as bad as it sounds, but the choice of the MS-Windows-guys to implement this feature in the way it is and to make it the default does look good.

The signals are missing on the windows side. This can be overcome by using Mutexes and Conditions to replace the communication part of signals, or to simply use HANDLEs to end a specific process instead of sending a signal, provided the permissions exist to do so.

Another painful omission is the fork. Most of the time fork is accompanied by an exec and exactly that can be doe by the CreateProcess in MS-Windows. Often we do like to share open files with the forked process and there are ways to do this, at least to some extent. But to use fork for creating a couple identical processes that run on the same code and data initialized once, which is sometimes a good idea, just does not exist on MS-Windows. It can be overcome by using threads and dealing with the issues of having to take responsibility for really separating the threads or by using multiple processes and memory mapped files for sharing that initial data structure.

The Win32- and Win64-APIs are documented quite well on some Microsoft-Webpage. I find the Linux-man pages slightly more useful, but both systems are documented in a way that it should be easy to find and use the original documentation and additional resources on the web.

Generally I would recommend all system programmers to have a look at the other world and how things work there. It helps enjoy and understand the beauty and power of both systems and probably maintain or even challenge the preference.

I have been teaching system programming for both platforms to college students and I enjoyed teaching and exploring these platforms with my students very much.

# Find the next entry in a sequence

In Facebook, Xing, Google+, Vk.com, Linkedin and other of these social media networks we are often encountered with a trivial question like this:
 1->2 2->8 3->18 4->32 5->50 6->72 7->? 
There are some easy patterns. Either it is some polynomial formula or some trick with the digits.
But the point is, that any such sequence can easily be fullfilled by a polynomial formula. That means we can put any value for 7 and make it work. Or any answer is correct. So what would probably be the real question is the most simple function to full-fill the given constraints. Simplicity can be measured in some way… If the solution is unique is unclear, but let us just look at the polynomial solution.

A function is needed that takes as parameter a list of key-value-pairs (or a hash map) and that yields a function such that the function of any of the key is the associated value.

Assuming a polynomial function in one variable we can make use of the chinese remainder theorem, which can be applied to univariate polynomials over a field as well as to integral numbers. For a polynomial p(X) we have

where is the polynomial variable and is a concrete value.

We are looking for a polynomial such that for given values we have

or in another way

which is exactly the Chinese remainder theorem.
Let

and

We can see that for all the polynomials

have the properties

or

where is the Kronecker symbol, which is 0 if the two indices differ and 1 if they are equal.
Or as congruence:

Then we can just combine this and use

This can easily be written as a Ruby function
 def fun_calc(pairs)   n = pairs.size   result = lambda do |x|     y = 0     n.times do |i|       p_i = pairs[i]       x_i = p_i[0].to_r       y_i = p_i[1].to_r       z = y_i       n.times do |j|         if (j != i)           p_j = pairs[j]           x_j = p_j[0]           z *= (x - x_j) / (x_i - x_j)         end       end       y += z     end     y   end   result end 
This takes a list of pairs as a parameter and returns the polynomial function als lambda.
It can be used like this:
 lop = [[0, 0], [1, 1], [2, 4], [3, 9], [4, 16], [5, 25], [6, 36], [7, 64]]

 f = fun_calc(lop) 

20.times do |x|   y = f.call(x)   puts sprintf("%6d -> %6d", x, y) end 
Put this together into a ruby program and add some parsing for the list of pairs or change the program each time you use it and all these „difficult“ questions „that 99.9% fail to solve“ are not just easy, but actually soluble automatically.

This is interesting for more useful applications. I assume that there will always be situations where a function is needed that meets certain exact values a certain inputs and is an interpolation or extrapolation of this.

Please observe that there are other interesting and useful ways to approach this:

• Use a „best“ approximation from a set of functions, for example polynomials with a given maximum degree
• use cubic splines, which are cubic polynomials within each section between two neighboring input values such that at the input values the two adjacent functions have the same value (, of course), the same first derivative and the same second derivative.

For highway and railroad construction other curves are used, because the splines are making an assumption on what is the -axis and what is the -axis, which does not make sense for transport facilities. They are using a curve called Clothoid.

Use Java, C, Perl, Scala, F# or the programming language of your choice to do this. You only need Closures, which are available in Java 8, F#, Scala, Perl, Ruby and any decent Lisp dialect. In Java 7 they can be done with an additional interface as anonymous inner classes. And for C it has been described in this blog how to do closures.

# How to recover the Carry Bit

As frequent readers might have observed, I like the concept of the Carry Bit as it allows for efficient implementations of long integer arithmetic, which I would like to use as default integer type for most application development. And unfortunately such facilities are not available in high level languages like C and Java. But it is possible to recover the carry bit from what is available in C or Java, with some extra cost of performance, but maybe neglectable, if the compiler does a good optimization on this. We might assume gcc on a 64-Bit-Linux. It should be possible to do similar things on other platforms.

So we add two unsigned 64-bit integers and and an incoming carry bit to a result

with

using the typical „long long“ of C. We assume that

where

and

. In the same way we assume and with the same kind of conditions for , , or , , , respectively.

Now we have

and we can see that

for some

.
And we have

where

is the carry bit.
When looking just at the highest visible bit and the carry bit, this boils down to

This leaves us with eight cases to observe for the combination of , and :

x_hy_huz_hc
00000
10010
01010
11001
00110
10101
01101
11111

Or we can check all eight cases and find that we always have

or

So the result does not depend on anymore, allowing to calculate it by temporarily casting , and to (signed long long) and using their sign.
We can express this as „use if and use if „.

An incoming carry bit does not change this, it just allows for , which is sufficient for making the previous calculations work.

In a similar way subtraction can be dealt with.

The basic operations add, adc, sub, sbb, mul, xdiv (div is not available) have been implemented in this library for C. Feel free to use it according to the license (GPL). Addition and subtraction could be implemented in a similar way in Java, with the weirdness of declaring signed longs and using them as unsigned. For multiplication and division, native code would be needed, because Java lacks 128bit-integers. So the C-implementation is cleaner.

# Shift- and Rotation Functions

Deutsch

In a comment to Carry Bit: How does it work the question about shift and rotation functions has been asked. Which exist and how they work exactly depends off course on the CPU architecture, but I will try to give a high level overview anyway.

The following aspects have to be considered:

• 8, 16, 32, 64,… Bit: How many Bits are affected by the operation?
• Shift vs. Rotate: Shift moves bits to the left or right, filling in 0 or 1 in the positions that get freed by this. The bits that get shifted out usually go to the carry bit, so the last one of these wins, if the shift is done by more than one bit. Rotation means what some English knowledge suggests: The bits shifted out are moved in on the other side.
• ROL and ROR vs RCL and RCR: Rotation through Carry (RCL or RCR) means that the carry bit participates as bit 9., 17., 33. or 65. ROL and ROR leave the carry bit alone, at least do not use it as input.
• Logical vs. Arithmetic Shift: This deals with the „sign“. For arithmetic Shift the number is considered to be signed in two’s complement. Therefore right shift does not necessarily introduce a 0 as new most significant bit, but depending on the sign introduces a 0 or a 1.

Current CPUs have barrel shifts or something like that built into the hardware, so shift and rotate operations by more than one bit position are much faster than sequences of single shifts. This technology has been around on better CPUs for decades and is not at all new.

How the commands are called exactly and possibly some details about there operations depend on the CPU architecture.

Examples (8 bit) for shifting and rotating by one bit position:

Vorher (Carrybit in Klammern)ROL
(rotate left)
RCL
(rotate left through carry)
ROR
(rotate right)
RCR
(rotate right through carry)
ASL / SHL
(arithmetic/logical shift left)
SHR
(logical shift right)
ASR
(arithmetic shift right)
00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)
00000000 (1)00000000 (1)00000001 (0)00000000 (1)10000000 (0)00000000 (0)00000000 (0)00000000 (0)
00000001 (0)00000010 (0)00000010 (0)10000000 (0)00000000 (1)00000010 (0)00000000 (1)00000000 (1)
00000001 (1)00000010 (1)00000011 (0)10000000 (1)10000000 (1)00000010 (0)00000000 (1)00000000 (1)
10000000 (0)00000001 (0)00000000 (1)01000000 (0)01000000 (0)00000000 (1)01000000 (0)11000000 (0)
10000000 (1)00000001 (1)00000001 (1)00000001 (1)11000000 (0)00000000 (1)01000000 (0)11000000 (0)
11111111 (0)11111111 (0)11111110 (1)11111111 (0)01111111 (1)11111110 (1)01111111 (1)11111111 (1)
11111111 (1)11111111 (1)11111111 (1)11111111 (1)11111111 (1)11111110 (1)01111111 (1)11111111 (1)

These shift and rotate operations are needed to express multiplications by powers of two and division by powers of two. In C, C++, C#, Perl, Java and some other porgramming languages these operations are included and written as „<<" (for ASL or SHL), ">>“ (for SHR) and „>>>“ for ASR.