Borrow and Carry Bit for Subtraction

Similar to the usage of the carry bit when adding there are mechanisms for subtracting that allow to integrate the result of subtraction of the lower bits into the subtraction of the next higher block of bits, where necessary.

There are two ways to do this, that are trivially equivalent by a simple not operation:

  • borrow bit (also borrow flag)
  • carry bit (also carry flag)

Often CPUs use what is the carry bit for addition and interpret it as borrow bit for subtraction.

Here are the two ways:

Borrow Bit

It is assumed, that the CPU-word is n bits long, so calculations are performed modulo N=2^n. Further it is assumed, that the range 0\ldots N-1 is preferred, so all sign issues are excluded for the moment.

When subtracting two numbers x, y from each other like x-y and y > x, the provided result is

    \[x-y+N \equiv x-y \mod N\]

and the borrow bit is set (b=1), to indicate that the subtraction caused „underflow“, which had to be corrected by added N in order to get into the desired range.

In the „normal“ case where y \le x, the provided result is simply

    \[x-y\]

and the borrow bit is not set (b=0).

The next round of subtraction takes the borrow bit into account and calculates x-y-b, where the condition becomes y+b > x and the result is

    \[x-y-b+N \equiv x-y \mod N\]

or

    \[x-y-b\]

respectively. This is how some of the older readers used to do it in school on paper, but of course with N=10.

Now the typical integer arithmetic of current CPUs uses Two’s complement, which means that -y=(\mathrm{NOT}\; y)+1. Combining this with the previous results in calculating

    \[x-y = x + (\mathrm{NOT}\; y) + 1 - b \mod N\]

At this point some CPU-designers found it more natural, to use the carry bit c=1-b instead of the borrow bit b.

Carry Bit

When subtracting two numbers x, y from each other like x-y and we have y > x, the provided result is

    \[x-y+N \equiv x-y \mod N\]

and the carry bit is not set (c=0), to indicate that the subtraction caused „underflow“, which had to be corrected by added N in order to get into the desired range.

In the „normal“ case where y \le x, the provided result is simply

    \[x-y\]

and the carry bit is set (c=1).

The next round of subtraction takes the borrow bit into account and calculates x-y-1+c, where the condition becomes y+1-c > x and the result is

    \[x-y-1+c+N \equiv x-y \mod N\]

or

    \[x-y-1+c\]

respectively.

Now two’s complement with -y=(\mathrm{NOT}\; y)+1 this can be written as

    \[x-y = x + (\mathrm{NOT}\; y) + 1 - b \mod N\]

or with c=1-b

    \[x-y = x + (\mathrm{NOT}\; y) + c \mod N\]

These two ways are really equivalent and easily transformed into each other. Neither of them provides big advantages, apart from the fact that we have the unnecessary confusion, because it depends on the CPU design, which of the two variants is used.

Recovery of Borrow or Carry bit

The borrow bit is calculated and used during subtractions that are done at assembly language level, but higher languages like C or Java do provide access to this information. It is relatively easy to recover the carry bit in the case of addition based on x, y and x+y \mod N.

This possible as well for the subtraction. Quite easily the comparison between x and y or y+b could be done before the subtraction. This would work, but it is kind of inefficient, because under the hood the comparison is just a subtraction with discarding the result and keeping the flags. So the subtraction is performed twice. This might not be such a bad idea, because a compiler could recognize it or the work of subtracting twice could be neglectable compared to the logic for an „optimized“ recovery, based on some logic expression of certain bits from x, y and x-y \mod N.

Links

Share Button

Orthodox Christmas 2018/2019

Orthodox Christmas in some countries, for example in Ukraine is on 7th of January.
So to all readers, who have Christmas on 7th of January:


С Рождеством! — Hyvää Joulua! — καλά Χριστούγεννα! — Buon Natale! — Prettige Kerstdagen! — З Рiздвом Христовим! — Merry Christmas! — Срећан Божић! — God Jul! — ¡Feliz Navidad! — ميلاد مجيد — クリスマスおめでとう ; メリークリスマス — Natale hilare! — Joyeux Noël! — God Jul! — Frohe Weihnachten! — Crăciun fericit! — Feliĉan Kristnaskon!

I created the message and the random ordering using Perl and the Schwartzian transform:

#!/usr/bin/perl
use Math::Random::Secure qw(irand);
my @list = ( "Prettige Kerstdagen!",
          "God Jul!",
          "Crăciun fericit!",
          "クリスマスおめでとう ; メリークリスマス",
          "God Jul!",
          "Feliĉan Kristnaskon!",
          "Hyvää Joulua!",
          "ميلاد مجيد",
          "Срећан Божић!",
          "καλά Χριστούγεννα!",
          "З Рiздвом Христовим!",
          "Natale hilare!",
          "Buon Natale!",
          "Joyeux Noël!",
          "Frohe Weihnachten!",
          "С Рождеством!",
          "Merry Christmas!",
          "¡Feliz Navidad!" );
my @shuffled = map{ $_->[0] }
               sort {$a->[1] <=> $b->[1]}
               map { [ $_, irand() ] }
               @list;
print join(" — ", @shuffled);

Share Button

How to draw lines, circles and other curves

These ideas were developed more than 30 years without knowing that they were already known at that time…

Today the graphics cards can easily do things like this in very little time. And today’s CPUs are of course really good at multiplying. So this has lost a lot of its immediate relevance, but it is a fun topic and why not have some fun…

Let us assume we have a two dimensional coordinate system and a visible area that goes from x_{\min} to x_{\max} and y_{\min} to y_{\max}. Coordinates are discrete.

In this world we can easily measure an angle against a (directed) line parallel to the x-axis, for example up to an accuracy of 45^\circ=\frac{\pi}{4}:

  • y=0 \wedge x > 0 \implies \alpha = 0 (= 0^\circ)
  • 0 < y < x \implies 0 < \alpha < \frac{\pi}{4}(=45^\circ)
  • 0 < y = x \implies \alpha = \frac{\pi}{4}
  • 0 < x < y \implies \frac{\pi}{4} < \alpha < \frac{\pi}{2}(=90^\circ)
  • x = 0 \land y > 0\implies \alpha = \frac{\pi}{2}
  • x < 0 \land y > 0 \land |x| < |y|\implies \frac{\pi}{2} < \alpha < \frac{3\pi}{4}(=135^\circ)
  • x < 0 \land y > 0 \land -x = y\implies \alpha = \frac{3\pi}{4}(=135^\circ)

So let us assume we have a curve that is described by a polynomial function in two variables x and y, like this:

    \[f(x, y) = \sum_{j=0}^m\sum_{k=0}^n a_{j,k}x^jy^k = 0\]

We have to apply some math to understand that the curve behaves nicely in the sense that it does not behave to chaotic in scales that are below our accuracy, that it is connected etc. We might possibly scale and move it a bit by substituting something like c_1u+c_2 for x and c_3v+c_4 for y.

For example we may think of

  • line: f(x,y)=ax+by+c
  • circle: f(x, y)=x^2+y^2-r^2
  • eclipse: f(x, y)=\frac{x^2}{a^2}+\frac{y^2}{b^2}-1

We can assume our drawing is done with something like a king of chess. We need to find a starting point that is accurately on the curve or at least as accurately as possible. You could use knights or other chess figures or even fictive chess figures..

Now we have a starting point (x_0, y_0) which lies ideally exactly on the curve. We have a deviation from the curve, which is f(x_0, y_0)=d_0. So we have f(x_n, y_n)=d_n. Than we move to x_{n+1}=x_n + s and y_{n+1}=y_n+t with s, t = \{-1, 0, 1\}. Often only two or three combinations of (s, t) need to be considered. When calculating d_{n+1} from d_n for the different variants, it shows that for calculating d_{n+1}-d_n the difference becomes a polynomial with lower degree, because the highest terms cancel out. So drawing a line between two points or a circle with a given radius around a given point or an ellipse or a parabola or a hyperbola can be drawn without any multiplications… And powers of n-th powers of x can always be calculated with additions and subtractions only from the previous x-values, by using successive differences:
d_{m,1}=(x-m)^n-(x-m-1)^n
d_{m,l+1}=d_{m+1,l}-d_{m,l}
These become constant for l=n, just as the lth derivatives, so by using this triangle, successive powers can be calculated with some preparational work using just additions.
It was quite natural to program these in assembly language, even in 8-bit assembly languages that are primitive by today’s standards. And it was possible to draw such figures reasonably fast with only one MHz (yes, not GHz).

We don’t need this stuff any more. Usually the graphics card is much better than anything we can with reasonable effort program. Usually the performance is sufficient when we just program in high level languages and use standard libraries.

But occasionally situations occur where we need to think about how to get the performance we need:
Make it work,
make it right,
make it fast,
but don’t stop after the first of those.

It is important that we choose our steps wisely and use adequate methods to solve our problem. Please understand this article as a fun issue about how we could write software some decades ago, but also as an inspiration to actually look into bits and bytes when it is really helping to get the necessary performance without defeating the maintainability of the software.

Share Button

2019 — Happy New Year

Gott nytt år! — Godt nytt år! — Felice anno nuovo! — Καλή Χρονια! — Щасливого нового року! — Срећна нова година! — С новым годом! — Feliĉan novan jaron! — Bonne année! — FELIX SIT ANNUS NOVUS — Gullukkig niuw jaar! — Un an nou fericit! — Frohes neues Jahr! — Happy new year! — ¡Feliz año nuevo! — Onnellista uutta vuotta! — عام سعيد

This was created by a Java-program:

import java.util.Random;
import java.util.List;
import java.util.Arrays;
import java.util.Collections;

public class HappyNewYear {

    public static void main(String[] args) {
        List list = Arrays.asList("Frohes neues Jahr!",
                                          "Happy new year!",
                                          "Gott nytt år!", 
                                          "¡Feliz año nuevo!",
                                          "Bonne année!", 
                                          "FELIX SIT ANNUS NOVUS", 
                                          "С новым годом!",
                                          "عام سعيد",
                                          "Felice anno nuovo!",
                                          "Godt nytt år!", 
                                          "Gullukkig niuw jaar!", 
                                          "Feliĉan novan jaron!",
                                          "Onnellista uutta vuotta!",
                                          "Срећна нова година!",
                                          "Un an nou fericit!",
                                          "Щасливого нового року!", 
                                          "Καλή Χρονια!");
        Collections.shuffle(list);
        System.out.println(String.join(" — ", list));
    }
}

Share Button

Christmas 2018


Feliĉan Kristnaskon! — Frohe Weihnachten! — God Jul! — Merry Christmas! — Joyeux Noël! — クリスマスおめでとう ; メリークリスマス — Срећан Божић! — Buon Natale! — Hyvää Joulua! — З Рiздвом Христовим! — ميلاد مجيد — С Рождеством! — Crăciun fericit! — ¡Feliz Navidad! — καλά Χριστούγεννα! — Natale hilare! — God Jul! — Prettige Kerstdagen!

As I said, I am learning some Python, so let’s use it. I created the message above with this program:

#!/usr/bin/python3
import random
arr = [
    "Frohe Weihnachten!",
    "Merry Christmas!",
    "God Jul!",
    "¡Feliz Navidad!",
    "Joyeux Noël!",
    "Natale hilare!",
    "С Рождеством!",
    "ميلاد مجيد",
    "Buon Natale!",
    "God Jul!",
    "Prettige Kerstdagen!",
    "Feliĉan Kristnaskon!",
    "Hyvää Joulua!",
    "クリスマスおめでとう ; メリークリスマス",
    "Срећан Божић!",
    "Crăciun fericit!",
    "З Рiздвом Христовим!",
    "καλά Χριστούγεννα!"
]
random.shuffle(arr)
print(" — ".join(arr))
print("\n")

Share Button

Logging

Deutsch

Software often contains a logging functionality. Usually entries one or sometimes multiple lines are appended to a file, written to syslog or to stdout, from where they are redirected into a file. They are telling us something about what the software is doing. Usually we can ignore all of it, but as soon as something with „ERROR“ or worse and more visible stack traces can be found, we should investigate this. Unfortunately software is often not so good, which can be due to libraries, frameworks or our own code. Then stack traces and errors are so common that it is hard to look into or to find the ones that are really worth looking into. Or there is simply no complete process in place to watch the log files. Sometimes the error shows up much later than it actually occurred and stack traces do not really lead us to the right spot. More often than we think logging actually introduces runtime errors, that were otherwise not present. This is related to a more general concept, which is called observer effect, where logging actually changes the business logic.

It is nice that log files keep to some format. Usually they start with a time stamp in ISO-format, often to the millisecond. Please add trailing zeros to always have 3 digits after the decimal point in this case. It is preferable to use UTC, but people tend to stick to local date and time zones, including the issues that come with switching to and from daylight saving time. Usually we have several processes or threads that run simultaneously. This can result in a wild mix of logging entries. As long as even multiline entries stay together and as long as beginning and end of one multiline entry can easily be recognized, this can be dealt with. Tools like splunk or simple Perl, Ruby or Python scripts can help us to follow threads separately. We could actually have separate logs for each thread in the first place, but this is not a common practice and it might hit OS-limitations on the number of open files, if we have many threads or even thousands of actors as in Erlang or Akka. Keeping log entries together can be achieved by using an atomic write, like the write system call in Linux and other Posix systems. Another way is to queue the log entries and to have a logger thread that processes the queue.

Overall this area has become very complex and hard to tame. In the Java world there used to be log4j with a configuration file that was a simple properties file, at least in the earlier version. This was so good that other languages copied it and created some log4X. Later the config file was replaced by XML and more logging frame works were added. Of course quite a lot of them just for the purpose of abstracting from the large zoo of logging frameworks and providing a unique interface for all of them. So the result was, that there was one more to deal with.

It is a good question, how much logic for handling of log files do we really want to see in our software. Does the software have to know, into which file it should log or how to do log rotation? If a configuration determines this, but the configuration is compiled into the jar file, it does have to know… We can keep our code a bit cleaner by relying on program functionality without code, but this still keeps it as part of the software.

Log files have to please the system administrator or whoever replaced them in a pure devops shop. And in the end developers will have to be able to work with the information provided by the logs to find issues in the code or to explain what is happening, if the system administrator cannot resolve an issue by himself. Should this system administrator have to deal with a different special complex setup for the logging for each software he is running? Or should it be necessary to call for developer support to get a new version of the software with just another log setting, because the configurations are hard coded in the deployment artifacts? Interesting is also, what happens when we use PAAS, where we have application server, database etc., but the software can easily move to another server, which might result in losing the logs. Moving logs to another server or logging across the network is expensive, maybe more expensive than the rest of this infrastructure.

Is it maybe a good idea to just log to stdout, maintaining a decent format and to run the software in such a way that stdout is piped into a log manager? This can be the same for all software and there is one way to configure it. The same means not only the same for all the java programs, but actually the same for all programs in all languages that comply to a minimal standard. This could be achieved using named pipes in conjunction with any hard coded log file that the software wants to use. But this is a dangerous path unless we really know what the software is doing with its log files. Just think of what weird errors might happen if the software tries to apply log rotation to the named pipe by renaming, deleting, creating new files and so on. A common trick to stop software from logging into a place where we do not want this is to create a directory with the name of the file that the software usually uses and to write protect this directory and its parent directory for the software. Please find out how to do it in detail, depending on your environment.

What about software, that is a filter by itself, so its main functionality is to actually write useful data to stdout? Usually smaller programs and scripts work like this. Often they do not need to log and often they are well tested relyable parts of our software installation. Where are the log files of cp, ls, rm, mv, grep, sort, cat, less,…? Yes, they do tend to write to stderr, if real errors occur. Where needed, programs can turn on logging with a log file provided on the command line, which is also a quite operations friendly approach. Named pipes can help here.

And we had a good logging framework in place for many years. It was called syslog and it is still around, at least on Linux.

A last thought: We spend really a lot of effort to get well performing software, using multiple processes, threads or even clusters. And then we forget about the fact that logging might become the bottle neck.

Share Button

Carry Bit, Overflow Bit and Signed Integers

It has already been explained how the Carry Bit works for addition. Now there was interest in a comment about how it would work for negative numbers.

The point is, that the calculation of the carry bit does not have any dependency on the sign. The nature of the carry bit is that it is meant to be used for the less significant parts of the addition. So assuming we add two numbers x and y that are having k and l words, respectively. We assume that n=\max(k,l) and make sure that x and y are both n words long by just providing the necessary number of 0-words in the most significant positions. Now the addition is performed as described by starting with a carry bit of 0 and adding with carry x[0]+y[0], then x[1]+y[1] and so on up to x[n-1]+y[n-1], assuming that x[0] is the least significant word and x[n-1] the most significant word, respectively. Each addition includes the carry bit from the previous addition. Up to this point, it does not make any difference, if the numbers are signed or not.

Now for the last addition, we need to consider the question, if our result still fits in n words or if we need one more word. In the case of unsigned numbers we just look at the last carry bit. If it is 1, we just add one more word in the most significant position with the value of 1, otherwise we are already done with n words.

In case of signed integers, we should investigate what can possibly happen. The input for the last step is two signed words and possibly a carry bit from the previous addition. Assuming we have m-Bit-words, we are adding numbers between -2^{m-1} and 2^{m-1}-1 plus an optional carry bit c. If the numbers have different signs, actually an overflow cannot occur and we can be sure that the final result fits in at most n words.

If both are not-negative, the most significant bits of x[n-1] and y[n-1] are both 0. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \ge 2^{n-1}, which means that the result „looks negative“, although both summands were not-negative. In this case another word with value 0 has to be provided for the most significant position n to express that the result is \ge 0 while maintaining its already correctly calculated result. It cannot happen that real non-zero bits are going into this new most significant word. Consequently the carry bit can never become 1 in this last addition step.

If both are negative, the most significant bits of x[n-1] and y[n-1] are both 1. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \lt 2^{n-1}, which means that the result „looks positive or 0“, although both summands were negative. In this case another word with value 2^n-1 or -1, depending on the viewpoint, has to be prepended as new most significant word. In this case of two negative summands the carry bit is always 1.

Now typical microprocessors provide an overflow flag (called „O“ or more often „V“) to deal with this. So the final addition can be left as it is in n words, if the overflow bit is 0. If it is 1, we have to signal an overflow or we can just provided one more word. Depending on the carry flag it is 0 for C=0 or all bits 1 (2^n-1 or -1, depending on the view point) for C=1.

The overflow flag can be calculated by o := \mathrm{signbit}(x) = \mathrm{signbit}(y) \land \mathrm{signbit}(x+y\mod 2^n) \ne \mathrm{signbit}(x).
There are other ways, but they lead to the same results with approximately the same or more effort.

The following table shows the possible combinations and examples for 8-Bit arithmetic and n=1:

x<0 or x≥0y<0 or y≥ 0(x+y)%2^8 < 0 or ≥ 0Overflow BitCarry Bitadditional word neededvalue additional wordExamples (8bit)
x≥0y≥0≥000no-0+0
63+64
x≥0y≥0<010yes064+64
127+127
x≥0y<0≥000 or 1no-65+(-1)
127+(-127)
x≥0y<0<000 or 1no-7+(-8)
127+(-128)
0+(-128)
x<0y≥0≥000 or 1no--9 + 12
-1 + 127
-127+127
x<0y≥0<000 or 1no--128+127
-128+0
-1 + 0
x<0y<0≥011yes-1-64 + (-65)
-128+(-128)
x<0y<0<001no--1 + (-1)
-1 + (-127)
-64 + (-64)

If you like, you can try out examples that include the carry bit and see that the concepts still work out as described.

Links

Share Button

Christmas — Weihnachten — Рождество 2017

Bon nadal! — Priecîgus Ziemassvçtkus — З Рiздвом Христовим — Buon Natale — Bella Festas daz Nadal! — С Рождеством — Срећан Божић — καλά Χριστούγεννα — God Jul! — Feliĉan Kristnaskon — ميلاد مجيد — Feliz Navidad — Glædelig Jul — Fröhliche Weihnachten — Joyeux Noël — Hyvää Joulua! — クリスマスおめでとう ; メリークリスマス — Merry Christmas — Natale hilare — God Jul — Crăciun fericit — Prettige Kerstdagen — Nollaig Shona Dhuit!

xmas tree

Christmas Tree in Olten 2017

This message has been generated by a program again, this time I decided to use Scala:

import scala.util.Random
object XmasGreeting {
  val texts : List[String] = List( "Fröhliche Weihnachten",
    "Merry Christmas", "God Jul", "Feliz Navidad", "Joyeux Noël",
    "Natale hilare", "С Рождеством", "ميلاد مجيد", "Buon Natale", "God Jul!",
    "Prettige Kerstdagen", "Feliĉan Kristnaskon", "Hyvää Joulua!",
    "Glædelig Jul", "クリスマスおめでとう ; メリークリスマス",
    "Nollaig Shona Dhuit!", "Bon nadal!", "Срећан Божић",
    "Priecîgus Ziemassvçtkus", "Crăciun fericit", "Bella Festas daz Nadal!",
    "З Рiздвом Христовим", "καλά Χριστούγεννα")
  val shuffledTexts : List[String] = Random.shuffle(texts)
  def main(args: Array[String]) : Unit = {
    println(shuffledTexts.mkString(" — "))
  }
}

Share Button

Functional Programming: Article in another Blog

I wrote a Guest Blog Article about Functional Programming on Adesso’s Blog in German.

Share Button

XML

In the late 1990es there was a real hype about XML. Tons of standards evolved and it was a big deal to acquire sound knowledge of it.

It has been some success, because it is still around and very common almost 20 years later.

I would say that the idea of having a human readable and editable text format has mostly failed. Trivial XML can be edited manually without too much of a risk of breaking it, but then again simpler formats like JSON or even java-properties-Files or something along these lines would be sufficient and easier to deal with, unless it is the 1001st slightly different format that needs to be learned again. XML is different each time anyway, because it depends on the schema, so we have the problem on that side, but of course the general idea is well known.

For complex XML manual reading and editing becomes a nightmare, it is just so much harder to read for humans than any reasonably common programming languages of our time. It is text, but so involved that it feels like half binary. And who knows, maybe we can also edit binary files with a hex-editor. And real magicians, actually people with too much time in this case, can do so and keep the binary file correct and uncorrupted, at least for some binary formats. And they can do so in XML as well… But it is actually better to have a tool or a script to create and change non-trivial XML-configuration files.

Where XML is strong is for data exchange between systems. This is mostly transfer in space between different systems, but it can also be transfer in time, that is for storing information to be retrieved later. It gives a format that allows for some „type safety“, that is very versatile and that provides a lot of tool and script support around it. Even here we have to acknowledge that there are some drawbacks. Maintaining a XML interface involves some work for the schema files, adopting the software on the human side. It requires some CPU-overhead on the sending and mostly on the receiving side for creating and parsing XML. The libraries have been optimized but still they take a little bit of time. And then on the network size we transmit a multiple of the amount of data, if it is densely packed with tags.

But it is a format that is well understood, that works on pretty much any platform, over the network and also usually allows us to support different versions of the same interface simultaneously. For debugging it is good to have a format that is at least human readable, even if not very pleasant. Ideally the schema is defined in a way that is self documenting.

I wonder why approaches like in WML have not become more common. WML had a customized compressed format that was more friendly to low bandwidth cell phones.

XML is good for many purposes, but as always it is good to know other tools, like JSON and to decide when it is a case for XML and when not.

Some positive side effects of XML are that it helped some other standards to become more mainstream. UTF-8 was from the beginning the default encoding for XML and this is now a common standard encoding for any text. And with XML-schema it became common to encode dates within XML in the ISO-format, which helped this format in becoming generally known and commonly used for cases where one date format should work independently of the origin of the reader.

Share Button