How to draw lines, circles and other curves

These ideas were developed more than 30 years without knowing that they were already known at that time…

Today the graphics cards can easily do things like this in very little time. And today’s CPUs are of course really good at multiplying. So this has lost a lot of its immediate relevance, but it is a fun topic and why not have some fun…

Let us assume we have a two dimensional coordinate system and a visible area that goes from x_{\min} to x_{\max} and y_{\min} to y_{\max}. Coordinates are discrete.

In this world we can easily measure an angle against a (directed) line parallel to the x-axis, for example up to an accuracy of 45^\circ=\frac{\pi}{4}:

  • y=0 \wedge x > 0 \implies \alpha = 0 (= 0^\circ)
  • 0 < y < x \implies 0 < \alpha < \frac{\pi}{4}(=45^\circ)
  • 0 < y = x \implies \alpha = \frac{\pi}{4}
  • 0 < x < y \implies \frac{\pi}{4} < \alpha < \frac{\pi}{2}(=90^\circ)</li> <li>x = 0 \land y > 0\implies \alpha = \frac{\pi}{2}</li> <li>x < 0 \land y > 0 \land |x| < |y|\implies \frac{\pi}{2}< \alpha < \frac{3\pi}{4}(=135^\circ)
  • x < 0 \land y > 0 \land -x = y\implies \alpha = \frac{3\pi}{4}(=135^\circ)

So let us assume we have a curve that is described by a polynomial function in two variables x and y, like this:

    \[f(x, y) = \sum_{j=0}^m\sum_{k=0}^n a_{j,k}x^jy^k = 0\]

We have to apply some math to understand that the curve behaves nicely in the sense that it does not behave to chaotic in scales that are below our accuracy, that it is connected etc. We might possibly scale and move it a bit by substituting something like c_1u+c_2 for x and c_3v+c_4 for y.

For example we may think of

  • line: f(x,y)=ax+by+c
  • circle: f(x, y)=x^2+y^2-r^2
  • eclipse: f(x, y)=\frac{x^2}{a^2}+\frac{y^2}{b^2}-1

We can assume our drawing is done with something like a king of chess. We need to find a starting point that is accurately on the curve or at least as accurately as possible. You could use knights or other chess figures or even fictive chess figures..

Now we have a starting point (x_0, y_0) which lies ideally exactly on the curve. We have a deviation from the curve, which is f(x_0, y_0)=d_0. So we have f(x_n, y_n)=d_n. Than we move to x_{n+1}=x_n + s and y_{n+1}=y_n+t with s, t = \{-1, 0, 1\}. Often only two or three combinations of (s, t) need to be considered. When calculating d_{n+1} from d_n for the different variants, it shows that for calculating d_{n+1}-d_n the difference becomes a polynomial with lower degree, because the highest terms cancel out. So drawing a line between two points or a circle with a given radius around a given point or an ellipse or a parabola or a hyperbola can be drawn without any multiplications… And powers of n-th powers of x can always be calculated with additions and subtractions only from the previous x-values, by using successive differences:
These become constant for l=n, just as the lth derivatives, so by using this triangle, successive powers can be calculated with some preparational work using just additions.
It was quite natural to program these in assembly language, even in 8-bit assembly languages that are primitive by today’s standards. And it was possible to draw such figures reasonably fast with only one MHz (yes, not GHz).

We don’t need this stuff any more. Usually the graphics card is much better than anything we can with reasonable effort program. Usually the performance is sufficient when we just program in high level languages and use standard libraries.

But occasionally situations occur where we need to think about how to get the performance we need:
Make it work,
make it right,
make it fast,
but don’t stop after the first of those.

It is important that we choose our steps wisely and use adequate methods to solve our problem. Please understand this article as a fun issue about how we could write software some decades ago, but also as an inspiration to actually look into bits and bytes when it is really helping to get the necessary performance without defeating the maintainability of the software.

Share Button

Carry Bit, Overflow Bit and Signed Integers

It has already been explained how the Carry Bit works for addition. Now there was interest in a comment about how it would work for negative numbers.

The point is, that the calculation of the carry bit does not have any dependency on the sign. The nature of the carry bit is that it is meant to be used for the less significant parts of the addition. So assuming we add two numbers x and y that are having k and l words, respectively. We assume that n=\max(k,l) and make sure that x and y are both n words long by just providing the necessary number of 0-words in the most significant positions. Now the addition is performed as described by starting with a carry bit of 0 and adding with carry x[0]+y[0], then x[1]+y[1] and so on up to x[n-1]+y[n-1], assuming that x[0] is the least significant word and x[n-1] the most significant word, respectively. Each addition includes the carry bit from the previous addition. Up to this point, it does not make any difference, if the numbers are signed or not.

Now for the last addition, we need to consider the question, if our result still fits in n words or if we need one more word. In the case of unsigned numbers we just look at the last carry bit. If it is 1, we just add one more word in the most significant position with the value of 1, otherwise we are already done with n words.

In case of signed integers, we should investigate what can possibly happen. The input for the last step is two signed words and possibly a carry bit from the previous addition. Assuming we have m-Bit-words, we are adding numbers between -2^{m-1} and 2^{m-1}-1 plus an optional carry bit c. If the numbers have different signs, actually an overflow cannot occur and we can be sure that the final result fits in at most n words.

If both are not-negative, the most significant bits of x[n-1] and y[n-1] are both 0. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \ge 2^{n-1}, which means that the result „looks negative“, although both summands were not-negative. In this case another word with value 0 has to be provided for the most significant position n to express that the result is \ge 0 while maintaining its already correctly calculated result. It cannot happen that real non-zero bits are going into this new most significant word. Consequently the carry bit can never become 1 in this last addition step.

If both are negative, the most significant bits of x[n-1] and y[n-1] are both 1. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \lt 2^{n-1}, which means that the result „looks positive or 0“, although both summands were negative. In this case another word with value 2^n-1 or -1, depending on the viewpoint, has to be prepended as new most significant word. In this case of two negative summands the carry bit is always 1.

Now typical microprocessors provide an overflow flag (called „O“ or more often „V“) to deal with this. So the final addition can be left as it is in n words, if the overflow bit is 0. If it is 1, we have to signal an overflow or we can just provided one more word. Depending on the carry flag it is 0 for C=0 or all bits 1 (2^n-1 or -1, depending on the view point) for C=1.

The overflow flag can be calculated by o=\mathrm{signbit}(x) = \mathrm{signbit}(y) \land \mathrm{signbit}(x+y\mod 2^n) \ne \mathrm{signbit}(x).
There are other ways, but they lead to the same results with approximately the same or more effort.

The following table shows the possible combinations and examples for 8-Bit arithmetic and n=1:

x<0 or x≥0y<0 or y≥ 0(x+y)%2^8 < 0 or ≥ 0Overflow BitCarry Bitadditional word neededvalue additional wordExamples (8bit)
x≥0y<0≥000 or 1no-65+(-1)
x≥0y<0<000 or 1no-7+(-8)
x<0y≥0≥000 or 1no--9 + 12
-1 + 127
x<0y≥0<000 or 1no--128+127
-1 + 0
x<0y<0≥011yes-1-64 + (-65)
x<0y<0<001no--1 + (-1)
-1 + (-127)
-64 + (-64)

If you like, you can try out examples that include the carry bit and see that the concepts still work out as described.


Share Button

How to multiply long numbers

Assuming we have a multiplication of n-bit-numbers already available.
Other than typical compiled languages, which assume that the multiplication result fits into the same size as the two factors, we should take a closer look.
Unsigned n-bit-numbers as factors x and y fullfill

    \[0 \le x \le 2^n-1\]

    \[0 \le y \le 2^n-1\]

So we have for the product z=x*y

    \[0 \le z \le (2^n-1)^2 \le 2^{2n}-1\]

So we should be prepared for a 2n-bit long result. Assembly language provides for this. In Java we are kind of running out of luck, because we have the 64-bit-long as the largest possible value, so we cannot make use of the 64-bit capabilities of our CPU by getting a 128-bit-result. Even worse we do not have unsigned primitive types. For addition and subtraction it is possible to cheat by assuming that the longs are unsigned, because the addition for signed and unsigned numbers is almost the same, but for multiplication this is slightly more complex.

In C we do have better chances, but we need to go compiler and OS-specific to some extent. gcc on 64-bit platforms has a type __uint128_t and we can hope that something like this

__uint128_t mul(unsigned long long x, unsigned long long y) {
__uint128_t xx = (__uint128_t) x;
__uint128_t yy = (__uint128_t) y;
__uint128_t z = xx*yy;
return z;

is optimized by the compiler to one assembly language instruction that multiplies two unsigned 64-bit-integers into one unsigned 128-bit integer.

Multiplication of longer numbers is achieved by just assuming

    \[ x= \sum_{j=0}^{m-1}2^{nj}x_j, y= \sum_{j=0}^{m-1}2^{nj}y_j\]


    \[ \bigwedge_{j=0}^{m-1} 0\le x_j, y_j <2^n.\]

Now the product is

    \[ z = \sum_{j=0}^{m-1}\sum_{k=0}^{m-1}2^{n(j+k)}x_j y_k \]

where each x_j y_k summand needs to be split into a lower and an upper part such that

    \[x_j y_k = 2^n h(x_j y_k) + l(x_j y_k).\]

All these half products need to be added with propagating the carry bits to the next higher level.
This can be sorted out and done with a lot of additions with carry-bit, finally giving the desired result.

For reasonably short numbers this is a good approach, if it is done well, but it can be replaced by better algorithms for larger numbers.

The most obvious next step is to move to the Karatsuba algorithm.
Assuming we have two factors x=x_l + 2^m x_h and y = y_l + 2^m y_h the naïve approach of having to calculate the four products
x_h y_h, x_h y_l, x_l y_h, x_l y_l can be replaced by calculation of only three products x_h y_h, (x_l + x_h)(y_l+ y_h), x_l y_l, because what is actually needed is the sum x_h y_l+x_l y_h which can be obtained by

    \[(x_l + x_h)(y_l+ y_h)-x_l y_l-x_h y_h=x_h y_l+x_l y_h.\]

That can be iterated by subdividing the numbers subsequently into halves until a size is reached that can be handled well enough by the naïve approach.
An issue that needs to be observed is that x_l + x_h and y_l+ y_h need one bit more than their summands.

For even larger numbers the so called Toom-Cook algorithm, an extension of the idea of the Karatsuba-algorithm for more than two summands, can be useful and for numbers larger than that the Schönhage-Strassen multiplication proves to be faster. It is implemented in some libraries for long integer arithmetic, like GMP. Since around 2007 this can be improved by using the Fürer algorithm which is assymptotically faster, but I do not know of any useful implementations and for practical purposes the improvement does not seem to be very relevant.

Share Button

How to recover the Carry Bit

As frequent readers might have observed, I like the concept of the Carry Bit as it allows for efficient implementations of long integer arithmetic, which I would like to use as default integer type for most application development. And unfortunately such facilities are not available in high level languages like C and Java. But it is possible to recover the carry bit from what is available in C or Java, with some extra cost of performance, but maybe neglectable, if the compiler does a good optimization on this. We might assume gcc on a 64-Bit-Linux. It should be possible to do similar things on other platforms.

So we add two unsigned 64-bit integers x and y to a result

    \[z\equiv x+y \mod 2^{64}\]


    \[0 \le z < 2^{64}\]

using the typical „long long“ of C. We assume that



    \[x_h \in \{0,1\}\]


    \[0 \le x_l < 2^{63}\]

. In the same way we assume y=2^{63}y_h + y_l and z=2^{63}z_h + z_l with the same kind of conditions for x_h, y_h, z_h or x_l, y_l, z_l, respectively.

Now we have

    \[0 \le x_l+y_l \le 2^{64}-2\]

and we can see that

    \[x_l + y_l = 2^{63}u + z_l\]

for some

    \[u\in \{0.1\}\]

And we have

    \[x+y = 2^{64}c+z\]



is the carry bit.
When looking just at the highest visible bit and the carry bit, this boils down to

    \[2c+z_h = x_h + y_h + u\]

This leaves us with eight cases to observe for the combination of x_h, y_h and u:


Or we can check all eight cases and find that we always have

    \[c = x_h \wedge\neg z_h \vee y_h \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h\]


    \[c = (x_h \vee y_h) \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h.\]

So the result does not depend on u anymore, allowing to calculate it by temporarily casting x, y and z to (signed long long) and using their sign.
We can express this as „use x_h \wedge y_h if z_h=1 and use x_h \vee y_h if z_h = 0„.

An incoming carry bit d does not change this, it just allows for x_l + y_l + d < 2^{64}, which is sufficient for making the previous calculations work.

In a similar way subtraction can be dealt with.

The basic operations add, adc, sub, sbb, mul, xdiv (div is not available) have been implemented in this library for C. Feel free to use it according to the license (GPL). Addition and subtraction could be implemented in a similar way in Java, with the weirdness of declaring signed longs and using them as unsigned. For multiplication and division, native code would be needed, because Java lacks 128bit-integers. So the C-implementation is cleaner.

Share Button

Shift- and Rotation Functions


In a comment to Carry Bit: How does it work the question about shift and rotation functions has been asked. Which exist and how they work exactly depends off course on the CPU architecture, but I will try to give a high level overview anyway.

The following aspects have to be considered:

  • 8, 16, 32, 64,… Bit: How many Bits are affected by the operation?
  • Shift vs. Rotate: Shift moves bits to the left or right, filling in 0 or 1 in the positions that get freed by this. The bits that get shifted out usually go to the carry bit, so the last one of these wins, if the shift is done by more than one bit. Rotation means what some English knowledge suggests: The bits shifted out are moved in on the other side.
  • ROL and ROR vs RCL and RCR: Rotation through Carry (RCL or RCR) means that the carry bit participates as bit 9., 17., 33. or 65. ROL and ROR leave the carry bit alone, at least do not use it as input.
  • Logical vs. Arithmetic Shift: This deals with the „sign“. For arithmetic Shift the number is considered to be signed in two’s complement. Therefore right shift does not necessarily introduce a 0 as new most significant bit, but depending on the sign introduces a 0 or a 1.

Current CPUs have barrel shifts or something like that built into the hardware, so shift and rotate operations by more than one bit position are much faster than sequences of single shifts. This technology has been around on better CPUs for decades and is not at all new.

How the commands are called exactly and possibly some details about there operations depend on the CPU architecture.

Examples (8 bit) for shifting and rotating by one bit position:

Vorher (Carrybit in Klammern)ROL
(rotate left)
(rotate left through carry)
(rotate right)
(rotate right through carry)
(arithmetic/logical shift left)
(logical shift right)
(arithmetic shift right)
00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)
00000000 (1)00000000 (1)00000001 (0)00000000 (1)10000000 (0)00000000 (0)00000000 (0)00000000 (0)
00000001 (0)00000010 (0)00000010 (0)10000000 (0)00000000 (1)00000010 (0)00000000 (1)00000000 (1)
00000001 (1)00000010 (1)00000011 (0)10000000 (1)10000000 (1)00000010 (0)00000000 (1)00000000 (1)
10000000 (0)00000001 (0)00000000 (1)01000000 (0)01000000 (0)00000000 (1)01000000 (0)11000000 (0)
10000000 (1)00000001 (1)00000001 (1)00000001 (1)11000000 (0)00000000 (1)01000000 (0)11000000 (0)
11111111 (0)11111111 (0)11111110 (1)11111111 (0)01111111 (1)11111110 (1)01111111 (1)11111111 (1)
11111111 (1)11111111 (1)11111111 (1)11111111 (1)11111111 (1)11111110 (1)01111111 (1)11111111 (1)

These shift and rotate operations are needed to express multiplications by powers of two and division by powers of two. In C, C++, C#, Perl, Java and some other porgramming languages these operations are included and written as „<<" (for ASL or SHL), ">>“ (for SHR) and „>>>“ for ASR.


Share Button

Carry Bit: How does it work?


Most of us know from elementary school how to add multi-digit numbers on paper. Usage of the carry bit is the same concept, but not for base 10, not even for base 2, but for base 256 (in the old 8-bit-days), base 65536 (in the almost as old 16-bit-days), base 4294967296 (32 bit) or base 18446744073709551616 (64 bit), whatever is the word width of the CPU. Always using powers of two is common today, but it is quite possible that this will change from bits to trits (having three possible values, -1, 0 and 1) in the far future.

I do not think that application development should be dealing with low level stuff like bits and bytes, but currently common programming languages like Java, C, C++, C# and more would not let you get away with that, you have to be aware of the bits underlying their numeric types to some extent. So it is a good idea to spend some effort on understanding this. Unfortunately all of these languages are lacking the carry bit, but it is anyway useful to understand the concept.

I have been writing software since the beginning of the 80es. The computers available to me at that time where 8-bit with a 6502- or 6510-CPU and 1 MHz clock speed. Yes, it was 1 MHz, not 1 GHz. It was possible to program them in some BASIC-dialect, but that was quite useless for many purposes because it was simply too slow. Compiled languages existed, but were too clumsy and too big to be handled properly on those computers, at least the ones that I have seen. So assembly language was the way to go. In later years I have also learned to use the 680×0 assembly language and the 80×86 assembly language, but after the mid 90es that has not happened any more. An 8-bit CPU can add two 8-bit numbers and yield an 8-bit result. For this two variants need to be distinguished, namely signed and unsigned integers. For signed numbers it is common to use 2’s complement. That means that the highest bit encodes the sign. So all numbers from 0 to 127 are positive integers as expected. 127 has the 8-bit representation 01111111. Now it would be tempting to assume that 10000000 stands for the next number, which would be +128, but it does not. Having the highest bit 1 makes this a negative number, so this is -128. Those who are familiar with modular arithmetic should find this easily understandable, it is just a matter of choosing the representatives for the residue classes. But this should not disturb you, if you have no recent experience with modular arithmetic, just accept the fact that 10000000 stands for -128. Further increments of this number make it less negative, so 10000001 stands for -127 and 11111111 for -1. For unsigned numbers, the 8 bits are used to express any number from 0 to 255.

For introducing the carry bit let us start with unsigned integral numbers. The possible values of a word are 0 to 2^n-1 where n is the word width in bits, which would be 8 in our example. Current CPUs have off course 64-bit word width, but that does not change the principle, so we stick with 8-bit to make it more readable. Just use your imagination for getting this to 32, 64, 96 or 128 bits.

So now the bit sequence 11111111 stands for 255. Using an assembly language command that is often called ADD or something similar, it is possible to add two such numbers. This addition can typically be performed by the CPU within one or two clock cycles. The sum of two 8-bit numbers is in the range from 0 through 510 (111111110 in binary), which is a little bit too much for one byte. One bit more would be sufficient to express this result. The workaround is to accept the lower 8 bits as the result, but to retain the upper ninth bit, which can be 0 or 1, in the so called carry bit or carry flag. It is possible to query it and use a different program flow depending on it, for example for handling overflows, in case successive operation cannot handle more than 8 bit. But there is also an elegant solution for adding numbers that are several bytes (or several machine words) long. From the second addition onwards a so called ADC („add with carry“) is used. The carry bit is included as third summand. This can create results from 0 to 511 (111111111 in binary). Again we are getting a carry bit. This can be continued until all bytes from both summands have been processed, just using 0 if one summand is shorter than the other one. If the carry bit is not 0, one more addition with both summand 0 and the carry bit has to be performed, yielding a result that is longer than the longer summand. This can off course also be achieved by just assuming 1, but this is really an implementation detail.

So it is possible to write a simple long integer addition in assembly language. One of the most painful design mistakes of current programming languages, especially of C is not providing convenient facilities to access the carry bit, so a lot of weird coding is used to work around this when writing a long integer arithmetic. Usually 64-bit arithemetic is used to do 32-bit calculations and the upper 32 bits are used for the carry bit. Actually, it is not that hard to recover the carry bit, but it is anyway a bit annoying.

Subtraction of long integers can be done in a quite similar way, using something like SBC („subtract with carry“) or SBB („subtract with borrow“), depending on how the carry bit is interpreted when subtracting.

For signed integer special care has to be taken for the highest bit of the highest word of each summand, which is the sign. Often a so called overflow but comes in handy, which allows to recognize if an additional machine word is needed for the result.

Within the CPU of current 64 bit hardware it could theoretically be possible to do the 64-bit addition internally bit-wise or byte-wise one step after the other. I do not really know the implementation details of ARM, Intel and AMD, but I assume that much more parallelism is used for performing such operation within one CPU cycle for all 64 bits. It is possible to use algorithms for long integer addition that make use of parallel computations and that can run much faster than what has been described here. They work for the bits and bytes within the CPU, but they can also be used for very long numbers when having a large number of CPUs, most typically in a SIMD fashion that is available on graphics devices misused for doing calculations. I might be willing to write about this, if interest is indicated by readers.

It is quite interesting to look how multiplication, division, square roots, cube roots and more are calculated (or approximated). I have a lot of experience with that so it would be possible to write about hat. In short these operations can be done quite easily on modern CPUs, because they have already quite sophisticated multiplication and division functions in the assembly language level, but I have off course been able to write such operations even for 8-bit CPUs lacking multiplication and division commands. Even that I could cover, but that would be more for nostalgic reasons. Again there are much better algorithms than the naïve ones for multiplication of very long integers.


Share Button