How to recover the Carry Bit

As frequent readers might have observed, I like the concept of the Carry Bit as it allows for efficient implementations of long integer arithmetic, which I would like to use as default integer type for most application development. And unfortunately such facilities are not available in high level languages like C and Java. But it is possible to recover the carry bit from what is available in C or Java, with some extra cost of performance, but maybe neglectable, if the compiler does a good optimization on this. We might assume gcc on a 64-Bit-Linux. It should be possible to do similar things on other platforms.

So we add two unsigned 64-bit integers x and y to a result

    \[z\equiv x+y \mod 2^{64}\]


    \[0 \le z < 2^{64}\]

using the typical „long long“ of C. We assume that



    \[x_h \in \{0,1\}\]


    \[0 \le x_l < 2^{63}\]

. In the same way we assume y=2^{63}y_h + y_l and z=2^{63}z_h + z_l with the same kind of conditions for x_h, y_h, z_h or x_l, y_l, z_l, respectively.

Now we have

    \[0 \le x_l+y_l \le 2^{64}-2\]

and we can see that

    \[x_l + y_l = 2^{63}u + z_l\]

for some

    \[u\in \{0.1\}\]

And we have

    \[x+y = 2^{64}c+z\]



is the carry bit.
When looking just at the highest visible bit and the carry bit, this boils down to

    \[2c+z_h = x_h + y_h + u\]

This leaves us with eight cases to observe for the combination of x_h, y_h and u:


Or we can check all eight cases and find that we always have

    \[c = x_h \wedge\neg z_h \vee y_h \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h\]


    \[c = (x_h \vee y_h) \wedge\neg z_h \vee x_h \wedge y_h \wedge z_h.\]

So the result does not depend on u anymore, allowing to calculate it by temporarily casting x, y and z to (signed long long) and using their sign.
We can express this as „use x_h \wedge y_h if z_h=1 and use x_h \vee y_h if z_h = 0„.

An incoming carry bit d does not change this, it just allows for x_l + y_l + d < 2^{64}, which is sufficient for making the previous calculations work.

In a similar way subtraction can be dealt with.

The basic operations add, adc, sub, sbb, mul, xdiv (div is not available) have been implemented in this library for C. Feel free to use it according to the license (GPL). Addition and subtraction could be implemented in a similar way in Java, with the weirdness of declaring signed longs and using them as unsigned. For multiplication and division, native code would be needed, because Java lacks 128bit-integers. So the C-implementation is cleaner.

Share Button

Shift- and Rotation Functions


In a comment to Carry Bit: How does it work the question about shift and rotation functions has been asked. Which exist and how they work exactly depends off course on the CPU architecture, but I will try to give a high level overview anyway.

The following aspects have to be considered:

  • 8, 16, 32, 64,… Bit: How many Bits are affected by the operation?
  • Shift vs. Rotate: Shift moves bits to the left or right, filling in 0 or 1 in the positions that get freed by this. The bits that get shifted out usually go to the carry bit, so the last one of these wins, if the shift is done by more than one bit. Rotation means what some English knowledge suggests: The bits shifted out are moved in on the other side.
  • ROL and ROR vs RCL and RCR: Rotation through Carry (RCL or RCR) means that the carry bit participates as bit 9., 17., 33. or 65. ROL and ROR leave the carry bit alone, at least do not use it as input.
  • Logical vs. Arithmetic Shift: This deals with the „sign“. For arithmetic Shift the number is considered to be signed in two’s complement. Therefore right shift does not necessarily introduce a 0 as new most significant bit, but depending on the sign introduces a 0 or a 1.

Current CPUs have barrel shifts or something like that built into the hardware, so shift and rotate operations by more than one bit position are much faster than sequences of single shifts. This technology has been around on better CPUs for decades and is not at all new.

How the commands are called exactly and possibly some details about there operations depend on the CPU architecture.

Examples (8 bit) for shifting and rotating by one bit position:

Vorher (Carrybit in Klammern)ROL
(rotate left)
(rotate left through carry)
(rotate right)
(rotate right through carry)
(arithmetic/logical shift left)
(logical shift right)
(arithmetic shift right)
00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)00000000 (0)
00000000 (1)00000000 (1)00000001 (0)00000000 (1)10000000 (0)00000000 (0)00000000 (0)00000000 (0)
00000001 (0)00000010 (0)00000010 (0)10000000 (0)00000000 (1)00000010 (0)00000000 (1)00000000 (1)
00000001 (1)00000010 (1)00000011 (0)10000000 (1)10000000 (1)00000010 (0)00000000 (1)00000000 (1)
10000000 (0)00000001 (0)00000000 (1)01000000 (0)01000000 (0)00000000 (1)01000000 (0)11000000 (0)
10000000 (1)00000001 (1)00000001 (1)00000001 (1)11000000 (0)00000000 (1)01000000 (0)11000000 (0)
11111111 (0)11111111 (0)11111110 (1)11111111 (0)01111111 (1)11111110 (1)01111111 (1)11111111 (1)
11111111 (1)11111111 (1)11111111 (1)11111111 (1)11111111 (1)11111110 (1)01111111 (1)11111111 (1)

These shift and rotate operations are needed to express multiplications by powers of two and division by powers of two. In C, C++, C#, Perl, Java and some other porgramming languages these operations are included and written as „<<" (for ASL or SHL), ">>“ (for SHR) and „>>>“ for ASR.


Share Button

Carry Bit: How does it work?


Most of us know from elementary school how to add multi-digit numbers on paper. Usage of the carry bit is the same concept, but not for base 10, not even for base 2, but for base 256 (in the old 8-bit-days), base 65536 (in the almost as old 16-bit-days), base 4294967296 (32 bit) or base 18446744073709551616 (64 bit), whatever is the word width of the CPU. Always using powers of two is common today, but it is quite possible that this will change from bits to trits (having three possible values, -1, 0 and 1) in the far future.

I do not think that application development should be dealing with low level stuff like bits and bytes, but currently common programming languages like Java, C, C++, C# and more would not let you get away with that, you have to be aware of the bits underlying their numeric types to some extent. So it is a good idea to spend some effort on understanding this. Unfortunately all of these languages are lacking the carry bit, but it is anyway useful to understand the concept.

I have been writing software since the beginning of the 80es. The computers available to me at that time where 8-bit with a 6502- or 6510-CPU and 1 MHz clock speed. Yes, it was 1 MHz, not 1 GHz. It was possible to program them in some BASIC-dialect, but that was quite useless for many purposes because it was simply too slow. Compiled languages existed, but were too clumsy and too big to be handled properly on those computers, at least the ones that I have seen. So assembly language was the way to go. In later years I have also learned to use the 680×0 assembly language and the 80×86 assembly language, but after the mid 90es that has not happened any more. An 8-bit CPU can add two 8-bit numbers and yield an 8-bit result. For this two variants need to be distinguished, namely signed and unsigned integers. For signed numbers it is common to use 2’s complement. That means that the highest bit encodes the sign. So all numbers from 0 to 127 are positive integers as expected. 127 has the 8-bit representation 01111111. Now it would be tempting to assume that 10000000 stands for the next number, which would be +128, but it does not. Having the highest bit 1 makes this a negative number, so this is -128. Those who are familiar with modular arithmetic should find this easily understandable, it is just a matter of choosing the representatives for the residue classes. But this should not disturb you, if you have no recent experience with modular arithmetic, just accept the fact that 10000000 stands for -128. Further increments of this number make it less negative, so 10000001 stands for -127 and 11111111 for -1. For unsigned numbers, the 8 bits are used to express any number from 0 to 255.

For introducing the carry bit let us start with unsigned integral numbers. The possible values of a word are 0 to 2^n-1 where n is the word width in bits, which would be 8 in our example. Current CPUs have off course 64-bit word width, but that does not change the principle, so we stick with 8-bit to make it more readable. Just use your imagination for getting this to 32, 64, 96 or 128 bits.

So now the bit sequence 11111111 stands for 255. Using an assembly language command that is often called ADD or something similar, it is possible to add two such numbers. This addition can typically be performed by the CPU within one or two clock cycles. The sum of two 8-bit numbers is in the range from 0 through 510 (111111110 in binary), which is a little bit too much for one byte. One bit more would be sufficient to express this result. The workaround is to accept the lower 8 bits as the result, but to retain the upper ninth bit, which can be 0 or 1, in the so called carry bit or carry flag. It is possible to query it and use a different program flow depending on it, for example for handling overflows, in case successive operation cannot handle more than 8 bit. But there is also an elegant solution for adding numbers that are several bytes (or several machine words) long. From the second addition onwards a so called ADC („add with carry“) is used. The carry bit is included as third summand. This can create results from 0 to 511 (111111111 in binary). Again we are getting a carry bit. This can be continued until all bytes from both summands have been processed, just using 0 if one summand is shorter than the other one. If the carry bit is not 0, one more addition with both summand 0 and the carry bit has to be performed, yielding a result that is longer than the longer summand. This can off course also be achieved by just assuming 1, but this is really an implementation detail.

So it is possible to write a simple long integer addition in assembly language. One of the most painful design mistakes of current programming languages, especially of C is not providing convenient facilities to access the carry bit, so a lot of weird coding is used to work around this when writing a long integer arithmetic. Usually 64-bit arithemetic is used to do 32-bit calculations and the upper 32 bits are used for the carry bit. Actually, it is not that hard to recover the carry bit, but it is anyway a bit annoying.

Subtraction of long integers can be done in a quite similar way, using something like SBC („subtract with carry“) or SBB („subtract with borrow“), depending on how the carry bit is interpreted when subtracting.

For signed integer special care has to be taken for the highest bit of the highest word of each summand, which is the sign. Often a so called overflow but comes in handy, which allows to recognize if an additional machine word is needed for the result.

Within the CPU of current 64 bit hardware it could theoretically be possible to do the 64-bit addition internally bit-wise or byte-wise one step after the other. I do not really know the implementation details of ARM, Intel and AMD, but I assume that much more parallelism is used for performing such operation within one CPU cycle for all 64 bits. It is possible to use algorithms for long integer addition that make use of parallel computations and that can run much faster than what has been described here. They work for the bits and bytes within the CPU, but they can also be used for very long numbers when having a large number of CPUs, most typically in a SIMD fashion that is available on graphics devices misused for doing calculations. I might be willing to write about this, if interest is indicated by readers.

It is quite interesting to look how multiplication, division, square roots, cube roots and more are calculated (or approximated). I have a lot of experience with that so it would be possible to write about hat. In short these operations can be done quite easily on modern CPUs, because they have already quite sophisticated multiplication and division functions in the assembly language level, but I have off course been able to write such operations even for 8-bit CPUs lacking multiplication and division commands. Even that I could cover, but that would be more for nostalgic reasons. Again there are much better algorithms than the naïve ones for multiplication of very long integers.

Share Button