Borrow and Carry Bit for Subtraction

Similar to the usage of the carry bit when adding there are mechanisms for subtracting that allow to integrate the result of subtraction of the lower bits into the subtraction of the next higher block of bits, where necessary.

There are two ways to do this, that are trivially equivalent by a simple not operation:

  • borrow bit (also borrow flag)
  • carry bit (also carry flag)

Often CPUs use what is the carry bit for addition and interpret it as borrow bit for subtraction.

Here are the two ways:

Borrow Bit

It is assumed, that the CPU-word is n bits long, so calculations are performed modulo N=2^n. Further it is assumed, that the range 0\ldots N-1 is preferred, so all sign issues are excluded for the moment.

When subtracting two numbers x, y from each other like x-y and y > x, the provided result is

    \[x-y+N \equiv x-y \mod N\]

and the borrow bit is set (b=1), to indicate that the subtraction caused „underflow“, which had to be corrected by added N in order to get into the desired range.

In the „normal“ case where y \le x, the provided result is simply


and the borrow bit is not set (b=0).

The next round of subtraction takes the borrow bit into account and calculates x-y-b, where the condition becomes y+b > x and the result is

    \[x-y-b+N \equiv x-y \mod N\]



respectively. This is how some of the older readers used to do it in school on paper, but of course with N=10.

Now the typical integer arithmetic of current CPUs uses Two’s complement, which means that -y=(\mathrm{NOT}\; y)+1. Combining this with the previous results in calculating

    \[x-y = x + (\mathrm{NOT}\; y) + 1 - b \mod N\]

At this point some CPU-designers found it more natural, to use the carry bit c=1-b instead of the borrow bit b.

Carry Bit

When subtracting two numbers x, y from each other like x-y and we have y > x, the provided result is

    \[x-y+N \equiv x-y \mod N\]

and the carry bit is not set (c=0), to indicate that the subtraction caused „underflow“, which had to be corrected by added N in order to get into the desired range.

In the „normal“ case where y \le x, the provided result is simply


and the carry bit is set (c=1).

The next round of subtraction takes the borrow bit into account and calculates x-y-1+c, where the condition becomes y+1-c > x and the result is

    \[x-y-1+c+N \equiv x-y \mod N\]




Now two’s complement with -y=(\mathrm{NOT}\; y)+1 this can be written as

    \[x-y = x + (\mathrm{NOT}\; y) + 1 - b \mod N\]

or with c=1-b

    \[x-y = x + (\mathrm{NOT}\; y) + c \mod N\]

These two ways are really equivalent and easily transformed into each other. Neither of them provides big advantages, apart from the fact that we have the unnecessary confusion, because it depends on the CPU design, which of the two variants is used.

Recovery of Borrow or Carry bit

The borrow bit is calculated and used during subtractions that are done at assembly language level, but higher languages like C or Java do provide access to this information. It is relatively easy to recover the carry bit in the case of addition based on x, y and x+y \mod N.

This possible as well for the subtraction. Quite easily the comparison between x and y or y+b could be done before the subtraction. This would work, but it is kind of inefficient, because under the hood the comparison is just a subtraction with discarding the result and keeping the flags. So the subtraction is performed twice. This might not be such a bad idea, because a compiler could recognize it or the work of subtracting twice could be neglectable compared to the logic for an „optimized“ recovery, based on some logic expression of certain bits from x, y and x-y \mod N.


Share Button

Carry Bit: How does it work?


Most of us know from elementary school how to add multi-digit numbers on paper. Usage of the carry bit is the same concept, but not for base 10, not even for base 2, but for base 256 (in the old 8-bit-days), base 65536 (in the almost as old 16-bit-days), base 4294967296 (32 bit) or base 18446744073709551616 (64 bit), whatever is the word width of the CPU. Always using powers of two is common today, but it is quite possible that this will change from bits to trits (having three possible values, -1, 0 and 1) in the far future.

I do not think that application development should be dealing with low level stuff like bits and bytes, but currently common programming languages like Java, C, C++, C# and more would not let you get away with that, you have to be aware of the bits underlying their numeric types to some extent. So it is a good idea to spend some effort on understanding this. Unfortunately all of these languages are lacking the carry bit, but it is anyway useful to understand the concept.

I have been writing software since the beginning of the 80es. The computers available to me at that time where 8-bit with a 6502- or 6510-CPU and 1 MHz clock speed. Yes, it was 1 MHz, not 1 GHz. It was possible to program them in some BASIC-dialect, but that was quite useless for many purposes because it was simply too slow. Compiled languages existed, but were too clumsy and too big to be handled properly on those computers, at least the ones that I have seen. So assembly language was the way to go. In later years I have also learned to use the 680×0 assembly language and the 80×86 assembly language, but after the mid 90es that has not happened any more. An 8-bit CPU can add two 8-bit numbers and yield an 8-bit result. For this two variants need to be distinguished, namely signed and unsigned integers. For signed numbers it is common to use 2’s complement. That means that the highest bit encodes the sign. So all numbers from 0 to 127 are positive integers as expected. 127 has the 8-bit representation 01111111. Now it would be tempting to assume that 10000000 stands for the next number, which would be +128, but it does not. Having the highest bit 1 makes this a negative number, so this is -128. Those who are familiar with modular arithmetic should find this easily understandable, it is just a matter of choosing the representatives for the residue classes. But this should not disturb you, if you have no recent experience with modular arithmetic, just accept the fact that 10000000 stands for -128. Further increments of this number make it less negative, so 10000001 stands for -127 and 11111111 for -1. For unsigned numbers, the 8 bits are used to express any number from 0 to 255.

For introducing the carry bit let us start with unsigned integral numbers. The possible values of a word are 0 to 2^n-1 where n is the word width in bits, which would be 8 in our example. Current CPUs have off course 64-bit word width, but that does not change the principle, so we stick with 8-bit to make it more readable. Just use your imagination for getting this to 32, 64, 96 or 128 bits.

So now the bit sequence 11111111 stands for 255. Using an assembly language command that is often called ADD or something similar, it is possible to add two such numbers. This addition can typically be performed by the CPU within one or two clock cycles. The sum of two 8-bit numbers is in the range from 0 through 510 (111111110 in binary), which is a little bit too much for one byte. One bit more would be sufficient to express this result. The workaround is to accept the lower 8 bits as the result, but to retain the upper ninth bit, which can be 0 or 1, in the so called carry bit or carry flag. It is possible to query it and use a different program flow depending on it, for example for handling overflows, in case successive operation cannot handle more than 8 bit. But there is also an elegant solution for adding numbers that are several bytes (or several machine words) long. From the second addition onwards a so called ADC („add with carry“) is used. The carry bit is included as third summand. This can create results from 0 to 511 (111111111 in binary). Again we are getting a carry bit. This can be continued until all bytes from both summands have been processed, just using 0 if one summand is shorter than the other one. If the carry bit is not 0, one more addition with both summand 0 and the carry bit has to be performed, yielding a result that is longer than the longer summand. This can off course also be achieved by just assuming 1, but this is really an implementation detail.

So it is possible to write a simple long integer addition in assembly language. One of the most painful design mistakes of current programming languages, especially of C is not providing convenient facilities to access the carry bit, so a lot of weird coding is used to work around this when writing a long integer arithmetic. Usually 64-bit arithemetic is used to do 32-bit calculations and the upper 32 bits are used for the carry bit. Actually, it is not that hard to recover the carry bit, but it is anyway a bit annoying.

Subtraction of long integers can be done in a quite similar way, using something like SBC („subtract with carry“) or SBB („subtract with borrow“), depending on how the carry bit is interpreted when subtracting.

For signed integer special care has to be taken for the highest bit of the highest word of each summand, which is the sign. Often a so called overflow but comes in handy, which allows to recognize if an additional machine word is needed for the result.

Within the CPU of current 64 bit hardware it could theoretically be possible to do the 64-bit addition internally bit-wise or byte-wise one step after the other. I do not really know the implementation details of ARM, Intel and AMD, but I assume that much more parallelism is used for performing such operation within one CPU cycle for all 64 bits. It is possible to use algorithms for long integer addition that make use of parallel computations and that can run much faster than what has been described here. They work for the bits and bytes within the CPU, but they can also be used for very long numbers when having a large number of CPUs, most typically in a SIMD fashion that is available on graphics devices misused for doing calculations. I might be willing to write about this, if interest is indicated by readers.

It is quite interesting to look how multiplication, division, square roots, cube roots and more are calculated (or approximated). I have a lot of experience with that so it would be possible to write about hat. In short these operations can be done quite easily on modern CPUs, because they have already quite sophisticated multiplication and division functions in the assembly language level, but I have off course been able to write such operations even for 8-bit CPUs lacking multiplication and division commands. Even that I could cover, but that would be more for nostalgic reasons. Again there are much better algorithms than the naïve ones for multiplication of very long integers.


Share Button