Overflow of Integer Types

Deutsch

Handling of integral numbers has always been one of the basic capabilities of our computing devices. Any common programming language since the fifties provides this in some way.

There are some significant differences between the ways integral numbers are handled in different programming languages.

Dealing with the overflow

There are several approaches:

1. Integral numbers can have any number of digits, as far as the memory allows. Results of calculations that need more bytes to be stored than the input are stored in as many bytes as needed to be represented completely. There is no need to deal with overflow, unless the numbers are so big and so many that they eat up the available memory. Examples: Common Lisp, Ruby
2. Integral numbers have a fixed number of bits. If a calculation results in a number that cannot be expressed in these many bits, the upper bits are tacitly discarded, unless each operation is accompanied by elaborate indirect checks for overflow. Examples: C, Java
3. Integral numbers have a fixed number of bits. If a calculation results in a number that cannot be expressed in these many bits, an exception is thrown. I do not have any examples, maybe someone can suggest an example.
4. Integral numbers have a fixed number of bits. For results of a multiplication, a type with twice as many bits as the two factors is used. For results of addition, a type with as many bits as the summands is used, but a carry bit is provided, too. This is the extra bit that is needed to express the result in all possible cases. Using this carry bit and feeding it into the next addition, it is possible to add numbers of arbitrary length, as needed for solution 1. Multiplication, Division and Subtraction can be implemented in a similar manner. Example: Assembly language
5. Not integers are available and floating point numbers have to be used instead. Example: Lua

Evaluation

Solution 1 is exactly what is needed for application development. Counting of bits, checking for overflow and such practices are just absurd during the development of business logic. This is quite a strong argument in favor of using Ruby (or Lisp) for application development.

Solution 2 is simply crap. I consider this the most serious design bug of C and Java. Rounding is something that we can accept in many cases, but that discards the lower bits. Here we tacitly discard the upper bits. Average software developers do not really want to deal with these issues, sometimes they are simply ignored, because the bugs do not occur frequently and are not discovered during the testing phase. Throwing away the carry bit, which contains very important information for implementing solution 1 or for just recognizing overflows is not a good idea. There are workarounds for this, but they double the runtime of certain calculation intensive software.

Solution 3 would be acceptable, but I consider solution 4 better. Low level code should leave it to the developer if an exception is thrown or if the situation can be handled.

Solution 4 is quite acceptable and useful for low level programming, but not for typical application development. Sometimes access to bits and bytes is needed explicitly, for example to talk to external hardware, to implement the lower layers of network protocols or similar issues. It can be useful for very performance critical calculations, where it can be guaranteed that no overflows occur. The possibility to deal with a carry bit at least optionally cannot do any harm. Unfortunately this is currently only (or almost only?) possible in assembly languages. For implementing integer arithmetic for languages like Ruby or Lisp, some implementation that works with solution 2 by doing crappy workarounds needs to be provided, but for common CPU-architectures it is possible to provide an implementation in assembly language based on solution 4, that will run about twice as fast.

Solution 5 is just an unnecessary restriction.

Conclusion

Many software developers think that this does not interest them or is not their problem. I recommend to take this issue serious. The bugs caused by the tacit overflow are hard to find, because they can occur anywhere and they might not show during testing. But then they occur with some weird combination of data on the productive system. Who wants to find the cause of a weird bug, that surfaces far away from the real cause? Only solution 1 allows the application developer to safely ignore these problems.

6 Kommentare