Indexing of Arrays and Lists

We index arrays with integers. Lists also, at least the ones that allow random access. And sizes of collections are also integers.
This allows for 2^{31}-1=2147483647 entries in Java and typical JVM languages, because integers are actually considered to be 32bit. Actually we could think of one more entry, using indices 0..2147483647, but then we would not be able to express the size in an signed integer. This stuff is quite deeply built into the language, so it is not so easy to break out of this. And 2’000’000’000 entries are a lot and take a lot of time to process. At least it was a lot in the first few years of Java. There should have been an unsigned variant of integers, which would in this case allow for 4’000’000’000 entries, when indexing by an uint32, but that would not really solve this problem. C uses 64 bit integers for indexing of arrays on 64 bit systems.

It turns out that we would like to be able to index arrays using long instead of int. Now changing the java arrays in a way that they could be indexed by long instead of int would break a lot of compatibility. I think this is impossible, because Java claims to retain very good backward compatibility and this reliability of both the language and the JVM has been a major advantage. Now a second type of arrays, indexed by long, could be added. This would imply even more complexity for APIs like reflection, that have to deal with all cases for parameters, where it already hurts that the primitives are no objects and arrays are such half-objects. So it will be interesting, what we can find in this area in the future.

For practical use it is a bit easier. We can already be quite happy with a second set of collections, let them be called BigCollections, that have sizes that can only be expressed with long and that are indexed in cases where applicable with longs. Now it is not too hard to program a BigList by internally using an array of arrays or an array of arrays of arrays and doing some arithmetic to calculate the internal indices from the long (int64) index given in the API. Actually we can buy some performance gain when resizing happens, because this structure, if well done, allows for more efficient resizing. Based on this all kinds of big collections could be built.

Share Button

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.