Intervals

Intervals are subsets of a universe, that are defined by upper and lower boundaries. Typically we think about real numbers, but any totally ordered universe allows the definition of intervals.

Intervals are defined by lower and upper boundaries, which can be a limiting number or unlimited, typically written as \infty for the upper bound and -\infty for the lower bound. The boundaries can be included or excluded. So the following combinations exist for a universe X:

(-\infty, \infty)=X
unlimited
(-\infty, a] = \{ x \in X : x \le a\}
half open, lower unlimited
(-\infty, a) = \{ x \in X : x < a\}
open, lower unlimited
[b, \infty) = \{ x \in X : x \ge b\}
half open, upper unlimited
(b, \infty) = \{ x \in X : x > b\}
open, upper unlimited
(c, d) = \{ x \in X : c < x < d\}
open
[c, d) = \{ x \in X : c \le x < d\}
half open
(c, d] = \{ x \in X : c < x \le d\}
half open
[c, d] = \{ x \in X : c \le x \le d\}
closed
\emptyset
it is sometimes useful to consider the empty set as an interval as well

The words „open“ and „closed“ refer to our usual topology of real numbers, but they do not necessarily retain their topological meaning when we extend the concept to our typical data types. a, b, c and d in the notation above do not have to be members of X, as long as the comparison is defined between them and all members of X. So we could for example meaningfully define for X=\mathbb{Q} the interval (\sqrt{2}, \pi) = \{ x\in\mathbb{Q} : \sqrt{2} < x < \pi\}.

As soon as we do not imply X=\mathbb{R} we always have to make this clear… And \mathbb{R} is kind of hard to really work with in software on computers with physically limited memory and CPU power.

Intervals have some relevance in software systems.

We sometimes have a business logic that actually relies on them and instead programming somehow around it, it is clearer and cleaner to actually work with intervals. For example, we can have a public transport scheduling system and we deal with certain time intervals in which different schedules apply than during the rest of the day. Or we have a system that records downtimes of servers and services and these are quite naturally expressed as intervals of some date-time datatype. It is usually healthy to consider all the cases mentioned above rather than ignoring the question if the boundary with probability zero of actually happening or having ugly interval limits like 22:59:59.999.

The other case is interval arithmetic. This means, we do floating point calculations by taking into account that we have an inaccuracy. So instead of numbers we have intervals I. When we add two intervals, we get I+J=\{x+y : x\in I \wedge y\in J\}. In the same way we can multiply and subtract and even divide, as long as we can stay clear of zero in the denominator. Or more generally we can define f(I_1, I_2, \ldots, I_n)=\{f(x_1,x_2,\ldots,x_n) : x_1\in I_1 \wedge x_2 \in I_2 \wedge\ldots\wedge x_n\in I_n\}.
It does of course require some mathematical thinking to understand, if the result is an interval again or at least something we can deal with reasonably. Actually we are usually happy with replacing the result by an interval that is possibly a superset of the real result, ideally the minimal superset that can be expressed with our boundary type.

At this point we will probably discover a desire to expand the concept of intervals in a meaningful way to complex numbers. We can do this by working with open disks like \{ z \in \mathbb{C} :  |z-z_0| < d\} or closed disks like \{ z \in \mathbb{C} :  |z-z_0| \le d\}. Or with rectangles based on two intervals I and J like \{ z = x+iy \in \mathbb{C} : x \in I \wedge y \in J \}.

These two areas are quite interesting and sometimes useful. Libraries have been written for both of them.

Often we discover, that intervals alone are not quite enough. We would like to do set operations with intervals, that is

union
A\cup B
intersection
A \cap B
set difference
A \setminus B

While the intersection works just fine, as long as we include the empty set \emptyset as an interval, unions and differences lead us to non-intervals. It turns out that interval-unions, sets that can be expressed as a union of a finite number of intervals, turn out to be a useful generalization, that is actually what we want to work with rather than with intervals. In this case we can drop the empty set as interval and just express it as the union of zero intervals.

There are some questions coming up, that are interesting to deal with:

normalization
Can we normalize interval-unions to some canonical form that allows safe and relyable comparison for equality?
adjacency
is our universe X actually discrete, so we can express all unlimited boundaries with closed boundaries?
interval lengths
Do we have a meaningful and useful way to measure the length of an interval or the total length of an interval-union, as long as they are limited? Or even for unlimited intervals?
collection interfaces
Do we want to implement a Set-interface in languages that have sets and an understanding of sets that would fit for intervals
implementation
How can we implement this ourselves?
implementation
Can we find useful implementations?

Having written a java library to support interval-unions on arbitrary Comparable types once in a project and having heard a speech about an interval library in Scala that ended up in using interval-unions in a pretty equivalent way, it might be interesting to write in the future about how to do this or what can be found in different languages to support us. For interval arithmetic some work has been done to create extensions or libraries for C and Fortran, that support this, while I was a student. So this is pretty old stuff and interesting mostly for the concepts, even if we are not going to move to Fortran because of this.

If there is interest I will write more about actual implementations and issues to address when using or writing them.

Links

Share Button

LF and CR LF in git and svn

Typically we observe that line endings of text file turn out to be „linefeed“ (LF or „\n“ or 0x0a or Ctrl-J) on Linux and Unix-like systems (including MacOSX), while they are „carriage return and linefeed“ (CR LF or „\r\n“ or 0x0d 0x0a or Ctrl-M Ctrl-J) on MS-Windows. See the little obstacles of interoperability.

This can become annoying and there is little reason for this. Most tools for editing, compiling and working with these files just understand both variants. It does become annoying when diffs are created between different files and even more so when scripts are turning out to have the „CR LF“ ending and the script interpreter given in the first line is not found, because the system tries to find one that has the otherwise invisible „CR“ in its file name. It also becomes kind of messy, when multiple CR-characters are present. CR-characters are annoying even on MS-Windows itself as soon as we use cygwin. Since in most cases the target system is a Linux system anyway and we just waste space with the unnecessary CR-character, it is actually in most cases a good idea to agree on not having these CR-characters at all in certain kinds of files.

The easiest way is to just set up git or subversion to change files from CR LF to LF on commit. So the repository only contains „clean“ files, at least from that moment onward.

This is accomplished in subversion by applying the following command on the files that we want to be kept on „LF“ only:

svn pset svn:eol LF

and then committing.

Git has a way to achieve this using the .gitattributes configuration file. So the .gitattributes file can contain something like this:

* text=auto eol=lf

Remark

Of course I recommend to use git instead of svn for new projects and to consider migrating existing projects from svn to git. But for this special aspect svn provides a slightly more powerful and more intuitive tool than git.

Links

Share Button