Geometric and Harmonic Rounding

Most of our rounding methods are based on arithmetic rounding.
Please refer to the nomenclature described in Residue Class Rounding. So we can base our rounding on a metric and the most obvious choice is the absolute value of the difference of the two parameters, d(x,y)=|x-y|. That is what everybody does and nothing else really comes to our mind when we are talking about rounding. But this approach is based on the additive structure of our field or ring containing the numbers that we want to round. Another way of looking at this is that (in case we are talking of real numbers or some subset thereof) the choice of the rounded value is mostly based on seeing on which side of a boundary between the two candidates for the rounded values the original value lies, with special handling if it is exactly equal to this boundary. Now the this boundary is the arithmetic mean between the two candidates, therefore we call this arithmetic rounding. Just to give you an example:

We look at the „middle“ between two allowed numbers, for example usually something like 2.5 when rounding a number between 2 and 3 to integers and automatically assume that above that middle we round up, below we round down and exactly on the boundary we need additional rules. This middle is the arithmetic mean of the two neighboring allowed numbers after rounding and hence we can call this arithmetic rounding. All numbers x with 2 \le x < 2.5 are rounded to two, all numbers x with 2.5 < x \le 3 are rounded to three.

Means

We assume M=\Bbb R and N=\Bbb Z, so each real number should be rounded to an integral number. Also we assume d(x,y)=|x-y|.

Now assume we have a non-integral number x \in \Bbb R and an integral number n\in \Bbb Z such n< x< n+1. For convenience we write n'=n+1. Then we round x to n if x < n+0.5 and to n' if x > n+0.5, so n+0.5 = \frac{n+n'}{2} serves as boundary. Now there are many other ways to define a mean between two number n and n'. Some of the most common are

We could also define a arithmetic-harmonic mean in a similar way, but that is just the geometric mean. It is an interesting way to approach the geometric mean of matrices, which is somewhat hard to define otherwise because of the non-commutative multiplication.

For the users of spreadsheets, like included in LibreOffice, OpenOffice or MS-Office, you can find the arithmetic mean as function average, the geometric mean as geomean and the harmonic mean as harmean.
Others means can be defined along these lines. And we can see that:

  • a(n,n')=m_1(n,n')
  • g(n,n')=m_0(n,n')
  • h(n,n')=m_{-1}(n,n')
  • q(n,n')=m_2(n,n')
  • c(n,n')=m_3(n,n')
  • \min(n,n')=m_{-\infty}(n,n')
  • \max(n,n')=m_\infty(n,n')

Here is a visual explanation of arithmetic, geometric and harmonic mean:

Trapez_Harmonicus

The line x from K to L has the harmonic mean of a and c as a length. The geometric mean would be reached by moving that line to a position that the upper and lower part have the same shape, apart from scaling. And the arithmetic mean would be reached by moving K and L to the middle points of the lines AD and BC, respectively.

The ruby-gem long-decimal supports all of these methods for calculating a mean.

Using means to define rounding methods

Just to give you an idea that you can extend this further, the geometric and harmonic mean are now considered.
As can be seen negative and zero values for n and n' are causing some trouble or require some care. Actually we can deal with this by splitting our unrounded and rounded worlds into separate parts, here we take the number < 0, the \{0\} and the numbers > 0 as three different sets and do the rounding within them:

    \[M=M_1\cup M_2\cup M_3\]

with

    \[\bigwedge_{i\ne j}M_i\cap M_j = \emptyset\]

where we define

    \[M_1=\{m\in M : m < 0\}\]

and

    \[M_2=\{0\}\]

and

    \[M_3=\{m\in M : m > 0\}.\]

In a similar way we define N_1, N_2 and N_3 and define metrics d_1, d_2, d_3 separately. Then we just look at one partition i and consider rounding to be a mapping

    \[r_i: M_i\rightarrow N_i, x\mapsto r_i(x),\]

such that d_i(x, r_i(x)) is minimal.

The obvious assumption of rounding 0 to 0 releaves us from dealing with logarithms of zero or division by zero. We would see that anyway any number between 0 and 1 would be rounded to 1 for both harmonic and geometric rounding, because the boundary is 0, if we pick the right variant of the formula for calculating the mean. Also between -1 and 0 everything is rounded to -1 in our example. Otherwise the calculation of the boundary can be made to work, because then both factors are negative, so we draw the square root of a positive product, but then we need to replace the square root by its negative counterpart.

So for geometric rounding we define that x is rounded to n if x < g(n,n'), rounded to n' if x > g(n,n') and rounding for x=g(n,n') has to be defined by additional rules. But we define g(n,n') for negative values of n and n' to be -\sqrt{nn'}. This can also be expressed in terms of a metric:

    \[d_1(x,y)=|\log(-x)-\log(-y)|\;\; (x, y < 0),\]

    \[d_2(x,y)=0 \;\;(x=y=0)\]

and

    \[d_3(x,y)=|log(x)-log(y)| \;\; (x,y > 0).\]

Harmonic rounding is definable in a similar way:
We define that x is rounded to n if x < h(n,n'), rounded to n' if x > h(n,n') and rounding for x=h(n,n') has to be defined by additional rules. This can also be expressed in terms of a metric:

    \[d_1(x,y)=|\frac{1}{x}-\frac{1}{y}|\;\; (x, y < 0),\]

    \[d_2(x,y)=0 (x=y=0)\]

and

    \[d_3(x,y)=|\frac{1}{x}-\frac{1}{y}| (x,y > 0).\]

Quadratic and Cubic rounding can be defined in a similar way. It would even be possible to go for arithmetic-geometric rounding or harmonic-geometric rounding, but that would imply having to do an approximative iterative calculation of the boundary each time it is needed, while the rounding modes described above can still be achieved with relatively simple multiplication and addition operations.

In any case it is necessary to define the rule how to round numbers that lie exactly on the boundary, unless we can prove that the boundary is irrational.

That is the case for geometric, quadratic and cubic rounding, because if we have adjacent integers n and n+1 then we see that g(n, n+1)=\sqrt{n(n+1) is irrational for n>0 because n and n+1 cannot both be squares, so there is one prime number that divides either n or n+1 with an odd multiplicity, which already ensures that the square root is irrational. For the quadratic mean we have q(n,n+1)=\sqrt{\frac{n^2+(n+1)^2}{2}}, where the numerator of the fraction is odd, so two occurs with an odd multiplicity, which makes it irrational again. The similar argument applies for the cubic mean. For the harmonic mean we have h(n, n+1)=\frac{2n(n+1)}{2n+1}, where 2n+1 can be a power of 5, in which case we have to deal with the boundary issue.

The ruby-gem long-decimal supports geometric, harmonic, quadratic and cubic rounding in addition to the regular arithmetic rounding, that is referred to as „half“.

Harmonic and geometric rounding are important for some of the mechanisms of rounding with sum.
Based on this basis these algorithms will look more natural and no longer like some weird approach that needs to be copied from some untrusted source just to get over the nightmare of having to solve that issue.

Share Button

The language issue

Since I did not get a lot of feedback, possibly because wordpress denied to accept the vote, I can now see that continuing the blog in English or German seems to be equally popular or unpopular. I have provided the articles of almost three months almost completely in both languages, but now I intend to concentrate on the English version and provide a German translation for some articles, based on how much time I have and how much necessity I see for this and how many readers the English article gets.
If I do get a lot of votes, I will reconsider this decision.

Share Button

Comment Functionality

The comment function does not seem to work after some software upgrades.
It is necessary to check write permissions on the directory /wp-content/plugins/si-captcha-for-wordpress/captcha/cache, then it will work again.
So you can write comments again, I will only reject spam.

Share Button

Kommentarfunktion

Die Kommentarfunktion geht bei jedem Upgrade von WordPress verloren.
Damit sie funktioniert, werden Schreibrechte im Verzeichnis
/wp-content/plugins/si-captcha-for-wordpress/captcha/cache
benötigt. Die muss man nach jedem Softwareupdate überprüfen.
Jetzt sollte die Kommentarfunktion wieder gehen.

Share Button

Monads in Ruby

Monads? Easy!

A Monad over X simply is a Monoid in the Category of Endofunctors of X.

Monads? Really!

  • Monads are a concept in mathematics
  • Algebra is an area of mathematics
  • Category theory is an abstraction of algebra.
  • Monads are defined in category theory
  • Remember: in category theory we talk about infinity beyond cardinality of infinite sets…
  • The „idea“ has been transplanted into something used in programming languages…
  • Just think of stars in terms of astronomy, in terms of decoration and in terms of music or movies

Monad as Design Pattern

  • Actually Monads should be viewed as a design pattern
  • That describes the level of abstraction

Motivation

  • Pure functional programmers are poor guys when it comes to state
  • State is evil
  • State is not possible or hard
  • I/O is state, actually a segment of the outer world is manipulated
  • Monads encapsulate that
  • Even if you are happy with state (poor IT theology…):
    Multilevel nil problem 🙁

Monads in Functional Programming

  • Container type
  • wrap(foo):
    • class method
    • can also be called unit
    • Btw. arrogant Haskell guys call it return
  • pass(&block)
    • instance method
    • Can also be called bind
  • Optionally more operations (mjoin, empty, +,…)

Identity Monad

class Identity
    def initialize(value)
      @value = value
    end
end

def Identity.wrap(value)
  new(value)
end

class Identity
    def pass
        yield @value
    end
end

Axioms

  • Left-identity for pass:
    Calling pass on a newly wrapped value is the same as applying the block to that value
  • Right-identity for pass:
    Calling pass with a block that only calls wrap on its parameter results in the same as its target object
  • Nesting:
    Nesting pass blocks should be equivalent to calling them in sequence.

Does our Monad survive 1st law?

Identity.wrap(foo).pass do |value|
  f(value)
end

Is equivalent to

f(foo)

That is how we defined pass.

Does our Monad survive 2nd law?

bar.pass do |value|
  Identity.wrap(value)
end

Is equivalent to

bar

That is how we defined pass.

Does our Monad survive 3rd law?

bar.pass do |value_a|
    f(value_a)
end.pass do |value_b|
    g(value_b)
end

Is equivalent to

bar.pass do |value_a|
  f(value_a).pass do |value_b|
    g(valueb)
  end
end

That is how we defined pass.

Array as Monad

def Array.wrap(value)
    [ value ]
end

class Array
    def mjoin
        inject([]) { |combined, arr| combined + arr }
    end

    def pass(&block)
        map(&block).mjoin
    end
end

Does it work?

Have a look yourself…

More Monad Operations

Empty:

def Array::empty
    []
end

+:

Already there

Other Examples

  • Option (Some, None)

Links & Sources

This article is based on a speach given in the Ruby on Rails user group Switzerland.

Share Button

Swiss Perl Workshop 2015 in Olten

Please reserve the date in your calendar:
The Swiss Perl Workshop will take place on the 28th and 29th of August 2015 in Olten. The location can easily be reached by public transport, being only 5 min walking distance from the railroad station with 500 trains per day.
So if you are using the Perl programming language as part of your work or for private interest, this event is for you. More information will follow as soon as the web page is online.

Share Button

Rounding with sum

Deutsch

Often we have to round some numbers in such a way that the sum is correct afterwards. The most obvious case is maybe with percentage values, where a sum other than 100% is confusing for readers. But we may even have cases where a calculation becomes incorrect if the some constraint is violated. Anyway it can be useful to do some more advanced stuff behind the scenes to fulfill such a constraint. Now we could hope for the rounding operation to be an Endomorphism of the additive group, that means \bigwedge_{x,y\in M} r(x+y)=r(x)+r(y). At first glance it makes sense, error often compensate each other to make this true, but we can find counter examples for our usual rounding methods. Just think of the numbers x=1.4, y=2.3, z=3.3. We have x+y+z=7. Now we us a typical rounding method and we get r(x)=1 \wedge r(y)=2 \wedge r(z)=3 \Rightarrow r(x)+r(y)+r(z)=6 \ne 7. So it is no endomorphism, one counterexample is enough.

Is this any kind of new concept? Obviously we find printed media or web pages that are showing percentage values in such a way that they cheated for the partial values to have the anticipated sum. But it shows that such things happen officially every four years. We have events that are called „elections“ and parties or lists can be voted for. The parliament usually has a fixed number of members and if we disregard specialties that make the whole game more complex in some countries, in order to keep it accessible to logic, it is quite the same thing. Now parliament membership should be distributed according to the percentages of the popular votes. If we have n seats in the parliament, we have to round the exact percentages (as fractions of long integers) to multiples of \frac{100}{n} in such a way that the sum is correct. There are may established procedures for achieving this and depending on the procedure the result may differ a little bit. Did I mention the word „cheating“ above? No, we do not talk about cheating by deliberately counting wrong, off course nobody ever wanted to do something like that, but about ways to enforce an end result that uses up exactly the given number of parliament seats. Which procedure is considered to be the fairest changes almost every other year a little bit. Interestingly an equivalent issue occurs when distributing legislative seats to region. Even in the United States, where congressmen are elected by getting the plurality of votes in their district, this is applied to deciding how many congressmen will represent each state, in this case with the constraint that each state should have at least one congress seat (plus two senators).

If we do not need to write an election software, the question is often less delicate. For example a text should contain the sum of some rounded values in a reasonable accurate way. The majority of the readers will be happy. The very critical reader who knows anyway what I am writing here, will be happy as well, if it has been done well enough with inevitable cheating in the last digits. It should be observed that this seemingly trivial issues is quite involved, for example see apportionment paradox

As described in residue class rounding, we define sets M and N \subseteq M with a metric d: M\times M \rightarrow {\Bbb R_+}, (m, n) \mapsto d(m,n) \ge 0 and want to deal with maps r : M \rightarrow N having d(x, r(x)) kept small and possibly some other constraints. But since the correction („cheating“) should be part of the mapping, we are here rather looking at a mapping
s: M^n \rightarrow N^n, (m_1,\ldots,m_n) \mapsto (n_1,\ldots n_n) with \sum_{i=1}^n m_i = \sum_{i=1}^n n_i and for all i the value of d(m_i,n_i) is „small“. We could say, that the sum has to be rounded as well, but often it is already rounded. Think about the details of your special case when they occur and read the concepts… Purely rounding up or down will not work unless we have an exact edge case. We could desire to minimize error squares, which has the advantage that larger deviations have a larger effect, encouraging equal distribution of the corrections, for example we want
\sum_{i=1}^n (d(m_i, n_i))^2 minimal. It is possible to find examples that are non-unique. In this case a rule can be defined or random numbers can be used to resolve.
Another approach would be to minimize relative errors, like
\sum_{i=1: m_i \ne 0}^n (\frac{d(m_i,n_i)}{d(m_i,m_i)})^2, skipping summand with m_i=0 because they do not create any issues if we have 0\in N. Usually we do want to round 0 to 0.
A good alternative is to just look at the Apportionment methods for parliaments and the underlying algorithms and adjust them for the required usage:

Often our requirements may be different and especially lower in terms of the quality of the rounding than for parliament apportionment, but it is a good idea to think twice about this prior to just doing something naïve without thinking.

Question: how are the apportionment methods behaving in terms of absolute and relative error square optimization?

If there is serious interest, I might provide an implementation in some real programming language.

Often we encounter problems similar to this that can be deduced to this. Who has ever thought that parliament apportionment is actually an act of rounding?

Links

Here are some links to interesting pages and papers about algorithms and other topics related to this issue.

Share Button

Rundung mit Summe

English

Häufig muss man einige Zahlen auf einmal runden, aber die Summe soll hinterher „aufgehen“. Das kann man „summenerhaltendes Runden“ nennen, wobei unter Umständen auch die Summe gerundet wird. Der häufigste Fall sind Prozentwerte und wenn die gerundeten Zahlen nicht zusammen 100% ergeben, versteht das niemand. Da kann es schon besser sein, etwas schwieriger verständliche Dinge hinter den Kulissen zu tun. Kurz gesagt ist unsere Frage, ob die Rundung ein Endomorphismus der additiven Gruppe ist. Das bedeutet, dass \bigwedge_{x,y\in M} r(x+y)=r(x)+r(y). Auf den resten Blick sieht das vernünftig aus und die Fehler heben sich ja oft irgendwie weg. Aber wir können für eine gängige Rundungsmethode ein Gegenbeispiel finden: x=1.4, y=2.3, z=3.3. Wir haben x+y+z=7. Jetzt wird das gerundet und ergibt r(x)=1 \wedge r(y)=2 \wedge r(z)=3 \bigrightarrow r(x)+r(y)+r(z)=6 \ne 7. Ein Gegenbeispiel reicht und es ist kein Endomorphismus.

Ist das ein ganz neues Konzept? Es wird ja offensichtlich in fast allen Druckerzeugnissen, die irgendwie Prozentwerte so angeben, für die Gesamtsumme „gemogelt“. Aber es stellt sich heraus, dass das auch hochoffiziell alle vier Jahre passiert. Wir haben irgendwelche Ereignisse, die sich „Wahlen“ nennen und da kann man Parteien oder Listen für ein Parlament eine Stimme geben. Das Parlament hat meistens eine vorgegebene Anzahl von Sitzen, wenn wir mal Spezialitäten wie 5%-Sperrklausel und Überhangmandate in der deutschen Wahlpraxis vergessen, um es noch halbwegs für Logik zugänglich zu halten. Man stelle sich nur vor, was passiert, wenn 40 Parteien antreten und je 2.5% erhalten, um zu sehen, dass das System nicht korrekt ist, im Sinne einer korrekten Software. Nun sollte man meinen, die Parlamentssitze werden einfach im Verhältnis der Stimmen aufgeteilt. Wenn man n Parlamentssitze hat, werden die Stimmenanteile in Prozent auf Vielfache von \frac{100}{n} gerundet und zwar so, dass es am Schluss aufgeht. Es gibt viele Verfahren, um das zu bewerkstelligen und je nachdem, welches Verfahren man auswählt, kann bei denselben Stimmenzahlen die Sitzverteilung geringfügig variieren. Habe ich das Wort „Mogeln“ oben erwähnt? Es muss wohl Zufall sein. Welches Verfahren gerade am gerechtesten ist, wechselt also von Jahr zu Jahr immer wieder ein bisschen.

Wenn wir nicht gerade eine Wahlauswertungssoftware schreiben, ist in den meisten Fällen diese Frage weniger aufgeladen. Es soll in einem Text eine Summe von mehreren gerundeten Werten eingehalten werden und die Rundung soll „einigermaßen vernünftig“ erfolgen. Der oberflächliche Leser wird zufrieden sein, der etwas kritischere auch, weil die Summe stimmt und der Leser, der sich richtig Gedanken macht wird es auch verstehen, dass man da mogeln musste und dass man das einigermaßen sinnvoll getan hat. Man beachte, dass diese Thematik einige Überraschungen birgt, z.B. das Alabama-Paradoxon und das Wählerzuwachsparadoxon.

Wie in Restklassenrundung beschrieben, gehen wir von Mengen M und N \subseteq M mit einer Metrik d: M\times M \rightarrow {\Bbb R_+}, (m, n) \mapsto d(m,n) \ge 0 aus und wollen Abbildungen r : M \rightarrow N betrachten, bei denen d(x, r(x)) klein bleibt und eventuell ein paar weitere Bedingungen erfüllt sein sollen. Da aber die Korrektur schon Teil der Abbildung sein soll, interessiert uns hier eigentlich eher so etwas wie eine Abbildung
s: M^n \rightarrow N^n, (m_1,\ldots,m_n) \mapsto (n_1,\ldots n_n) mit \sum_{i=1}^n m_i = \sum_{i=1}^n n_i und für alle i ist d(m_i,n_i) „klein“. Reine Auf- oder Abrundung kann also nicht die Lösung sein, da das nie aufgeht. Wir könnten die Fehlerquadrate minimieren, was den Vorteil hat, das große absolute Abweichungen sich stärker auswirken, so dass eine gleichmäßige Verteilung der Korrektur besser wegkommt, also z.B.
\sum_{i=1}^n (d(m_i, n_i))^2 ist minimal. Es lassen sich Beispiele konstruieren, wo das nicht eindeutig ist. In dem Fall kann man entweder eine Regel festlegen oder mit Zufallszahlen ermitteln welche der passenden Möglichkeiten zum Zuge kommt. Man kann auch versuchen, die relativen Fehler zu minimieren, also so etwas wie
\sum_{i=1: m_i \ne 0}^n (\frac{d(m_i,n_i)}{d(m_i,m_i)})^2, wobei man die Summanden mit m_i=0 weglassen muss, was nicht stört, wenn 0\in N ist, man also sowieso 0 auf 0 runden will.
Eine Alternative ist, sich einfach an den Sitzzuteilungsverfahren für Parlamente zu orientieren und deren Algorithmen sinngemäß für die hier beschriebene Aufgabe anzupassen:

Oft sind die Anforderungen an die „Qualität“ der Rundung niedriger als bei der Zuteilung von Parlamentssitzen, aber es lohnt sich schon, sich hierüber einige Gedanken zu machen statt einfach irgendwie mit einem naiven Ansatz zu runden.

Frage: Wie verhalten sich die oben aufgeführten Sitzzuteilungsverfahren zur Minimierung der absoluten und relativen Fehlerquadrate?

Bei genügendem Interesse kann ich einmal eine Implementierung für so eine Rundung hinzufügen.

Gelegentlich begegnen uns auch Probleme, die dem hier geschilderten sehr ähnlich sind oder die sich darauf zurückführen lassen. Wer hätte gedacht, dass die Zuteilung von Parlamentssitzen eigentlich eine Rundung ist?

Share Button

Random-Access and UTF-8

Deutsch

It is a nice thing to be able to use random access files and to have the possibility to efficiently move to any byte position for reading or writing.

This is even true for text files that have a fixed number of bytes per character, for example exactly one, exactly two or exactly four bytes per character. Maybe we should prefer the term „code point“ here.

Now the new standard for text files is UTF-8. In many languages each character is usually just one byte. But when moving to an arbitrary byte position this may be in the middle of a byte sequence comprising just one character (code point) and not at the beginning of a character. How should that be handled without reading the whole file up to that position?

It is not that bad, because UTF-8 is self synchronizing. It can be seen if a particular byte is the first byte of a byte sequence or a successive one. First bytes start with 0 or 11, successive bytes start with 10. So by moving forward or backward a little bit that can be handled. So going to a rough position is quite possible, when knowledge of the average number of bytes per character for that language is around.
But when we do not want to go to a rough character position or to an almost exact byte position, but to an exact character position, which is usually the requirement that we have, then things get hard.

We either need to read the file from the beginning to be sure or we need to use an indexing structure which helps starting in the middle and only completely reading a small section of the file. This can be done with extra effort as long as data is only appended to the end, but never overwritten in the middle of the file.

But dealing with UTF-8 and random access is much harder than with bytes. Indexing structures need to be maintained in memory when accessing the file or even as part of the file.

Share Button

2015

Frohes neues Jahr!
Happy New Year
Gott nytt år!
Godt Nyttår!
¡Próspero año nuevo!
Bonne année!
FELIX SIT ANNUS NOVUS
Felice Anno Nuovo!
Весёлого нового года!
السنة الجديدة المبتهجة
Een gelukkig nieuwjaar!

Share Button