The language issue

Since I did not get a lot of feedback, possibly because wordpress denied to accept the vote, I can now see that continuing the blog in English or German seems to be equally popular or unpopular. I have provided the articles of almost three months almost completely in both languages, but now I intend to concentrate on the English version and provide a German translation for some articles, based on how much time I have and how much necessity I see for this and how many readers the English article gets.
If I do get a lot of votes, I will reconsider this decision.

Share Button

Comment Functionality

The comment function does not seem to work after some software upgrades.
It is necessary to check write permissions on the directory /wp-content/plugins/si-captcha-for-wordpress/captcha/cache, then it will work again.
So you can write comments again, I will only reject spam.

Share Button

Kommentarfunktion

Die Kommentarfunktion geht bei jedem Upgrade von WordPress verloren.
Damit sie funktioniert, werden Schreibrechte im Verzeichnis
/wp-content/plugins/si-captcha-for-wordpress/captcha/cache
benötigt. Die muss man nach jedem Softwareupdate überprüfen.
Jetzt sollte die Kommentarfunktion wieder gehen.

Share Button

Monads in Ruby

Monads? Easy!

A Monad over X simply is a Monoid in the Category of Endofunctors of X.

Monads? Really!

  • Monads are a concept in mathematics
  • Algebra is an area of mathematics
  • Category theory is an abstraction of algebra.
  • Monads are defined in category theory
  • Remember: in category theory we talk about infinity beyond cardinality of infinite sets…
  • The „idea“ has been transplanted into something used in programming languages…
  • Just think of stars in terms of astronomy, in terms of decoration and in terms of music or movies

Monad as Design Pattern

  • Actually Monads should be viewed as a design pattern
  • That describes the level of abstraction

Motivation

  • Pure functional programmers are poor guys when it comes to state
  • State is evil
  • State is not possible or hard
  • I/O is state, actually a segment of the outer world is manipulated
  • Monads encapsulate that
  • Even if you are happy with state (poor IT theology…):
    Multilevel nil problem 🙁

Monads in Functional Programming

  • Container type
  • wrap(foo):
    • class method
    • can also be called unit
    • Btw. arrogant Haskell guys call it return
  • pass(&block)
    • instance method
    • Can also be called bind
  • Optionally more operations (mjoin, empty, +,…)

Identity Monad

class Identity
    def initialize(value)
      @value = value
    end
end

def Identity.wrap(value)
  new(value)
end

class Identity
    def pass
        yield @value
    end
end

Axioms

  • Left-identity for pass:
    Calling pass on a newly wrapped value is the same as applying the block to that value
  • Right-identity for pass:
    Calling pass with a block that only calls wrap on its parameter results in the same as its target object
  • Nesting:
    Nesting pass blocks should be equivalent to calling them in sequence.

Does our Monad survive 1st law?

Identity.wrap(foo).pass do |value|
  f(value)
end

Is equivalent to

f(foo)

That is how we defined pass.

Does our Monad survive 2nd law?

bar.pass do |value|
  Identity.wrap(value)
end

Is equivalent to

bar

That is how we defined pass.

Does our Monad survive 3rd law?

bar.pass do |value_a|
    f(value_a)
end.pass do |value_b|
    g(value_b)
end

Is equivalent to

bar.pass do |value_a|
  f(value_a).pass do |value_b|
    g(valueb)
  end
end

That is how we defined pass.

Array as Monad

def Array.wrap(value)
    [ value ]
end

class Array
    def mjoin
        inject([]) { |combined, arr| combined + arr }
    end

    def pass(&block)
        map(&block).mjoin
    end
end

Does it work?

Have a look yourself…

More Monad Operations

Empty:

def Array::empty
    []
end

+:

Already there

Other Examples

  • Option (Some, None)

Links & Sources

This article is based on a speach given in the Ruby on Rails user group Switzerland.

Share Button

Swiss Perl Workshop 2015 in Olten

Please reserve the date in your calendar:
The Swiss Perl Workshop will take place on the 28th and 29th of August 2015 in Olten. The location can easily be reached by public transport, being only 5 min walking distance from the railroad station with 500 trains per day.
So if you are using the Perl programming language as part of your work or for private interest, this event is for you. More information will follow as soon as the web page is online.

Share Button

Rounding with sum

Deutsch

Often we have to round some numbers in such a way that the sum is correct afterwards. The most obvious case is maybe with percentage values, where a sum other than 100% is confusing for readers. But we may even have cases where a calculation becomes incorrect if the some constraint is violated. Anyway it can be useful to do some more advanced stuff behind the scenes to fulfill such a constraint. Now we could hope for the rounding operation to be an Endomorphism of the additive group, that means \bigwedge_{x,y\in M} r(x+y)=r(x)+r(y). At first glance it makes sense, error often compensate each other to make this true, but we can find counter examples for our usual rounding methods. Just think of the numbers x=1.4, y=2.3, z=3.3. We have x+y+z=7. Now we us a typical rounding method and we get r(x)=1 \wedge r(y)=2 \wedge r(z)=3 \Rightarrow r(x)+r(y)+r(z)=6 \ne 7. So it is no endomorphism, one counterexample is enough.

Is this any kind of new concept? Obviously we find printed media or web pages that are showing percentage values in such a way that they cheated for the partial values to have the anticipated sum. But it shows that such things happen officially every four years. We have events that are called „elections“ and parties or lists can be voted for. The parliament usually has a fixed number of members and if we disregard specialties that make the whole game more complex in some countries, in order to keep it accessible to logic, it is quite the same thing. Now parliament membership should be distributed according to the percentages of the popular votes. If we have n seats in the parliament, we have to round the exact percentages (as fractions of long integers) to multiples of \frac{100}{n} in such a way that the sum is correct. There are may established procedures for achieving this and depending on the procedure the result may differ a little bit. Did I mention the word „cheating“ above? No, we do not talk about cheating by deliberately counting wrong, off course nobody ever wanted to do something like that, but about ways to enforce an end result that uses up exactly the given number of parliament seats. Which procedure is considered to be the fairest changes almost every other year a little bit. Interestingly an equivalent issue occurs when distributing legislative seats to region. Even in the United States, where congressmen are elected by getting the plurality of votes in their district, this is applied to deciding how many congressmen will represent each state, in this case with the constraint that each state should have at least one congress seat (plus two senators).

If we do not need to write an election software, the question is often less delicate. For example a text should contain the sum of some rounded values in a reasonable accurate way. The majority of the readers will be happy. The very critical reader who knows anyway what I am writing here, will be happy as well, if it has been done well enough with inevitable cheating in the last digits. It should be observed that this seemingly trivial issues is quite involved, for example see apportionment paradox

As described in residue class rounding, we define sets M and N \subseteq M with a metric d: M\times M \rightarrow {\Bbb R_+}, (m, n) \mapsto d(m,n) \ge 0 and want to deal with maps r : M \rightarrow N having d(x, r(x)) kept small and possibly some other constraints. But since the correction („cheating“) should be part of the mapping, we are here rather looking at a mapping
s: M^n \rightarrow N^n, (m_1,\ldots,m_n) \mapsto (n_1,\ldots n_n) with \sum_{i=1}^n m_i = \sum_{i=1}^n n_i and for all i the value of d(m_i,n_i) is „small“. We could say, that the sum has to be rounded as well, but often it is already rounded. Think about the details of your special case when they occur and read the concepts… Purely rounding up or down will not work unless we have an exact edge case. We could desire to minimize error squares, which has the advantage that larger deviations have a larger effect, encouraging equal distribution of the corrections, for example we want
\sum_{i=1}^n (d(m_i, n_i))^2 minimal. It is possible to find examples that are non-unique. In this case a rule can be defined or random numbers can be used to resolve.
Another approach would be to minimize relative errors, like
\sum_{i=1: m_i \ne 0}^n (\frac{d(m_i,n_i)}{d(m_i,m_i)})^2, skipping summand with m_i=0 because they do not create any issues if we have 0\in N. Usually we do want to round 0 to 0.
A good alternative is to just look at the Apportionment methods for parliaments and the underlying algorithms and adjust them for the required usage:

Often our requirements may be different and especially lower in terms of the quality of the rounding than for parliament apportionment, but it is a good idea to think twice about this prior to just doing something naïve without thinking.

Question: how are the apportionment methods behaving in terms of absolute and relative error square optimization?

If there is serious interest, I might provide an implementation in some real programming language.

Often we encounter problems similar to this that can be deduced to this. Who has ever thought that parliament apportionment is actually an act of rounding?

Links

Here are some links to interesting pages and papers about algorithms and other topics related to this issue.

Share Button

Rundung mit Summe

English

Häufig muss man einige Zahlen auf einmal runden, aber die Summe soll hinterher „aufgehen“. Das kann man „summenerhaltendes Runden“ nennen, wobei unter Umständen auch die Summe gerundet wird. Der häufigste Fall sind Prozentwerte und wenn die gerundeten Zahlen nicht zusammen 100% ergeben, versteht das niemand. Da kann es schon besser sein, etwas schwieriger verständliche Dinge hinter den Kulissen zu tun. Kurz gesagt ist unsere Frage, ob die Rundung ein Endomorphismus der additiven Gruppe ist. Das bedeutet, dass \bigwedge_{x,y\in M} r(x+y)=r(x)+r(y). Auf den resten Blick sieht das vernünftig aus und die Fehler heben sich ja oft irgendwie weg. Aber wir können für eine gängige Rundungsmethode ein Gegenbeispiel finden: x=1.4, y=2.3, z=3.3. Wir haben x+y+z=7. Jetzt wird das gerundet und ergibt r(x)=1 \wedge r(y)=2 \wedge r(z)=3 \bigrightarrow r(x)+r(y)+r(z)=6 \ne 7. Ein Gegenbeispiel reicht und es ist kein Endomorphismus.

Ist das ein ganz neues Konzept? Es wird ja offensichtlich in fast allen Druckerzeugnissen, die irgendwie Prozentwerte so angeben, für die Gesamtsumme „gemogelt“. Aber es stellt sich heraus, dass das auch hochoffiziell alle vier Jahre passiert. Wir haben irgendwelche Ereignisse, die sich „Wahlen“ nennen und da kann man Parteien oder Listen für ein Parlament eine Stimme geben. Das Parlament hat meistens eine vorgegebene Anzahl von Sitzen, wenn wir mal Spezialitäten wie 5%-Sperrklausel und Überhangmandate in der deutschen Wahlpraxis vergessen, um es noch halbwegs für Logik zugänglich zu halten. Man stelle sich nur vor, was passiert, wenn 40 Parteien antreten und je 2.5% erhalten, um zu sehen, dass das System nicht korrekt ist, im Sinne einer korrekten Software. Nun sollte man meinen, die Parlamentssitze werden einfach im Verhältnis der Stimmen aufgeteilt. Wenn man n Parlamentssitze hat, werden die Stimmenanteile in Prozent auf Vielfache von \frac{100}{n} gerundet und zwar so, dass es am Schluss aufgeht. Es gibt viele Verfahren, um das zu bewerkstelligen und je nachdem, welches Verfahren man auswählt, kann bei denselben Stimmenzahlen die Sitzverteilung geringfügig variieren. Habe ich das Wort „Mogeln“ oben erwähnt? Es muss wohl Zufall sein. Welches Verfahren gerade am gerechtesten ist, wechselt also von Jahr zu Jahr immer wieder ein bisschen.

Wenn wir nicht gerade eine Wahlauswertungssoftware schreiben, ist in den meisten Fällen diese Frage weniger aufgeladen. Es soll in einem Text eine Summe von mehreren gerundeten Werten eingehalten werden und die Rundung soll „einigermaßen vernünftig“ erfolgen. Der oberflächliche Leser wird zufrieden sein, der etwas kritischere auch, weil die Summe stimmt und der Leser, der sich richtig Gedanken macht wird es auch verstehen, dass man da mogeln musste und dass man das einigermaßen sinnvoll getan hat. Man beachte, dass diese Thematik einige Überraschungen birgt, z.B. das Alabama-Paradoxon und das Wählerzuwachsparadoxon.

Wie in Restklassenrundung beschrieben, gehen wir von Mengen M und N \subseteq M mit einer Metrik d: M\times M \rightarrow {\Bbb R_+}, (m, n) \mapsto d(m,n) \ge 0 aus und wollen Abbildungen r : M \rightarrow N betrachten, bei denen d(x, r(x)) klein bleibt und eventuell ein paar weitere Bedingungen erfüllt sein sollen. Da aber die Korrektur schon Teil der Abbildung sein soll, interessiert uns hier eigentlich eher so etwas wie eine Abbildung
s: M^n \rightarrow N^n, (m_1,\ldots,m_n) \mapsto (n_1,\ldots n_n) mit \sum_{i=1}^n m_i = \sum_{i=1}^n n_i und für alle i ist d(m_i,n_i) „klein“. Reine Auf- oder Abrundung kann also nicht die Lösung sein, da das nie aufgeht. Wir könnten die Fehlerquadrate minimieren, was den Vorteil hat, das große absolute Abweichungen sich stärker auswirken, so dass eine gleichmäßige Verteilung der Korrektur besser wegkommt, also z.B.
\sum_{i=1}^n (d(m_i, n_i))^2 ist minimal. Es lassen sich Beispiele konstruieren, wo das nicht eindeutig ist. In dem Fall kann man entweder eine Regel festlegen oder mit Zufallszahlen ermitteln welche der passenden Möglichkeiten zum Zuge kommt. Man kann auch versuchen, die relativen Fehler zu minimieren, also so etwas wie
\sum_{i=1: m_i \ne 0}^n (\frac{d(m_i,n_i)}{d(m_i,m_i)})^2, wobei man die Summanden mit m_i=0 weglassen muss, was nicht stört, wenn 0\in N ist, man also sowieso 0 auf 0 runden will.
Eine Alternative ist, sich einfach an den Sitzzuteilungsverfahren für Parlamente zu orientieren und deren Algorithmen sinngemäß für die hier beschriebene Aufgabe anzupassen:

Oft sind die Anforderungen an die „Qualität“ der Rundung niedriger als bei der Zuteilung von Parlamentssitzen, aber es lohnt sich schon, sich hierüber einige Gedanken zu machen statt einfach irgendwie mit einem naiven Ansatz zu runden.

Frage: Wie verhalten sich die oben aufgeführten Sitzzuteilungsverfahren zur Minimierung der absoluten und relativen Fehlerquadrate?

Bei genügendem Interesse kann ich einmal eine Implementierung für so eine Rundung hinzufügen.

Gelegentlich begegnen uns auch Probleme, die dem hier geschilderten sehr ähnlich sind oder die sich darauf zurückführen lassen. Wer hätte gedacht, dass die Zuteilung von Parlamentssitzen eigentlich eine Rundung ist?

Share Button

Random-Access and UTF-8

Deutsch

It is a nice thing to be able to use random access files and to have the possibility to efficiently move to any byte position for reading or writing.

This is even true for text files that have a fixed number of bytes per character, for example exactly one, exactly two or exactly four bytes per character. Maybe we should prefer the term „code point“ here.

Now the new standard for text files is UTF-8. In many languages each character is usually just one byte. But when moving to an arbitrary byte position this may be in the middle of a byte sequence comprising just one character (code point) and not at the beginning of a character. How should that be handled without reading the whole file up to that position?

It is not that bad, because UTF-8 is self synchronizing. It can be seen if a particular byte is the first byte of a byte sequence or a successive one. First bytes start with 0 or 11, successive bytes start with 10. So by moving forward or backward a little bit that can be handled. So going to a rough position is quite possible, when knowledge of the average number of bytes per character for that language is around.
But when we do not want to go to a rough character position or to an almost exact byte position, but to an exact character position, which is usually the requirement that we have, then things get hard.

We either need to read the file from the beginning to be sure or we need to use an indexing structure which helps starting in the middle and only completely reading a small section of the file. This can be done with extra effort as long as data is only appended to the end, but never overwritten in the middle of the file.

But dealing with UTF-8 and random access is much harder than with bytes. Indexing structures need to be maintained in memory when accessing the file or even as part of the file.

Share Button