PI-Day

When using an arguable stupid American date format, today’s date looks like „03/14/15“ instead of 2015-03-14, which are the first few digits of \pi.

Now we all know how to calculate \pi, we just use the Taylor series of \arcsin or \arctan, which unfortunately does not converge too well when used naïvely, but there are tricks to make it converge by calculating \arctan(x) or \arcsin(x) of some x with 0 < x < \frac{1}{2} resulting in a fraction of \pi that can easily be multiplied to get \pi from it. This approach is quite ok and understandable with relatively basic mathematical background, when allowing for some inaccuracy and intuition when trying to understand the Taylor theorem. But we can actually do better. \pi is somehow weird for being so irrational and even non-algebraic and also in some other aspects, when investigating it.

So here are some cool approaches to calculating \pi that are efficient and beatiful, but require a little bit more advanced math or trust to the ones who provided it (which should be ok for most people, but a no-go for mathematicians):

AGM

Use the arithmetic-geometric mean, also mentioned shortly in Geometric and Harmonic Rounding:

The arithmetic-geometric mean is included in long-decimal, and long-decimal uses the AGM-method for calculating pi.

Elliptic Integrals

Elliptic Integrals are somewhat weird, because they are easy to write down, but not really easy to solve. But numerical approximations to their calculation can also help finding about the value of \pi to some desired precision. Since elliptic functions, elliptic curves and elliptic integrals have fascinated mathematicians )and at least in the case of elliptic curves over finite fields also computer scientists) a lot, this must be cool.

Other

If you just want to calculation 10000 digits on your computer (should be Linux or something with Ruby preinstalled), just type:

gem install long-decimal
ruby -e 'require "long-decimal";puts(LongMath.pi(10000))'

That should give you some bed time lecture for the \pi-day. Enjoy it.
And please, do not memorize \pi to 10000 digits. It is possible, but there is stuff more worth while memorizing.

And please, do use a decent date format, even if you are American.

Share Button

Why I still like Ruby

Some years ago Ruby in conjunction with rails was an absolute hype. In the Rails User Group in Zürich we had meetups with 30 people every two weeks. The meetings every two weeks have been retained, but often we are just five to ten people now. Is the great time of Ruby over or is it just a temporary decline? Is Perl 6 now taking over, since it is ready?

Perl 6 is cool and could challenge Ruby and it is now getting more and more towards production readiness. This will happen next year. This is what we will say next year. But Perl 6 is or will be very cool, stay tuned… What about Perl 5? At least Allison Randall still likes Perl (5). But it has other areas of strength than Ruby, so I will leave my opinionated writing focused on Ruby.

But the hype is right now in the area of JavaScript.

There are approaches to use server side JavaScript, which are interesting, because JavaScript needs to be dealt with on the client anyway and then it is tempting to have the same language on both sides.

There are also approaches to move back from the server to the client and put more functionality into the client and make the server less relevant, at least the percentage of development effort of the client gets more and of the server gets less.

There are cool frameworks for the client development in JavaScript and in conjunction with HTML5 and these frameworks a lot can be achieved.

On the other hand we are exploring NoSQL-databases like MongoDB and MongoDB is using JavaScript with some extension library as query language.

The general hype is toward functional programming and voila, JavaScript is functional, it is the most functional of the languages that have been established for a long time and brought the functional paradigm to every developer and even to the more sophisticated web designer, everything under the radar and long time before we knew how cool functional programming is. On the other hand, Ruby (and by the way Perl and especially Perl6) allow programming according to the functional paradigm as well, if used appropriately. But in JavaScript it is slightly more natural.

Then we have more contenders. Java is not dying so soon and it is still strong on its web frameworks, JavaServerFaces and one million others, not so bad actually.

And the idea of rails or at least its name, has been taken over by the groovy guys to develop groovy on grails, which integrates nicely with a Java enterprise backend.

And the more functional languages like F#, Scala, Haskell, Erlang, Elixir (and some others for the guys that don’t find Haskell theoretical enough) are around and gaining or retaining some popularity. Will Scala support the Erlang-VM as alternative backend, like JavaScript and JVM are supported today (and dotnet-CLI in the past)? And Scala has a promising web framework with play, that does WebServices right, which is what is actually needed for the modern client side JavaScript frameworks, just as an example.

PHP was always the ugly baby, why did they implement a second Perl based on the subset of Perl they understood? Well, I am not advocating PHP at all, but there are some pretty decent PHP-applications that are working really well with high load, like Wikipedia.

The unequal twin of Ruby, Python, is doing pretty well as well on attempting to get a web framework called Django. I have not investigated it at all, but it seems to exist.

So, where is ruby and rails? Ruby is a beautiful language, that has done many things better than any of the languages mentioned here:

It is easy to learn.

It has a nice and complete and consequent concept for object orientation.

It allows all the functional programming, most of it pretty natural.

It has decent numerical types by default. Integer grow to arbitrary length, Rationals are included and Complex numbers are included. JavaScript does not even have integers, it just has one numerical type: double precision floating point.

It allows extending, even operator overloading for new numerical types.

And it has a useful and easily accessible web framework like rails and actually some contender as alternative web frameworks, like camping, sinatra and some others.

And the development effort and the amount of code needed for a certain functionality is still lower than in Java or C# or COBOL.

I do believe that the web applications that are mostly server based and are using just HTML or minimal JavaScript still have their place and will be discovered again as useful for many application areas.

To conclude, I think that Ruby did not make it to become the replacement for Java, that some saw in it and that was due according to the history of replacement of other mainstream languages that have moved to legacy in the past. But I do believe that it will play an important role for some years to come and beyond the current JavaScript hype. Off course it will be challenged again in the future and maybe something will replace the main stream and will make Ruby on Rails just another legacy area for web applications. But I don’t even have an idea what that will be. I don’t think JavaScript. Perl6, Scala, F# or Clojure do have technical potential to do so, but I do not see that happening in the near future. Maybe something new will come up? Or something old? Just remember, Ruby is about as old as Java.

We will see.

Read more in this discussion on Quora.

For those who are interested, I actually teach Ruby and it is possible to arrange trainings for two to five days, depending on the previous knowledge of the participants and the goals.

Share Button

Indexing of long utf-8 files or strings

The UTF-8 format has the disadvantage that the length of characters and code points varies. Accessing a given position counted in characters is only possible by starting from the beginning or by providing an indexing structure. It is a good idea to find a balance between size and speed. So indexing blocks of several kilobyte size and scanning through them to reach the exact desired position should achieve both goals. The ideas described here can also be applied for compressing a long string or file by using the compression for these multi-kilobyte-blocks separately. In a way utf-8 is a compression scheme for allowing to store many of the four byte long code points in one or two or three bytes. Let us just assume we have characters and bytes. It could be arbitrary length numbers and bytes. And let us talk about compressed and uncompressed blocks.

Grouping and indexing can be done by having approximately a certain number of bytes in the compressed block, probably the largest number of characters that fit into the block. Now blocks can start at multiples of their maximum size and for each block we have meta information about its position in the chain of blocks, its uncompressed size and the position of its first entry in a totally uncompressed file or string. If the indexing structure is kept in memory only, this is quite easy to achieve. If it is stored in the file, some approach might be chosen: Add an additional indexing file to each „compressed“ text file, add the meta information in the beginning of the file, thus limiting the number of blocks at create time, or add the meta information to the end of the file. Or in some file systems add it into a second stream, which seems to be the cleanest approach, but only applicable for some file systems that are not main stream in the Unix- and Linux-world.

The other way around the uncompressed data can be grouped into blocks and then stored in a dense way. Then the meta information needs to contain the address of the first byte of the block. Both approaches seem to be reasonable.

It should be observed that random access reading is possible with this approach, but random access writing is doomed to fail, so it cannot be easily supported, unless the blocks can be stored anywhere and the meta information is used to contain all the information, current block size in bytes, current block size in characters, current location of first byte, position of block in the chain, number of the first character. When inserting characters, all metainformation needs to be updated and the block in which the change happens needs to be rewritten completely, but there could be ways to achieve this.

The question arises if this has been standardized in some way. I have not heard about such a standard, but I am observing that dealing with UTF-8 is always a mess or that this issue is just neglected. Databases have off course answered these kind of questions in their way quite well, but I think there could be a useful standard for compressed random access text files, maybe even supported by the libc or by the Posix standard.

If a typical compression is applied, it is possible to have a common dictionary for frequently occuring blocks and their encoding in one place of the file and just store the encoded data in the actual blocks, avoiding duplication. That would require a third area for storing data.

Share Button

Geometric and Harmonic Rounding

Most of our rounding methods are based on arithmetic rounding.
Please refer to the nomenclature described in Residue Class Rounding. So we can base our rounding on a metric and the most obvious choice is the absolute value of the difference of the two parameters, d(x,y)=|x-y|. That is what everybody does and nothing else really comes to our mind when we are talking about rounding. But this approach is based on the additive structure of our field or ring containing the numbers that we want to round. Another way of looking at this is that (in case we are talking of real numbers or some subset thereof) the choice of the rounded value is mostly based on seeing on which side of a boundary between the two candidates for the rounded values the original value lies, with special handling if it is exactly equal to this boundary. Now the this boundary is the arithmetic mean between the two candidates, therefore we call this arithmetic rounding. Just to give you an example:

We look at the „middle“ between two allowed numbers, for example usually something like 2.5 when rounding a number between 2 and 3 to integers and automatically assume that above that middle we round up, below we round down and exactly on the boundary we need additional rules. This middle is the arithmetic mean of the two neighboring allowed numbers after rounding and hence we can call this arithmetic rounding. All numbers x with 2 \le x < 2.5 are rounded to two, all numbers x with 2.5 < x \le 3 are rounded to three.

Means

We assume M=\Bbb R and N=\Bbb Z, so each real number should be rounded to an integral number. Also we assume d(x,y)=|x-y|.

Now assume we have a non-integral number x \in \Bbb R and an integral number n\in \Bbb Z such n< x< n+1. For convenience we write n'=n+1. Then we round x to n if x < n+0.5 and to n' if x > n+0.5, so n+0.5 = \frac{n+n'}{2} serves as boundary. Now there are many other ways to define a mean between two number n and n'. Some of the most common are

We could also define a arithmetic-harmonic mean in a similar way, but that is just the geometric mean. It is an interesting way to approach the geometric mean of matrices, which is somewhat hard to define otherwise because of the non-commutative multiplication.

For the users of spreadsheets, like included in LibreOffice, OpenOffice or MS-Office, you can find the arithmetic mean as function average, the geometric mean as geomean and the harmonic mean as harmean.
Others means can be defined along these lines. And we can see that:

  • a(n,n')=m_1(n,n')
  • g(n,n')=m_0(n,n')
  • h(n,n')=m_{-1}(n,n')
  • q(n,n')=m_2(n,n')
  • c(n,n')=m_3(n,n')
  • \min(n,n')=m_{-\infty}(n,n')
  • \max(n,n')=m_\infty(n,n')

Here is a visual explanation of arithmetic, geometric and harmonic mean:

Trapez_Harmonicus

The line x from K to L has the harmonic mean of a and c as a length. The geometric mean would be reached by moving that line to a position that the upper and lower part have the same shape, apart from scaling. And the arithmetic mean would be reached by moving K and L to the middle points of the lines AD and BC, respectively.

The ruby-gem long-decimal supports all of these methods for calculating a mean.

Using means to define rounding methods

Just to give you an idea that you can extend this further, the geometric and harmonic mean are now considered.
As can be seen negative and zero values for n and n' are causing some trouble or require some care. Actually we can deal with this by splitting our unrounded and rounded worlds into separate parts, here we take the number < 0, the \{0\} and the numbers > 0 as three different sets and do the rounding within them:

    \[M=M_1\cup M_2\cup M_3\]

with

    \[\bigwedge_{i\ne j}M_i\cap M_j = \emptyset\]

where we define

    \[M_1=\{m\in M : m < 0\}\]

and

    \[M_2=\{0\}\]

and

    \[M_3=\{m\in M : m > 0\}.\]

In a similar way we define N_1, N_2 and N_3 and define metrics d_1, d_2, d_3 separately. Then we just look at one partition i and consider rounding to be a mapping

    \[r_i: M_i\rightarrow N_i, x\mapsto r_i(x),\]

such that d_i(x, r_i(x)) is minimal.

The obvious assumption of rounding 0 to 0 releaves us from dealing with logarithms of zero or division by zero. We would see that anyway any number between 0 and 1 would be rounded to 1 for both harmonic and geometric rounding, because the boundary is 0, if we pick the right variant of the formula for calculating the mean. Also between -1 and 0 everything is rounded to -1 in our example. Otherwise the calculation of the boundary can be made to work, because then both factors are negative, so we draw the square root of a positive product, but then we need to replace the square root by its negative counterpart.

So for geometric rounding we define that x is rounded to n if x < g(n,n'), rounded to n' if x > g(n,n') and rounding for x=g(n,n') has to be defined by additional rules. But we define g(n,n') for negative values of n and n' to be -\sqrt{nn'}. This can also be expressed in terms of a metric:

    \[d_1(x,y)=|\log(-x)-\log(-y)|\;\; (x, y < 0),\]

    \[d_2(x,y)=0 \;\;(x=y=0)\]

and

    \[d_3(x,y)=|log(x)-log(y)| \;\; (x,y > 0).\]

Harmonic rounding is definable in a similar way:
We define that x is rounded to n if x < h(n,n'), rounded to n' if x > h(n,n') and rounding for x=h(n,n') has to be defined by additional rules. This can also be expressed in terms of a metric:

    \[d_1(x,y)=|\frac{1}{x}-\frac{1}{y}|\;\; (x, y < 0),\]

    \[d_2(x,y)=0 (x=y=0)\]

and

    \[d_3(x,y)=|\frac{1}{x}-\frac{1}{y}| (x,y > 0).\]

Quadratic and Cubic rounding can be defined in a similar way. It would even be possible to go for arithmetic-geometric rounding or harmonic-geometric rounding, but that would imply having to do an approximative iterative calculation of the boundary each time it is needed, while the rounding modes described above can still be achieved with relatively simple multiplication and addition operations.

In any case it is necessary to define the rule how to round numbers that lie exactly on the boundary, unless we can prove that the boundary is irrational.

That is the case for geometric, quadratic and cubic rounding, because if we have adjacent integers n and n+1 then we see that g(n, n+1)=\sqrt{n(n+1) is irrational for n>0 because n and n+1 cannot both be squares, so there is one prime number that divides either n or n+1 with an odd multiplicity, which already ensures that the square root is irrational. For the quadratic mean we have q(n,n+1)=\sqrt{\frac{n^2+(n+1)^2}{2}}, where the numerator of the fraction is odd, so two occurs with an odd multiplicity, which makes it irrational again. The similar argument applies for the cubic mean. For the harmonic mean we have h(n, n+1)=\frac{2n(n+1)}{2n+1}, where 2n+1 can be a power of 5, in which case we have to deal with the boundary issue.

The ruby-gem long-decimal supports geometric, harmonic, quadratic and cubic rounding in addition to the regular arithmetic rounding, that is referred to as „half“.

Harmonic and geometric rounding are important for some of the mechanisms of rounding with sum.
Based on this basis these algorithms will look more natural and no longer like some weird approach that needs to be copied from some untrusted source just to get over the nightmare of having to solve that issue.

Share Button

The language issue

Since I did not get a lot of feedback, possibly because wordpress denied to accept the vote, I can now see that continuing the blog in English or German seems to be equally popular or unpopular. I have provided the articles of almost three months almost completely in both languages, but now I intend to concentrate on the English version and provide a German translation for some articles, based on how much time I have and how much necessity I see for this and how many readers the English article gets.
If I do get a lot of votes, I will reconsider this decision.

Share Button

Comment Functionality

The comment function does not seem to work after some software upgrades.
It is necessary to check write permissions on the directory /wp-content/plugins/si-captcha-for-wordpress/captcha/cache, then it will work again.
So you can write comments again, I will only reject spam.

Share Button

Kommentarfunktion

Die Kommentarfunktion geht bei jedem Upgrade von WordPress verloren.
Damit sie funktioniert, werden Schreibrechte im Verzeichnis
/wp-content/plugins/si-captcha-for-wordpress/captcha/cache
benötigt. Die muss man nach jedem Softwareupdate überprüfen.
Jetzt sollte die Kommentarfunktion wieder gehen.

Share Button

Monads in Ruby

Monads? Easy!

A Monad over X simply is a Monoid in the Category of Endofunctors of X.

Monads? Really!

  • Monads are a concept in mathematics
  • Algebra is an area of mathematics
  • Category theory is an abstraction of algebra.
  • Monads are defined in category theory
  • Remember: in category theory we talk about infinity beyond cardinality of infinite sets…
  • The „idea“ has been transplanted into something used in programming languages…
  • Just think of stars in terms of astronomy, in terms of decoration and in terms of music or movies

Monad as Design Pattern

  • Actually Monads should be viewed as a design pattern
  • That describes the level of abstraction

Motivation

  • Pure functional programmers are poor guys when it comes to state
  • State is evil
  • State is not possible or hard
  • I/O is state, actually a segment of the outer world is manipulated
  • Monads encapsulate that
  • Even if you are happy with state (poor IT theology…):
    Multilevel nil problem 🙁

Monads in Functional Programming

  • Container type
  • wrap(foo):
    • class method
    • can also be called unit
    • Btw. arrogant Haskell guys call it return
  • pass(&block)
    • instance method
    • Can also be called bind
  • Optionally more operations (mjoin, empty, +,…)

Identity Monad

class Identity
    def initialize(value)
      @value = value
    end
end

def Identity.wrap(value)
  new(value)
end

class Identity
    def pass
        yield @value
    end
end

Axioms

  • Left-identity for pass:
    Calling pass on a newly wrapped value is the same as applying the block to that value
  • Right-identity for pass:
    Calling pass with a block that only calls wrap on its parameter results in the same as its target object
  • Nesting:
    Nesting pass blocks should be equivalent to calling them in sequence.

Does our Monad survive 1st law?

Identity.wrap(foo).pass do |value|
  f(value)
end

Is equivalent to

f(foo)

That is how we defined pass.

Does our Monad survive 2nd law?

bar.pass do |value|
  Identity.wrap(value)
end

Is equivalent to

bar

That is how we defined pass.

Does our Monad survive 3rd law?

bar.pass do |value_a|
    f(value_a)
end.pass do |value_b|
    g(value_b)
end

Is equivalent to

bar.pass do |value_a|
  f(value_a).pass do |value_b|
    g(valueb)
  end
end

That is how we defined pass.

Array as Monad

def Array.wrap(value)
    [ value ]
end

class Array
    def mjoin
        inject([]) { |combined, arr| combined + arr }
    end

    def pass(&block)
        map(&block).mjoin
    end
end

Does it work?

Have a look yourself…

More Monad Operations

Empty:

def Array::empty
    []
end

+:

Already there

Other Examples

  • Option (Some, None)

Links & Sources

This article is based on a speach given in the Ruby on Rails user group Switzerland.

Share Button

Swiss Perl Workshop 2015 in Olten

Please reserve the date in your calendar:
The Swiss Perl Workshop will take place on the 28th and 29th of August 2015 in Olten. The location can easily be reached by public transport, being only 5 min walking distance from the railroad station with 500 trains per day.
So if you are using the Perl programming language as part of your work or for private interest, this event is for you. More information will follow as soon as the web page is online.

Share Button

Rounding with sum

Deutsch

Often we have to round some numbers in such a way that the sum is correct afterwards. The most obvious case is maybe with percentage values, where a sum other than 100% is confusing for readers. But we may even have cases where a calculation becomes incorrect if the some constraint is violated. Anyway it can be useful to do some more advanced stuff behind the scenes to fulfill such a constraint. Now we could hope for the rounding operation to be an Endomorphism of the additive group, that means \bigwedge_{x,y\in M} r(x+y)=r(x)+r(y). At first glance it makes sense, error often compensate each other to make this true, but we can find counter examples for our usual rounding methods. Just think of the numbers x=1.4, y=2.3, z=3.3. We have x+y+z=7. Now we us a typical rounding method and we get r(x)=1 \wedge r(y)=2 \wedge r(z)=3 \Rightarrow r(x)+r(y)+r(z)=6 \ne 7. So it is no endomorphism, one counterexample is enough.

Is this any kind of new concept? Obviously we find printed media or web pages that are showing percentage values in such a way that they cheated for the partial values to have the anticipated sum. But it shows that such things happen officially every four years. We have events that are called „elections“ and parties or lists can be voted for. The parliament usually has a fixed number of members and if we disregard specialties that make the whole game more complex in some countries, in order to keep it accessible to logic, it is quite the same thing. Now parliament membership should be distributed according to the percentages of the popular votes. If we have n seats in the parliament, we have to round the exact percentages (as fractions of long integers) to multiples of \frac{100}{n} in such a way that the sum is correct. There are may established procedures for achieving this and depending on the procedure the result may differ a little bit. Did I mention the word „cheating“ above? No, we do not talk about cheating by deliberately counting wrong, off course nobody ever wanted to do something like that, but about ways to enforce an end result that uses up exactly the given number of parliament seats. Which procedure is considered to be the fairest changes almost every other year a little bit. Interestingly an equivalent issue occurs when distributing legislative seats to region. Even in the United States, where congressmen are elected by getting the plurality of votes in their district, this is applied to deciding how many congressmen will represent each state, in this case with the constraint that each state should have at least one congress seat (plus two senators).

If we do not need to write an election software, the question is often less delicate. For example a text should contain the sum of some rounded values in a reasonable accurate way. The majority of the readers will be happy. The very critical reader who knows anyway what I am writing here, will be happy as well, if it has been done well enough with inevitable cheating in the last digits. It should be observed that this seemingly trivial issues is quite involved, for example see apportionment paradox

As described in residue class rounding, we define sets M and N \subseteq M with a metric d: M\times M \rightarrow {\Bbb R_+}, (m, n) \mapsto d(m,n) \ge 0 and want to deal with maps r : M \rightarrow N having d(x, r(x)) kept small and possibly some other constraints. But since the correction („cheating“) should be part of the mapping, we are here rather looking at a mapping
s: M^n \rightarrow N^n, (m_1,\ldots,m_n) \mapsto (n_1,\ldots n_n) with \sum_{i=1}^n m_i = \sum_{i=1}^n n_i and for all i the value of d(m_i,n_i) is „small“. We could say, that the sum has to be rounded as well, but often it is already rounded. Think about the details of your special case when they occur and read the concepts… Purely rounding up or down will not work unless we have an exact edge case. We could desire to minimize error squares, which has the advantage that larger deviations have a larger effect, encouraging equal distribution of the corrections, for example we want
\sum_{i=1}^n (d(m_i, n_i))^2 minimal. It is possible to find examples that are non-unique. In this case a rule can be defined or random numbers can be used to resolve.
Another approach would be to minimize relative errors, like
\sum_{i=1: m_i \ne 0}^n (\frac{d(m_i,n_i)}{d(m_i,m_i)})^2, skipping summand with m_i=0 because they do not create any issues if we have 0\in N. Usually we do want to round 0 to 0.
A good alternative is to just look at the Apportionment methods for parliaments and the underlying algorithms and adjust them for the required usage:

Often our requirements may be different and especially lower in terms of the quality of the rounding than for parliament apportionment, but it is a good idea to think twice about this prior to just doing something naïve without thinking.

Question: how are the apportionment methods behaving in terms of absolute and relative error square optimization?

If there is serious interest, I might provide an implementation in some real programming language.

Often we encounter problems similar to this that can be deduced to this. Who has ever thought that parliament apportionment is actually an act of rounding?

Links

Here are some links to interesting pages and papers about algorithms and other topics related to this issue.

Share Button