How to calculate Square Roots and Cubic Roots

The functions sqrt and sometimes even cbrt are commonly available, but it is nice to see how they can be calculated.

There are several approaches, but the most popular ones are Newton’s method and an algorithmic formulation of how roots are taken manually, for those old enough to still have learned it in school. Earlier measurements that I did many years ago showed that the Newton approximation is slower, but it would be worth to do newer measurements.

So we have an equation y = x^2 or y=x^3 and want to find x or a well defined approximation of x when we know y. Mathematically speaking we want to assume that y is constant and we want to find an x for which f(x)=x^2-y=0 or g(x)=x^3-y=0. If we guess such an x and then draw the tangent at the curve of the function at the point (x, f(x)) or (x, g(x)), then the intersection point of the tangent can be used as the next approximation. This method converges in the case of these two functions (and some others) and is reasonably fast. Now the tangent has the linear equation

    \[\frac{y-y_0}{x-x_0}=f'(x_0)\]

where y_0=f(x_0) and f'(x)=\frac{df(x)}{dx} is the derivative of f(x). We want to solve this equation for y=0 and thus we get

    \[x-x_0 = -\frac{f(x_0)}{f'(x_0)}\]

and thus

    \[x=x_0-\frac{f(x_0)}{f'(x_0)}\]

As an iteration rule

    \[\bigwedge_{i=0}^\infty x_{i+1}=x_i-\frac{f(x_i)}{f'(x_i)}\]

In case of the sqare root we can just start with an estimation by shifting half the length to the right, but avoiding zero, which is important because of the division. Then we get for an appropriate n

    \[x_0 = 2^{-n}y\]

    \[x_{i+1}=x_i-\frac{x_i^2-y}{2x_i}=\frac{x_i+y/x_i}{2}\]

The last form is quite intuitive, even without calculus. As I said this converges usefully fast and there is tons of math around to describe the behavior, speed, precision and convergence of the calculations performed in this algorithm. Written in Ruby just for integers, this is quite simple. Convergence is simply discovered by the fact that the result does not change any more, which may fail in some cases, where intermediate results oscillate between two values, but just for the purpose of benchmarking it seems to be sufficient:

def sqrt_newton(x)
  if (x == 0) then
    return 0
  end
  y0 = x
  u0 = x
  while (u0 > 0) do
    y0 >>= 1
    u0 >>= 2
  end
  y0 = [1, y0].max
  yi = y0
  yi_minus_1 = -1
  loop do
    yi_plus_1 = (yi + x/yi) >> 1;
    if (yi_minus_1 == yi_plus_1) then
      return [yi, yi_minus_1].min
    elsif (yi == yi_plus_1) then
      return yi
    end
    yi_minus_1 = yi
    yi = yi_plus_1
  end
end

The newton algorithm tends to oscillate between two approximations, so this termination criteria takes into account y_{i-1}, y_i and y_{i+1} and uses the lower of the two oscillating values. This results in calculating the largest integer y such that y^2\le x and (y+1)^2 > x.

For the third root we get for an appropriate n

    \[x_0 = 2^{-n}y\]

    \[x_{i+1}=x_i-\frac{x_i^3-y}{3x_i^2}=\frac{x_i+y/x_i^2}{3}\]

Again this is a useful way and there is math around on when to stop for a desired precision.
Read Wikipedia for the convergence issues.

There is another approach, that people used to know when doing calculations on paper was more important than today. For the decimal system it works like this:

1. 7 3 2 0 5
----------------------
/ 3.00 00 00 00 00
/\/ 1 = 20*0*1+1^2
-
2 00
1 89 = 20*1*7+7^2
----
11 00
10 29 = 20*17*3+3^2
-----
71 00
69 24 = 20*173*2+2^2
-----
1 76 00
0 = 20*1732*0+0^2
-------
1 76 00 00
1 73 20 25 = 20*17320*5+5^2
----------
2 79 75
(source Wikipedia)
We group the digits to the left and to the right of the decimal point in groups of two. The highest possible square of an integral number that is below or equal to the leftmost group (03 in the example above) is used for the first digit of the result (1 in the example above). This square is subtracted and the next group is appended (200 in the example). Assuming that y_n is the result already calculated and x_n is what we have achieved after the subtraction and the appending of the next group, we search for a digit z_n such that u_n = 20\cdot y_n\cdot z_n + z_n^2 \le x_n. z_n is chosen in such a way that it yields the maximum possible u_n wich is still \le x_n. Subtracting u_n from x_n and appending the next group allows for the next iteration.

Now this can be turned into an algorithm. The first approach is to just switch from decimal system to binary system. Then for each iteration step we have to deal just with the possible values of 0 and 1, which greatly simplifies the algorithm. Here is a simple ruby program that would do this:

def split_to_words(x, word_len)
  bit_pattern = (1 << word_len) - 1   words = []   while (x != 0 || words.length == 0) do     w = x & bit_pattern     x = x >> word_len
    words.unshift(w)
  end
  words
end

def sqrt_bin(x)
  if (x == 0) then
    return 0
  end
  xwords = split_to_words(x, 2)
  xi = xwords[0] - 1
  yi = 1
  1.upto(xwords.length-1) do |i|
    xi = (xi << 2) + xwords[i]     d0 = (yi << 2) + 1     r  = xi - d0     b  = 0     if (r >= 0) then
      b  = 1
      xi = r
    end
    yi = (yi << 1) + b   end   return yi end

It seems that the two solutions yield the same results, but the sqrt_newton outperforms sqrt_bin by a factor of two.

Now we should reconsider, if base 2 is really the best choice. Actually we can use any power of 2 as a base and efficiently work with that. Apart from the initial first step, which is done by using an extended version of sqrt_bin, the next steps are estimated by division and trying neighboring values to get the exact result. This makes use of the fact that the equation we need to solve
u_n = 2\cdot b\cdot y_n\cdot z_n + z_n^2 \le x_n with the maximum z_n fullfilling this equation, where b is the base to which we are working, witch was 10 or 2 above and could now be a power of 2. As soon as y_n\cdot b has a certain size, the influence of z_n^2 becomes less relevant. We can consider the maximum posible value for z_n, which is b-1 and thus solve 2\cdot b\cdot y_n\cdot z_n\le x_n and 2\cdot b\cdot y_n\cdot z_n\le x_n-(b-1)^2, each for the maximum z_n fullfilling the equation. This can be calculated by simple division. If the range between the two solutions is small enough, then each value in the range can be tried to find the actual accurate solution for 2\cdot b\cdot y_n\cdot z_n + z_n^2 \le x_n and this is more efficient than working just bitwise. This method sqrt_word seems to outperform sqrt_newton for longer numbers, for example around 60 decimal digits with word_length=16. So the most promising approach seems to be to optimize the implementation and parameters of sqrt_word. The issue of termination, which has been properly addressed in the newton implementation, is already dealt with in this implementation. For more serious analysis it would be interesting to implement the algorithms in C or even in assembly language. So this is the final result for square roots, with some checks added:

def check_is_nonneg_int(x, name)
  raise TypeError, "#{name}=#{x.inspect} must be Integer" unless (x.kind_of? Integer) && x >= 0
end

def check_word_len(word_len, name="word_len")
  unless ((word_len.kind_of? Integer) && word_len > 0 && word_len <= 1024)
    raise TypeError, "#{name} must be a positive number <= 1024"
  end
end

def split_to_words(x, word_len)
  check_is_nonneg_int(x, "x")
  check_word_len(word_len)
  bit_pattern = (1 << word_len) - 1
  words = []
  while (x != 0 || words.length == 0) do
    w = x & bit_pattern
    x = x >> word_len
    words.unshift(w)
  end
  words
end

def sqrt_bin(x)
  yy = sqrt_bin_with_remainder(x)
  yy[0]
end

def sqrt_bin_with_remainder(x)
  check_is_nonneg_int(x, "x")
  if (x == 0) then
    return [0, 0]
  end

  xwords = split_to_words(x, 2)
  xi = xwords[0] - 1
  yi = 1

  1.upto(xwords.length-1) do |i|
    xi = (xi << 2) + xwords[i]
    d0 = (yi << 2) + 1
    r  = xi - d0
    b  = 0
    if (r >= 0) then
      b  = 1
      xi = r
    end
    yi = (yi << 1) + b
  end
  return [yi, xi]
end

def sqrt_word(x, n = 16)
  check_is_nonneg_int(x, "x")
  check_is_nonneg_int(n, "n")

  n2 = n << 1
  n1 = n+1
  check_word_len(n2, "2*n")
  if (x == 0) then
    return 0
  end

  xwords = split_to_words(x, n2)
  if (xwords.length == 1) then
    return sqrt_bin(xwords[0])
  end

  xi = (xwords[0] << n2) + xwords[1]
  a  = sqrt_bin_with_remainder(xi)
  yi = a[0]
  if (xwords.length <= 2) then
    return yi
  end

  xi = a[1]
  2.upto(xwords.length-1) do |i|
    xi = (xi << n2) + xwords[i]
    d0 = (yi << n1)
    q  = (xi / d0).to_i
    j  = 10
    was_negative = false
    while (true) do
      d = d0 + q
      r = xi - (q * d)
      break if (0 <= r && (r < d || was_negative))
      if (r < 0) then
        was_negative = true
        q = q-1
      else
        q = q+1
      end
      j -= 1
      if (j <= 0) then
        break
      end
    end
    xi = r
    yi = (yi << n) + q
  end
  return yi
end

def sqrt_newton(x)
  check_is_nonneg_int(x, "x")
  if (x == 0) then
    return 0
  end
  y0 = x
  u0 = x
  while (u0 > 0) do
    y0 >>= 1
    u0 >>= 2
  end
  y0 = [1, y0].max
  yi = y0
  yi_minus_1 = -1
  loop do
    yi_plus_1 = (yi + x/yi) >> 1;
    if (yi_minus_1 == yi_plus_1) then
      return [yi, yi_minus_1].min
    elsif (yi == yi_plus_1) then
      return yi
    end
    yi_minus_1 = yi
    yi = yi_plus_1
  end
end

This is the approach that has been built into the LongDecimal library, ignoring Newton. The examples have been added to github.

The algorithms can be extended to cubic roots or any higher roots. In this case, the nth root of \sum_{j=0}^{m}a_j b^{n(m-j)} is calculated by starting with the maximal integral number z_0 with z_0^n \le a_0 and the subsequently finding numbers z_j fullfilling an equation of the form {n}\choose{k}\sum_{k=1}^n (by_j)^{n-k}z_j^k \le x_i. This is always easy to handle for base two, by just testing the two possible solutions. For higher bases and n=3 it involves solving an quadratic equation, once the numbers are high enough to neglect the term z_j^3. For n=4 it is just possible to take the square root of the square root. For higher values of n and bases other than 2 it becomes really difficult to tame this algorithm. So I intend to constrain myself to square roots and cube roots. I have not explored, if it is useful to calculate the cube root with a higher base than 2 and which approach provides the best performance for cube roots. Even the square root calculation can possibly be tuned a bit. Maybe this will be addressed in another article.

Share Button

LongDecimal

Disclaimer: This article is an occasion, where you might need some of the presumably useless mathematics that you might have learned in school and university. If this bothers you, maybe you should wait for the next article in about two weeks time.

LongDecimal is a library that I have provided for Ruby. It is available as a ruby gem. It was originally intended to provide something like BigDecimal for Java. There is a BigDecimal, but it is not really the same. For writing finance applications, such a class is useful, so I wrote one that covers what Java’s BigDecimal has. It ended up by having a lot more, but we will get to that later.

So the general idea is that we do math with a subset of the rational numbers (\mathbb{Q}) \mathbb{D} = \{ \frac{x}{10^n} : x \in \mathbb{Z} \wedge n \in \mathbb{N}_0\}. This is not quite the truth, because the n actually carries information that we care about, so we would actually define

    \[\mathbb{D} = \{ (\frac{x}{10^n}, n) : x \in \mathbb{Z} \wedge n \in \mathbb{N}_0\}.\]

So we actually want to allow the numerator x to be a multiple of 10 and we use this to express the precision as to how many digits after the decimal point are explicitely part of our number. Having more decimal places after the decimal point expresses more precision.

Now we try to use mathematical operations +, - and \cdot on \mathbb{D}. It turns out that we have three different cases. The ring operations can be defined without problems, even though \mathbb{D} is not quite a ring, as we will see. But it is good enough for most purposes.

  • (\frac{x}{10^n}, n) + (\frac{y}{10^m}, m) = (\frac{x}{10^n} + \frac{y}{10^m}, \max(n, m))
  • (\frac{x}{10^n}, n) - (\frac{y}{10^m}, m) = (\frac{x}{10^n} - \frac{y}{10^m}, \max(n, m))
  • (\frac{x}{10^n}, n) \cdot (\frac{y}{10^m}, m) = (\frac{xy}{10^{n+m}}, m+n)

Addition and Subtraction actually lose information if n\ne m, because we might have an input with lower precision and in the end pretend to have a result of the higher precision. But not losing numerical information is considered more important and implicit rounding should be avoided at all costs, at least for the basic operations.

\mathbb{D} is not a ring, but it is a Semiring. The zero is not universally unique, but we seem to have many zeros (0, n). This is not the problem, because only (0, 0) would act as an additive neutral element. What we lack are additive inverse elements. If we have an element (x, n) with n>0, there is no element (y, m), such that (x,n)+(y,m)=(x+y, \max(n,m) = (0, 0). The distributivity, required for a semiring, can be seen easily:

  • ((\frac{x}{10^n}, n) + (\frac{y}{10^m}, m))\cdot (\frac{z}{10^l}, l) = ((\frac{x}{10^n}+\frac{y}{10^m})\cdot\frac{z}{10^l}, l+\max(m,n)
  • (\frac{x}{10^n}, n)\cdot (\frac{z}{10^l}, l) + (\frac{y}{10^m}, m)\cdot (\frac{z}{10^l}, l) = (\frac{x}{10^n}\cdot\frac{z}{10^l}+\frac{y}{10^m}\cdot\frac{z}{10^l}, \max(l+m,l+n)

But since we do computer programming and not math and only use math as a tool to help us, it is kind of OK, that it is only a semiring and not a ring, as long as we know it.

Division is a special case, because it is not always possible to express the exact numerical value of the quotient in \mathbb{D}, for example 3.0/7.0 = \frac{3}{7}, where the denominator is not a power of ten. To do such operations, a rule on how to round needs to be provided. This is cumbersome, because it blows up our formulas, so we define a set \mathbb{E}=\{(r, n) : r \in\mathbb{Q} \wedge n \in\mathbb{N}_0\}. Now the quotient of two elements of \mathbb{D} is a member of \mathbb{E}. And we have the rules

  • (\frac{x}{10^n}, n) / (\frac{y}{10^m}, m) = (\frac{x}{10^n} / \frac{y}{10^m}, p(n, m))
  • (r, n) + (s, m) = (r+s, \max(n, m))
  • (r, n) - (s, m) = (r-s, \max(n, m))
  • (r, n) \cdot (s, m) = (rs, n+m)
  • (r, n) / (s, m) = (\frac{r}{s}, q(n,m))

where p and q somehow try to estimate how precise the result of the division might be. The basic idea is to do the whole calculation that includes the division and round the result to the desired number of decimal places after the point and with the rounding mode desired.

Now the power is a hard one. Arbitrary powers can of course be defined and are supported, but most of the time, the exponent is actually an integer. These cases can be defined nicely. For exponents m\ge 0 we actually get a result in \mathbb{D} and for negative exponents m < 0 we get results in \mathbb{E}:

  • \bigwedge_{n\ge 0}:(\frac{x}{10^n}, n) ^m = \frac{x^m}{10^{mn}}, mn)
  • \bigwedge_{n < 0}:(\frac{x}{10^n}, n) ^m = \frac{x^m}{10^{mn}}, mn)

For non-integral exponents, the calculation of powers falls back to Ruby’s built in power and transforms elements of {\mathbb{D} and \mathbb{E} involved into rational numbers. These are of limited use, but they are provided and work and can be used, when needed. There is a more general power function, that has additional parameters for the desired rounding and number of digits after the decimal point. While this library goes long ways to achieve decent accuracy and speed, there are certainly possible input parameters that will result in extremely long calculation times or results that are much less accurate than claimed. Such examples are „hard“ to find and should not harm the practical usefulness of the library too much. Similar libraries in the Java world like BigDecimal do not even try to calculate powers with arbitrary exponents and the Ruby builtin library BigDecimal (which is something slightly different) does have its issues when calculating arbitrary powers.

Rounding functions are there to convert a numerical type that is at least viewable as a subset of \mathbb{R} to \mathbb{D}. The actual rounding has to be implemented, but it has been done for \mathbb{D}, \mathbb{E} and the built in types of Ruby except for Complex (\mathbb{C}). For complex numbers, the real and the imaginary part are rounded and stuffed into a new complex number.

Rounding needs two pieces of information, the desired precision (number of decimal places after the decimal point) and the rounding mode. There are different methods for rounding, but they all follow the same basic rules. A special case is the round_to_allowed_remainders, which does a residue class rounding.

There are many rounding modes. Rounding can be towards 0, away from 0, towards infinity or towards negative infinity. This boils down to cutting off all digits but n (or adding zeros) and possibly adjusting the result by one, if the cut off part contained anything but zeros. Other rounding modes take a mean between the two adjacent result candidates and decide by that which one to take, requiring an extra rule for the case that the value that needs to be rounded happens to be exactly on the border.

Generalized powers and all functions that return something irrational like square roots, cubic roots, exponential functions, logarithms and in the future also trigonometric functions needs to be calculated with the number of digits required and a rounding mode. Currently square roots (sqrt) and cube roots (cbrt) are calculated accurately according to these rounding parameters. For the transcendential functions (logarithms, exponential functions, power, trigonometric functions) minor deviations from the mathematically accurate result are still possible. Since the major usage of the library is expected to deal with the basic operations only, this is considered acceptable. To really work with the transcendental functions, using interval arithmetic in conjunction with long decimal would anyway be a better way, so the necessary guarantee to be given would be to provide a result that is close, but guaranteed to be lower or equal than the real mathematical result and one that is guaranteed to be greater or equal. Progress in this area is not going to happen very soon, unless someone would be volunteering to help with this or someone would be volunteering to sponsor the development.

Also it might be interesting to port this library to other languages, even to Java, because it has become much more sophisticated than Java’s BigDecimal library. Again this is unlikely to happen too soon without any help.

The current priority is to keep this library working with recent Ruby versions and to add the missing trigonometric functions.

Use it as follows:
gem install long-decimal
to install it. Then use it in your code with:
require "long-decimal"

A remark for people who are mathematically inclined: The definition of the natural numbers \mathbb{N} is not totally universal. Sometimes we have \mathbb{N} = \{0, 1, 2, 3, 4,\ldots\} and sometimes we have \mathbb{N} = \{1, 2, 3, 4,\ldots\}. To avoid this, I am using \mathbb{N}_0 = \{0, 1, 2, 3, 4,\ldots\}, even though the index _0 is kind of ugly. I agree with Dijkstra that we should prefer to include the 0 in the natural numbers.
Another remark for mathematically interested readers: If we were defining \mathbb{D}=\{ \frac{x}{10^n} : x \in \mathbb{Z} \wedge n \in \mathbb{N}_0\}, we would actually have a ring. If we now replaced 10 with a prime number p, we would approach the realm of p-adic numbers (\mathbb{Q}_p). This is well worth supporting by a library as well, but it is quite a different story and of course only of interest to a small group who actually knows p-adic numbers and works with them.

Links

This blog:

Share Button

TruffleRuby

The language Ruby is one of the most beautiful languages. A lot of things can be done, it has a good level of abstraction, it has chosen some very good defaults, has provided some great ideas that I have not discovered in any other language that I know well and provides a lot of flexibility. But I could no longer recommend it for projects that might require a good performance. I won’t go into the issue of static typing vs. dynamic here. Ruby is following the dynamic typing path and if you think that is a bad idea at all, then it will never become your favorite. But this is an issue with pros and cons. The big disadvantage of Ruby is that it is not very good in terms of performance. The single threaded performance is somewhat better in many reasonable languages like Java, Python, Scala, Clojure, C, C#, F# and some others .. And it gets worse when we want to use multiple cores, because Ruby does not run them simultaneously, but uses a global lock which ensures that only one thread at a time is running. Or in case of JRuby just crashes or yields wrong results in certain mulithreaded programs that we could write.

One approach is to go for immutability as a default, which allows quite painless multithreading. Scala and Clojure follow this route, for example. It is hard to write good code with this constraint or to make good use of very local mutability without leaking it outside, but under these conditions multiple threads are just working fine without deadlocks, crashes or falsified results. Another approach is to just copy structures and leave its own copy to each thread. There are ways to do a lot on this path, but the copying costs a lot of memory and performance and it is not always a gain.

Now Ruby heavily relies on mutable structures for strings and collections. It is not reasonable to go for a total paradigm change in this aspect. But there are some ways to get good and safe and fast operations on these collections and strings without breaking this. One idea is to work with chunks of collections or strings. For strings, the string that we are working with is described as a concatenation of such strings. Many operations can be made by just concatenating multiple strings together and possibly replacing one of them with a copy that can be made as needed. This is called Rope. A similar approach can be applied to collections. Then a smart locking mechanism is applied to the shorter string or collections when needed, but many operations can avoid locking or block much less of the structure.

Also the compiler can analyze the program and simplify it to a great extent, compile it to the JVM, which in conjunction with hot spot optimizations will make it run really fast. Now this TruffleRuby is much faster than other Rubies, by a factor of about 10. It uses GraalVM and it actually supports a lot of C-extensions for libraries through the feature of GraalVM that they can be eventually compiled to the JVM. It does not work if extensions rely on implementation details of the Ruby structures in C and it often does not work for C-extensions that go to low level OS functionality. The current version of TruffleRuby is not really ready to use in conjunction with Ruby on Rails, which is kind of a no go, because Ruby is usually used in conjunction with Rails. My impression is that it will be possible to use it with Rails in a year or two.

Hearing of this in a talk by Benoit Daloze in the local Rails user group in Zürich was a great and positive surprise. Ruby gets interesting again.

Links

Share Button