Laziness

In conservative circles, laziness has a slightly negative connotation, but in IT it should be seen positive, if we understand it right. If we as humans get our job done in a more efficient way, by working less, this laziness is a good thing. So let’s be lazy by becoming more efficient. For now, just remember, that laziness is something good.

Laziness in software refers to performing a certain operation only when needed. Imagine we store a value that is the result of some calculation in a data structure. We can only have read access to this value, speaking in the Java-world it is a „getter“. Under certain circumstances it is functionally equivalent to store the function that calculates the value instead and to call it whenever the getter is called. Usually this is combined with the memoize pattern, so the value is retained, once it has been calculated. This whole idea breaks down if the function to calculate the value has side effects, is influenced by side effects or generally does not yield the same result on the same inputs whenever it is called. Obviously for implementing such a lazy value the parameters for the function have to be retained as well, either by storing them in the structure as well or by wrapping the function and the parameters into a parameter-less function that includes them. Obviously these parameters have to be immutable for this approach to work.

Now the question is, what do we gain from this kind of laziness?
At first glance we gain nothing, because the calculation has to be done anyway and we are just procrastinating it, so the eventual calculation of whatever we are doing will be expensive. Actually this might not be true. It is quite possible, that the value is never needed, so it would be wasteful to calculate it in advance.

Actually many of us have seen both approaches in text books for Java or C++, when the singleton pattern is explained. The typical implementation looks like this:

public class S { private static S instance = null;


    public static synchronized S getInstance() {

        if (instance == null) {

            instance = new S();

        }

        return instance;

    }
    public S() {

        // ...

    }

//... }

This „synchronized“ is a bit of a pain, but necessary to ensure that the instance is created only once. Smart people came up with the „double-check-pattern“, so it looked like this:
public class S { private static S instance = null;


    public static S getInstance() {

        if (instance == null) {

            synchronized (S.class) {

                if (instance == null) {

                    instance = new S();

                }

            }

        }

        return instance;

    }
    public S() {

        // ...

    }

//... }

It is kind of cute, because it takes into account that after the first check and before acquiring the synchronized-lock some other thread already initializes it. It is supposed to work with newer Java versions, but for sure going to fail in older versions. This is due to the memory management model. Anyway it is kind of scary, because we observe that something that really looks correct won’t work anyway. Or even worse, it will work fine during testing and miserably fail on production when we don’t expect it. But it would be so easy with eager initialization:
public class S { private static S instance = new S();


    public static S getInstance() {

        return instance;

    }
    public S() {

        // ...

    }

//... }
Or use even an enum to implement the singleton.

Other language like Scala support this kind of laziness in a more natural way.
case class S(s : String) { lazy val t : String = (s + s); def tt() : String = { t } }
In this case t would be calculated when tt is called the first time.

In this example it does not help a lot, because the calculation of t is so cheap. But imagine we wanted to create a huge collection and then start doing something with the elements until we reach a certain goal. Eagerly initializing the collection would blow up our program, but we may be able to iterate through it or work with the first $n$ elements where $n$ is significantly smaller than the size of the collection would be. Just think of it as an iterate-only-collection or as a collection that is too big to keep it in memory completely.

Just do this in Clojure:
(def r (range 1000000000000N))
It would define a collection with $10^{12}$ elements, which would most likely exceed our memory. But it is accepted and we can do stuff with it as long as we are dealing only with elements near to the beginning. It could know its size, but it does not, so
(count r)
will fail or take forever. Or maybe work in some future version of Clojure…

So laziness allows us to write more elegant, expressive programs that have a good efficiency. Off course this can all be written by ourselves without such help, but it is a good laziness to rely on programming languages, libraries or tools that allow us to do it in such an elegant way.

Now we can find out that Hibernate and JPA use some kind of laziness. They have to, because objects tend to be connected and fetching one would require to fetch a really big bunch. Unfortunately the laziness is simply not correct, because we do have side effects, that influence the outcome. That is what databases are… So data may change, transactions may have been committed or rolled back or whatever. We get „LazyLoadingException“, when we try to access some data. And when we try to adopt a beautiful programming style that mimics functional patters in Java with non-static inner classes (prior to Java 8) in conjunction with Hibernate, it will be bound to fail miserably unless we apply absolute care about this issue.

While we are always moving on thin ice with this side-effect-dependent laziness of Hibernate and JPA, it will be really powerful in functional languages like Scala, Clojure or Haskell if we are going the extra mile to ensure that the function does not have side effects and does not depend on side effects or get influenced by side effects.

Schreibe einen Kommentar

Antworten abbrechen