Meaningless Whitespace in Textfiles

We use different file formats that are more or less tolerant to certain changes. Most well known is white space in text files.

In some programming languages white space (space, newline, carriage return, form feed, tabulator, vertical tab) has no meaning, as long as any whitespace is present. Examples for this are Java, Perl, Lisp or C. Whitespace, that is somehow part of String content is always significant, but white space that is used within the program can be combination of one or more of the white space characters that are in the lower 128 positions (ISO-646, often referred to as ASCII or 7bit ASCII. It is of course recommended to have a certain coding standard, which gives some guidelines of when to use newlines, if tabs or spaces are preferred (please spaces) and how to indent. But this is just about human readability and the compiler does not really care. Line numbers are a bit meaningful in compiler and runtime error messages and stack traces, so putting everything into one line would harm beyond readability, but there is a wide range of ways that are all correct and equivalent. Btw. many teams limit lines to 80 characters, which was a valid choice 30 years ago, when some terminals were only 80 characters wide and 132 character wide terminals where just coming up. But as a hard limit it is a joke today, because not many of us would be able to work with a vt100 terminal efficiently anyway. Very long lines might be harder to read, so anything around 120 or 160 might still be a reasonable idea about line lengths…

Languages like Ruby and Scala put slightly more meaning into white space, because in most cases a semicolon can be skipped if it is followed by a newline and not just horizontal white space. And Perl (Perl 5) is for sure so hard to compile that only its own implementation can properly format or even recognize which white space is part of a literal string. Special cases like having the language in a string and parsing and then executing that should be ignored here.

Now we put this program files into a source code management system, usually Git. Some teams still use legacy systems like subversion, source safe, clear case or CVS, while there are some newer systems that are probably about as powerful as git, but I never saw them in use. Git creates an MD5 hash of each file, which implies that any minor change will result in a new version, even if it is just white space. Now this does not hurt too much, if we agree on the same formatting and on the same line ending (hopefully LF only, not CR LF, even on MS-Windows). But our tooling does not make any difference between significant changes and insignificant formatting only changes. This gets worse, if users have different IDEs, which they should have, because everyone should use the IDE or editor, with which he or she is most efficient and the formal description of the preferred formatting is not shared between editors or differs slightly.

I think that each programming language should come with a command line diff tool and a command line formatting tool, that obey a standard interface for calling and can be plugged into editors and into source code management systems like git. Then the same mechanisms work for C, Java, C#, Ruby, Python, Fortran, Clojure, Perl, F#, Scala, Lua or your favorite programming language.

I can imaging two ways of working: Either we have a standard format and possibly individual formats for each developer. During „git commit“ the file is brought into the standard format before it is shown to git. Meaning less whitespace changes disappear. During checkout the file can optionally be brought into the preferred format of the developer. And yes, there are ways to deal with deliberate formatting, that for some reason should be kept verbatim and for dealing differently with comments and of course all kinds of string literals. Remember, the formatting tool comes from the same source as the compiler and fully understands the language.

The other approach leaves the formatting up to the developer and only creates a new version, when the diff tool of the language signifies that there is a relevant change.

I think that we should strive for this approach. It is no rocket science, the kind of tools were around for many decades as diff and as formatting tools, it would just be necessary to go the extra mile and create sister diff and formatting tools for the compiler (or interpreter) and to actually integrate these into build environments, IDEs, editors and git. It would save a lot of time and leave more time for solving real problems.

Is there any programming language that actually does this already?

How to handle XML? Is XML just the new binary with a bit more bloat? Can we do a generic handling of all XML or should it depend on the Schema?

Share Button

Loops with unknown nesting depth

We often encounter nested loops, like

for (i = 0; i < n; i++) {
    for (j = 0; j < m; j++) {
        doSomething(i, j);
    }
}

This can be nested to a few more levels without too much pain, as long as we observe that the number of iterations for each level need to be multiplied to get the number of iterations for the whole thing and that total numbers of iterations beyond a few billions (10^9, German: Milliarden, Russian Миллиарди) become unreasonable no matter how fast the doSomethings(...) is. Just looking at this example program

public class Modular {
    public static void main(String[] args) {
        long n = Long.parseLong(args[0]);
        long t = System.currentTimeMillis();
        long m = Long.parseLong(args[1]);
        System.out.println("n=" + n + " t=" + t + " m=" + m);
        long prod = 1;
        long sum  = 0;
        for (long i = 0; i < n; i++) {
            long j = i % m;
            sum += j;
            sum %= m;
            prod *= (j*j+1) % m;
            prod %= m;
        }
        System.out.println("sum=" + sum + " prod=" + prod + " dt=" + (System.currentTimeMillis() - t));
    }
}

which measures it net run time and runs 0 msec for 1000 iterations and almost three minutes for 10 billions (10^{10}):

> java Modular 1000 1001 # 1'000
--> sum=1 prod=442 dt=0
> java Modular 10000 1001 # 10'000
--> sum=55 prod=520 dt=1
> java Modular 100000 1001 # 100'000
--> sum=45 prod=299 dt=7
> java Modular 1000000 1001 # 1'000'000
--> sum=0 prod=806 dt=36
> java Modular 10000000 1001 # 10'000'000
--> sum=45 prod=299 dt=344
> java Modular 100000000 1001 # 100'000'000
--> sum=946 prod=949 dt=3314
> java Modular 1000000000 1001 # 1'000'000'000
--> sum=1 prod=442 dt=34439
> java Modular 10000000000 1001 # 10'000'000'000
--> sum=55 prod=520 dt=332346

As soon as we do I/O, network access, database access or simply a bit more serious calculation, this becomes of course easily unbearably slow. But today it is cool to deal with big data and to at least call what we are doing big data, even though conventional processing on a laptop can do it in a few seconds or minutes... And there are of course ways to process way more iterations than this, but it becomes worth thinking about the system architecture, the hardware, parallel processing and of course algorithms and software stacks. But here we are in the "normal world", which can be a "normal subuniverse" of something really big, so running on one CPU and using a normal language like Perl, Java, Ruby, Scala, Clojure, F# or C.

Now sometimes we encounter situations where we want to nest loops, but the depth is unknown, something like

for (i_0 = 0; i_0 < n_0; i_0++) {
  for (i_1 = 0; i_1 < n_1; i_1++) {
    \cdots
      for (i_m = 0; i_m < n_m; i_m++) {
        dosomething(i_0, i_1,\ldots, i_m);
      }
    \cdots
  }
}

Now our friends from the functional world help us to understand what a loop is, because in some of these more functional languages the classical C-Style loop is either missing or at least not recommended as the everyday tool. Instead we view the set of values we iterate about as a collection and iterate through every element of the collection. This can be a bad thing, because instantiating such big collections can be a show stopper, but we don't. Out of the many features of collections we just pick the iterability, which can very well be accomplished by lazy collections. In Java we have the Iterable, Iterator, Spliterator and the Stream interfaces to express such potentially lazy collections that are just used for iterating.

So we could think of a library that provides us with support for ordinary loops, so we could write something like this:

Iterable range = new LoopRangeExcludeUpper<>(0, n);
for (Integer i : range) {
    doSomething(i);
}

or even better, if we assume 0 as a lower limit is the default anyway:

Iterable range = new LoopRangeExcludeUpper<>(n);
for (Integer i : range) {
    doSomething(i);
}

with the ugliness of boxing and unboxing in terms of runtime overhead, memory overhead, and additional complexity for development. In Scala, Ruby or Clojure the equivalent solution would be elegant and useful and the way to go...
I would assume, that a library who does something like LoopRangeExcludeUpper in the code example should easily be available for Java, maybe even in the standard library, or in some common public maven repository...

Now the issue of loops with unknown nesting depth can easily be addressed by writing or downloading a class like NestedLoopRange, which might have a constructor of the form NestedLoopRange(int ... ni) or NestedLoopRange(List li) or something with collections that are more efficient with primitives, for example from Apache Commons. Consider using long instead of int, which will break some compatibility with Java-collections. This should not hurt too much here and it is a good thing to reconsider the 31-bit size field of Java collections as an obstacle for future development and to address how collections can grow larger than 2^{31}-1 elements, but that is just a side issue here. We broke this limit with the example iterating over 10'000'000'000 values for i already and it took only a few minutes. Of course it was just an abstract way of dealing with a lazy collection without the Java interfaces involved.

So, the code could just look like this:

Iterable range = new NestedLoopRange(n_0, n_1, \ldots, n_m);
for (Tuple t : range) {
    doSomething(t);
}

Btw, it is not too hard to write it in the classical way either:

        long[] n = new long[] { n_0, n_1, \ldots, n_m };
        int m1 = n.length;
        int m  = m1-1; // just to have the math-m matched...
        long[] t = new long[m1];
        for (int j = 0; j < m1; j++) {
            t[j] = 0L;
        }
        boolean done = false;
        for (int j = 0; j < m1; j++) {
            if (n[j] <= 0) {
                done = true;
                break;
            }
        }
        while (! done) {
            doSomething(t);
            done = true;
            for (int j = 0; j < m1; j++) {
                t[j]++;
                if (t[j] < n[j]) {
                    done = false;
                    break;
                }
                t[j] = 0;
            }
        }

I have written this kind of loop several times in my life in different languages. The first time was on C64-basic when I was still in school and the last one was written in Java and shaped into a library, where appropriate collection interfaces were implemented, which remained in the project or the organization, where it had been done, but it could easily be written again, maybe in Scala, Clojure or Ruby, if it is not already there. It might even be interesting to explore, how to write it in C in a way that can be used as easily as such a library in Java or Scala. If there is interest, please let me know in the comments section, I might come back to this issue in the future...

In C it is actually quite possible to write a generic solution. I see an API like this might work:

struct nested_iteration {
  /* implementation detail */
};

void init_nested_iteration(struct nested_iteration ni, size_t m1, long *n);
void dispose_nested_iteration(struct nested_iteration ni);
int nested_iteration_done(struct nested_iteration ni); // returns 0=false or 1=true
void nested_iteration_next(struct nested_iteration ni);

and it would be called like this:

struct nested_iteration ni;
int n[] = { n_0, n_1, \ldots, n_m };
for (init_nested_iteration(ni, m+1, n); 
     ! nested_iteration_done(ni); 
     nested_iteration_next(ni)) {
...
}

So I guess, it is doable and reasonably easy to program and to use, but of course not quite as elegant as in Java 8, Clojure or Scala.
I would like to leave this as a rough idea and maybe come back with concrete examples and implementations in the future.

Links

Share Button

Scala Days 2018

In 2018 I visited Scala Days in Berlin with this schedule. Other than in previous years I missed the opening keynote, which is traditionally on the evening before the main conference starts, because this is not so well compatible with my choice of traveling with a night train, especially because I did not want to miss more than two days of my current project.

I might provide Links to Videos of the talks, if they become available, including the interesting keynote that I have missed.

So I did visit the following talks on the first full day:

And on the second full day:

Links

Share Button

ScalaUA 2018

About a week ago I visited Scala UA in Kiev.

This was the schedule.

It was a great conference once again, as it was already in 2017 and I really enjoyed everything, including the food, which was great again… 🙂

I listened to the following talks on the first day:

And I gave this talk:

On the second day I listened to the following talks:

  • Advanced Patterns in Asynchronous Programming, Michael Arenzon & Assaf Ronen
  • Akka: Actors Design And Communication Techniques, Alex Zvolinskiy
  • Monad Stacks or: How I Learned to Stop Worrying and Love the Free Monad, and other stories, Harry Laoulakos
  • Scala on Wire: How event streams help us build Android apps, Maciej Gorywoda
  • Purely functional microservices with http4s and doobie, Jasper van Zandbeek
  • Tame Your Data with Reactive Streams and Monix, Jacek Kunicki
  • Future and issues of the Scala Ecosystem, Panel Discussion
  • Roll your own Event Sourcing, Lina Krutulytė-Kriščiūnė

And I gave this talk:

As always there was a lot of inspiration coming from the talks and a lot worth exploring in future posts. So there will be Scala posts once in a while, as in the past…

Links

  • Scala Days 2018
  • Scala UA 2017
  • Scala UA (official page)
  • NoSQL
  • Scala
  • Share Button

Christmas — Weihnachten — Рождество 2017

Bon nadal! — Priecîgus Ziemassvçtkus — З Рiздвом Христовим — Buon Natale — Bella Festas daz Nadal! — С Рождеством — Срећан Божић — καλά Χριστούγεννα — God Jul! — Feliĉan Kristnaskon — ميلاد مجيد — Feliz Navidad — Glædelig Jul — Fröhliche Weihnachten — Joyeux Noël — Hyvää Joulua! — クリスマスおめでとう ; メリークリスマス — Merry Christmas — Natale hilare — God Jul — Crăciun fericit — Prettige Kerstdagen — Nollaig Shona Dhuit!

xmas tree

Christmas Tree in Olten 2017

This message has been generated by a program again, this time I decided to use Scala:

import scala.util.Random
object XmasGreeting {
  val texts : List[String] = List( "Fröhliche Weihnachten",
    "Merry Christmas", "God Jul", "Feliz Navidad", "Joyeux Noël",
    "Natale hilare", "С Рождеством", "ميلاد مجيد", "Buon Natale", "God Jul!",
    "Prettige Kerstdagen", "Feliĉan Kristnaskon", "Hyvää Joulua!",
    "Glædelig Jul", "クリスマスおめでとう ; メリークリスマス",
    "Nollaig Shona Dhuit!", "Bon nadal!", "Срећан Божић",
    "Priecîgus Ziemassvçtkus", "Crăciun fericit", "Bella Festas daz Nadal!",
    "З Рiздвом Христовим", "καλά Χριστούγεννα")
  val shuffledTexts : List[String] = Random.shuffle(texts)
  def main(args: Array[String]) : Unit = {
    println(shuffledTexts.mkString(" — "))
  }
}

Share Button

Scala Exchange 2017

I have visited Scala Exchange („#ScalaX“) in London on 2017-12-14 and 2017-12-15. It was great, better than 2015 in my opinion. In 2016 I missed Scala Exchange in favor of Clojure Exchange.

This time there were really many talks about category theory and of course its application to Scala. Spark, Big Data and Slick were less heavily covered this time. Lightbend (former Typesafe), the company behind Scala, did show some presence, but less than in other years. But 800 attendees are a number by itself and some talks about category theory were really great.

While I have always had a hard time accepting why we need this „Über-Mathematics“ like category theory for such a finite task as programming, I start seeing its point and usefulness. While functors and categories provide a meta layer that is actually accessible in Scala there are actually quite rich theories that can even be useful when constrained to a less infinite universe. This helps understanding things in Java. I will leave details to another post. Or forget about it until we have the next Scala conference.

So the talks that I visited were:

  • Keynote: The Maths Behind Types [Bartosz Milewski]
  • Free Monad or Tagless Final? How Not to Commit to a Monad Too Early [Adam Warski]
  • A Pragmatic Introduction to Category Theory [Daniela Sfregola]
  • Keynote: Architectural patterns in Building Modular Domain Models [Debasish Ghosh]
  • Automatic Parallelisation and Batching of Scala Code [James Belsey and Gjeta Gjyshinca]
  • The Path to Generic Endpoints Using Shapeless [Maria-Livia Chiorean]
  • Lightning talk – Optic Algebras: Beyond Immutable Data Structures [Jesus Lopez Gonzalez]
  • Lightning Talk – Exploring Phantom Types: Compile-Time Checking of Resource Patterns [Joey Capper]
  • Lightning Talk – Leave Jala Behind: Better Exception Handling in Just 15 Mins [Netta Doron]
  • Keynote: The Magic Behind Spark [Holden Karau]
  • A Practical Introduction to Reactive Streams with Monix [Jacek Kunicki]
  • Building Scalable, Back Pressured Services with Akka [Christopher Batey]
  • Deep Learning data pipeline with TensorFlow, Apache Beam and Scio [Vincent Van Steenbergen]
  • Serialization Protocols in Scala: a Shootout [Christian Uhl]
  • Don’t Call Me Frontend Framework! A Quick Ride on Akka.Js [Andrea Peruffo]
  • Keynote: Composing Programs [Rúnar Bjarnason]
Share Button

Devoxx 2017

I was lucky to get a chance to visit Devoxx in Antwerp the sixth time in a row. As always there were interesting talks to listen to. Some issues that were visible across different talks:

Java 9 is now out and Java 8 will soon go into the first steps of deprecation. The step of moving to Java 9 is probably the hardest in the history of Java. There were features in the past that brought very significant changes, but they were usually kind of optional, so adoption could be avoided or delayed. Java 9 brings the module system and a new level of abstraction in that classes of modules can be made public to other modules selectively or globally. Otherwise they can be by themselves declared as public, but only be visible within the module. This actually applies to classes of the standard library, that were always declared as being private, but that could not be efficiently hidden away from external usage. Now they suddenly do not work any more and a lot of software has some difficulty and needs to be adjusted to avoid these internal classes. Beyond that a lot of talks were about Java 9, for example also covering somewhat convenient methods for writing constant collections in code. Future releases will follow a path that is somewhat similar to that of Perl 5. Releases will be created roughly every half year and will include whatever is ready for inclusion at that time. Some releases will be supported for a longer time than others.

In the arena of non-Java JVM-languages the big winner seems to be Kotlin, while Groovy, Clojure, JRuby and Ceylon where not visible at the conference. Scala has retained its position as an important JVM language besides Java at this conference. The rise of Kotlin may be explained by the fact that Idea (IntelliJ) has become much more important as IDE than Eclipse and Netbeans, which already brings Kotlin onto every JVM-language-developer’s desktop. And Google has moved from Eclipse to Idea as recommended and supported IDE for Android-development and is now officially supporting Kotlin besides Java as language for Android-development. There were heroic efforts to do development in Scala, Clojure, Groovy for Android without support from Google, which is quite possible, but having to deploy the libraries with each app instead of having them already on the phone is a big disadvantage. The second largest mobile OS has added support for Swift as an alternative to Objective C and Swift and Kotlin are different languages, but they are sufficiently similar in terms of concepts and possibilities to ease development of Apps targeting the two most important mobile system platforms in mixed teams at least a bit. And Kotlin gives developers many of the cool and interesting features of Scala, while remaining a bit easier to learn and to understand, because some of the most difficult parts of Scala are left out. Anyway, Scala is not yet heavily challenged by Kotlin and remains important and I think that Clojure and JRuby and Groovy retain their importance and live in somewhat differenct niches than Scala and Kotlin. I would think that they are just a bit too small to be present on each Devoxx. Or it was just random effects about how much news there was about the languages and what kind of speeches had been proposed for them. On the other hand, I would assume that Ceylon has become a dead end, because it came out at the same time as Kotlin and tries to cover the same niche. It is hard to stay important in the same niche with a strong winner.

Then there was of course security security security… Even more important than in the past.

And a lot more…

I listened to the following talks:
Wednesday, 2017-11-08

Thursday, 2017-11-09

Friday, 2017-11-10

Links:

Previous years:

Btw. I always came to Devoxx by bicycle, at least part of the way…

Share Button

ScalaUA 2017

About a month ago I visted the conference ScalaUA in Kiev.

This was the schedule.

It was a great conference and I really enjoyed everything, including the food, which is quite unusual for an IT-conference.. 🙂

I listened to the following talks:
First day:

  • Kappa Architecture, Juantomás García Molina
  • 50 shades of Scala Compiler, Krzysztof Romanowski
  • Functional programming techniques in real world microservices, András Papp
  • Scala Refactoring: The Good the Bad and the Ugly, Matthias Langer
  • ScalaMeta and the Future of Scala, Alexander Nemish
  • ScalaMeta semantics API, Eugene Burmako

I gave these talks:

  • Some thoughts about immutability, exemplified by sorting large amounts of data
  • Lightning talK: Rounding

Day 2:

  • Mastering Optics in Scala with Monocle, Shimi Bandiel
  • Demystifying type-class derivation in Shapeless, Yurii Ostapchuk
  • Reactive Programming in the Browser with Scala.js and Rx, Luka Jacobowitz
  • Don’t call me frontend framework! A quick ride on Akka.Js, Andrea Peruffo
  • Flawors of streaming, Ruslan Shevchenko
  • Rewriting Engine for Process Algebra, Anatolii Kmetiuk

Find recording of all the talks here:
https://www.scalaua.com/speakers-speeches-at-scalaua2017/

Share Button

Scala Days 2016

I have visited Scala Days in Berlin 2016-06-15 to 2016-06-17. A little remark on the format might be of interest. The conference is scheduled for 3 days. On the first day, there is only one speech, the first keynote, some time in the late afternoon. During Scala Days 2015 the rest of the day was put into use by organizing a Scala training session, where volunteers could teach Scala to other volunteers who wanted to learn it. But I think two or three sessions on the first day would be better and would still allow starting in the late afternoon with the first keynote. The venue and of course Berlin were great and I enjoyed the whole event.

The talks that I visited were:

Wednesday 2016-06-15

  • First keynote: Scala’s Road Ahead by Martin Odersky about the future of Scala. Very interesting ideas for future versions that are currently explored in dotty.

Thursday 2016-06-16

Friday 2016-06-17

Summary

The whole event was great, I got a lot of inspiration and met great people. Looking forward to the next event.

I might write more on some topics, where I consider it interesting, but for the moment this summary should be sufficient.

Links

Share Button

Slick

Almost every serious programming language has to deal with database access, if not out of love, then at least out of practical necessity.

The theoretical background of a functional programming language is somewhat hostile to this, because pure functional langauges tend to dislike state and a database has the exact purpose to preserve state for a long time. Treating the database as somewhat external and describing its access with monads will at least please theoretical purists to some extent.

In case of Scala things are not so strict, it is allowed to leave the purely functional path where adequate. But there are reasons to follow it and to leave it only in well defined, well known, well understood and well contained exceptions… So the database itself may be acceptable.

Another intellectual and practical challenge is the fact that modern Scala architectures like to be reactive, like to uncouple things, become asynchronous. This is possible with databases, but it is very unusual for database drivers to support this. And it is actually quite contrary to the philosophy of database transactions. It can be addressed with future database drivers, but I do not expect this to be easy to use properly in conjunction with transactions.

It is worthwhile to think about
immutability and databases.
For now we can assume that the database is there and can be used.

So the question is, why not integrate existing and established persistence frameworks like JPA, Hibernate, Eclipslink and other OR-mapping-based systems into Scala, with Scala language binding?

Actually I am quite happy they did not go this path. Hibernate is broken and buggy, but the generic concept behind it is broken in my opinion. I do not think that it is theoretically correct, especially if we should program with waterproof transactions and actually defeat them by JPA-caching without even being aware of it? But for practical purposes, I actually doubt that taming JPA for non trivial applications is really easier than good old JDBC.

Anyway, Scala took the approach of building something in the lines of JDBC, but making it nicer. It is called Slick, currently Slick 3. Some links:
* Slick
* Slick 3.1.1
* Documentation
* Github
* Assynchronous DB access
* Recording of the Slick Workshop at ScalaX

What Slick basically does is building queries using Scala language. It does pack the results into Scala structures. Important is, that the DB is not hidden. So we just do the same as with JDBC, but in a more advanced way.

This article is inspired by my visit to Scala X.

Share Button