UTF-16 Strings in Java

Deutsch

Strings in Java and many other JVM-languages consist of Unicode content and are encoded as utf-16. It was fantastic to already consider Unicode when introducing Java in the 90es and to make it the only reasonable way to use strings, so there is no temptation to start with a „US-ASCII“-version of a software that never really gets enhanced to deal properly with non-English content and data. And it also avoids having to deal with many kinds of String encodings within the software. For outside storage in databases and files and for communication with other processes and over the network, these issues off course remain. But Java strings are always encoded utf-16. We can be sure. Most common languages can make use of these Strings easily and handling common letter based languages from Europe and western Asia is quite strait forward. Rendering may still be a challenge, but that is another issue. A major drawback of this approach is that more memory is used. This usually does not hurt too much. Memory is not so expensive and a couple of Strings even with utf-16 will not be too big. With 64-bit Java, which should be used these days, the memory limitations of the JVM are not relevant any more, they can basically use as much memory as provided.

But some applications to hit the memory limits. And since usually most of the data we are dealing with is ultimately strings and combinations of relatively long strings with some reference pointers, some integer numbers and more strings, we can say that in typical memory intensive applications strings actually consume a large fraction of the memory. This leads to the temptation of using or even writing a string library that uses utf-8 or some other more condensed internal format, while still being able to express any Unicode content. This is possible and I have done it. Unfortunately it is very painful, because Strings are quite deeply integrated into the language and explicit conversions need to be added in many places to deal with this. But it is possible and can save a lot of memory. In my case we were able to abandon this approach, because other optimizations, that were less painful, proved to be sufficient.

An interesting idea is to compress strings. If they are long enough, algorithms like gzip work on a single string. As with utf-8, selectively accessing parts of the string becomes expensive, because it can only be achieved by parsing the string from the beginning or by adding indexing structures. We just do not know which byte to go to for accessing the n-th character, even with utf-8. In reality we often do not have long strings, but rather many relatively short strings. They do not compress well by themselves. If we know our data and have a rough idea about the content of our complete set of strings, custom compression algorithm can be generated. This allows good results even for relatively short strings, as long as they are within the „language“ that we are prepared for. This is more or less similar to the step from utf-16 to utf-8, because we replace common byte sequences by shorter byte sequences and less common sequences may even get replaced by something longer. There is no gain in utf-8, if we have mostly strings that are in non-Latin alphabets. Even Cyrillic or Greek, that are alphabets similar to the Latin alphabet, will end up needing two bytes for each letter, which is not at all better than utf-16. For other alphabets it will even become worse, because three or four bytes are needed for one symbol that could easily be expressed with two bytes in utf-16. But if we know our data well enough, the approach with the specific compression will work fine. The „dictionary“ for the compression needs to be stored only once, maybe hard-coded in the software, and not in each string. It might be of interest to consider building the dictionary dynamically at run-time, like it is done with gzip, but keeping it in a common place for all strings and thus sharing it. The custom strings that I used where actually using a hard coded compression algorithm generated using a large amount of typical data. It worked fine, but was just too clumsy to use because Java is not prepared to replace String with something else without really messing around in the standard run-time libraries, which I would neither recommend nor want.

It is important to consider the following issues:

  1. Is the memory consumption of the strings really a problem?
  2. Are there easier optimizations that solve the problem?
  3. Can it just be solved by adding more hardware? Yes, this is a legitimate question.
  4. Are there solutions for the problem in the internet or even within the current organization?
  5. A new String class is so fundamental that excellent testing is absolutely mandatory. The unit tests should be very extensive and complete.
Share Button

Laziness

In conservative circles, laziness has a slightly negative connotation, but in IT it should be seen positive, if we understand it right. If we as humans get our job done in a more efficient way, by working less, this laziness is a good thing. So let’s be lazy by becoming more efficient. For now, just remember, that laziness is something good.

Laziness in software refers to performing a certain operation only when needed. Imagine we store a value that is the result of some calculation in a data structure. We can only have read access to this value, speaking in the Java-world it is a „getter“. Under certain circumstances it is functionally equivalent to store the function that calculates the value instead and to call it whenever the getter is called. Usually this is combined with the memoize pattern, so the value is retained, once it has been calculated. This whole idea breaks down if the function to calculate the value has side effects, is influenced by side effects or generally does not yield the same result on the same inputs whenever it is called. Obviously for implementing such a lazy value the parameters for the function have to be retained as well, either by storing them in the structure as well or by wrapping the function and the parameters into a parameter-less function that includes them. Obviously these parameters have to be immutable for this approach to work.

Now the question is, what do we gain from this kind of laziness?
At first glance we gain nothing, because the calculation has to be done anyway and we are just procrastinating it, so the eventual calculation of whatever we are doing will be expensive. Actually this might not be true. It is quite possible, that the value is never needed, so it would be wasteful to calculate it in advance.

Actually many of us have seen both approaches in text books for Java or C++, when the singleton pattern is explained. The typical implementation looks like this:


public class S {
    private static S instance = null;

    public static synchronized S getInstance() {
        if (instance == null) {
            instance = new S();
        }
        return instance;
    }

    public S() {
        // ...
    }

    //...
}

This „synchronized“ is a bit of a pain, but necessary to ensure that the instance is created only once. Smart people came up with the „double-check-pattern“, so it looked like this:

public class S {
    private static S instance = null;

    public static S getInstance() {
        if (instance == null) {
            synchronized (S.class) {
                if (instance == null) {
                    instance = new S();
                }
            }
        }
        return instance;
    }

    public S() {
        // ...
    }

    //...
}

It is kind of cute, because it takes into account that after the first check and before acquiring the synchronized-lock some other thread already initializes it. It is supposed to work with newer Java versions, but for sure going to fail in older versions. This is due to the memory management model. Anyway it is kind of scary, because we observe that something that really looks correct won’t work anyway. Or even worse, it will work fine during testing and miserably fail on production when we don’t expect it. But it would be so easy with eager initialization:

public class S {
    private static S instance = new S();

    public static S getInstance() {
        return instance;
    }

    public S() {
        // ...
    }

    //...
}

Or use even an enum to implement the singleton.

Other language like Scala support this kind of laziness in a more natural way.

case class S(s : String) {
  lazy val t : String = (s + s);
  def tt() : String = { t }
}

In this case t would be calculated when tt is called the first time.

In this example it does not help a lot, because the calculation of t is so cheap. But imagine we wanted to create a huge collection and then start doing something with the elements until we reach a certain goal. Eagerly initializing the collection would blow up our program, but we may be able to iterate through it or work with the first n elements where n is significantly smaller than the size of the collection would be. Just think of it as an iterate-only-collection or as a collection that is too big to keep it in memory completely.

Just do this in Clojure:

(def r (range 1000000000000N))

It would define a collection with 10^{12} elements, which would most likely exceed our memory. But it is accepted and we can do stuff with it as long as we are dealing only with elements near to the beginning. It could know its size, but it does not, so

(count r)

will fail or take forever. Or maybe work in some future version of Clojure…

So laziness allows us to write more elegant, expressive programs that have a good efficiency. Off course this can all be written by ourselves without such help, but it is a good laziness to rely on programming languages, libraries or tools that allow us to do it in such an elegant way.

Now we can find out that Hibernate and JPA use some kind of laziness. They have to, because objects tend to be connected and fetching one would require to fetch a really big bunch. Unfortunately the laziness is simply not correct, because we do have side effects, that influence the outcome. That is what databases are… So data may change, transactions may have been committed or rolled back or whatever. We get „LazyLoadingException“, when we try to access some data. And when we try to adopt a beautiful programming style that mimics functional patters in Java with non-static inner classes (prior to Java 8) in conjunction with Hibernate, it will be bound to fail miserably unless we apply absolute care about this issue.

While we are always moving on thin ice with this side-effect-dependent laziness of Hibernate and JPA, it will be really powerful in functional languages like Scala, Clojure or Haskell if we are going the extra mile to ensure that the function does not have side effects and does not depend on side effects or get influenced by side effects.

Share Button

Clojure

Functional programming languages have become a bit of a hype.

But the ideas are not really so new.
The first languages beyond Assembly language that have maintained some relevance up to today were FORTRAN, COBOL and Lisp. Indirectly also Algol, because it inspired pretty much any modern mainstream programming language in some way through some intermediate ancestors. The early Algol Variants itself have disappeared .

It can be argued if the early Lisp Dialects were really functional languages, but they did support some functional patterns and made them popular at least in certain communities. I would say that popular scripting languages like Perl, Ruby, Lua and especially JavaScript brought them to the main stream.

Lisp has always remained in its niche. But the question arose on creating a new Lisp that follows more strictly the functional paradigm and is somewhat more modern, cleaner and simpler than the traditional Lisps. It was done and it is called Clojure.

So anybody who has never used any Lisp will at first be lost, because it is a jungle of parentheses „((((())))()()()(…)“ with some minor stuff in between…
Actually that is an issue, when we move from today’s common languages to Clojure. But it is not that bad. The infix-notation is familiar to us, but it has its benefits to use one simple syntax for almost everything.

An expression that consists of a function call is written like this (function-name param1 param2 parm3...). +, -, *,…. are just functions like anything else, so if we want to write 3\cdot4 + 5\cdot6 we just write (+ (* 3 4) (* 5 6)).

In the early days of calculators it was easier to build something that works with a notion called „RPN“, so there we would write 3 ENTER 4 * 5 ENTER 6 * +, which is similar to the Lisp way, but just the other way round.

It is easy to add a different number of values:
* (+) -> 0
* (+ 7) -> 7
* (+ 1 2 3 4 5 6 7) -> 28

In Clojure functions are just normal values like numbers, arrays, lists,… that can be passed around.. It is good programming practice to rely on this where it helps. And with more experience it will be helpful more often.

Immutability is king. Most of the default structures of Clojure are immutable. They can be passed around without the fear that they might change once they have been constructed. This helps for multithreading.

Clojure provides lists, arrays, sets, hashmaps, and the sorted variants of the latter. These can be written easily:
* List: (list 1 2 3) -> (1 2 3) (entries are evaluated in this case)
* List: '(1 2 3) -> (1 2 3) (entries are not evaluated in this case)
* Array: [1 2 3] (entries are evaluated in this case)
* Set: #{1 2 3} (entries are evaluated in this case)
* Map: {1 2, 3 4} (entries are evaluated in this case. key-value-Pairs are usually grouped with a comma, which is whitespace for Clojure)

All of these are immutable. So methods that change collections, always create a copy that contains the changes. The copy can be done lazily or share data with the original.

Actually I can teach Clojure in course of two to five days duration depending on the experience of the participants and the goals they want to achieve.

There is much more to write about Clojure…

Share Button

ScalaX 2015

It was possible to arrange a visit of Scala Exchange 2015 in London, short #ScalaX.

I visited the following events:

What do +, – and * with Integer do?

When using integers in C, Java or Scala, we often use what is called int.

It is presented to us as the default.

And it is extremely fast.

Ruby uses by default arbitrary length integers.

But what do +, – and * mean?

We can rebuild them, in Ruby, kind of artificially restrict the integers to what we have in other programming langauges as int:


MODULUS = 0x100000000;
LIMIT   =  0x80000000;

def normalize(x)
  r = x % MODULUS;
  if (r < -LIMIT) then     return r + MODULUS;   elsif (r >= LIMIT) 
    return r - MODULUS;
  else
    return r;
  end
end

def intPlus(x, y)
  normalize(x+y);
end

def intMinus(x, y)
  normalize(x-y);
end

def intTimes(x, y)
  normalize(x*y);
end

x = 0x7fffffff;
y = intPlus(x, x);
z = intPlus(x, x);
puts("x=#{x} y=#{y} z=#{z}");

What is the outcome?

Exactly what you get when doing 32-Bit-Ints in C, Java, Scala or C#:

x=2147483647 y=-2 z=-2

int is always calculated modulo a power of two, usually 2^{32}. That is the
x % MODULUS in normalize(). The Rest of the function is just for normalizing the result to the range -2^{31} \ldots 2^{31}-1.

So we silently get this kind of result when an overflow situation occurs, without any notice.
The overflow is not trivial to discover, but it can be done.
For addition I have described how to do it.

See also Overflow of Integer Types, an earlier post about these issues…

I gave a talk that included this content adapted for Scala at Scala Exchange 2015.

Share Button

Devoxx 2015

This year I have had the pleasure to visit the Devoxx-Conference in Antwerp in Belgium.

I have visited the following talks:

There is a lot to write about this and I will get back to specific interesting topics in the future…

My previous visits in 2012, 2013 (part 1), 2013 (part 2), and 2014 have their own blog articles.

Share Button

Using Collections

When Java came out about 20 years, it was great to have a decent and quite extensive collection library available as part of the standard setup and ready to use.

Before that we often had to develop our own or find one of many interesting collection libraries and when writing and using APIs it was not a good idea to rely on them as part of the API.

Since Java 2 (technical name „Java 1.2“) collection interfaces have been added and now the implementation is kind of detached, because we should use the interfaces as much as possible and make the implementation exchangeable.

An interesting question arose in conjunction with concurrency. The early Java 1 default collections where synchronized all the way. In Java 2 non synchronized variants were added and became the default. Synchronization can be achieved by wrapping them or by using the old collections (that do implement the interfaces as well since Java 2).

This was a performance improvement, because most of the time it is expensive and unnecessary overhead to synchronize collections. As a matter of fact special care should be used anyway to know who is accessing a collection in what way. Even if the collection itself does not get broken by simultaneous access, your application most likely is, unless you really know what you are doing. Are you?

Now it is usually a good idea to control changes of a collection. This is achieved by wrapping it with some Collections.umodifyableXXX-method. The result is that accessing the wrapped collection with set or put will cause an exception. It was a good approach, as a first shot, but not where we want to be now.

Of the wrapped collection still references to the inner, non-wrapped collection can be around, so it can still change while being accessed. If you can easily afford it, just copy collections when taking them in or giving them out. Or go immutable all the way and wrap your own in an umnodifiable-wrapper, if that works.

What I would like to see is something along the following lines:

  • We have two kinds of collection interfaces, those that are immutable and those that are mutable.
  • The immutable should be the default.
  • We have implementations of the collections and construction facilities for the immutable collections
  • The immutable implementation is off course the default.

I do not want to advocate going immutable collections only, because that does come at a high price in terms of efficiency. The usual pattern is to still have methods that modify a collection, but these leave the original collection as it is and just create a modified copy. Usually these implementations have been done in such a smart way that they share a lot, which is no pain, because they are all immutable. No matter how smart and admirable these tricks are, I strongly doubt that they can reach the performance of modifiable collections, if modifications are actually used a lot, at least in a purely single threaded environment.

Ruby has taken an interesting approach. Collections have a method freeze that can be called to make them immutable. That is adding runtime checks, which is a good match for Ruby. Java should check this at compile time, because it is so important. Having different interfaces would do that.

I recommend checking out the guava-collection library from google. It does come with most of the issues described here addressed and I think it is the best bet at the moment for that purpose. There are some other collection libraries to explore. Maybe one is actually better then guava.

Share Button

Residue Class Rounding

Deutsch

If you do not know what residue classes are, just read on, it will be explained to the extent needed later on.

The concept of rounding seems familiar, but let us try to grab it a little bit more systematically.

Let us assume a set M of numbers and a subset N \subseteq M of it, whose elements can be represented in the programming or software environment we are having in mind. We want to use a metric d : M \times M \rightarrow \Bbb R_+, (x, y) \mapsto r=d(x,y)>=0. Usually we have three requirements for d:

  • Identity of indiscernibles: d(x,y) = 0 \iff x = y
  • Symmetry: d(x,y)=d(y,x)
  • Triangular inquation: d(x,z) \le d(x,y) + d(y,z)

Usually the number that we are dealing with can be considered to be within the world of real or complex numbers, we can hence assume that M \subseteq \Bbb C, often even M \subseteq \Bbb R or if we are honest M \subseteq \Bbb Q. Then we are usually working with d(x,y) = |x-y|, which is kind of implicitly clear. If you do not know complex numbers, just think of real or even rational numbers, which is the most common case anyway. Off course the concepts for rounding of p-adic numbers are really interesting and beautiful, but since I do not want to explain p-adic numbers here, I will not extend on this issue.

What is our intuitive understanding of rounding?
Maybe just a map
r: M \rightarrow N, x \mapsto r(x),
that is chosen with certain constraints for each x in such a way that d(x, r(x)) is minimal.
We need the constraints to enforce uniqueness if multiple values y\in N with minimal d(x,y) exist. The classical case of rounding to integral numbers has to answer the question of how to round 0.5. If N is a subset of the real numbers, which is „usually“ the case, we have ordering. We can choose between the following constraints:

ROUND_UP
|r(x)|\ge |x| Z.B. r(0.4)=1 and r(0.5)=1 and r(-0.5)=-1
ROUND_DOWN
|r(x)|\le |x| Z.B. r(0.6)=0 and r(0.5)=0 and r(-0.5)=0
ROUND_CEILING
r(x) \ge x Z.B. r(0.4)=1 and r(0.5)=1 and r(-0.5)=0
ROUND_FLOOR
r(x) \le x Z.B. r(0.6)=0 and r(0.5)=0 and r(-0.5)=-1
ROUND_HALF_UP
Minimize d(r(x),x), but if multiple optimal values exist for r(x), pick the one furthest away from 0, for example r(0.4)=0 and r(0.6)=1 and r(0.5)=1 and r(-0.5)=-1
ROUND_HALF_DOWN
Minimize d(r(x),x), but if multiple optimal values exist for r(x), pick the one closest to 0, for example r(0.4)=0 and r(0.6)=1 and r(0.5)=0 and r(-0.5)=0
ROUND_HALF_CEILING
Minimize d(r(x),x), but if multiple optimal values exist for r(x), pick the largest, for example r(0.4)=0 and r(0.6)=1 and r(0.5)=1 and r(-0.5)=0
ROUND_HALF_FLOOR
Minimize d(r(x),x), but if multiple optimal values exist for r(x), pick the smallest, for example r(0.4)=0 and r(0.6)=1 and r(0.5)=0 and r(-0.5)=-1
ROUND_HALF_EVEN
Minimize d(r(x),x), but if multiple optimal values exist for r(x), pick the one with even last digit. Please observe that this constraint is useful in the classical case, but it cannot be generalized. For example: r(0.4)=0 and r(0.6)=1 and r(0.5)=0 and r(-0.5)=0 and (1.5)=2
ROUND_UNNECESSARY
This constraint does not work in the mathematical sense (or only with ugly abusive math formulas), but programmatically we can do that: We try r(x)=x and throw an exception if not x\in N.

Usually we are thinking in the decimal system (even though our computers prefer something else, but the computer should serve us and not vice versa..). So we pick a power of ten 10^n with n \in \Bbb N_0, so n \ge 0. Now we simply define
N = \{ x \in M : 10^n x \in \Bbb Z\},
which means that N contains all numbers of M with a maximum of n digits after the decimal point. This rounding works quite well with something like LongDecimal in Ruby or BigDecimal in Scala or Java, but BigDecimal offers fewer rounding modes than LongDecimal for Ruby.

Now we look at the residue class rounding. We assume such a power of ten 10^n. Then we need an integral number m \ge 2 and a set of numbers R \subseteq \{0,...,m-1\}, we call them residues. Now we define N = \{ x \in M : 10^n x \in {\Bbb Z} \wedge \bigvee_{r \in R} 10^n x \equiv r \mod{m}\}.
That means that if we fill the number with zeros to the given number of places after the decimal point, remove the decimal point and perform a division with residue of this number by m, the residue lies in R. In this case ROUND_HALF_EVEN can become difficult to implement and ambiguous, so we might need to sacrifice it. But even worse, we have to deal with the case that 0 is not in R and provide rules how to round 0. The candidates have self-explanatory names:

  • ZERO_ROUND_TO_PLUS
  • ZERO_ROUND_TO_MINUS
  • ZERO_ROUND_TO_CLOSEST_PREFER_PLUS
  • ZERO_ROUND_TO_CLOSEST_PREFER_MINUS
  • ZERO_ROUND_UNNECESSARY

An important application of this is m=10 and R=\{0, 5\}. This is used in Switzerland and probably other currency regions for rounding money amounts to multiples of 5 Rappen (0.05 CHF). This has already been described in „Rounding of Money Amounts„. If we have m=10 and R=\{0,1,2,3,4,5,6,7,8,9\} we have the usual non-CHF-rounding case. Maybe there are even cases for m=10 and R=\{0,2,4,6,8\}. But the concept is more general, if you like to use it.

Share Button

ScalaX 2014

Deutsch

I have been visiting the conference Scala eXchange ( #scalaX ) organized by Skillsmatter in London.

Here is some information about the talks that I have attended and the highlights:

The Binary Compatibility Challenge

Martin Odersky

Examples can be constructed for each of the four combinations of binary and source code compatibility.
We wish more of both. The talk is mostly about binary compatibility.
For Scala the conflict is innovation vs compatibility. Java chose one extreme to retain compatibility at any cost, which is considered more like a constraint then a restriction by the Java developers. It is easier for them because they control the JVM which is again close to the Java language. Clojure, Ruby, JRuby, Perl, Python, PHP and some others have less problems with this issue because the software is distributed as source code and compiled when run, just in time.
Reproducible builds are hard to achieve in Scala, just think of the many build tools like ivy, gradle, maven, sbt, ant, make (yes, I have seen it),…
The idea is to split the around 30 steps of compilation of scala into two groups. The first group could yield an intermediate format after around 10 internal compilation steps, which might be stored as tree of the program in a clever binary format. This could be a good compromise and address some of the issues, if kept stable. More likely will programs be combinable compatibly with this format than with binary or source only. It would also save time during compilation, which is a desirable improvement for scala.

REST on Akka: Connect to the world

Mathias Doenitz

Akka-http is the spray 2.0 or successor of spray. It follows the lines of spray, but improves on some of the shortcomings.
It should be used for reactive streams in Akka and is important enough to be part of core Akka.
TCP-flow-control can be used to implement „back pressure“.

Bootstrapping the Scala.js Ecosystem

Haoyi Li

Scala shall be compiled to as second alternative instead of JVM. The target is the browser, not so much server side JavaScript, where the JVM is available and better.
Advantage for applications: Some code can be used on both sides, for example HTML-tag-generation, validation etc. This is more elegant than using two languages. Also Scala might be considered a more sane language for the browser than JavaScript, although JavaScript is not such a bad language, but suffers like PHP and VBA from being used by non-developers who come from web design side and try a little JavaScript as part of their work, tripping into each of the pitfalls that we developers have already had some time ago when we had our first experience.
Libraries prove to be hard. They are huge and it is hard to transfer them. Optimization is needed to deal with this, like for Android development.
Reflection is not available on scala.js. Many things do not work because of that, but enough things to make it useful do work.
Serialization is another challenge, because many frameworks rely on reflection, but there seems to be a solution for that.
Integer types are a little bit crappy. JS only has double which can do 53 bit integers. Long has to be built from two doubles.

Introduction to Lambda Calculus

Maciek Makowski

Very theoretical talk. Lamba calculus is pure math or even more theoretical, pure theoretical informatics, but it can be made a complete programming language with some thinking. It can be used for dealing with issues like computability. Many nice proofs are possible. The theoretical essence of functional programming languages is there. Some key words: „Church Rosser Theorem“, „Programming with Lambda-Calculus“, „numbers as lambda expressions“ (church encoding), „y combinator“, „fixed point combinator“, „lambda cube“, „fourth dimension for Subtypes“, ….
Very small language, great for proofs, not relevant or applicable for practical purposes.

State of the Typelevel

Lars Hupel

Typelevel is inpired by Haskell. Libraries by them are scalaz, shapeless, spire, scalaz-stream, monocle and more.
We should strive to get correct programs and optimize where the needs arises.
The JVM integers are not good. Think of the silent overflow. Floats (float and double) are good for numerical mathematicians and scientists with knowledge in this area, who can deal with error propagation an numerical inaccuracy.
Off course numbers can be seen as functions, like this:
$x=f(.)$ mit $\bigwedge_{n \in \Bbb N: n>0} \frac{f(n)-1}{n} < x < \frac{f(n)+1}{n}$ Equality of real numbers cannot be decided in finite time. What is "Costate command coalgebra"? Monocle provides "lenses" and similar stuff known from Haskell... Good binary serialization formats are rare in the JVM world. How should the fear of scalaZ and monads be overcome? Remember: "A monad is a monoid in the category of endofunctors. So what is the problem?" as could be read on Bodil Stokke’s T-shirt.

Slick: Bringing Scala’s Powerful Features to Your Database Access

Rebecca Grenier

Slick is a library that generates and executes SQL queries. The conceptional weaknesses of JPA and Hibernate are avoided.
It has drivers for the five major DBs (PostgreSQL, mySQL/mariaDB, Oracle, MS-SQL-Server and DB2) and some minor DBs, but it is not free for the three commercial DBs.
Inner and outer joins are possible and can be written in a decent way.
With database dictionaries slick is now even able to generate code. Which I have done, btw. a lot using perl scripts running on the DDL-SQL-script. But this is better, off course…

Upcoming in Slick 2.2

Jan Christopher Vogt

Monads haunt us everywhere, even here. Time to learn what they are. I will be giving a talk in the Ruby-on-Rails user group in Zürich, which will force me to learn it.
Here they come: monadic sessions….
Sessions in conjunction with futures are the best guarantee for all kinds of nightmares, because the SQL is sometimes executed when the session is no longer valid or the transaction is already over. When dealing with transactional databases a lot of cool programming patterns become harder. Just think of the cool java guys who execute stuff by letting a EJB-method return an instance of an inner class with the DB-session implicitely included there and calling a method which via JPA indirectly and implicitely does DB-access long after the EJB-method is over. Have fun when debugging this stuff. But we know about it and address it here.
At least Slick is theoretically correct, other than JPA which I conject to be theoretically incorrect, apart from the shortcomings of the concrete implementations.
Several statements can be combined with andThen or a for-comprehension. Be careful, by default this uses separate sessions and transactions, with very funny effects. But threads and sessions are expensive and must not be withheld during non-running non-SQL-activities by default. Reactiveness is so important. Futures and thread pools look promising, but this fails miserably when we have blocking operations involved, that will occupy our threads for nothing.
We would like to have assynchronous SQL-access, which can be done on DB- and OS-level, but JDBC cannot. So we have to work around on top of JDBC. Apart from using a reasonably low number of additional threads this approach seems to be viable.
Statically type checked SQL becomes possible in the future.

No more Regular Expressions

Phil Wills

I love the regex of Perl. Really. So why do effort to give up something so cool, even in non-perl-languages?
It is not as bad as it sounds. We retain regular expressions as a concept, just do not call them like that (for marketing reasons I assume) and write them differently. Writing them as strings between // is very natural in Perl, but it breaks the flow of the language in Scala. A typical programmatical scala-like approach is more natural and more type safe. And more robust in case of errors. org.paraboiled2 can be used. Capture is unfortunately positional, unlike in newer Perl-regex, where captures can be named. But it hurts less here.

Scala eXchange – Q&A Panel

Jon Pretty, Kingsley Davies, Lars Hupel, Martin Odersky, and Miles Sabin

Interesting discussions…

Why Scala is Taking Over the Big Data World

Dean Wampler

‚“Hadoop“ is the „EJB“ of our time.‘
MapReduce is conceptionally already functional programming. So why use Java and not Scala?
Some keywords: „Scalding“, „Storm“, „Summing bird“, „Spark“.
Scala can be more performant than python, which is so popular in this area, but migration has to be done carefully.

Case classes a la carte with shapeless, now!

Miles Sabin

Example: tree structure with backlinks. Hard to do in strict immutabilty. Shapeless helps.

Reactive Programming with Algebra

André Van Delft

Algebra can be fun and useful for programming. Algebraic interpretations were introduced.
Page is subscript-lang.org.
Algebra of communicationg processes. It is very powerful and can even be applied to other targets, for example operation of railroad systems.
Every program that deals with inpout is in its way a parser. So ideas from yacc and bison apply to them.

High Performance Linear Algebra in Scala

Sam Halliday

Lineare Algebra has been addressed extremely well already, so the wheel should not be reinvented.
TL;D, Netflix and Breeze.
Example for usage of that stuff: Kalman Filter.
netlib has reference implementation in Fortran, a well defined interface and a reliable set of automatic tests. How do we take this into the scala world?
Fortran with C-Wrapper for JNI. (cblas)
compile Fortran to Java. really.
Alternate implementations of the same test suite in C.
High-Performance is not only about speed and memory alone, but about those under the given requirements concerning correctness, precision and stability.
Hardware is very interesting. The CPU-manufacturers are talking with the netlib team.
Can GPU be used? Off course, but some difficulties are involved, for example transfer of data.
FPGA? maybe soon? Or something like GPU, without graphics and operating on the normal RAM?
We will see such stuff working in the future.

An invitation to functional programming

Rúnar Bjarnason

Referential transparency, something like compatibility with the memoize pattern.
Pure functions…
Parallelization..
Comprehensiveness.. The all time promise, will it be kept this time?
Functional programming is not good for the trenches of real life project, but for getting out of the trenches. This should be our dream. Make love not war, get rid of trenches…

Building a Secure Distributed Social Web using Scala & Scala-JS

Henry Story

Spargl is like SQL for the „semantic web“.
Developed with support from Oracle.
We can have relativiity of of truth while retaining the absolute truth. The speech was quite philosophical, in a good way.
Graphs can be isomorphic, but have a different level of trust, depending on where the copy lies.
Linked data protocol becomes useful.
How is spam and abuse avoided? WebID?
We are not dealing with „Big Data“ but with massively parallel and distributed „small data“.

TableDiff – a library for showing the difference between the data in 2 tables

Sue Carter

What is the right semantics for a diff?
What do we want to see? How are numbers and strings compared when occurring in fields?
Leave timestamps that obviously differ but do not carry much information out.

Evolving Identifiers and Total Maps

Patrick Premont

Idea is to have a map where get always returns something non-null. Smart type concepts avoid the compilation of a get call that would not return something.
Very interesting idea, but I find it more useful as theoretical concept rather than for practical purposes. The overhead seems to be big.

Summary

Overall it was a great conference.

Share Button

Devoxx 2014 in Belgium

In 2014 I have visited the Devoxx conference in Antwerp in Belgium.

Here are some notes about it:

What is Devoxx?

  • Devoxx ist a conference organized by the Belgian Java User Group.
  • Belgium is trilingual (French, Flemish and German), but the conference is 100% in English.
  • The location is a huge cinema complex, which guarantees for great sound, comfortable seats and excellent projectors. It is cool.
  • 8 tracks, overflow for keynotes
  • Well organized (at least this year), more fun than other conferences…
  • sister conferences:
    • Devoxx FR
    • Devoxx PL
    • Devoxx UK
    • Voxxed (Berlin, Ticino,….)

Topics & main Sponsors

  • Java / Oracle
  • Android / Oracle
  • Startups, Business, IT in enterprises / ING-Bank
  • Java-Server, JBoss, Deployment / Redhat
  • JVM-languages
  • Web
  • SW-Architecture
  • Security
  • Whatever roughly fits into these lines and is considered worth being talked about by the speaker and the organizers…

These are some of the talks that I have attended:

Scala and Java8

  • Many conceptional features of Scala have become available in Java 8 with lambdas.
  • Problem: different implementation and interoperability between Java and Scala.
  • Development of Scala will make lambdas of Scala and Java interoperabel.

Monads

  • Concept from category theory. (5% of mathematicians do algebra, 5% of algebraians do category theory, but this very abstract and very theoretical piece of math suddenly becomes interesting for functional programming. Off course our functional programming world lacks the degree of infiniteness that justifies the theory at all, but concepts can be applied anyway)
  • Monoid (+, *, concat,…)
  • Functor
  • Monad
    Wikipedia
  • Wikipedia de

  • (T, \eta, \mu)
  • example: List with a functor F: (A\rightarrow B)\rightarrow (List[A]\rightarrow List[B])
    \mu is flatten: List[List[A]]\rightarrow List[A]; \eta: A\rightarrow List[A]

Probability & Decisions

  • Example: Software for automatic steering of house infrastructure
  • Heuristics and probability theory
  • False positives / false negatives: what hurts? (usually both)
  • Very good explanation of probability theory and its use

Clojure

  • Clojure is another JVM-language
  • It is a Lisp-Dialekt, recognizable by its source having an abundance of opening and closing parentheses: (+ (* 3 4) (* 5 6))…
  • strong support for functional programming.
  • Dynamically typed (for us: Just think of everything being declared as „Object“ and implicit casts being performed prior to method calls.
  • After Java itself, Scala, Groovy and Javascript it appears to me to be the fifth most common JVM-language

MapReduce

  • „No one at Google uses MapReduce anymore“
  • Google has replaced it with more general and more performance sensitive concepts and implementations.
  • Optimized: save steps, combine them etc.
  • Can be used as cloud service (Cloud Dataflow)

Key Note ING

  • ING considers itself to be an „open bank“
  • Not the money is lieing around openly for burglers to play with it, but they claim to be open for new ideas.
  • Mobile app is the typical interface to the bank.
  • IT has a lot of influence („IT driven business“)
  • Feasability from the IT side is considered important
  • Agile Prozesses (Scrum) vs. Enterprise IT
  • IT has slowly moved to these agile processes.
  • „Enterprise IT is what does not work“

Material Design

  • GUI-Design with Android and Web Material Design
  • Visual ideas available for both platforms
  • Polymer design for Web

SW-Architecture with Spring

  • Spring 4.1
  • „works with WebSphere“
  • DRY
  • Lambda from Java8 can simplify many APIs out of the box by just replacing one-method anonymous and inner classes.
  • Generic Messaging Interface (wasn’t JMS that already???)
  • Caching, be careful when testing, but can be disabled.
  • Test on start.spring.io
  • Spring works well with Java. Also with Groovy, which comes from the same shop as spring. Combination with Scala „experimental“

Lambda_behave

  • High-Level testing-Framework
  • Uses Java8-Features (Lambda etc.)
  • Description in natural language.
  • Failure-messages are human readable
  • Like Cucumber…
  • Source of randomness can be configured. This is very important for monte-carlo-testing, simulations and the like.

Builtin Types of Scala and Java

  • In Java we find „primitive types“ (long, int, byte, char, short, double,…)
  • Probleme with arithmetic with int, long & Co: Overflow happens unnoticed
  • With float and double Rounding errors
  • With BigInteger, BigDecimal, Complex, Rational error prone, clumpsy and unreadable syntax.
  • In Scala we can write a=b*c+d*e even for newly defined numerical types.
  • Remark: Oracle-Java-guys seem to consider the idea of operator overloading for numerical types more interesting than before, as long as it is not used for multiplying exceptions with collections and the like.
  • Spire library

Future of Java (9, 10, …)

Part I

  • Q&A-meeting (questions asked via twitter)
  • Numerical types are in issue. That primitive types behave as they do and are kind of the default won’t change.
  • Generics and type erasure (where is the problem)?
  • Jigsaw vs. Jars vs. OSGi still open how that will fit together, but jar is there to stay.
  • Jigsaw repository: Could well be located with maven central. Oracle does not work in such a way that this being hosted directly by Oracle is likely to happen, if third party software is there as well.

Part II

  • Benchmarking with Java is hard because of hot spot optimization
  • JMH is a good tool
  • New ideas are always hard to introduce because of the requirement of remaining compatible with old versions.
  • Java might get a „repl“ some day, like irb for Ruby…

Part III

  • Collection literals (promised once for Java 7!!!) did not make it into Java 8, unlikely for Java 9
  • With upcoming value types this might be more reasonable to find a clean way for doing that.
  • For Set and List somthing like
    new TreeSet(Arrays.asList(m1, m2, m3,…., mn))
    works already now
  • For maps something like a pair would be useful. Tuples should come and they should be based on value types. The rest remains as an exercise for the reader.

Part IV

  • Tail-Recursion can now be optimized in an upcoming version.
  • Because of the security-manager, that analyzed stacktraces this was impossible for a long time. (weird!!!)
  • C and Lisp have been doing this for decades now…
  • Statement: Generics are hard, but having understood them once they become easy. So keep trying….
  • Covarianz und Contravarianz (Bei Array falsch gelöst)

Part V

  • Arrays 2.0: indexing with long could become an issue. Some steps towards list, but with array syntax. (studies and papers only)
  • Lists have two extreme implementations: ArrayList and LinkedList. We would love to see more „sophisticated“ Lists, maybe some hybrid of both
  • Checked exceptions: critical issue, it was a problem with generics and lambda. And way too many exceptions are checked, just think of whatever close()-methods can throw, that should not be checked.

Semantic source code analysis

  • Useful for high level testing tools
  • Static and dynamic analysis
  • Dataflow analysis: unchecked data from outside, think of SQL-injection, but also CSS, HTML, JavaScript, JVM languages and byte code

Functional ideas in Java

Functional:

  • Functions or methods are „first class citizens“
  • Higher order functions (C could that already)
  • Closures
  • Immutability (function always returns the same result)
  • „lazy“-constructions can be possible though
  • For big structures we always have the question of immutability vs. performance
  • But functional is much more thread-friendly

50 new things in Java8

Part I

  • Lambda (see below)
  • Streams (see below)
  • Default implementations in interfaces
  • Date/Time (loke Joda time)
  • Optional (better than null)
  • Libraries can work with lambda
  • Parallel (use with care and only when needed and useful)

Part II

  • String.join()
  • Something like „find“ in Unix/Linux
  • Writing comparators is much easier
  • Maps of Maps, Maps of Collections easier
  • Sorting is better: quicksort instead of mergesort, can be parallelized

Groovy for Android

  • Problem with JVM languages other than Java: library has to be included in each app. 🙁
  • Solution: jar optimization tool helps
  • Second problem: dynamic languages have to be compiled on demand on the device
  • Solution: „static“ programming, dynamic features possible but not performing well

Lambdas

  • Lambdas are anonymous functions
  • Idea given: interface XY with one method uvw()
  • Instead of

    XY xy = new XY() {
    public long uvw(long x) { return x*x }
    };

    new
    XY xy = x -> x*x;
  • shorter, more readable, easier to maintain, interface becomes superfluous in many cases.
  • Closure means that final variables from the surrounding context can be included
  • Instance methods can be seen as closures also, the include the instance in a closure like way.

Streams

  • Streams are somewhere in the vicinity of Collection, Iterable, Iterator and the like, but something new.
  • They have methods that allow a function to be applied on all elements
  • Elegant for programming stuff like
    • Sum
    • Product
    • Maximum
    • Minimum
    • First / last / any element with certain property
    • all elements with a certain property
    • all transformed Elements…
  • It turns out to be a wise decision to make it different from Iterable, Iterator, Collection,… and rather provide wrapping capabilities where needed.

Links

Share Button