Cassandra DB

The large zoo of NoSQL databases needs to be considered thoroughly.

While the major transactional SQL databases (Oracle, PostgreSQL, MS-SQL-Server, DB2 and to a limited extent MariaDB/MySQL) can be used more or less interchangeably, if we are talking about the beginning of the project, the NoSQL databases have totally different features and must be chosen wisely and adequately. While the typical pattern of many NoSQL databases is trading speed for complex transactions, this is by no means compulsory. There are SQL databases like Teradata, that have reduced support for transactions, but feature a full set of SQL for queries. This can be used for data warehousing, where the data changing operations take place in a more controlled manor, because the data is already known and present elsewhere, but queries are the important thing. On the other hand there are NoSQL-databases like Neo4j that support full ACID-transactions, but organize the data in a totally different way that might be more adequate than a relational database for many purposes. This shows the other motivation to go for NoSQL-databases. While the relational model is somewhat complete and there are always ways to express whatever is desired with it, it may not be a natural or efficient or convenient way to model certain data.

Some databases, like MongoDB allow additional indexing and thus efficient queries without full table scans and additional uniqueness constraints, which is a must have feature for a good transactional SQL database, but not so much for NoSQL databases in general.

Some NoSQL-databases are typically run in memory. While there are some SQL databases that run in memory as well, like H2, this kind of defeats the idea of ACID, so it is not really a full transactional SQL database anyway, at least not in this mode.

SQL databases are well known to many people and can do many jobs quite well. If the data model can be expressed well in the SQL database and the performance is sufficient, this may be a good way to go. The powerful query and data manipulation language is usually extremely helpful and usually you get only a shadow of what a powerful SQL dialect like PostgreSQL or Oracle can do when you move to a NoSQL database.

Now NoSQL databases are somewhat scary. The data is „not safe“ in them, because there are no transactions. But what about the transactional relational SQL database? If we use it through some implicit or explicit caching layers, we get rid of this feature without being aware of it. JPA or hibernate tend to do implicit caching and can even use file system persistency outside of the database as a „second level cache“. I think this is just so broken, but it is used and good applications are built with it. But the feature of the good old transactional SQL database that serves as argument to use it has silently been thrown into the garbage can in this project.

So we should think if and where we need serious transactions and then really write our software in a way that does not defeat them by caching, preferably by avoiding JPA and Hibernate. Or we should pick our database solution for the purpose based on the modelling features and performance features.

Cassandra comes in as one of the NoSQL products with very good write performance and scalability. The query language kind of looks like SQL, but it is a tiny subset. If we can live with that subset and it is relevant to get more performance than we would get from a PostgreSQL or Oracle, this might be a good way to go. An important observation is that tables in Cassandra do have primary keys. But additional unique keys cannot be defined and instead lookup tables. This works like this:

CREATE TABLE IF NOT EXISTS „A“ (
„X“ text,
„Y“ text,
„Z“ text,
„T“ text,
PRIMARY KEY(„X“)
);

CREATE TABLE IF NOT EXISTS „A_LOOKUP_BY_Y“ (
„Y“ text,
„X“ text,
PRIMARY KEY(„Y“)
);

Lookups by X can be done directly in A. Lookups by Y have to be done in two steps. First the lookup table is used to determine the value for X and the table A is used for the actual lookup of the data.

There are ways to do interesting and useful things that seem to be missing and it is usually worth investigating how they can be done. When knowing how the different tools that are available can be used efficiently and properly, it may be a good moment to actually decide what tool to use. But a serious comparison should compare well done solutions of both technologies and not just a junior’s first day program in one technology against a professional solution done by a senior.

It is always important to have a look at the different NoSQL databases to decide which ones are going to be used in the team or in the organization. Cassandra can be a useful database for this approach. Often it is good to also have a SQL database in this set of supported databases. It is easier to model for example accounts, booking and payments with them, while a other data may be better stored in adequate NoSQL databases.

By the way, while major transactional SQL databases may be somewhat interchangable in the beginning of a project, it is quite a pain to change them later or to support multiple databases. Also there are of course licensing issues, teams are needed to operate the database, databases may run better on certain operating systems and there may be some additional features that might make them more or less desirable. But the basic functionality differs less than in the case of NoSQL databases.

Share Button

Perl 5 and Perl 6

We have now two Perls. Perl 5, which has been around for more than 20 years just as the „Perl programming language“ and Perl 6, which has been developed for more than a decade and of which now stable versions exist.

The fact, that they are both called „Perl“ is a bit misleading. They are two different and incompatible programming languages. But they share the same community. And Perl conferences are usually covering both languages.

So this rises the question about the differences or about which of the two Perls to use.

Here are some differences:

  • Perl 5 is well established and many people know it. Perl 6 has to be learned, even if it is relatively easy to learn for someone with a Perl 5 background.
  • Perl 5 runs about three times faster than Perl 6
  • Perl 6 programs are a bit shorter than Perl 5 programs
  • Perl 6 regular expressions are even better than Perl 5’s regular expressions
  • Perl 6 is more logical than Perl 5
  • Perl 6 uses by default better numerical types
  • Perl 6 makes it easier and more natural to do object oriented programming and functional programming
  • Perl 6 has come up with a useful approach for doing multithreadoing.
  • Perl 5 has so many cool libraries on CPAN, Perl 6 just a few.

Links:

Share Button

Conference Talks

Referring to the Swiss Perl Workshop, it is now time to collect all the conference talks that I have given so far and that have been uploaded as video.

Share Button

Swiss Perl Workshop 2017

I have attended the Swiss Perl Workshop.
We were a group of about 40 people, one track and some very interesting talks, including by Damian Conway.
I gave a regular talk and a lightning talk myself.
The content of my talk might go into another Blog post in the future.
The Perl programming language is still interesting, and of course it was covered in both variants: Perl 5 and Perl 6.
But many of the talks were about general issues like security and architecture and just exemplified by Perl.

The Video recording of talks was optional. Here are those that have been recorded and already uploaded: Youtube: Swiss Perl Workshop

Share Button