Almost every non trivial application uses in some way a database.

For many years this has been anyway Oracle, DB2 or MS-SQL-Server, depending mostly on the habits and on the religious orientation of the organization that developed or ran the application. These days all three are available for Linux and MS-Windows. DB2 is also available z/OS. The „home-platforms“ of these three are probably Linux, z/OS and MS-Windows, respectively (2018).

We saw Teradata as an alternative to DB2 and Oracle for data warehouses. They run on huge amounts of data, but are really invisible to most of us. Maybe the data warehouse is the old „big data“, before the invention of the term.

We saw a big Hype about NoSQL databases and some interesting DB products from this group that could successfully establish themselves.

We saw MySQL (and its fork MariaDB) mostly for database-installations that had somewhat lower requirements on the DB-product in terms of features or in terms of performance. Actually Wikipedia runs on MySQL or MariaDB and that is quite a big installation with heavy user load, but it is mostly about reading.

PostgreSQL was often positioned „somewhere between Oracle and MySQL“.

PostgreSQL 10 just came out. The most important new features where replication on a per table basis, better partitioning of large tables and better support for clustering.

I have worked with all of the database technologies listed here and even giving trainings for MongoDB, Oracle and PostgreSQL.

So where is PostgreSQL positioned really in this landscape?

It is a good database product for a large and growing class of applications. I find it slightly more pleasant than the other four SQL databases mentioned here to work with, because the SQL-implementation and its extensions are powerful, clean and behave more or less as expected. Some minor positive points are the default usage of ISO-date-format, the distinction between Null and empty string and on the other hand that most stuff that works in Oracle at SQL level can easily be transferred to PostgreSQL. The psql-shell works like typical linux shells in terms of command line editing and history. So a lot of minor details are just pleasant or as they should be.

Comparing to the three groups of contenders:


NoSQL databases kind of leave the mainstream of transactional relational SQL-databases and provide us either some interesting special features or promise us performance gains or support of huge data base sizes. The price for this is that we loose a extremely mature, clever and powerful query language, which SQL is. I would go for NoSQL products, if the additional feature of this NoSQL-DB-product cannot be reasonably be duplicated in PostgreSQL or other SQL-DBs and if it is really useful for the job. I would go for a NoSQL-DB-product, if the required data sizes and performance cannot reasonably be achieve using an SQL-product like PostgreSQL, good tuning of hardware, OS, database and application logic, but can actually be achieved with the NoSQL-product. These applications exist and it is important to pick the right NoSQL-DB for the project. It should be observed that PostgreSQL has a lot of features beyond of what normal SQL-databases have and looking into this area might be useful… A typical strength of some NoSQL-databases (like CassandraDB and MongoDB) is that a powerful replication is kind of trivial to set up, while it is a really big story for typical transactional SQL databases… This is due to the transactional feature which adds complexity and difficulty and a performance penalty to some kinds of replications…


I do not count that MySQL belongs to Oracle, because MariaDB is an independent fork outside of Oracle and can be used instead of MySQL.
I do think that MySQL does not have quite the level of PostgreSQL in terms of features and cleanness. So we can get PostgreSQL for the same price as MySQL or MariaDB. So why not go for the better product? Even if MariaDB perfectly fits today, the application will grow and it will at some point prove useful to be based on PostgreSQL. I came across the issue of nested transactions some years ago. They were easily supported by PostgreSQL, but not at all by MariaDB. Issues like that can come up more likely this way than the other way around.

Oracle, DB2, MS-SQL-Server

Especially Oracle makes many long term loyal customers run away due to there pricing and licensing practices. While it is extremely hard to change the database of a non trivial database based application, at least new applications in many organizations are discouraged from using Oracle, unless they can make a point why they really need it. MS SQL-Server might absorb some of these, especially since they are now available on Linux servers. But what Oracle does now might very well be the policy of Microsoft or IBM in a few years, so it makes perfect sense to have a serious look at PostgreSQL. A reasonably well tuned PostgreSQL will work pretty much as good as a reasonably well tuned Oracle, DB2 or MS-SQL-Server. Features that are missing now are being added with new releases. Some interesting features make it just a bit more pleasant to use than for example Oracle. It just feals more modern and more Linux-like.

Btw. there were some more contenders in the space of commercial transactional SQL-databases like Adabas D, SyBase and Informix. While the database products Adabas D and SyBase have been bought by SAP 1997 and the whole Sybase company in 2010 in two more or less unsuccessful attempts to have their own database and not having to use their competitors product as database, but they seem to have some success in using HANA now. Informix has been bought by IBM and is still offered as alternative to DB2. I would say that they have lost their relevance.


So I do recommend to seriously consider PostgreSQL as a DB product. It is currently my favorite in this space, but there is no univeral tool that fits for everything.

Some random aspects to keep in mind when moving from Oracle to PostgreSQL are mentioned here…

Types CLOB and BLOB do not exist. They can mostly be replaced by types TEXT and BYTEA, but it is not exactly the same. The type TEXT, which is a somewhat unlimited variable length string can easily be used for columns where we would try to use VARCHAR2 in Oracle, which gives us the advantage that we do not have to worry about defining a maximum length or exceeding the 4k limit that Oracle imposes on VARCHAR2.

Empty Strings are not the same as NULL in PostgreSQL, they are in Oracle.

PostgreSQL has a boolean type. Please use it and get rid of the workaround using CHAR, VARCHAR2 or NUMBER as replacement.

Oracle only had one kind of transaction isolation that was really well supported and I think this is still the way to go. It is an excellent choice and is very close to „repeatable read“, while PostgreSQL uses by default „read committed“, but it can be brought to use „repeatable read“. Please keep this in mind to avoid very unpleasant surprises and use the transaction isolation level appropriately.

The structuring of PostgreSQL consists of DB-instances, usually only one on a virtual or physical server, which somewhat resembles what is a database in Oracle. Within a DB-instance, it is possible to define a number of database without much pain. This was totally not the case with Oracle in earlier years and it was best practice to rely no schemas, but now we can easily afford to put more virtual servers each running Oracle (or PostgreSQL), if the licensing does not prohibit it in the case of Oracle. And since Oracle 12 there is the concept of the virtual database which splits a Oracle database into sub databases, somewhat behaving like separate database without the overhead of DB instances. It seems to be quite equivalent to what PostgreSQL does, apart from the naming and many details about how to set it up and how to use it. Schema and User are more separate concepts in PostgreSQL, a Schema can be defined totally independently of Users, but there is a way to define Schema names that match the user names to support this way of working. So we can do pretty well what we want, but the details how to work it out are quite different.

Each database has its programming language to write triggers, stored procedures and the like. They seem to be somewhat similar between different DB-products (we are talking about MS-SQL-Server, Oracle, PostgreSQL and DB2), but different enough that we need to rewrite triggers and stored procedures from scratch. This is not as painful as it used to be, since the approach of accessing DB tables for read access only via views and for write access only via stored procedures seems to have lost some popularity. Having written a lot of the business logic PL/SQL the pain of migrating to another DB product is really enormous, while a business logic in Java, Scala, C, C++, Perl, Ruby, C# or Clojure can be ported more easily to different OS and different DB. But it is no way for free.

One remark for development: Some teams like to use in memory databases for development and then trust that deployment on PostgreSQL or Oracle or whatever will more or less work. I strongly recommend not to follow this route. It is totally not trivial to support one more DB product or usually a second DB product and it is quite easy to setup a virtual OS with the DB product that is being used and with test data. PostgreSQL, Oracle, MS-SQL-Server, MongoDB and whatever you like can be configured to use more Memory and perform pretty much like these in memory DBs, if we set them up for development and are willing to risk data loss. This is no problem, because the image can be trivially copied from the master image when needed. Yes, a really good network and SSDs of sufficient size, speed and quality are needed for working efficiently like this and it is possible and worthwhile to have that.

I can give training about PostgreSQL and MongoDB and about SQL in different dialects. Find contact information here.

And please: comments, corrections and additional information are always welcome…


Share Button

The magic trailing space

When comparing string, of course spaces count as well and they should count. To ignore them, we can normalize strings. Typical white space normalization includes the following (Perl regular expressions):

  • /[ \t]+/ /g replace any sequence of tabs and spaces used to separate content by one space.
  • /\r\n/\n/g replace carriage return + linefeed by linefeed only.
  • /\s+$// remove trailing whitespace.
  • /^\s+// remove leading whitespace.

More or less it is often useful to do something like this when comparing strings that originally come from outside sources and are not normalized, but only „the content“ counts. There can be more sophisticated rules, to deal with no-break-space, with control characters, with trailing spaces at the end of each line or only at the end of the whole thing or replacing multiple empty lines by just one empty line. Just the general idea is to think about the right normalization.

In some cases, like long numbers, spaces or other symbols are used to group digits. These should also be removed. Sometimes more specific rules apply, like for phone numbers, web sites, email addresses etc. that need to be done specifically for this type, hopefully using an adequate library.

More often than not we see that web sites do not do this properly. Quite often an information has to be entered and it is not normalized prior to further processing. So credit card numbers or IBAN numbers are rejected because of spaces or anything because of trailing spaces, of course with an error message that does not give us a hint about what was the problem.

For serious application there needs to be a serious processing step for data coming from outside anyway, for security reasons. Even though SQL injection should not work due to sound SQL-placeholder usage, it is a good practice to check the data anyway and reject it early and with a meaningful message. Should I trust the security of a site that cannot deal with spaces in a credit card number for giving them my card number? I am not sure.

It is about time that UI developers get into the habit of doing the proper processing, normalization and checks for user input. Beware that any security relevant checks need to be done on the server or on the server as well.

Share Button