Devoxx UA 2020 (talks)

I watched the conference onlie and picked the following talks:

And on the second day:

Share Button

How procurement can create value for IT projects

We know this, in many IT projects we need to make use of services and software and hardware that needs to be bought.

Actually it often makes a huge difference, what kind of deals are made and how efficient the projects can work on this basis.

I will just briefly tell a few stories and tell a bigger story by that.

A common pattern is a „preferred supplier“. It is nice to be a preferred supplier and in the phase when the partner is chosen and the contracts are made, companies often offer their best people and services to show how good it will be later to have them as preferred supplier. And then, when the deal has been fixed, they send the juniors for the same hourly rate and make a lot of profit. Or the price has been made so low, that only the juniors can be sent. This can be a problem in the long run, because it might get really difficult to get enough really good people in order to progress with strategic long term projects and not just maintenance of the daily business. Another interesting pattern can occur, when the preferred supplier is very strong with their employees in a project. Now they are getting some money from the hourly rates and in order to make profit the salaries should be significantly lower than this. In order for this to make sense for the employees they can provide non-monetary incentives, like some kind of career steps. Being in a powerful position in the project, the preferred supplier has some possibilities to choose who is in the project and who gets more responsible positions. So there is a temptation to kick out people who are not from their company and to provide these attractive positions not to the person, who would be the best choice for the customer, but to those whom they want to give an incentive. This is on the expense of their customer. So in the end of the day it is usually good to rely on multiple providers for „external“ people. There are serious companies who behave professionally and correctly even when they have become a „preferred supplier“, but this is not always the case.

When the preferred supplier is providing software, for example a database, it may be possible to get a really good deal for five years. Then in five years the deal needs to be extended and becomes magically more expensive. Especially if the company knows itself in the position of a „preferred supplier“. And when this issue is discovered, maybe even a year before, it is already too late to migrate. And then the expensive software needs to be used for another four years until it is again too late… And being from a big, impressive company does not necessarily make the software good. Counterexamples exist. In the case of databases I have seen companies that follow a strategy of multiple databases and that require a good reason for using the more expensive solutions. And magically the position of the buyer becomes much stronger when the deal needs to be extended for another five years. Maybe the overly expensive database will even be kicked out at some time. And yes, this expensive database has some really cool and pretty unique features. Unfortunately they only come in some enterprise edition that would be even much more expensive than the regular one, while open source databases provide decent, but less sophisticated variants of these enterprise features for a price that is less than the base version of the expensive database product. But, since the DB product cannot easily be changed, it is important to make a wise choice and to consider different options, including the more expensive ones, when starting a project. And to pick what makes sense for the specific needs.

Some interesting observations where made, when some preferred supplier made a really tempting offer for operating all the servers of a larger company. The annual price was really low. Much cheaper than doing it with their own team. Now suddenly the need arose, to store some really large amount of log files. I mean, I am talking about what could be stored on a few USB-discs, that could have been bought in the supermarket for a few hundred Euros. But of course this was forbidden, because the servers had to be run by the preferred supplier and even putting Linux on a few PCs that were no longer needed and attaching a few cheap disks would have been ruled out by the cheap overall contract. But this cheap solution would have been absolutely sufficient for the purpose. Now the diskspace could be bought from this supplier. Or more precisely rented. It was not a few hundred Euros, but a few hundred thousand Euros a year. Yeah, trey needed to make some money somehow… And a few hundred Euros once every few years, or maybe even a few thousand Euros every year would have been totally acceptable to pay by the project. But there are limits. You cannot do certain things under such conditions. The deal kills important possibilities of the IT people. I am not going to write, how this was resolved…

Another story, with a really cheap preferred supplier: They actually ran an important database for a stunningly low fixed base price. And on top of that it was paid per query. So what did they do? They designed the software in such a way that it used an Oracle database as a cache for the pay-per-query DB2 database. So the same query had to be made only once to DB2 as long as the data did not change. And when the data changed, the Oracle database just had to be cleaned up. Since this happened only a few times a year, this technically stupid architecture saved really a lot of money every year. Big money.

Yet another example: The management had already bought clearcase licenses. They were really expensive and the money was already gone. Now the setup that was used for clearcase and that was allowed by the licensing was not really optimized for part of the team working remotely. To do that efficiently would have required a much more expensive license that no one wanted to pay. So every day synchronizing the software took like 30 to 45 minutes. And one team member had to work full time to maintain clearcase. There were some other pains, like it crashed when files contained only linefeeds instead of carriage return-linefeed and some other annoying details that I do not really remember. Just for the record, some of these issues have been fixed in later releases… And clearcase had a lot of really interesting features that were not at all used. The seriously useful features can all be found in git now, in a contemporary way, of course. But not in those days, when there was still neither git nor subversion. So some tests were performed and it looked like the free software CVS (which really sucks big time when compared with contemporary systems like git) would have worked much much better for the concrete project. But clearcase had to be used because it was so expensive and the money had already been paid.

So in the end of the day, when the procurement does make good deals, this can create a lot of value for the project and allow for efficient and innovative work and for solutions that make sense technically instead of finding tricks how ot bypass the worst parts of the contract.

So a good procurement team and a good communication with the technical staff that knows what is needed for their work is a big plus for everybody, for the project and for the company.

Share Button

Just run it twice

Often we use some kind of „clustered“ environment to run our software.

This promises higher performance and better availability.

And the frameworks seem to suggest that it is just a matter of starting it twice and it will magically work correctly.

There is nothing wrong with investing some thoughts on this issue. It can actually quite wrong otherwise…

So some questions to think about:

Where are the data? Does each service have its own set of data? Or do they share the data? Or is there some kind of synchronization mechanism between the copies of data? Or is some data shared and some data as copy for each instance of the service?

Do we gain anything in terms of performance or is the additional power of the second instance eaten up by the overhead of synchronizing data? Or if data is only stored once, does this become the bottleneck?

Then there is an issue with sending the requests to the right service. Usually it is a good idea to use something like „sticky sessions“ to keep a whole session or collections of related requests on one instance. Even if the protocol is „stateless“ and „restful“.

Is there some magic caching that happens automatically, for example in persistence frameworks like Hibernate? What does this mean when running two instances? Do we really understand what is happening? Or do we trust that hibernate does it correctly anyway? I would not trust hibernate (or any other JPA implementation) on this issue.

What about transactions? If storage is not centralized, we might need to do distributed transactions. What does that mean?

Now messaging can become fun. Modern microservice architectures favor asynchronous communication over synchronous communication, where it can be applied. That means some kind of messaging or transmission of „events“ or whatever it is called. Now events can be subscribed. Do we want them to be processed exactly once, at least once or by every instance? How do we make sure it is happening as we need it? Especially the „exactly once“-case is tricky, but of course it can be done.

How do we handle tasks that run like once in a certain period of time, like cronjobs in Linux? Do we need to enforce that they run exactly once or is it ok to run them on each instance? If so, is it ok to run them at the same time?

Do we run the service multiple times on the productive system, but only a single instance on the test and development systems?

Running the service twice or more times is of course something we need to do quite often and it will become more common. But it is not necessarily easy. Some thinking needs to be done. Some questions need to be asked. And we need to find answwers to them. Consider this as a starting point for your own thinking processes for your specific application landscape. Get the knowledge, if it is not yet in the team by learning or by involving specialists who have experience…

Share Button

Pagination of Database Query Results

This article is highly inspired by the blog post We need tool support for keyset pagination.
Please consider reading the original first and then my interpretation and additional thoughts about this idea.

We have a typical database base backed web application. It can be a rich client. It can be a NoSQL database. Whatever, but for the moment maybe a transactional SQL database and a web application will be used as the most common and best understood example.

We kind of construct a query, of which the result set is so large, that we do not want to transfer, render and display it all at once. Think of Google, where you might have millions of hits and only see the first 10 or 100 or so. Now you can navigate through the result set, usually by going to the next pages successively.

Nice web applications tell you how many records there are exactly. And then calculate how to split this up into a large number of pages and allowing you to go to an arbitrary page, or at least to some of the pages (last, next 10 pages) immediately.

There are several problems with this. First of all: the set you are basing this on changes with time. This could be fixed by creating some temporary snapshot in memory, in the database or in some kind of db-cache and keeping that around for some time. This is expensive and usually not worth the price, so we live with some misbehavior.

The second problem is the count. Count actually needs to do a full scan of the result set, there is no decent short cut. Now it would usually be good enough to estimate the size of the result set and that is exactly what google does. Sometimes the estimation is ridiculously bad, but we have gotten used to it. That is much better than using the real count, because it really speeds up the application by a great factor and improves user experience overall. Of course it is a bit more challenging to program an estimation and only worth it for big result sets. We could still jump to any of the next 10 pages by just increasing the size of the current query by 11 and showing the required segment. Since we sort the result by something, we could just get the first N results sorted backwards and then wrap that in a query that sorts them forward. So dropping the count can help for performance without too much loss in functionality.

Now the next problem is, that having navigated through a lot of pages, the usual way of doing a query and then constraining it with some mechanism like „rownum“, „row_number“, „limit/offset“ to the subset we want to display always requires obtaining the resultset from the beginning to our page. Usually that does not matter, because we navigate through the application manually and that means not too many pages from the beginning. But it can become an issue, if the application is heavily used.

Now we sort by something. Usually not the id, but some business logic relevant keys or a combination. They do not have to be unique, but we can always add the ID (or more generally the primary key) as the last sort key to make it unique. Then the next page can be found by just adding a condition based on our sort key. Assuming we want to show N records. The query might be

SELECT * FROM TABX 
  WHERE cond1 AND cond2 
  ORDER BY A, B, C ASC 
  LIMIT n;

Now the last record of the result has A=a1, B=b1, C=c1.
Then the next page can be obtained by

SELECT * FROM TABX 
  WHERE cond1 AND cond2 
   AND (A>a1 OR A=a1 AND B>b1 OR A=a1 AND B=b1 AND C>c1) 
  ORDER BY A, B, C ASC 
  LIMIT n;

To skip three pages and get the fourth of the following pages is done like this:

SELECT * FROM TABX 
  WHERE cond1 AND cond2 
   AND (A>a1 OR A=a1 AND B>b1 OR A=a1 AND B=b1 AND C>c1) 
  ORDER BY A, B, C ASC
  LIMIT n OFFSET 3*n;

We do use (or simulate) OFFSET here, but in a more intelligent way, because we do not force it to work harder than necessary by rebasing our query on what we already know.

We get a bit more consistency, because new records being inserted in the beginning will not be seen, but they will at least not disturb the succession of our result set. Maybe that is worth more than the performance gain, because we get it with a reasonable effort. caches, snapshots or even keeping it in „session memory“ are not really serious approaches to this issue, they will just sooner or later blowup our application in production, at the worst possible moment.

Share Button

UUIDs revisited

UUIDs have proven useful in many circumstances.
We have basically two main variants:

  • The UUID is calculated as a combination of the Ethernet-MAC-address, the timestamp and a counter.
  • The UUID is calculated using a good random number generator

While variant 1 provides for a good uniqueness, there are some issues with it. Today we use mostly virtualized servers, which means that the MAC-address is coming from a configuration file and no longer guaranteed to be world wide unique. And we give away some information with the UUID, that we do not necessarily want to give away.

Variant 2 can be proven to have an acceptably low risk of collisions, but this is only true when using really good random number generators, which cannot always be guaranteed. Also it introduces an uncertainty in an area where we do not need it. We need to worry about this uniqueness, at least a little but, which is unnecessary.

So the question is, if we can rethink variant 1.

Assuming, that our software runs on our server farm. There may be a few hundred or thousand or even millions of virtual or physical servers. Now the organization does have a way to uniquely identify their servers. Of course we only need to consider the servers that are relevant for the application. Maybe an ID for the service instance instead of the server is even better. We may assume a numerical ID or something or have a table to map IP addresses and the like to such an ID. Some thinking is still required on how to do this. We can fill the digits that we do not need with random numbers.

Putting this ID instead of the MAC address solves the issue of configurable MAC address.

The next problem, that timestamps can be abused to find out something that should not be found out could be resolved by running the timestamp part or even the ID part (including a random number) through a symmetric encryption or simply some bijective function that is kept as a secret.

In many circumstances there is nothing wrong with customizing the UUID-generation to some „local“ standard, if this is well understood and carefully implemented.

Share Button

Cisactions

New research has analyzed the concept of transactions from a very theoretical point of view. An interesting result of this research was the concept of cisactions, which are in some way the opposite of transactions. A duality between cisactions and transactions has been proven. This means that in principal every application that is based on transactions can also be written with cisactions. But there are some challenges:

  1. currently there is no database that supports cisactions
  2. cisactions are even harder do understand than transactions
  3. the whole program has to be written once again in a totally different way.
  4. even the business logic has to be redefined in a totally different way, but eventually the same results can be achieved
  5. The paradigm change from transactions to cisactions is much harder than the change from object-oriented to functional programming
  6. mixing of cisactions and transactions in the same program is almost impossible

But we will see applications that do use cisactions properly in the future. And they will perform about 1000 times faster than programs using transactions.

The interesting question is:

When will the first cisactional databases become available? How will they work? Which programming languages will support cisactions? Or will we rather have to invent totally new programming languages for proper support of cisactions?

A lot of questions still have to be answered, and this is still theoretical research. But it is so promising for high performance usage that we absolutely must expect that this will become an important way to solve real high performance development tasks.

Share Button

Not all projects are on ideal paths II (Tim Finnerty)

This is another story of a project, that did not go as well as it could have gone while I was there. From unsuccessful projects we can learn a lot, so there will be stories like this once in a while. The first one was about Tom Rocket.

Tim Finnerty

It was the time, when all the cool companies tried to introduce Java. And some of the new Java projects failed, causing the companies to go back to C, which again scared other companies from doing this step. But some companies did not get scared by this. They embraced the new Java-fashion at a time when it still was not clear whether or not this was a good idea. What could possibly go wrong?

Well, in those days the experienced guys did not want to move to Java. It was slow, it was unreliable, not mature,… Maybe for Applets, maybe more generally for GUIs to get rid of VisualBasic, but Java on the server was considered a bad joke. For the server real people used real C, of course on Unix or maybe Linux, which was not really such a bad idea in those days. But there were the young people. Or the ones who had stayed young. They often just had finished their education and firmly believed that by just using technology „xyz“ everything would become great. xyz can be a development method (spiral model in those days, agile today), an architecture (microservices), a paragdigm („OO“, „functional“), a framework („enterprise edition“), a tool or a programming language (yeah, Java).. Often this first enthusiasm ends in a disaster: The money has been spent, the developers are leaving and the software is further away from being useful than anybody would like to admit. In lucky cases there is still some money left to do it right. Maybe even to do it right in Java.

That is were we are coming in. I do not really know the earlier history, but according to the management it was a total disaster. Now Tim Finnerty (real name known to the author) was the new technical lead, team architect or whatever this role is called. He embraced the new technology, but promised to not overstrech it. A bit of old school. Sounds good, because it is exactly what managers want to hear. No more risky experiements, but this time it needs to become a success.

So Tim Finnerty defined, how we had to work. He knew Java, he knew databases and especially Oracle, he knew the web, he even knew Perl. And he knew OO. Better than anybody else, so we did the real thing. Great head start. And everybody had to program according to Tim’s rules.

Of course we were using Java enterprise edition. That meant, that we were programming against some Weblogic application server, that was hard to install, hard to run, required a few minutes of startup time for each minor change of the software that we were writing and forced to a very archaic and primitive programming model. But that was cool and it was the future, which unfortunately proved to be true. Even though it has at least become usable by now. So far nothing to blame on Tim, because it is kind of the stack that everybody used.

Now to the OO and the database. Each database table represented a „Business Object“, with a name like XyzBO. So most of the time, we wrote a class XyzBO plus a few more to fulfill the greed and need for boilerplate code of the old J2EE-world. XyzBO was a enterprise java bean. A stateless session bean, to be accurate. Which meant, that we wrote methods of this EJB, which were basically procedures of the pre-OO-world. But within we could of course use the whole OO-toolset. Which we did. So the class to represent any data from the database was actually called Data. It was a minor subset of the standard Perl data structures, which meant that Data was a list of hash maps, which could behave just like a hash map if it had only one entry. Database queries returned Data, or of course null, if nothing was found. Nobody would ever want to use an empty collection. Pretty much the opposite of what we are now doing the hard way by introducing Optional or Option to avoid the null. But it was easy to just write
if (data == null) { data = new Data(); }
at least for the ones who new this trick.
So data resembled the content of the database or of the query result, with the column names as keys and the values as objects. When working with these, it was really easy. Just know the attribute name accurately. Get the value from data. See if it is null. If not, cast it to the real type, and voila….

The database was designed according to Tim’s advice, he had to review every table. It was mandatory to have as unique key and as first column a string of around 700 characters, which was called HANDLE. Each table had a business primary key, which was always consisting of several columns. Because the system allowed multiple instances of the software to run on the same database, there was always one column called „SITE“ in this logical primary key. But there were no primary keys defined in the database. The unique HANDLE was enough. It contained the name of the BO, like XyzBO, followed by a colon and followed by the concatenation of all the logical primary keys, separated with commas. The date had to be converted to a string using a local US format, not ISO, of course. All foreign keys were defined using HANDLE. In the end more than half of the DB space was wasted for this stupidity. But each single handle value started with an XyzBO:, to remind us that we were programming OO.

And now booleans. It was forbidden to use the boolean type of Java. All booleans were strings containing the words „true“ and „false“. This went like that all the way to the web interface.

At that time web frameworks did not yet exist or were at least unknown. So the way to go was to write JSPs, which contained kind of dynamic web pages and to write servlets to control the flow and access the EJBs. Now according to Tim it was too hard to learn servlets as well, so it was forbidden to use them and instead a connection-JSP had to be written, which did not display anything, but only contained a <% in the beginning and a %> in the end and the code between.

A lot of small and larger stupidities, that were forced on the team. Most people were new to Java and to OO and did not even realize that there was anything wrong, apart from the fact, that it was kind of hard to get stuff done. Some stupidities were due to the fact that the early J2EE really sucked, but mostly it was Tim, who forced everyone to his level. This story happened a long time ago. Tim has already retired. I would say he is one of the guys, who retire as a Junior.

There is (almost) nothing wrong with stupidity. And there is (almost) nothing with arrogance. But the combination really sucks. Especially if it it taken serious by the manager or has to be taken serious by the team.

Share Button

Devoxx Kiew 2018

In the end of 2018 the number of conferences is kind of high. A great highlight is the Devoxx BE in Antwerp. But it has now five partner conferences in London, Paris, Krakow, Morocco and Kiev. So I decided to have a look at the one in Kiev.

How was it in comparison to the one in Belgium? What was better in Kiev: The food was way better, the drinks in the first evening (Whisky and Long Drinks vs. Belgium Beer) might be considered better, there were more people engaged to help the organizers…
What was better in Belgium: There were still a bit more speeches. While the location in Kiev was really great, in Belgium the rooms were way better for the purpose of providing a projection visible for everybody and doing a video recording that did not disturb the audience.
The quality of the speeches was mostly great in both locations. In Kiev they gamified the event a bit more..

Generally there was a wide range of topics and the talks were sorted into the following thematic groups:

  • Methodology & Culture
  • JVM Languages
  • Server Side
  • Architecture & Security
  • Mobile & IoT
  • Machine Learning & AI
  • Big Data & Data Mining
  • Cloud, Containers & Infrastructure
  • Modern Web & UX

See the schedule for the distribution…

I attended on Friday:

I attended on Saturday:

A lot to learn.

Share Button

Some thoughts about String equality

Of course Strings are today in some way Unicode. In this article we assume code points as the building blocks of Strings. That means for example in the Java-world, that we are talking about one code point being comprised of one Java character for typical European languages, using Latin, Greek or Cyrillic alphabets including extensions to support all languages typically using these alphabets, for example. But when moving to Asian languages, a code point can also consist of two Java characters and there are Strings that are illegal from Unicode perspective, because they contain characters that should be combined in a way that cannot be combined properly. So here we assume, that Strings consist of sequences of bytes or two-byte characters or whatever encoding that properly express a sequence of code points. There are many interesting issues when dealing with some Asian languages that we will not cover here today.

Now there are a lot of possibilities to create Strings, that look the same, but are actually different. We are not talking about „0“ and „O“ or „1“ and „l“ and „I“ that might look similar in some fonts, but should not look similar, because we actually depend on their distinctness, even on their visual distinctness. Unfortunately we have the bad habit of using traditional typewriter fonts, that make it hard to distinguish these, for source code, where it would be so crucial. But for today, we just assume that we always look hard enough to solve this issue.

The classical example of what looks the same is whitespace. We have ordinary space “ “ and no break space “ „, that are meant to look exactly the same, but to expose a slightly different behavior. There are tons of possibilities to create exactly the same look with different combinations of whitespace. But this is kind of a special case, because in terms of semantics often carries little information and we want to disregard it to some extent when comparing strings. Typical examples are stripping of leading and trailing whitespace of the string or of the lines contained within it and replacing tabulators with the number of spaces that would be equivalent. Or even to replace any amount of adjacent whitespace within a line by a single space. Again, handling of different whitespace code points might require different rules, so it is good to be careful in not putting to much logic and it is better to rely on a library to at least apply exactly the same rules in equivalent situations.

Another example that we actually might know is that certain characters look the same or almost the same in the Cyrillic, Greek and Latin alphabets. I try to give an idea of the meaning of the Greek and Cyrillic characters, but they depend on the language, the dialect and even the word, the word form or the actual occurrence of the letter in the word…

LatinCyrillicGreekmeaning of Cyrillic Lettermeaning of Greek letter
AАAlike Latinlike Latin
BВBlike Latin VBeta (like V in new Greek)
CСlike Latin S
EЕElike LatinEpsilon (like Latin E)
ГHlike Latin GGamma (like Latin G)
HНΗlike Latin NEta (like Latin I in new Greek)
JЈSerbian Ј, like German J
KКΚlike LatinKappa (like Latin K)
MМΜlike LatinMu (like Latin M)
NΝNu (like Latin N)
OОΟlike LatinOmikron (like Latin O)
PРΡlike Latin RRho (like Latin R)
ПΠlike Latin PPi (like Latin P)
TТΤlike LatinTau (like Latin T)
ФΦlike Latin FPhi (like Latin F)
XХΧlike German CHChi (like German CH)
YУΥlike Latin UUpsilon (like Latin U)
ZΖZeta (like German Z)
IІΙUkrainian IIota (like Latin I)

In this case we usually want the characters to look the same or at least very similar, because that is how to correctly display them, but we do want them to be different when comparing strings.

While these examples are kind of obvious, there is another one that we tend to ignore, but that will eventually catch us. There are so called combining characters, that should actually be named „combining code points“, but here we go. That means that we can put them after a letter and they will combine to form a letter with diacritical marks. A typical example is the letter „U“ that can be combined with two dots “ ̈ ̈“ to form an „Ü“, which looks the same as the „Ü“ that is composed of one code point. It is meant to look the same, but it also has the same meaning, at least for most purposes. What we see is the Glyph. We see the difference when we prefix each code point with a minus or a space: „Ü“ -> „-U-̈“ or “ U ̈“, while the second one is transformed like this: „Ü“ -> „-Ü“ or “ Ü“, as we would expect.

While the way to express the Glyph in such a way with two code points is not very well known and thus not very common, we actually see it already today when we look at Wikipedia articles. In some languages, where the pronunciations is ambiguous, it can be made clear by putting an accent mark on one vowel, as for example Кириллица, which puts an accent mark on the term in the beginning of the article like this: „Кири́ллица“. Since in Cyrillic Alphabet accent marks are unfortunately not used in normal writing, it comes in handy that the combining accent also works with cyrillic letter. When putting minus-signs between the code points it looks like this: „К-и-р-и-́-л-л-и-ц-а“ or with spaces like this: „К и р и ́ л л и ц а“. So Strings that we encounter in our programs will contain these combining characters in the future. While we can prohibit them, it is better to embrace this and it is actually not too hard, if we use decent libraries. Java has the Normalizer class in its built in library, that can convert to one or the other convention of expressing such glyphs and then allowing comparison in the way that we actually mean.

Unfortunately issues like semantic lengths of strings or semantic positions become even harder than they already are after moving from characters to code points. And we can be sure that Unicode has still more to offer to complicate things, if we dig deeper. The typical answer that we get on most web sites that talk about these issues is something like: „The length of strings and positions within strings are surprisingly irrelevant to most programs.“

In the end of the day, jobs that have been trivial in the past are now becoming a big deal and we need to learn to think of comparison, length, position, regular expressions, sorting and all kinds of string functionality with bytes, characters, code points and glyphs in mind.

What can our current libraries already do for us, what are we missing in them, considering different programming languages, databases, text files and network transmission?

Links

Share Button

SQL: DROP, CREATE,… How to avoid errors?

In SQL we often want to create or drop an object (TABLE, VIEW, SEQUENCE, INDEX, SYNONYM, DATABASE, USER, SCHEMA,….)

Now we do not know if this thing exists or not. We would like to write scripts, that work in either case.

So something like

CREATE TABLE IF NOT EXISTS XYZ (...);

or

DROP TABLE IF EXISTS XYZ;
CREATE TABLE XYZ (...);

would be desirable.

This seems to be quite hard, depending on the database.

Some databases support the „IF EXISTS“ or even the slightly less useful „IF NOT EXISTS“ pattern for some of the CREATE and DROP statements:

COMMANDPostgreSQLMS-SQL-ServerOracleDB2MySQL/MariaDB
DROP DATABASE IF EXISTSxx--x
DROP FUNCTION IF EXISTS xx-xx
DROP INDEX IF EXISTSxx-xx
DROP MATERIALIZED VIEW IF EXISTSxMaterialized view not found in documentation-Materialized view not found in documentationno materialized views supported
DROP ROLE IF EXISTSxx-xx
DROP SCHEMA IF EXISTSxxno DROP SCHEMA, Oracle uses User to express the concept of a Schema? (no DROP SCHEMA found in documentation)Schema not supported.
DROP SEQUENCE IF EXISTSxx-xx
DROP TABLE IF EXISTSxx-xx
DROP TABLESPACE IF EXISTSxNot found in documentation-? (no DROP TABLESPACE found in documentation)TABLESPACE not supported
DROP TRIGGER IF EXISTSxx-xx
DROP USER IF EXISTSxx--x
DROP VIEW IF EXISTSxx-xx
CREATE OR REPLACE DATABASE----x
CREATE DATABASE IF NOT EXISTS---xx
CREATE FUNCTION IF NOT EXISTS---x-
CREATE OR REPLACE FUNCTIONxx (CREATE OR ALTER FUNCTION)x-x
CREATE OR REPLACE INDEX----x
CREATE INDEX IF NOT EXISTSx--xx
CREATE MATERIALIZED VIEW IF NOT EXISTSxMaterialized view not found in documentation-Materialized view not found in documentationno materialized views supported
CREATE OR REPLACE ROLE----x
CREATE ROLE IF NOT EXISTS---xx
CREATE SCHEMA IF NOT EXISTSx-Oracle ties the schema closely to a user. So slightly different meaning of CREATE SCHEMA...Oracle ties the schema closely to a user. So slightly different meaning of CREATE SCHEMA...Schema not supported.
CREATE OR REPLACE SEQUENCE----x
CREATE SEQUENCE IF NOT EXISTSx--xx
CREATE OR REPLACE TABLE----x
CREATE TABLE IF NOT EXISTSx--xx
CREATE TABLESPACE IF NOT EXISTS-Not found in documentation-TABLESPACE apparently not supportedTABLESPACE not supported
CREATE OR REPLACE TRIGGER-x (CREATE OR ALTER TRIGGER)x-x
CREATE TRIGGER IF NOT EXISTS---xx
CREATE OR REPLACE USER----x
CREATE USER IF NOT EXISTS----x
CREATE OR REPLACE VIEWxx (CREATE OR ALTER VIEW)x-x
CREATE VIEW IF NOT EXISTS---xx

Usually we can happily ignore the missing „IF (NOT) EXISTS“ conditions and just DROP the object and then create it. By ignoring the error message this works.

But increasingly we want to manage the database with tools that run scripts automatically and cannot distinguish as well as a DBA between „good error messages“ and „bad error messages“.

So we should find our ways around this.

The trick is to use the procedural extension language of SQL (like PL/SQL for Oracle, because in case of Oracle we need it most often). Then we write a block and find by a select in the data dictionary if the object we are creating or dropping exists. Then we put the select result into a variable and put an IF-condition around our CREATE or DROP statement. This approach can also be useful for dropping all objects of a certain type that fullfill some SQL-query-accessible condition.

DECLARE
   CNT INT;
BEGIN
   SELECT COUNT(*) INTO CNT 
     FROM USER_TABLES 
     WHERE TABLE_NAME = 'MY_TABLE';
   IF CNT >= 1 THEN
      EXECUTE IMMEDIATE 'DROP TABLE MY_TABLE';
   END IF;
END;
/

Or to catch the kind of exception that we want to ignore (example for Oracle):

BEGIN
   EXECUTE IMMEDIATE 'DROP TABLE MY_TABLE';
EXCEPTION
   WHEN OTHERS THEN
      IF SQLCODE != -942 THEN
         RAISE;
      END IF;
END;
/

It is a bit tricky to find the reference pages for the SQL dialects of the different DB products. I used these:

MariaDB is a fork of mySQL that was created, when mySQL came to Oracle. I recommend it as the successor of mySQL, if you are into mySQL-like databases. And I strongly recommend looking into PostgreSQL, if you intend to use a free/opensource database, because it is the same price and just has richer features. Generally all five databases are good and have their areas of strength and their weaknesses. I won’t go deeper into this in this article…

Share Button