Word Documents vs. Wiki

It was this typical clash of cultures when a real IT guy, scientist, mathematician or engineer joined a company and suddenly an office package had to be used. It could have been any of them, think LibreOffice or MS-Office, but there were some others, that have meanwhile lost relevance. We liked to use text files or HTML or LaTeX or even Literate Programming, part of which has conceptionally made it into our time as JavaDoc and similar concepts for most modern languages. The idea is to keep source code and documentation together and in sync, to work with a similar technology stack for software and documentation and to be able to use typical Linux/Unix-tools like grep, Perl, Python or Ruby on the document files to search, to create them, to transform them, to parse them etc. Do not use awk and sed for these purpose any more, they have been superseded by scripting languages.

So now all tools for automatizing things and for working efficiently or at least comfortably where gone, maybe the office even forced us to use such a non-developer-system as MS-Windows for developing software that should eventually run on Linux– or Unix-Servers. Working together on the same file was a nightmare, in spite of the existence of some software that claims to support this. And yes, I know that these office packages contain scripting languages, like VBA or LibreOffice Basic that might not be our favorite, but could eventually be powerful if we took the effort and learned them. But actually the people who learned this were mostly on the „business side“ and sometimes they were faster getting a functionality up and running than we professionals. Up and running as long as their C:-drive was up and running…

So now after the anger is gone or hopefully at least a bit reduced, it is good to actually take a look. The „classical“ office suite contained a word processor, a spread sheet and a presentation program, other components like a database have become optional and I see them rarely in use.

I think that there is no serious doubt about wanting to have a spread sheet. And a presentation program is mostly useful as well, even though there are serious alternatives to be seen like doing presentations in modern HTML. The arguments against the word processor of typical office suites have proven to somewhat survive through all the years. While the word processor is useful for exchanging documents with people who are heavily working with them and cannot easily be taught otherwise, it has pretty much disappeared from our daily work. Since around 2008 the main documentation tool in the overwhelming majority of projects that I have seen was a private Wiki, like MediaWiki or Confluence or Redmine, for example.

This allowed for easier collaboration, it was a web application that worked with any modern browser and any OS and it was a bit more natural to use and somewhat easier to read and write and search. The Wikis do have a text format, that can be processed in situations where we want to generate documentation from code or code from documentation. And it could be more easily controlled than Word-documents, that were mailed around and changed.

Document management systems or enterprise content management systems allow us to do a lot of stuff even with „classical“ office files. I know that the more advanced systems are very powerful. But I have rarely seen it in the context of software development and architecture projects. So I am not going to write too much about them, but leave this area to the experts of this field.

Generally I find it much more pleasant to work with Wikis than it was before with Office-Documents, so I would consider this a positive development.

Share Button

SQL: DROP, CREATE,… How to avoid errors?

In SQL we often want to create or drop an object (TABLE, VIEW, SEQUENCE, INDEX, SYNONYM, DATABASE, USER, SCHEMA,….)

Now we do not know if this thing exists or not. We would like to write scripts, that work in either case.

So something like

CREATE TABLE IF NOT EXISTS XYZ (...);

or

DROP TABLE IF EXISTS XYZ;
CREATE TABLE XYZ (...);

would be desirable.

This seems to be quite hard, depending on the database.

Some databases support the „IF EXISTS“ or even the slightly less useful „IF NOT EXISTS“ pattern for some of the CREATE and DROP statements:

COMMANDPostgreSQLMS-SQL-ServerOracleDB2MySQL/MariaDB
DROP DATABASE IF EXISTSxx--x
DROP FUNCTION IF EXISTS xx-xx
DROP INDEX IF EXISTSxx-xx
DROP MATERIALIZED VIEW IF EXISTSxMaterialized view not found in documentation-Materialized view not found in documentationno materialized views supported
DROP ROLE IF EXISTSxx-xx
DROP SCHEMA IF EXISTSxxno DROP SCHEMA, Oracle uses User to express the concept of a Schema? (no DROP SCHEMA found in documentation)Schema not supported.
DROP SEQUENCE IF EXISTSxx-xx
DROP TABLE IF EXISTSxx-xx
DROP TABLESPACE IF EXISTSxNot found in documentation-? (no DROP TABLESPACE found in documentation)TABLESPACE not supported
DROP TRIGGER IF EXISTSxx-xx
DROP USER IF EXISTSxx--x
DROP VIEW IF EXISTSxx-xx
CREATE OR REPLACE DATABASE----x
CREATE DATABASE IF NOT EXISTS---xx
CREATE FUNCTION IF NOT EXISTS---x-
CREATE OR REPLACE FUNCTIONxx (CREATE OR ALTER FUNCTION)x-x
CREATE OR REPLACE INDEX----x
CREATE INDEX IF NOT EXISTSx--xx
CREATE MATERIALIZED VIEW IF NOT EXISTSxMaterialized view not found in documentation-Materialized view not found in documentationno materialized views supported
CREATE OR REPLACE ROLE----x
CREATE ROLE IF NOT EXISTS---xx
CREATE SCHEMA IF NOT EXISTSx-Oracle ties the schema closely to a user. So slightly different meaning of CREATE SCHEMA...Oracle ties the schema closely to a user. So slightly different meaning of CREATE SCHEMA...Schema not supported.
CREATE OR REPLACE SEQUENCE----x
CREATE SEQUENCE IF NOT EXISTSx--xx
CREATE OR REPLACE TABLE----x
CREATE TABLE IF NOT EXISTSx--xx
CREATE TABLESPACE IF NOT EXISTS-Not found in documentation-TABLESPACE apparently not supportedTABLESPACE not supported
CREATE OR REPLACE TRIGGER-x (CREATE OR ALTER TRIGGER)x-x
CREATE TRIGGER IF NOT EXISTS---xx
CREATE OR REPLACE USER----x
CREATE USER IF NOT EXISTS----x
CREATE OR REPLACE VIEWxx (CREATE OR ALTER VIEW)x-x
CREATE VIEW IF NOT EXISTS---xx

Usually we can happily ignore the missing „IF (NOT) EXISTS“ conditions and just DROP the object and then create it. By ignoring the error message this works.

But increasingly we want to manage the database with tools that run scripts automatically and cannot distinguish as well as a DBA between „good error messages“ and „bad error messages“.

So we should find our ways around this.

The trick is to use the procedural extension language of SQL (like PL/SQL for Oracle, because in case of Oracle we need it most often). Then we write a block and find by a select in the data dictionary if the object we are creating or dropping exists. Then we put the select result into a variable and put an IF-condition around our CREATE or DROP statement. This approach can also be useful for dropping all objects of a certain type that fullfill some SQL-query-accessible condition.

DECLARE
   CNT INT;
BEGIN
   SELECT COUNT(*) INTO CNT 
     FROM USER_TABLES 
     WHERE TABLE_NAME = 'MY_TABLE';
   IF CNT >= 1 THEN
      EXECUTE IMMEDIATE 'DROP TABLE MY_TABLE';
   END IF;
END;
/

Or to catch the kind of exception that we want to ignore (example for Oracle):

BEGIN
   EXECUTE IMMEDIATE 'DROP TABLE MY_TABLE';
EXCEPTION
   WHEN OTHERS THEN
      IF SQLCODE != -942 THEN
         RAISE;
      END IF;
END;
/

It is a bit tricky to find the reference pages for the SQL dialects of the different DB products. I used these:

MariaDB is a fork of mySQL that was created, when mySQL came to Oracle. I recommend it as the successor of mySQL, if you are into mySQL-like databases. And I strongly recommend looking into PostgreSQL, if you intend to use a free/opensource database, because it is the same price and just has richer features. Generally all five databases are good and have their areas of strength and their weaknesses. I won’t go deeper into this in this article…

Share Button

Serialization

Deutsch

Serialization allows us to store objects in a lossless way or to transfer them over the network. This could be done before, but it was necessary to program the serialization mechanism, which was a lot of work. Of course, they were not yet called objects in those days…

Java suddenly had such a serialization for (almost) all object without any additional programming effort. This does not mean that this automatic serialization did not exist before, but it was made popular with Java, because frameworks started to heavily rely on it. To use it ourselves, we just had to use ObjectOutputStream and ObjectInputStream and Objects could be stored and written across the network or just be cloned. It was even able to handle circular references, which most serialization mechanisms cannot do. The idea was not really new, as other languages had something like this already before, but nobody became aware of it.

But there are some drawbacks, that were discovered when it was already too late and that should at least be mentioned.

  • Marking with a Serializiable-Interface is conceptionally quite a bad solution, because it assumes that serialization never gets lost by deriving classes, which is just not true. An Unserializable interface would have been a much better solution, if not almost ideal solution for this, because trivial objects are always serializable and they loose this when something non-serializable is added. Then again, how about collections… Today possibly some annotation could also be helpful.
  • This serialVersionUID creates a lot of pain. Should we change it, whenever the interface changes? We talk about the implicit serialization interface, not about an explicit interface that we can easily see. Should we trust automatic mechanisms? In any case issues with incompatible versions remain that are not really solved well and cannot even easily be solved well.
  • Serialization introduces an additional invisible constructor.
  • Serialization undermines the idea of private and protected, because suddenly private and protected member attributes become part of the interface
  • Funny effects happen with serializable non-static inner classes, because there serialized version bakes in the containing outer object. Yes it has to…
  • The object indentity gets lost, when an object is serialized. It is easy to create several copies of the same object.
  • Sometimes it is still necessary to manually write serialization code, for example for singletons. It is easy to forget this, because everything seems to work just fine automatically.
  • Java’s Serialization is quite slow.
  • The format is binary and cannot easily be read. A pluggable serialization format that could allow more human readable data files like JSON, XML,… would have been better…
  • Serialization creates a temptation to use this format for communication, which again forces a tight coupling that might not be necessary otherweise.
  • Serialization creates a temptation to use it as a storage format instead of mature database technologies. Very bad: the second level cache of hibernate…

There were some advantages in having this serialization in the past and for some purposes it kind of works. But it is important to question this and to consider other, more solid approaches, even if they require slightly more work. Generally it is today considered one of the larges fallacies of Java to introduce this serialization mechanism in this way. There are now better ways to do serialization, that require a bit more work, but avoid some of the terrible short comings of the native Java-Serialization.

For the serialVersionUID there are several approaches that can work. A statical method, that extracts from an „$Id$“-string that is managed by svn, can be a way. It will avoid compatibility between even slightly different versions, which is probably the best we can get. With git it is a bit harder, but it can be done as well.

Usually it is the best choice by far to leave serialVersionUID empty and rely on Java’s automatic mechanisms. They are not perfect, but better than 99% of the manually badly maintained serialVersionUIDs. If you want to manage your serialVersionUIDs yourself, there needs to be a checklist on what to do to release a new version of a file, a library or a whole software system. This is usually sick, because it creates a lot of work, even more errors and should really be done only with good reasons, good discipline and a very good concept. If you like anyway to use serialVersionUID or if you are forced to do so by the project, here is a script to create them randomly:

#!/usr/bin/perl
use bigint;
use Math::Random::Secure qw(irand);
my $r = (irand() << 32) + irand();
printf "%20d\n", $r;

This is still better than using the IDE-generated value and keeping it forever or starting with a 1 or 0 and keeping it forever, because updating this serialVersionUID is not really on our agenda. And it shouldn't be.

Share Button