Water pressure CPU

All kinds of technologies are being investigated to become the successor of our decades old silicon chip technology:

  • quantum computing
  • using light instead of electricity
  • using other semiconductors than silicon

But it is funny that the most obvious approach has not been investigated until recently.
Now it has been leaked that companies in east Asia are following this approach. The CPUs and all the wires on their computers are just very accurate pipes that carry water. Water pressure is used to transmit information and they have discovered nano-structures that do the equivalent to transistors for electricity, but they are much faster and need less space. Since one water molecule ways only about 2.9915\cdot10^{-26} kg, it is possible to transport very much information with very small amounts of water. Even better, the molecule has so many ways to carry vibrations and they learned about this from homeopathy and homeopathic dilutions, which still carry the vibration of the original substance without any molecules of it.

In addition, the issue of CPU heating has totally disappeared. Obviously water cooling is inherently included in the construction.

Share Button

Usability: ATMs

It is interesting, how difficult it can be to use a simple device such as an ATM („bankomat“ in most languages).

Sometimes it is just annoying, sometimes it is really hard to get the machine working properly at all…

So what are the typical use cases? I would say, 90% of the time the intention is to withdraw some money from the main account attached to the card being used. There are some interesting special tasks like looking up the balance or changing the PIN code of the card. And maybe banks are creative and provide even more functionality. Users of the ATM are mostly local users who are customers of the bank that runs the ATM, because they usually provide the best conditions for their own customers. But there are also the foreigners, who do not understand the local language and who want to use a card from somewhere else. Which amazingly sometimes works.

The first question is, in which language to start. Now we usually agree with our bank on some language or the bank imposes its local language on us and we hopefully understand it. So why can’t the ATM find out our language from the card, but asks us as first question, which language we want? There should be a button to change the language, but the machine should be smart enough to figure out our preferred language and the card should provide a preferred language and possibly some alternatives as fallback solution. Also at least for the most common operations a picture could help understand the text of the button.

Reality is, that we find ATMs that work for example in Ukrainian and won’t easily change the language. There is not too much danger in trying several options, but this is annoying for the person who wants to use the ATM and for the ones waiting in the line.

A step that some ATMs provide is that they show how the ATM should look like and to ask us to confirm that the machine is not obviously manipulated. This may be justified for security reasons and it is not too annoying, if we are already in the right language and still have not entered the PIN. When the PIN has been entered, it is already too late. Maybe even when the card has been entered.

Now the PIN code. Usually PIN codes have four digits, but Swiss cards allow six digits, which is a bit better and still reasonable to memorize, because it has always been possible to change the PIN to some memorizable favorite (not „000000“, please) since at least twenty years. I have once seen a ticket vending machine that could only handle four digit PIN codes, but I have never seen an ATM that did not accept six digits. And I have seen ATMs that immediately assume that the number is complete with the sixth digit and thus do not give a chance to correct the last digit. Even worse, the language choice was attached to the „confirm“ button, which therefore could not be used with six digits.

The next step should be to actually withdraw money, with the obvious button, in the right language or at least a secondary language choice and as the largest button or with a picture. In reality it is the third of seven buttons or so, only labeled with text in a language we do not know. And with each wrong choice the PIN has to be entered again. How convenient for the people in the line, if someone does not want to use his own card… Yes, Ukrainians are very honest, but the same applies to any country and to any kind of people who happen to be in the line.

Now the amount. It is a business decision, if the maximum amount we can withdraw is 3 EUR or 3000 EUR or something between. It should be the minimum of what this ATM allows and what our bank allows for this card in this moment. And yes, it would be kind of cool if we could find out about the limit of the ATM before spending too much time with it, especially if each transaction costs about as much as the maximum that the ATM will give us. A good way to enter the amount is just to type on the keyboard. It is obvious and should just work, of course with some way to confirm that this is now the amount we mean, but the „green“ key of the keyboard should just work.

It is really easy to think of good usability for ATMs, but it has not always been done. And it should not be too hard to program their properly.

Writing good software for ticket vending machines is much harder, because there are really much more options and it is usually hard to match the travel plan of the customer with the tariff system of the transport system.

Share Button

Daylight Saving

In many countries of Europe we have to readjust our watches and clocks today, unless they do it automatically.

It is interesting, that dealing with this has always been a great challenge for software engineers and a very high two digit percentage of software that in some way or other deals with time, does not handle it correctly. Operating systems and standard software are doing a very good job on this, but specialized software that is written today very often does not properly handle the switch. It usually does not matter, because it just results in effects like having rare phone calls charged for an hour too much, to give an example. And really critical software is properly tested for this, I hope. But software developers who are able to deal with this properly are less much than 100%, and even those who are at least able to accept that they do not know and prepared to listen to somebody who knows it, are less than 100%. And daylight saving is really a very very minor invisible side issue for most software projects. They have a task to perform and usually they do that task well enough… So we will continue to develop software that is not really properly handling daylight saving.

One more reason to stop this changing of clocks twice a year, especially since the saving of energy, that was once mentioned as advantage, does not seem to be significant.

Share Button

Hidden CPUs

How many CPUs does your computer have?

If we go way back, we will discover that some time ago there were already ancillary CPUs in our computers. The floppy disk drive of the C64 had a CPU very similar to the one in the computer itself, but very little memory and it was hard, though not impossible, to make use of it. I never really tried. The PC-keyboards had CPUs, it was told that a Z80 or 8080 or something like that was built into them. I never bothered to find out.

Now this concept is not at all new, but was already used 35 years ago. So the question is, if our computers still have such hidden CPUs. This seems to be the case and it is easy to search for „hidden CPUs“ or „secret CPUs“. And it would be extremely strange to expect anything different. They do not have compute power for us, but just run and manage hardware, that appears to be just hardware from the point of view of our main CPU, that we can program. So why not just consider this as hardware and ignore the „secret“ or „hidden“ CPUs and see them as implementation detail of the hardware. That is a very legitimate approach and to be honest what we do most of the time.

The issue is more delicate now, because these hidden CPUs can access the internet, even when the computer is turned off or seems to be offline. There are tools to analyze the network traffic and to detect this. But we should start to become aware of this invisible world that is potentially as dangerous as visible malware. And this applies to all kinds of devices, especially cell phones, tablets, routers, TV-sets and all „things“ that have their own CPU power and network access…

Links

Share Button

When to use Scala and Ruby

There are many interesting languages that have their sweet spots and of course a larger set of languages than just two should be considered for new projects.

But Ruby and Scala are both very interesting languages that did not just pick up and sell concepts that were already known, but brought them to a new level and to new beauty. Interestingly, both were started by a single person and finally became community projects.

There are some differences to observe.

Ruby is mostly a dynamic language, which means that it is easier and more natural to change the program at runtime. This is not necessarily a bad thing and different Lisp variants including today’s Clojure have successfully used and perfected this kind of capability for many decades. Consequently more things happen at runtime, especially dynamic typing is used, which means that types only exist at runtime.

Scala is mostly a static language, which means that all program structures have to be created at compile time. But this has been brought to perfection in the sense that a lot of things that are typically available only in dynamic languages, can be done. The type system is static and it is in this sense more consistent and more rigorous than the type system of Java, where we sometimes encounter areas that cannot reasonably be covered by Generics and fall back to the old flavor of untyped collections. This does not happen too often, but the static typing of Scala goes further.

In general this gives more flexibility to Ruby and makes it somewhat harder to tame the ways to do similar things in a static way in Scala. But the type system at compile time of course helps to match things, to find a certain portion of errors and even to make the program more self explanatory without relying on comments. In IDEs it is hard to properly support Scala, but the most common IDEs have achieved this to a very useful level. This should not be overvalued, because there are enough errors that cannot be detected by just using common types. It is possible to always define more specific types which include tight constraints and thus perform really tight checking of certain errors at compile time, but the built in types and the types from common libraries are to convenient and the time effort for this is too high, so it does not seem to be the usual practice. In any case it is a recommended practice to achieve a good test coverage of non-trivial functionality with automated tests. They implicitly cover type errors that are detected by the compiler in Scala, but of course only to the level of the test coverage. Ruby is less overhead to compile and run. We just write the program and run it, while we need a somewhat time intensive compile step for Scala. If tests are included, it does not make so much of a difference, because running the tests or preceding them with a compile job is kind of a minor difference.

An interesting feature of Ruby is called „monkey patching“. This means that it is possible to change methods of an existing class or even of a single object. This can be extremely powerful, but it should be used with care, because it changes the behavior of the class in the whole program and can break libraries. Usually this is not such a bad thing, because it is not used for changing existing methods, but for adding new methods. So it causes problems only when two conflicting monkey patches occur in different libraries. But for big programs with many libraries there is some risk in this area. Scala tries to achieve the same by using „implicit conversions“. So a conversion rule is implicitly around and when a method is called on an object that does not exist in its type, the adequate conversion is applied prior to the method. This works at compile time. Most of the time it is effectively quite similar to monkey patching, but it is a bit harder to tame, because writing and providing implicit conversions is more work and harder to understand than writing monkey patches. On the other hand, Scala avoids the risks of Ruby’s monkey patching.

An increasingly important issue is making use of multiple CPU cores. Scala and especially Scala in combination with Akka is very strong on this. It supports a reasonably powerful and tamable programming model for using multiple threads. The C- or JavaSE-way is very powerful, but it is quite difficult to avoid shooting oneself into the foot and even worse there is a high likelihood that such errors show up in production, in times of heavy load, while all testing seemed to go well. This is the way to go in some cases, but it requires a lot of care and a lot of thinking and a team of skillful developers. There are more developers who think they belong to this group than are actually able to do this well. Of course Scala already filters out some less skilled developers, but still I think its aproach with Akka is more sound.
Ruby on the other hand has very little support for multithreading, and cannot as easily make use of multiple cores by using threads. While the language itself does support the creation of threads, for many years the major implementation had very little support for this in the sense that not actually multiple threads were running at the same time. This propagated into the libraries, so this will probably never become the strength of Ruby. The way to go is to actually start multiple processes. This is not so bad, because the overhead of processes in Ruby is much less than in JVM-languages. Still this is an important area and Scala wins this point.

Concerning web GUIs Ruby has Rails, which is really a powerful and well established way to do this. Scala does provide Play, which is in a way a lot of concepts from Rails and similar frameworks transferred to Scala. It is ok to use it, but rails is much more mature and more mainstream. So I would give this point to Ruby. Rails includes Active Record, about which I do have doubts, but this is really not a necessary component of a pure WebGUI, but more a backend functionality…

So in the end I would recommend to use Scala and Akka for the solution, if it is anticipated that a high throughput will be needed. For smaller solutions I would favor Ruby, because it is a bit faster and easier to get it done.

For larger applications a multi tier architecture could be a reasonable choice, which opens up to combinations. The backend can be done with Scala. If server side rendering is chosen, Ruby and Rails with REST-calls to the backend can be used. Or a single page application which is done in JavaScript or some language compiling to JavaScript and again REST-calls to the backend.

Share Button

Carry Bit, Overflow Bit and Signed Integers

It has already been explained how the Carry Bit works for addition. Now there was interest in a comment about how it would work for negative numbers.

The point is, that the calculation of the carry bit does not have any dependency on the sign. The nature of the carry bit is that it is meant to be used for the less significant parts of the addition. So assuming we add two numbers x and y that are having k and l words, respectively. We assume that n=\max(k,l) and make sure that x and y are both n words long by just providing the necessary number of 0-words in the most significant positions. Now the addition is performed as described by starting with a carry bit of 0 and adding with carry x[0]+y[0], then x[1]+y[1] and so on up to x[n-1]+y[n-1], assuming that x[0] is the least significant word and x[n-1] the most significant word, respectively. Each addition includes the carry bit from the previous addition. Up to this point, it does not make any difference, if the numbers are signed or not.

Now for the last addition, we need to consider the question, if our result still fits in n words or if we need one more word. In the case of unsigned numbers we just look at the last carry bit. If it is 1, we just add one more word in the most significant position with the value of 1, otherwise we are already done with n words.

In case of signed integers, we should investigate what can possibly happen. The input for the last step is two signed words and possibly a carry bit from the previous addition. Assuming we have m-Bit-words, we are adding numbers between -2^{m-1} and 2^{m-1}-1 plus an optional carry bit c. If the numbers have different signs, actually an overflow cannot occur and we can be sure that the final result fits in at most n words.

If both are not-negative, the most significant bits of x[n-1] and y[n-1] are both 0. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \ge 2^{n-1}, which means that the result „looks negative“, although both summands were not-negative. In this case another word with value 0 has to be provided for the most significant position n to express that the result is \ge 0 while maintaining its already correctly calculated result. It cannot happen that real non-zero bits are going into this new most significant word. Consequently the carry bit can never become 1 in this last addition step.

If both are negative, the most significant bits of x[n-1] and y[n-1] are both 1. An overflow is happening, if and only if the sum x[n-1]+y[n-1]+c \lt 2^{n-1}, which means that the result „looks positive or 0“, although both summands were negative. In this case another word with value 2^n-1 or -1, depending on the viewpoint, has to be prepended as new most significant word. In this case of two negative summands the carry bit is always 1.

Now typical microprocessors provide an overflow flag (called „O“ or more often „V“) to deal with this. So the final addition can be left as it is in n words, if the overflow bit is 0. If it is 1, we have to signal an overflow or we can just provided one more word. Depending on the carry flag it is 0 for C=0 or all bits 1 (2^n-1 or -1, depending on the view point) for C=1.

The overflow flag can be calculated by o := \mathrm{signbit}(x) = \mathrm{signbit}(y) \land \mathrm{signbit}(x+y\mod 2^n) \ne \mathrm{signbit}(x).
There are other ways, but they lead to the same results with approximately the same or more effort.

The following table shows the possible combinations and examples for 8-Bit arithmetic and n=1:

x<0 or x≥0y<0 or y≥ 0(x+y)%2^8 < 0 or ≥ 0Overflow BitCarry Bitadditional word neededvalue additional wordExamples (8bit)
x≥0y≥0≥000no-0+0
63+64
x≥0y≥0<010yes064+64
127+127
x≥0y<0≥000 or 1no-65+(-1)
127+(-127)
x≥0y<0<000 or 1no-7+(-8)
127+(-128)
0+(-128)
x<0y≥0≥000 or 1no--9 + 12
-1 + 127
-127+127
x<0y≥0<000 or 1no--128+127
-128+0
-1 + 0
x<0y<0≥011yes-1-64 + (-65)
-128+(-128)
x<0y<0<001no--1 + (-1)
-1 + (-127)
-64 + (-64)

If you like, you can try out examples that include the carry bit and see that the concepts still work out as described.

Links

Share Button

Source Code of Apple-iOS leaked

It seems that the parts of the source code of Apple’s iOS 9 have leaked via github. They might have been removed from there, while you are reading this, but probably they will be passed around in the internet anyway.

Some sources say that this is a risk to security. It might be, but in the end cryptography specialists tend to consider the availability of the source code as an advantage for security, because it can be analyzed by everyone, vulnerabilities can be found and published and of course more easily be corrected if the source is available to everyone. Hiding the source code is some kind of „security by obfuscation“, which is not really a strong mechanism and it should be based on verifiable secure mechanisms, as successfully applied by Linux and other open source operating systems. But this might not be fully true, if the sources are just passed around in somewhat closed circles and not easily available to the general public.

This does not make iOS open source, because the licenses that Apple imposes on their software are still valid and to my understanding they do not make this part of the system open source, which means much more than just being able to read the source code of a certain version that might already be outdated. Please observe that if the source code that you might find on github is really coming from Apple, their original license and not the one mentioned in github applies.

To put Jail breaking somewhere near security breaches is wrong, because this is an action done by the owner of the device with his or her own device at own risk. This should be everyone’s right to do so and there should be nothing wrong with making it easier. I know, we are not living in a perfect world…

So please relax. If Apple has done a good job, there will not be too bad exploits and if they are still doing a good job, they will quickly fix any exploits that show up. And if you like to have an open source system, you should still consider using something else.

Links

Share Button

Java Properties Files and UTF-8

Java uses a nice pragmatic file format for simple configuration tasks and for internationalization of applications. It is called Java properties file or simply „.properties file“. It contains simple key value pairs. For most configuration task this is useful and easy to read and edit. Nested configurations can be expressed by simple using dots („.“) as part of the key. This was introduced already in Java 1.0. For internationalization there is a simple way to create properties files with almost the same name, but a language code just before the .properties-suffix. The concept is called „resource bundle“. Whenever a language specific string is needed, the program just knows a unique key and performs a lookup.

The unpleasant part of this is that these files are in the style of the 1990es encoded in ISO-8859-1, which is only covering a few languages in western, central and northern Europe. For other languages as a workaround an \u followed by the 4 digit hex code can be used to express UTF-16 encoding, but this is not in any way readable or easy to edit. Usually we want to use UTF-8 or in some cases real UTF-16, without this \u-hack.

A way to deal with this is using the native2ascii-converter, that can convert UTF-8 or UTF-16 to the format of properties files. By using some .uproperties-files, which are UTF-8 and converting them to .properties-files using native2ascee as part of the build process this can be addressed. It is still a hack, but properly done it should not hurt too much, apart from the work it takes to get this working. I would strongly recommend to make sure the converted and unconverted files never get mixed up. This is extremely important, because this is not easily detected in case of UTF-8 with typical central European content, but it creates ugly errors that we are used to see like „sch�ner Zeichensalat“ instead of „schöner Zeichensalat“. But we only discover it, when the files are already quite messed up, because at least in German the umlaut characters are only a small fraction of the text, but still annoying if messed up. So I would recommend another suffix to make this clear.

The bad thing is that most JVM-languages have been kind of „lazy“ (which is a good thing, usually) and have used some of Java’s infrastructures for this, thus inherited the problem from Java.

Another way to deal with this is to use XML-files, which are actually by default in UTF-8 and which can be configured to be UTF-16. With some work on development or search of existing implementations there should be ways to do the internationalization this way.

Typically some process needs to be added, because translators are often non-IT-people who use some tool that displays the texts in the original languages and accepts the translation. For good translations, the translator should actually use the software to see the context, but this is another topic for the future. Possibly there needs to be some conversion from the data provided by the translator into XML, uproperties, .properties or whatever is used. These should be automated by scripts or even by the build process and merge new translations properly with existing ones.

Anyway, Java 9 Java 9 will be helpful in this issue. Finally Java-9-properties that are used as resource bundles for internationalization can be UTF-8.

Links

Share Button

PostgreSQL

Almost every non trivial application uses in some way a database.

For many years this has been anyway Oracle, DB2 or MS-SQL-Server, depending mostly on the habits and on the religious orientation of the organization that developed or ran the application. These days all three are available for Linux and MS-Windows. DB2 is also available z/OS. The „home-platforms“ of these three are probably Linux, z/OS and MS-Windows, respectively (2018).

We saw Teradata as an alternative to DB2 and Oracle for data warehouses. They run on huge amounts of data, but are really invisible to most of us. Maybe the data warehouse is the old „big data“, before the invention of the term.

We saw a big Hype about NoSQL databases and some interesting DB products from this group that could successfully establish themselves.

We saw MySQL (and its fork MariaDB) mostly for database-installations that had somewhat lower requirements on the DB-product in terms of features or in terms of performance. Actually Wikipedia runs on MySQL or MariaDB and that is quite a big installation with heavy user load, but it is mostly about reading.

PostgreSQL was often positioned „somewhere between Oracle and MySQL“.

PostgreSQL 10 just came out. The most important new features where replication on a per table basis, better partitioning of large tables and better support for clustering.

I have worked with all of the database technologies listed here and even giving trainings for MongoDB, Oracle and PostgreSQL.

So where is PostgreSQL positioned really in this landscape?

It is a good database product for a large and growing class of applications. I find it slightly more pleasant than the other four SQL databases mentioned here to work with, because the SQL-implementation and its extensions are powerful, clean and behave more or less as expected. Some minor positive points are the default usage of ISO-date-format, the distinction between Null and empty string and on the other hand that most stuff that works in Oracle at SQL level can easily be transferred to PostgreSQL. The psql-shell works like typical linux shells in terms of command line editing and history. So a lot of minor details are just pleasant or as they should be.

Comparing to the three groups of contenders:

NoSQL

NoSQL databases kind of leave the mainstream of transactional relational SQL-databases and provide us either some interesting special features or promise us performance gains or support of huge data base sizes. The price for this is that we loose a extremely mature, clever and powerful query language, which SQL is. I would go for NoSQL products, if the additional feature of this NoSQL-DB-product cannot be reasonably be duplicated in PostgreSQL or other SQL-DBs and if it is really useful for the job. I would go for a NoSQL-DB-product, if the required data sizes and performance cannot reasonably be achieve using an SQL-product like PostgreSQL, good tuning of hardware, OS, database and application logic, but can actually be achieved with the NoSQL-product. These applications exist and it is important to pick the right NoSQL-DB for the project. It should be observed that PostgreSQL has a lot of features beyond of what normal SQL-databases have and looking into this area might be useful… A typical strength of some NoSQL-databases (like CassandraDB and MongoDB) is that a powerful replication is kind of trivial to set up, while it is a really big story for typical transactional SQL databases… This is due to the transactional feature which adds complexity and difficulty and a performance penalty to some kinds of replications…

MariaDB/MySQL

I do not count that MySQL belongs to Oracle, because MariaDB is an independent fork outside of Oracle and can be used instead of MySQL.
I do think that MySQL does not have quite the level of PostgreSQL in terms of features and cleanness. So we can get PostgreSQL for the same price as MySQL or MariaDB. So why not go for the better product? Even if MariaDB perfectly fits today, the application will grow and it will at some point prove useful to be based on PostgreSQL. I came across the issue of nested transactions some years ago. They were easily supported by PostgreSQL, but not at all by MariaDB. Issues like that can come up more likely this way than the other way around.

Oracle, DB2, MS-SQL-Server

Especially Oracle makes many long term loyal customers run away due to there pricing and licensing practices. While it is extremely hard to change the database of a non trivial database based application, at least new applications in many organizations are discouraged from using Oracle, unless they can make a point why they really need it. MS SQL-Server might absorb some of these, especially since they are now available on Linux servers. But what Oracle does now might very well be the policy of Microsoft or IBM in a few years, so it makes perfect sense to have a serious look at PostgreSQL. A reasonably well tuned PostgreSQL will work pretty much as good as a reasonably well tuned Oracle, DB2 or MS-SQL-Server. Features that are missing now are being added with new releases. Some interesting features make it just a bit more pleasant to use than for example Oracle. It just feals more modern and more Linux-like.

Btw. there were some more contenders in the space of commercial transactional SQL-databases like Adabas D, SyBase and Informix. While the database products Adabas D and SyBase have been bought by SAP 1997 and the whole Sybase company in 2010 in two more or less unsuccessful attempts to have their own database and not having to use their competitors product as database, but they seem to have some success in using HANA now. Informix has been bought by IBM and is still offered as alternative to DB2. I would say that they have lost their relevance.

PostgreSQL

So I do recommend to seriously consider PostgreSQL as a DB product. It is currently my favorite in this space, but there is no univeral tool that fits for everything.

Some random aspects to keep in mind when moving from Oracle to PostgreSQL are mentioned here…

Types CLOB and BLOB do not exist. They can mostly be replaced by types TEXT and BYTEA, but it is not exactly the same. The type TEXT, which is a somewhat unlimited variable length string can easily be used for columns where we would try to use VARCHAR2 in Oracle, which gives us the advantage that we do not have to worry about defining a maximum length or exceeding the 4k limit that Oracle imposes on VARCHAR2.

Empty Strings are not the same as NULL in PostgreSQL, they are in Oracle.

PostgreSQL has a boolean type. Please use it and get rid of the workaround using CHAR, VARCHAR2 or NUMBER as replacement.

Oracle only had one kind of transaction isolation that was really well supported and I think this is still the way to go. It is an excellent choice and is very close to „repeatable read“, while PostgreSQL uses by default „read committed“, but it can be brought to use „repeatable read“. Please keep this in mind to avoid very unpleasant surprises and use the transaction isolation level appropriately.

The structuring of PostgreSQL consists of DB-instances, usually only one on a virtual or physical server, which somewhat resembles what is a database in Oracle. Within a DB-instance, it is possible to define a number of database without much pain. This was totally not the case with Oracle in earlier years and it was best practice to rely no schemas, but now we can easily afford to put more virtual servers each running Oracle (or PostgreSQL), if the licensing does not prohibit it in the case of Oracle. And since Oracle 12 there is the concept of the virtual database which splits a Oracle database into sub databases, somewhat behaving like separate database without the overhead of DB instances. It seems to be quite equivalent to what PostgreSQL does, apart from the naming and many details about how to set it up and how to use it. Schema and User are more separate concepts in PostgreSQL, a Schema can be defined totally independently of Users, but there is a way to define Schema names that match the user names to support this way of working. So we can do pretty well what we want, but the details how to work it out are quite different.

Each database has its programming language to write triggers, stored procedures and the like. They seem to be somewhat similar between different DB-products (we are talking about MS-SQL-Server, Oracle, PostgreSQL and DB2), but different enough that we need to rewrite triggers and stored procedures from scratch. This is not as painful as it used to be, since the approach of accessing DB tables for read access only via views and for write access only via stored procedures seems to have lost some popularity. Having written a lot of the business logic PL/SQL the pain of migrating to another DB product is really enormous, while a business logic in Java, Scala, C, C++, Perl, Ruby, C# or Clojure can be ported more easily to different OS and different DB. But it is no way for free.

One remark for development: Some teams like to use in memory databases for development and then trust that deployment on PostgreSQL or Oracle or whatever will more or less work. I strongly recommend not to follow this route. It is totally not trivial to support one more DB product or usually a second DB product and it is quite easy to setup a virtual OS with the DB product that is being used and with test data. PostgreSQL, Oracle, MS-SQL-Server, MongoDB and whatever you like can be configured to use more Memory and perform pretty much like these in memory DBs, if we set them up for development and are willing to risk data loss. This is no problem, because the image can be trivially copied from the master image when needed. Yes, a really good network and SSDs of sufficient size, speed and quality are needed for working efficiently like this and it is possible and worthwhile to have that.

I can give training about PostgreSQL and MongoDB and about SQL in different dialects. Find contact information here.

And please: comments, corrections and additional information are always welcome…

Links

Share Button

The magic trailing space

When comparing string, of course spaces count as well and they should count. To ignore them, we can normalize strings. Typical white space normalization includes the following (Perl regular expressions):

  • /[ \t]+/ /g replace any sequence of tabs and spaces used to separate content by one space.
  • /\r\n/\n/g replace carriage return + linefeed by linefeed only.
  • /\s+$// remove trailing whitespace.
  • /^\s+// remove leading whitespace.

More or less it is often useful to do something like this when comparing strings that originally come from outside sources and are not normalized, but only „the content“ counts. There can be more sophisticated rules, to deal with no-break-space, with control characters, with trailing spaces at the end of each line or only at the end of the whole thing or replacing multiple empty lines by just one empty line. Just the general idea is to think about the right normalization.

In some cases, like long numbers, spaces or other symbols are used to group digits. These should also be removed. Sometimes more specific rules apply, like for phone numbers, web sites, email addresses etc. that need to be done specifically for this type, hopefully using an adequate library.

More often than not we see that web sites do not do this properly. Quite often an information has to be entered and it is not normalized prior to further processing. So credit card numbers or IBAN numbers are rejected because of spaces or anything because of trailing spaces, of course with an error message that does not give us a hint about what was the problem.

For serious application there needs to be a serious processing step for data coming from outside anyway, for security reasons. Even though SQL injection should not work due to sound SQL-placeholder usage, it is a good practice to check the data anyway and reject it early and with a meaningful message. Should I trust the security of a site that cannot deal with spaces in a credit card number for giving them my card number? I am not sure.

It is about time that UI developers get into the habit of doing the proper processing, normalization and checks for user input. Beware that any security relevant checks need to be done on the server or on the server as well.

Share Button