Test Driven Development
We all know that how good test driven development is and that we should move in that direction.
How much coverage
There are some serious obstacles. Most of all, we have some obligation to actually finish software and the resources are usually kind of limited. If they were not limited by money and time constraints, they would hit the limit of efficient team sizes and organizational structures.
We can just look at a simple application that does „CRUD“ operations. Ideally we start with a known data set and reset the database to exactly this content before starting the tests, maybe even before each single test… If we have a huge and well managed server farm to run the test, maybe possible. For the „read“-methods we need to write some tests that succeed in reading, probably performing a few reads with a single read method to cover different outcomes of the successful read or different parameter combinations. Then there are unsuccessful reads that just do not find anything and return null or an empty result collection or even those that fail with an exception. It is of interest to check the maximum and minimum allowed values, if there are such limits. So we end up writing five to a few dozen test methods for a single read method. And this is the simple case. For delete and update we should create our own data in the beginning of the test. Probably there are dependencies and constraints in conjunction with other data, so it is necessary to cover these also. Create and update actually need a variety of at least two values for each of he most simple attributes of the created data object, to deal with not null. Usually we have more constraints on attributes, concerning lengths, value range and some kind of compatibility with other data. So there will be up to around ten tests for each attribute of the created or updated entity and we have successful and unsuccessful operations that we expect. So we will end up writing hundreds or even thousands of unit test methods just to obtain the most basic coverage for a relatively simple „CRUD“-application. Writing many similar tests is not so difficult and it would be interesting to explore ways to cut down on the repetitive work involved by using less verbose languages for writing the tests, creating them partially with scripts or simply writing very powerful helper methods in the test class that just get called with slightly different parameters to do all the tests. It will anyway be a lot of work, to write the tests. I think 60% of the time for the unit tests and 40% for the actual code is a reasonable number for a relatively fair coverage of most of the code.
In practice we should really prioritize our unit testing efforts, because spending 2.5 times as much time for the whole thing as for the code itself is simply not always possible. On the other hand, the time we save in the long run with good testing is even more than ha we spend, if we do the unit test development well.
But there are some aspects to think of:
- Which parts of the application are fairly stable?
- Which parts of the application are used and relied on heavily by other parts of the application?
- Which parts of the application are used a lot by end users?
- Which parts of the application are high risk because they have more inner complexity?
- Which parts of the application actually showed errors? Fix the errors by writing a test to expose them first.
- Which parts of the application are high risk in terms of reputation, money loss or data loss if they go wrong?
- Which parts of the application are undergoing internal changes, while retaining the API?
- Which parts of the application are migrated to another platform, OS, DB, architecture …, while retaining the API?
It is good to focus primarily on areas based on these questions and to do reduced testing for areas that are less critical.
The first question is quite delicate, because it exposes some contradiction we need to cope with. We should be agile, change the application easily when requirements are understood better or the architecture is understood better. But with tests, even this effort multiplies by 2.5 or whatever we have to update the unit tests. Or even worse, it leads to disabling the unit tests or to the loss of agility. In areas that change quickly it may be better to write the complete set of tests by the time they have become relatively stable.
The next issue is the database. Typical organizations like to provide one DB instance and schema for the whole development team, because the database instances and schemata are seen as expensive resources. They are hard to maintain and for various reasons it is often difficult to install a local database on each of the developers machine. If it is Oracle, DB2 or MS-SOL-Server, some know-how is needed to install it and maybe even some constraints are there in terms of the OS. MariaDB and PostgreSQL seam to be somewhat easier to install, there are less license issues involved, but still even that is an effort. This can be overcome by virtualization. An image with the DB-setup can be developed once and than copied to each team member. There are interesting and good ways to do something like this. So it is becoming less of an issue, but still it is very unusual to have that. Now there are two ways out of this. One way is to use another DB product for development and production. This is somewhat dangerous, because databases are so different, that today’s common abstractions do not hide the differences and we also might pay a high price in terms of performance if we do not use DB-specific features. So it requires extra development effort to support both DB types. And it is very important to run tests against the DB that is used in production anyway on a regular basis. It may be helpful to move part of the tests to such a similar-but-not-equal local environment. The regular development DB is unfortunately often shared between many developers. Now if tests run simultaneously from different development machines against the same DB, they will usually inter and some tests will probably fail just because of that. Not all the time, but sporadically. It can be avoided, by some team organization and some kind of reservation of the DB, but that is painful, so we just run the tests and if they fail assume it is someone else testing at the same time causing the failure. It is possible to write the tests in such a way that they can withstand this, but this is a lot of extra effort, compared to the effort of using a virtual image with a working DB instance it is not justifiable.
So what we should aim for is a dedicated DB schema for each developer. Ideally it should be of the DB software product used for production. It can be locally, on some DB-server or as a virtual image.