From an ideological point of view, tests that involve the database may be unit tests are may be integration tests. For many applications the functional logic is quite trivial, so it is quite pointless to write „real unit tests“ that work without the database. Or it may be useful.
But there is a category of tests, that we should usually write, that involve the database and the database access layer and possibly the service layer on top of that. If layers are separated in that way. Or if not, they still test the service that is heavily relying on a database. I do not care too much if these tests are called unit tests or integration tests, as long as these terms are used consistently within the project or within the organization. If they are called integration tests, than achieving a good coverage of these integration tests is important.
Now some people promote using an in-memory-database for this. So in the beginning of the tests, this database is initialized and filled with empty tables and on top of that with some test data. This costs almost no time, because the DB is in memory.
While this is a valid way to go, I do not recommend it. First of all, different database products are always behaving different. And usually it is a bad idea to support more than one, because it significantly adds to the complexity of the software or degrades the performance. So there is this database product, that is used for production. For example PostgreSQL or Oracle or MS-SQL-Server or DB2. That is, what needs to work. That means the tests need to run against this database product, at least sometimes. This may be slow, so there is temptation to use H2 instead, but that means, writing tests in such a way that they support both db products. And enhancing the code to support at least to a minimal extent the H2 database as well. This is additional work and additional complexity to maintain.
What I recommend is: Use the same database product as for production, but tune it specifically for testing. For example: You can run it in a RAM disk. Or you can configure the database in such a way that transactions do not insist on the „D“ of „ACID“. This does not break functionality, but causes data loss in case of a crash, which is something we can live with when doing automated tests. PostgreSQL allows this optimization for sure, possibly others as well.
Now for setting up the database installation, possibly with all tables, we can use several techniques that allow to have a database instance with all table structures and some test data on demand. Just a few ideas, that might work:
- Have a directory with everything in place and copy it into the ram disk and start it from there
- Just have a few instances ready to use, dispose them after tests and already prepare them immediately after that for the next test
- Have a docker image that is copied
- Have a docker image that is immutable in terms of its disk image and use a RAM disk for all variable data. Much like Knoppix
- Do it with VMWare or VirtualBox instead…
In the end of the day, there are many valid ways. And in some way or other, a few master images have to be maintained and all tests start from this master image. How it is done, if it is done on classical servers, on virtual servers, on a private cloud, on a public cloud or on the developers machine, does not really matter, if it just works efficiently.
If there is no existing efficient infrastructure, why not start with docker images that contain the database?
We need to leave the old traditional thinking that the database instance is expensive and that there are few of them on the server…
We can use the database product that is used for production environments even for development and testing and have separated and for testing fresh images. We should demand the right thing and work hard to get it instead of spending too much time on non-solutions that work only 80%. It can be done right. And laziness is a virtue and doing it right in this case is exactly serving this virtue.