tmp-directories

On all computers we have some concept of a tmp-directory. Typically it is /tmp on Linux- and Unix-systems and something like C:/TEMP plus some subdirectory in each users home directory on MS-Windows.

In terms of software development this tends to be some dark area. Programs like to create some files there, store some stuff there and then maybe remove it, maybe not. And we do not know for sure, when we can delete these files and we actually do not want to care. Linux and Unix-Systems sometimes clear their tmp-directories on reboot, while providing an additional /var/tmp-directory, that survives reboot. Sometimes the tmp-directory is deducted from shared memory, so it is kind of a RAM-disk, but usually stored in the swap partition (or swap file) of our OS. Now this cleanup on reboot does not help too much, when we want to keep our system running for a long time.

These days most computers are somehow dedicated. Either they are virtual computers that run exactly one server application or a set of closely related server applications. Or it is a mobile phone, tablet or desktop computer that is typically used by only one person. But still we should not forget that the system should allow being used by several applications and by several users. So sharing the same tmp-directory for everyone can cause some conflicts. The Unix- and Linux-family has a way of setting file permissions for the tmp-directory itself and for its entries that stop users from reading, changing or deleting each others files, but still there is some concurrency about using the namespace of this one directory, which is usually quite elegantly bypassed by each software by using smart naming or by having the OS create unique names. But I would not consider it ideal. On the other hand, sometimes we might actually want to use the tmp-directory to share something between users or between processes, where this one tmp-directory might come in handy.

The approach of having a separate tmp-directory in each home directory and in a sub directory of each server application’s installation is tempting, because it separates name spaces, allows to disallow reading the directory entries by others and does not mix totally unrelated stuff in one directory. There is a drawback to this. We usually have different storage technologies. Some are optimized for reading, maybe even avoiding redundancy, because the system can be reinstalled. Some use sporadic writing, some are strictly read-only. And some use a lot of reading and writing. Some data is transient, some can be easily restored and some data needs to be stored redundantly to be safe. Depending on that we should aim to put it on Flash disks, or on a different RAID setup of hard disks. This is getting harder with virtualization, but eventually we can get to the point where virtual computers have disks of different characteristics, that are mapped to the appropriate hardware.

So there is no real good answer to this question, but I think that a tmp-directory that is separate from the home directory, but specific to each user, would be the best approach. Will this change? Probably not so easily. But maybe in some distant future.

Share Button

Powerful API Functions or Specific API Functions?

When designing APIs we should confront ourselves with the question what they should look like, what they should contain and what not. This is not mostly a question about development effort, but about creating a good API that can be used and save us development effort elsewhere.

There are always simple answers, but in the end we should balance certain partially contradicting desires to create something great.

One aspect will be discussed here. Some of us know functions in the libc of certain systems that we use to program in C. Favorite candidates are ioctl and fcntl. These functions include a wide range of functionality and actually do quite different things depending on the parameters. Primarily there is one parameter that selects the function. And then depending on this parameter there are several additional parameters, whose meaning totally depends on the first parameter.

I truly admire the libc and the Posix-API, because of what it can do and how it is accomplished and how clever the concepts are. But putting loosely related stuff into one catch-all-function and using a parameter that selects which function to actually execute is just wrong and it has been wrong even in the days when it was created. Now there is possibly some argument in favor of this design, because these functions are system calls, which are special, because they go immediately into the OS-kernel. Depending on the implementation of the OS there might be limits of the total number of system calls that the OS can support and it might be hard to change the interface between OS and libc too often, so a flexible system call comes in handy. In the concrete example, it is impossible to change it directly, because the POSIX-API has been standardized and this is one of the few standards that has remained relatively stable for 25 years and still offers great functionality. Linux, which strictly follows this standard, is by far the most widespread operating system today, especially on servers, mobile devices (Android) and devices that we perceive as just hardware like network routers, firewalls, … It is too valuable that programs written for the POSIX-API and of course using the defined functionality run on newer Linuxes.

But there is a lesson to learn for our own APIs. We should avoid putting too many different things into one API-function. I do not think that many of us will try to write an universal API-function like ioctl, but more subtle examples are quite common.

A typical pattern is this:

findPerson(name, email, phone_number)

We can provide a name, a phone number or an email address or a combination and then search for entries that match all of the entries that we had provided. This is still quite clear, but now we could also provide a list of phone_numbers, a list of email addresses etc…

Independent of the actual preference, it should be considered, that this are 7 functions. We can include or exclude any of the parameters, but the case that all are null is probably not supported. Or it is the eighth case that finds everything.

When we are talking about 1, 2, 3, or maybe 4 parameters, it is still possible. to create API-functions for all the combinations, like

findPersonByName(name)
findPersonByEmail(email)
findPersonByPhoneNumber(pone_number)
findPersonByNameAndEmail(name, email)
...
findPersonByNameAndEmailAndPhoneNumber(name, email, phone_number)

This will be clearer. When writing exhaustive automatic Tests, which will probably be „integration tests“, not „unit tests“, they have to be written against these seven variants anyway, no matter if it is one or seven functions. The implementation might also internally use „if“s or do the equivalent at query level by doing something like

SELECT * FROM PERSON P
WHERE
(:name IS NULL OR P.NAME = :name)
AND (:email IS NULL OR P.EMAIL = :email)
AND (:phone_number IS NULL OR P.PHONE_NUMBER = :phone_number);

which has actually eight paths, that need to be covered by tests, including the case where all three parameters are null, if that is not blocked by application code.

This also shows the limits of the classical approach, when the multitude of queries gets really complex. That might require a more generic approach, which is actually quite well exemplified by SQL or its embedded forms like JDBC. For typical IT projects, I would give the recommendation, not to go there and develop such a generic query DSL as part of the project. This usually leads to disaster, because the skills for designing a good language or a good generic framework are usually not available in the team and if we talk about budget, quality and schedule, it will usually blow anyway. So the reasonable approaches are either to use an existing well proven solution for the generic API or to just find out, what functionalities are actually needed and to provide them.

Some examples show the opposite, like Ruby on Rails, which was developed as part of a project effort. Another example is a relatively big company that developed a framework quite similar to Spring itself, before Spring was available. But these successes cannot easily be duplicated in our projects.

Share Button