We have professional profiling tools and we should use them.. They give really useful and extensive information.
So why bother about doing „poor man’s profiling“?
About ten years ago, running profilers was kind of constrained to very small examples and computers with really huge memory. We have this huge memory now on every development machine. But we can only run such tests on development and test systems.
Sometimes it is possible to copy data from the productive system to the test system and really simulate what is happening on the productive system. Often this is quite far away from what is really happening on the productive system and we often only know about a problem when it has already occurred.
What we usually do have are log files. Each entry has a time stamp and can somehow be connected to a line of code where it is created. We always know the class or the file or something like that from where the log statement was issued and since there is only a limited number of log statements, they can often be recognized. Sometimes it is a good idea to actually help finding where it was logged. For example could we artificially put the source code line number or some uniq number or a short random string into the source code for each log statement. Then we can find several pieces of information in the log statement:
- The timestamp (please use ISO-format!!! And at least milliseconds)
- The source code location (line number, class name, random string, relative position in file, meaningful description like entering/leaving methods etc.)
- The thread and/or process
- The payload information of the log
Now log statements analyzed by thread. The first step is to look from which pairs of locations in the source code we observe successive log statements from the same thread, ignoring all logging from other threads and processes. We can also estimate the time that was spent for going from the first to the second location. Aggregating this by basically adding up thes timestamp deltas for each such pair over all threads and the whole log file will already indicate some hotspots where time is spent. Sometimes it is even interesting to look at sequences of three log statements (from the same thread). This was useful, when DB-commits, i.e. one such pair like „begin commit“ to „commit done“ used around 50% of the time, if not more. This was not interesting without knowing the statement that was committed, which was found out by looking at the previous log entry of the same thread just before the commit.
Such analysis can be done with scripts for example in Perl, Ruby or Python. I use Perl for this kind of task. Maybe log analysis tools provide this out of the box or with a little bit of tweaking. A concept about what entries in log files look like that make recognizing the location of the program that created the log entry, are very helpful. Sometimes we actually want to see, with which kins of data the problem occurred, which means looking also into the payload.