How to replace svn:keywords?

In the old days we used svn, cvs, rcs or other systems for source code management, that allowed enabling something like svn:keywords. This resulted in certain strings in the source code being replaced by strings containing some version information.

More often than we might think these were useful. The question „what version are we running?“ is often answered, but surprisingly often not correctly.

Now putting the version information into a comment or even better into a string that might even be logged or that might at least be extracted by using something like

strings xyz |egrep '\$Id.*\$'

allows to find out.

Now we are using git instead of svn, or at least we should be using git or plan our migration to git. There are other tools like Mercurial, that are probably just as good as git, but git is most common and every developer knows it or has to learn it anyway to stay in business.

Now git is not supporting these svn:keywords or at least not as easily, because it relies sha-checksums, which does not allow for changing file contents. There are some tricks like pre-checking and post-checkout scripts that might solve such issues, but this is kind of difficult to tame, due to the distributed characters of git including a local repo on each developers machine.

So it is better to accept that the time of this svn:keywords-stuff is over and look for something new. As an example we will consider the world of Java and JVM languages. Most use a Jenkins server to compile the software.

To create a release, even a temporary release or a release just for testing, the right way is to first label the head of the branch we are working on, then check out based on this label, compile that and upload it to the artifactory, if it is successful. Maybe rename the label or and another label. If not, maybe delete the label, depending on the processes.

Now the jar-files contain a META-INF-directory and a MANIFEST.MF. This should be the right place to put version information during such a build. More or less this can provide the same benefit as the svn:keywords, but it works with git and needs only be done in one place.

Details about how to do it will can be found out when needed.

I assume that the same approach can also be accomplished for other environments. We can even find ways that the software logs its version by changing a string in a source code file during the build process.

Share Button

LF and CR LF in git and svn

Typically we observe that line endings of text file turn out to be „linefeed“ (LF or „\n“ or 0x0a or Ctrl-J) on Linux and Unix-like systems (including MacOSX), while they are „carriage return and linefeed“ (CR LF or „\r\n“ or 0x0d 0x0a or Ctrl-M Ctrl-J) on MS-Windows. See the little obstacles of interoperability.

This can become annoying and there is little reason for this. Most tools for editing, compiling and working with these files just understand both variants. It does become annoying when diffs are created between different files and even more so when scripts are turning out to have the „CR LF“ ending and the script interpreter given in the first line is not found, because the system tries to find one that has the otherwise invisible „CR“ in its file name. It also becomes kind of messy, when multiple CR-characters are present. CR-characters are annoying even on MS-Windows itself as soon as we use cygwin. Since in most cases the target system is a Linux system anyway and we just waste space with the unnecessary CR-character, it is actually in most cases a good idea to agree on not having these CR-characters at all in certain kinds of files.

The easiest way is to just set up git or subversion to change files from CR LF to LF on commit. So the repository only contains „clean“ files, at least from that moment onward.

This is accomplished in subversion by applying the following command on the files that we want to be kept on „LF“ only:

svn pset svn:eol LF

and then committing.

Git has a way to achieve this using the .gitattributes configuration file. So the .gitattributes file can contain something like this:

* text=auto eol=lf

Remark

Of course I recommend to use git instead of svn for new projects and to consider migrating existing projects from svn to git. But for this special aspect svn provides a slightly more powerful and more intuitive tool than git.

Links

Share Button

Meaningless Whitespace in Textfiles

We use different file formats that are more or less tolerant to certain changes. Most well known is white space in text files.

In some programming languages white space (space, newline, carriage return, form feed, tabulator, vertical tab) has no meaning, as long as any whitespace is present. Examples for this are Java, Perl, Lisp or C. Whitespace, that is somehow part of String content is always significant, but white space that is used within the program can be combination of one or more of the white space characters that are in the lower 128 positions (ISO-646, often referred to as ASCII or 7bit ASCII. It is of course recommended to have a certain coding standard, which gives some guidelines of when to use newlines, if tabs or spaces are preferred (please spaces) and how to indent. But this is just about human readability and the compiler does not really care. Line numbers are a bit meaningful in compiler and runtime error messages and stack traces, so putting everything into one line would harm beyond readability, but there is a wide range of ways that are all correct and equivalent. Btw. many teams limit lines to 80 characters, which was a valid choice 30 years ago, when some terminals were only 80 characters wide and 132 character wide terminals where just coming up. But as a hard limit it is a joke today, because not many of us would be able to work with a vt100 terminal efficiently anyway. Very long lines might be harder to read, so anything around 120 or 160 might still be a reasonable idea about line lengths…

Languages like Ruby and Scala put slightly more meaning into white space, because in most cases a semicolon can be skipped if it is followed by a newline and not just horizontal white space. And Perl (Perl 5) is for sure so hard to compile that only its own implementation can properly format or even recognize which white space is part of a literal string. Special cases like having the language in a string and parsing and then executing that should be ignored here.

Now we put this program files into a source code management system, usually Git. Some teams still use legacy systems like subversion, source safe, clear case or CVS, while there are some newer systems that are probably about as powerful as git, but I never saw them in use. Git creates an MD5 hash of each file, which implies that any minor change will result in a new version, even if it is just white space. Now this does not hurt too much, if we agree on the same formatting and on the same line ending (hopefully LF only, not CR LF, even on MS-Windows). But our tooling does not make any difference between significant changes and insignificant formatting only changes. This gets worse, if users have different IDEs, which they should have, because everyone should use the IDE or editor, with which he or she is most efficient and the formal description of the preferred formatting is not shared between editors or differs slightly.

I think that each programming language should come with a command line diff tool and a command line formatting tool, that obey a standard interface for calling and can be plugged into editors and into source code management systems like git. Then the same mechanisms work for C, Java, C#, Ruby, Python, Fortran, Clojure, Perl, F#, Scala, Lua or your favorite programming language.

I can imaging two ways of working: Either we have a standard format and possibly individual formats for each developer. During „git commit“ the file is brought into the standard format before it is shown to git. Meaning less whitespace changes disappear. During checkout the file can optionally be brought into the preferred format of the developer. And yes, there are ways to deal with deliberate formatting, that for some reason should be kept verbatim and for dealing differently with comments and of course all kinds of string literals. Remember, the formatting tool comes from the same source as the compiler and fully understands the language.

The other approach leaves the formatting up to the developer and only creates a new version, when the diff tool of the language signifies that there is a relevant change.

I think that we should strive for this approach. It is no rocket science, the kind of tools were around for many decades as diff and as formatting tools, it would just be necessary to go the extra mile and create sister diff and formatting tools for the compiler (or interpreter) and to actually integrate these into build environments, IDEs, editors and git. It would save a lot of time and leave more time for solving real problems.

Is there any programming language that actually does this already?

How to handle XML? Is XML just the new binary with a bit more bloat? Can we do a generic handling of all XML or should it depend on the Schema?

Share Button