Perl Scripts for editing

Even though we do have IDEs with quite powerful refactoring mechanisms for many languages, it is still sometimes useful, to have another automated editing mechanism.

Why would that be the case?

Some examples:

In some cases there was an SQL script to create the database tables, which had been written first. From that classes and even CRUD operations in something like JDBC or DBI can be created. Even though most Java-projects that I have seen recently prefer using Hibernate (or more generally JPA2), there are some benefits in doing just plain old JDBC. This requires some discipline in writing the SQL in a uniform way and it was sometimes even necessary to have „magical comments“ that were ignored by the DB, but used by the script. This can save a lot of work and errors.

In some cases there were large numbers of HTML files. Now a similar kind of change had to be applied to each of them. Using a Script to parse the HTML file and to apply the changes can save a lot of work and provide consistency that is often badly needed. And if the whole thing happens in the context of an agile project, then the stakeholders might want to see the outcome and come up with further ideas. No big deal, just change the transformation scripts and check it out again. I recommend thinking in terms of larger transformation steps. Then I would retain the original files and apply the script for the step until the outcome is ok. Then this can be committed to git and the next step can be worked on. If the intermediate results are not useful for anybody else, just wait with the push or better work on a branch.

Sometimes refactorings have to be done that are not easily supported by the IDE. For example a whole bunch of classes need to be moved to different packages according to some new naming rule. Just find all the classes with their old package names, then move to the new directory structure and rename imports and package declarations to the new structure. This can easily be done with a Script. Always remember to be careful if there is Reflection involved, this will most likely break all refactorings, no matter if by IDE or by script.

Also it is often useful to use scripts to analyze code and to find occurrences of certain patterns.

These things can be done with scripts in Ruby, Python, Perl or Raku. All of these are valid options. Ruby and Raku are somewhat more sophisticated languages than the other two. And Perl and Raku have the most sophisticated Regex capabilities. I would assume that Raku is the best choice, if you start from scratch, it even supports grammars out of the box, which might be a way to address such issues. Perl has it as add on libraries. Of course it is also useful to work with what you know. But sometimes it is worth learning the right tool instead of just using the golden hammer to put in screws.

So in my case the tool for this is currently Perl. It does the job, is available more or less everywhere and I know it well enough. It might be worth moving to Raku in the future.

Some things that often work for simple scripts are the following:

Try to start with normalizing the input files, this might be the first transformation step. Remove trailing spaces, replace tabs with spaces, maybe normalize idention, replace line endings by LF only without CR. In HTML replace HTML-entities for UNICODE-characters by the appropriate Unicode characters, convert everything to UTF-8 etc.
If you have data like phone numbers and dates in the file that are meaningful for your further steps, bring them to a standard format. Of course always depending on your local circumstances, but this is usually the right way to go. This might be partially done with an external tool like xmllint.

For the next steps we need to consider the issue that we have multiple lines. There are regex-variants that do not stop at line boundaries. But if we read the content line wise, it can still be a bit more work. Sometimes it is easier to just replace line feeds by some marker string that does not otherwise occur in your files (check it before!) and then apply usual regex on this longer line.

What we often need is a regex that finds the shortest and not the longest match. If we write something like /a.*b/, this will look for a sequence that starts with an „a“ and ends with the last „b“ that can be found. Often we want to end with the first „b“. This can be achieved by /a.*?b/.

Another pattern that is often useful for such scripts is to work with a state machine. If a certain pattern is discovered, we react to it depending on the state and possibly change the state. So we can apply changes to something only if it occurs in a certain context.

Share Button

Perl and Scala: what can they learn from each other?

Ironically Scala at first drew my interest, because I discovered that about ten years ago there was no really good understanding of how to do a good multithreading concept for Perl 6. I thought exploring how they do it in Scala, where it was already known to be good at that time, would give a more general understanding to this issue. At this time Perl 6 (now named „Raku“) was intended to rather go without multithreading capabilities than doing them badly. In the end I got dragged into Scala and found that by itself more interesting than the original issue. And Perl 6 community eventually found good answers for providing multithreaded capabilities anyway.

So why there are technical concepts in both of these languages, that are interesting and possibly could in some way be applied to the other one, there is an interesting parallel.

Both Scala and Perl have been „cool languages“ that were really strong in an area or even in a broader range of application areas. Both of them found a competitor, that was kind of an „inferior clone“ of them. PHP in its early versions was very similar to Perl, but „simplified“ and kind of a subset of what Perl provided. At that time Perl had a real boom, because the first Web applications came up and the only reasonable way to go was CGI and of course it was done with Perl. There were some early alternatives like Cold Fusion and ASP, but they never really become main stream, at least not outside of their respective communities. Now PHP eventually took over most of Perl’s CGI and has become a major building block of our current WWW. Wikipedia and this Blog run on PHP. Perl eventually also lost its leading position as system administration scripting language to Ruby and even more to Python and some others, but it is still there and has strong string parsing capabilities and a very useful ecosystem of libraries called CPAN.

Now Scala has found Kotlin to be a similar competitor. Besides being somewhat simpler Kotlin also shines with good tooling support. It comes from the same organization as IntelliJ IDEA, which is the usual IDE for most JVM-languages for people who rely neither on Emacs nor vi. So Kotlin support in IntelliJ is always going to be a high priority. And Kotlin is officially supported by Google as programming language for Android-apps. It seems to work well, allows for more modern development than the supported Java versions and has conceptionally a lot of similarity with Swift, which is the most modern programming language supported by Apple for IOS-Apps. There have been heroic and admirable approaches to allow for Android App development using other JVM-languages, especially Scala. But they all suffer from the same set of problems. In order to avoid installing too much language specific code in the app, dynamic language features that would require a compile capability, as commonly used for Groovy or Clojure have to be avoided. And the excessive use of the languages libraries has to be avoided, because they are not on the phone already, but a copy of them has to be shipped with the app, for each App. So the storage usage is much more than for Kotlin and Java Apps. And then we see an attempt, to reduce the size of the libraries, by only including what is needed. That is necessary, but it looks too fragile to really trust it. So, for Mobile Apps, it is Kotlin. Period. And then Kotlin is already there, so why not use it on the server as well. Yes, I do believe Scala is better, but that is not what everyone thinks and it needs to be much better to justify the additional language, where App-development for Android is already happening.

Now both Perl and Scala had some problems. To some extent, they are even sharing the exact same problem. It was the possibility to write really „cool“ code that was very smart, very short and could not be read by anybody else without very much time and very much knowledge. This can be done in any language, but Perl is the number one for this and I would put Scala as number two and C++ and C as number three and four, from the languages, that I know. It is a good idea to use some coding standards that allow for clean Scala or Perl code. But please remain reasonable and do not let bureaucrats come in charge of the coding standards to create a monster that drains all creativity. Allow using powerful features, but use them in a decent and readable way.

Now in both cases, there was an effort, to write a new version of the language, that was meant to be slightly incompatible and cleaned up some of the weaknesses and brought some improvements. In case of Perl this was Perl 6. It was developed for around 20 years and came out a few years ago. Eventually it turned out too different, so it was renamed to Raku. For Scala, a new language called „Dotty“ was developed. It was decided to make this the next major version, Scala 3. Even though it is much closer to Scala 2 than Raku to Perl 5, it is still incompatible and requires an effort to rewrite code. It is already seen that large Perl 5 projects are hardly moving to Raku, so Perl 5 is there to stay and Raku is just a second language within the same community. This will probably not happen like that with Scala, and the core language team will probably at some point of time concentrate on Scala 3. But large organizations that heavily invested into Scala cannot easily migrate, simply because it needs a lot of time and money. So we will probably also see some long term coexistence of Scala 2 and Scala 3. Maybe Scala 2 will be forked by major adopters. Or it will be supported for money from Lightbend.

Share Button