How to rename files according to a pattern

We often encounter situations, where a large number of files should be copied or renamed or moved or something like that.
This can be done on the Linux command line, but it should be possible in almost the same way on the Unix/Linux/Cygwin-command line of newer MS-Windows or MacOS-X.

Now people routinely do that and they have developed several ways of doing it, which are all valid and useful.

I will show how I do things like that. It works and it is not the only way to do it.

So in the most simple case, all files in a directory ending in ‚.a‘ should be renamed to ‚.b‘.

What I do is:


ls *.a \
|perl -p -e 'chomp;$x = $_;s/\.a$/.b/;$y = $_; s/.+/mv $x $y\n/;' \
|egrep '^mv '\
|sh

You can run it without the last |sh, to check if it really does what you want.

So I use the files as input to a short perl script and create shell commands. It would be possible to do this actually in Perl itself, without piping it into a shell:


ls *.b \
|perl -n -e 'chomp;$x = $_;s/\.b$/.c/;$y=$_;rename $x, $y;'

You could also read the directory from perl, it is quite easy, but for just quickly doing stuff, I prefer getting the input from some ls.

To go into sub directories, you can use find:


find . -name '*.c' -type f -print \
| perl -n -e 'chomp;$x = $_;s/\.c$/.d/;$y=$_;rename $x, $y;'

a
You can also rename all the files that contain a certain string:

find . -name '*.html' -type f -print \
|xargs egrep -l form \
|perl -n -e 'chomp; $x=$_;s/\.html$/.form/;$y=$_;rename $x, $y;'

So you can combine with all kinds of shell commands and do really a lot of things in one line.

Of course you can use Raku, Ruby, Python or your favorite scripting language instead, as long as it allows some simple pattern matching and an efficient implicit iteration over the lines.

For such simple tasks there are also ways to do it directly in the shell like this

for f in *.d ; do mv $f `basename $f .d`.e; done

And you can always use sed, possibly in conjunction with awk instead of perl for such simple tasks.

Another approach is to just pipe the files into an texteditor that is powerful enough and create a one time script using powerful editing commands.
On Linux and Unix servers we almost always use vi, even people like me, who prefer Emacs on their own computer:

ls *.e > tmpscript
vi tmpscript

and then in vi


:0,$s/\(.*\)\(.e\)$/mv \1\2 \1.f/
ZZ

and then

sh tmpscript
rm tmpscript

So, there are many ways to achieve this goal and they are flexible and powerful enough to do really a lot more than just such simple pattern renaming.

If you work in a team and put these things into scripts, it might be necessary to follow a team policy about which scripting languages are preferred and which patterns are preferred. And you need to know the stuff that you write yourself, but also the stuff that your colleagues write.

Please, do not do

mv *.a *.b

It won’t work for good reasons.
On Linux and Unix systems the shell (usually bash) expands the glob expression (the stuff with the stars) into a list of strings and then starts mv with these strings a parameters. So calling mv with some file names ending in .a and .b, mv cannot have any idea what to do. When called with more than two parameters, the last one needs to be a directory where to move the stuff, so usually it will just refuse to work.

Share Button

GIMP

In spite of working mostly for server software and server setup using powerful non-graphic command line tools and scripting languages, it is sometimes fun to work with something very graphical. I did talk about Clojure Art, which is fun and creates interesting visual results and helps getting into the phantastic language Clojure. But more than twenty years ago I have discovered GIMP, which is the main image editor on Linux computers. I keep hearing that Photoshop is even a bit better, but it does not work on my computers, so I do not really care too much about it.

To be clear, I am not a professional image editing specialist, I just do it a bit for fun and without the claim of putting in all the knowledge about colors and their visual appearance, the functionality of gimp and image editing in general… I am just experimenting and finding out what looks interesting or good to me and how to work efficiently. Actually it brings together my three interests, programming, photography and bicycle touring, the combination of the latter two being the major source of my input material.

Now you start working with layers and with tools that increase or decrease the brightness, contrast and saturation of an image or of the layer being worked on. Now I would like to explore how certain functions can be brought into this. Either by putting them into my fork of gimp or into a plugin or into a script within gimp or a standalone script or program.

Some things that I find interesting and would like to explore: additional functions for merging layers. The function has the input of n (for example n=2) pixels from the same position and different layers and it produces another pixel. These are the twenty or thirty layer modes, that describe how a layer is seen on top of the next lower visible layer. So two images of the same size or two layers could be merged. It could be as nicely as in Gimp or just desctructively to make things easier. If it is worth anything for anybody else, maybe making it work as the current modes would be a noble goal. But for the moment it is more interesting what to do. A very logical thing to do is taking just the average of the two layers. So for rgb it could be the arithmetic mean of the r, g and b values of the n layers (or images) belonging to the same x-y-position. Now what would alpha values mean? I would think that they are weighting the average and the new alpha value could be the average of the input alpha values. Now we could use geometric, quadratic and cubic means and with some care concerning the 0 even harmonic means. Very funny effects could be created by combining these byte-values with functions like xor.

When working with any functions, it is always annoying that the r-g-b-values are always between 0 and 255 (or something like that). So this can be changed to real numbers, by doing something like

    \[s = \tan(\frac{(r-127.5)\pi}{256})\]

or to the non-negative real numbers by doing something like

    \[s = \tan(\frac{r\pi}{512})\]

Then some functions can be applied to these double values and in the end the inverse function will just be applied and result in a rounded and limited integral value from 0 to 255.

Do we need this? I do not know. But I think it would be fun to be able play around with some functions on one pixel, a vicinity of pixels, the same pixel-position from different layers and the like.

The nice thing is: we can see the result and like it or throw it away. Which function is correct or useful can be discussed and disagreed on. That is more fun than formally proven correctness, assuming of course, that the functions itself are implemented correctly.

I have not looked into the source code of plugins to tell what can reasonably be done without too much effort. But if someone reading this has some ideas, this would be interesting to hear about.

And finally we have one more advantage of GIMP, because it is open source and it is possible to make changes to it.

Share Button

How procurement can create value for IT projects

We know this, in many IT projects we need to make use of services and software and hardware that needs to be bought.

Actually it often makes a huge difference, what kind of deals are made and how efficient the projects can work on this basis.

I will just briefly tell a few stories and tell a bigger story by that.

A common pattern is a „preferred supplier“. It is nice to be a preferred supplier and in the phase when the partner is chosen and the contracts are made, companies often offer their best people and services to show how good it will be later to have them as preferred supplier. And then, when the deal has been fixed, they send the juniors for the same hourly rate and make a lot of profit. Or the price has been made so low, that only the juniors can be sent. This can be a problem in the long run, because it might get really difficult to get enough really good people in order to progress with strategic long term projects and not just maintenance of the daily business. Another interesting pattern can occur, when the preferred supplier is very strong with their employees in a project. Now they are getting some money from the hourly rates and in order to make profit the salaries should be significantly lower than this. In order for this to make sense for the employees they can provide non-monetary incentives, like some kind of career steps. Being in a powerful position in the project, the preferred supplier has some possibilities to choose who is in the project and who gets more responsible positions. So there is a temptation to kick out people who are not from their company and to provide these attractive positions not to the person, who would be the best choice for the customer, but to those whom they want to give an incentive. This is on the expense of their customer. So in the end of the day it is usually good to rely on multiple providers for „external“ people. There are serious companies who behave professionally and correctly even when they have become a „preferred supplier“, but this is not always the case.

When the preferred supplier is providing software, for example a database, it may be possible to get a really good deal for five years. Then in five years the deal needs to be extended and becomes magically more expensive. Especially if the company knows itself in the position of a „preferred supplier“. And when this issue is discovered, maybe even a year before, it is already too late to migrate. And then the expensive software needs to be used for another four years until it is again too late… And being from a big, impressive company does not necessarily make the software good. Counterexamples exist. In the case of databases I have seen companies that follow a strategy of multiple databases and that require a good reason for using the more expensive solutions. And magically the position of the buyer becomes much stronger when the deal needs to be extended for another five years. Maybe the overly expensive database will even be kicked out at some time. And yes, this expensive database has some really cool and pretty unique features. Unfortunately they only come in some enterprise edition that would be even much more expensive than the regular one, while open source databases provide decent, but less sophisticated variants of these enterprise features for a price that is less than the base version of the expensive database product. But, since the DB product cannot easily be changed, it is important to make a wise choice and to consider different options, including the more expensive ones, when starting a project. And to pick what makes sense for the specific needs.

Some interesting observations where made, when some preferred supplier made a really tempting offer for operating all the servers of a larger company. The annual price was really low. Much cheaper than doing it with their own team. Now suddenly the need arose, to store some really large amount of log files. I mean, I am talking about what could be stored on a few USB-discs, that could have been bought in the supermarket for a few hundred Euros. But of course this was forbidden, because the servers had to be run by the preferred supplier and even putting Linux on a few PCs that were no longer needed and attaching a few cheap disks would have been ruled out by the cheap overall contract. But this cheap solution would have been absolutely sufficient for the purpose. Now the diskspace could be bought from this supplier. Or more precisely rented. It was not a few hundred Euros, but a few hundred thousand Euros a year. Yeah, trey needed to make some money somehow… And a few hundred Euros once every few years, or maybe even a few thousand Euros every year would have been totally acceptable to pay by the project. But there are limits. You cannot do certain things under such conditions. The deal kills important possibilities of the IT people. I am not going to write, how this was resolved…

Another story, with a really cheap preferred supplier: They actually ran an important database for a stunningly low fixed base price. And on top of that it was paid per query. So what did they do? They designed the software in such a way that it used an Oracle database as a cache for the pay-per-query DB2 database. So the same query had to be made only once to DB2 as long as the data did not change. And when the data changed, the Oracle database just had to be cleaned up. Since this happened only a few times a year, this technically stupid architecture saved really a lot of money every year. Big money.

Yet another example: The management had already bought clearcase licenses. They were really expensive and the money was already gone. Now the setup that was used for clearcase and that was allowed by the licensing was not really optimized for part of the team working remotely. To do that efficiently would have required a much more expensive license that no one wanted to pay for. So every day synchronizing the software took like 30 to 45 minutes. And one team member had to work full time to maintain clearcase. There were some other pains, like it crashed when files contained only linefeeds instead of carriage return-linefeed and some other annoying details that I do not really remember. Just for the record, some of these issues have been fixed in later releases… And clearcase had a lot of really interesting features that were not at all used. The seriously useful features can all be found in git now, in a contemporary way, of course. But not in those days, when there was still neither git nor subversion. So some tests were performed and it looked like the free software CVS (which really sucks big time when compared with contemporary systems like git) would have worked much much better for the concrete project. But clearcase had to be used because it was so expensive and the money had already been paid.

So in the end of the day, when the procurement does make good deals, this can create a lot of value for the project and allow for efficient and innovative work and for solutions that make sense technically instead of finding tricks how ot bypass the worst parts of the contract.

So a good procurement team and a good communication with the technical staff that knows what is needed for their work is a big plus for everybody, for the project and for the company.

Share Button

Unix and Linux

When Linux appeared in the first half of the 1990’s, I used to hear a lot: „yes, this is a nice thing, but it is not a real Unix“.

So why was there a different name, even though it was behaving almost the same? It was Posix, but Unix was a trademark, that could not be applied to Linux.

Unix was very important in the 90’s and in the 2000’s, but now it has lost almost all of its relevance.

Newer systems almost always use Linux and systems that still run Unix (Solaris or Aix) are usually considered as something that needs to be migrated to Linux sooner or later.

There is nothing wrong with Unix. It brought us great concepts and these concepts are relevant today. And inventing and standardizing these concepts was a good thing and a success story. But all the good stuff can now be found in Linux and since the progress is happening there, it has surpassed the Unixes. Of course it was a factor that HP in the late 90’s or so announced that they saw no future for their HP/UX, which they revoked, but it had created damage to that system. Oracle had bought SUN and that weakened Solaris. Many other Unix-variants have already lost their relevance long ago or are still lively and good niche systems, like BSD.

The success story behind this is that standards that work across companies have been established and allow systems to work together and to behave similarly.

Share Button

How bad can a bad IT be for a company?

Just a funny story that happened some years ago…

I wanted to buy some lamps in a stored somewhere 100 km away from where I lived.

So I went to the shop, ordered them and bought something else already.

Now I went there again when the lamps were there. I had ordered six lamps, but actually wanted to buy one more. A bit of money I had already paid when ordering…
It was a bad day. They told me that the lamps were probably there, but they were not able to process the purchase or even find them because the IT was not running. I was kind of upset about wasting so much time and money to get there and so they paid me the train ticket..

Next time I went there. It took a long time until I was able to pay. And then it really took an hour.. Six lamps ordered, plus one more, minus the sales tax from the previous sale minus what I had already paid plus some other stuff that I had actually bought during this visit. So many numbers all had to be added together with the right sign… After about an hour and many false attempts they got it right. It took an hour from the time when it was my turn to the time I had actually successfully paid and my credit card did work correctly… Then I got a piece of paper and I had to go to another entrance of the building, quite far away from where I was. There they had another understanding about how many lamps I should get then what I thought I had paid for. So I went back to the lady where I had paid and asked her to come with me to help me get the right number of lamps. She did not want to help me in that way, but I got her to write a not on the piece of paper that she had given me, that this means that I was entitled to seven lamps and she signed it. Then after having spent at least two hours I was able to go home with seven lamps and whatever else I had bought.

Now the question is, what is wrong here?

Obviously the IT did not work too well… It did not work at all on the second visit and it did not help getting the job done during the third visit.

But was that really the problem? Or just the symptom?

My impression is that the top management of the company was really bad. The processes were bad. And they were not able to find good employees, to train them and to motivate them. And then the IT was showing the same standard as the rest of the company.

Fixing the IT would not fix the problem. The business has to be fixed, the processes, the management, the employees need to be trained well, selected well and most of all motivated to work well… Then, when there are decent processes, it is a good time to improve the IT to support these processes instead of retaining the bad processes by implementing an IT before understanding the business well enough.

Share Button

Just run it twice

Often we use some kind of „clustered“ environment to run our software.

This promises higher performance and better availability.

And the frameworks seem to suggest that it is just a matter of starting it twice and it will magically work correctly.

There is nothing wrong with investing some thoughts on this issue. It can actually quite wrong otherwise…

So some questions to think about:

Where are the data? Does each service have its own set of data? Or do they share the data? Or is there some kind of synchronization mechanism between the copies of data? Or is some data shared and some data as copy for each instance of the service?

Do we gain anything in terms of performance or is the additional power of the second instance eaten up by the overhead of synchronizing data? Or if data is only stored once, does this become the bottleneck?

Then there is an issue with sending the requests to the right service. Usually it is a good idea to use something like „sticky sessions“ to keep a whole session or collections of related requests on one instance. Even if the protocol is „stateless“ and „restful“.

Is there some magic caching that happens automatically, for example in persistence frameworks like Hibernate? What does this mean when running two instances? Do we really understand what is happening? Or do we trust that hibernate does it correctly anyway? I would not trust hibernate (or any other JPA implementation) on this issue.

What about transactions? If storage is not centralized, we might need to do distributed transactions. What does that mean?

Now messaging can become fun. Modern microservice architectures favor asynchronous communication over synchronous communication, where it can be applied. That means some kind of messaging or transmission of „events“ or whatever it is called. Now events can be subscribed. Do we want them to be processed exactly once, at least once or by every instance? How do we make sure it is happening as we need it? Especially the „exactly once“-case is tricky, but of course it can be done.

How do we handle tasks that run like once in a certain period of time, like cronjobs in Linux? Do we need to enforce that they run exactly once or is it ok to run them on each instance? If so, is it ok to run them at the same time?

Do we run the service multiple times on the productive system, but only a single instance on the test and development systems?

Running the service twice or more times is of course something we need to do quite often and it will become more common. But it is not necessarily easy. Some thinking needs to be done. Some questions need to be asked. And we need to find answwers to them. Consider this as a starting point for your own thinking processes for your specific application landscape. Get the knowledge, if it is not yet in the team by learning or by involving specialists who have experience…

Share Button

How far to go with internationalization?

I had some experience with mobile apps in Sweden. Since I know Swedish, I did not bother to find out, if they support other languages. But they were potentially important and useful, but implicitly required a domicile in Sweden.

More concrete, the apps that I was interesting in using were „Dalatrafik“ and „Swish“.

Swish is a mobile app for payment. I think that is the way to go and the future. Credit cards are just not quite as good, in many ways. I use Twint a lot in Switzerland and I like it very much. You either have NFC with the phone and the payment device or there is a QR-code to scan. But I think Swish goes much further, because the entry barrier for becoming a receiver of a payment is much lower. It can be used for payments between individuals or in flea markets, where a credit card terminal is not an option or too expensive. Now Sweden is generally more advanced in terms of cashless payment than other European countries and of course a . Very rarely cash is required and even surprisingly often only credit card, debit card and Swish work and cash is not accepted. Swish can be used to pay to a number, for example a camp ground can be paid, when the reception is closed by just sending the amount according to the price list to the swish number. So I have installed Swish. And then the big disappointment: A Swedish bank account is needed to back it. Maybe it is eventually worth having one, but ad hoc it is not an option and only works for residents of Sweden or frequent tourists of Sweden who go the extra mile to open a Swedish bank account as a non-resident. This is even difficult in Switzerland. I have not tried it in Sweden.

Dalatrafik provides information about public transport connections within the Swedish region Dalarna. This really works well. But then it also allows buying tickets for public transport. In times of Corona it is actually the only possible way, because the driver does not sell tickets. In larger towns, there are a few shops that sell tickets as well, but that is neither conveniant nor helpful in most cases. So now the app has two payment modes. Swish and credit card. Swish we will see later. Other than in almost all other cases, it turns out that this app only works with Swedish credit cards. So the way to go by bus as a tourist is to just try to buy a ticket and if somebody controls it, to say, it did not work… Why not accept foreign credit cards?

So please think a bit more who is your user base and of ways of including non-residents or other „non obvious“ groups of users, that are actually important. Starting with something that works and omitting the parts that are really difficult to achieve is good. But it is worth exploring h ow to make it work for non-residents, especially for apps that are so crucial. Dalatrafik could obviously allow foreign credit cards, as almost everybody in the world does. And Swish could come up with a way to add a foreign bank account or a foreign credit card to back the payments. We will see what they will come up with in the future. On the other hand it is amazing, that almost all credit card terminals and ATMs in the world support 6-digit PIN-codes for credit cards and debit cards, although they seem to exist only in Switzerland. A 6-digit code that you may choose yourself is much better than a 4-digit-code that is imposed on you by the bank.

An example of a funny constraint, that has eventually been removed is that users of iphones and ipads needed a Windows-computer (or Apple-computer) somewhere to run their devices, because updates were only possible together with a computer of either of these two types. So Linux users could not buy i-things. Or people who did not have any computer at all and just wanted to use the i-device as their computer. Then they came up with the possibility to do the update just with the device itself without needing any computer whatsoever. This is what all my Android devices do and always did and what makes sense.

Links

Share Button

Some thoughts about Usability

Just from the users perspective, some thoughts and experiences concerning usability…

I have to pay my phone bill every month. It is one for phone, internet, and everything, but it needs to be payed. It is mostly used for the company, so it is paid by my company and I need a proper invoice and the bits of information for how to pay it via e-banking.

Now I get an email every month telling me they want money and I want to pay it. That is not too hard, because the email already contains the information and it is also relatively easy to find on the web site.

But then I need the invoice. When I go to the web site, there are different worlds, like private, small business, big business. It is necessary to log into the right one, and if for historical reasons an account exists for „private“ that is an dead end.

Then it is finally possible to find the money that needs to be paid. But then again, the obvious path leads to the payment information without the link to the PDF of the invoice. It is a dead end.

The web site does have a beautiful design and provides a lot of information and services.

But why don’t they make it really a matter of seconds to pay my bill and printing the invoice? That is the only thing I really want and need to do every month. And it is a bad experience each time. Because I forget how to do it after a month and it is not easy and obvious enough. The invoice could be attached to the email or there could be a link that leads directly to the invoice. Plus another link for whatever else they want to show me, if they like… I can easily ignore that second link most of the time.

Share Button

Für neue Projekte verfügbar

Ich bin ab Anfang September 2020 für neue Projekte verfügbar.

Auf der Webseite meiner Firma IT Sky Consulting kann man sehen, was ich anbiete…

Share Button

Combining multiple scans

When images are scanned multiple times, maybe there is a way to construct an image that is better than any of the scans from them.

In this case it is assumed, that one scan has a higher resolution, but another scan got the colors better.

It has already been found out, which two scans belong to the same original.

Also one of them has already been rotated to the desired position.

The rotation can easily be found. Since images are roughly rectangular and width is roughly 1.5 times the height, simply two rotations have to be tried. Images are compared pixelwise and the one that has less deviation is probably the right one.

Now the orientation is already the same. And when scaled to the same size, the images should be exactly the same, apart from inaccuracies. But that is not the case. They are scaled slightly differently and they are shifted a bit.

Now the shift and scaling can be expressed by four numbers as x=a*u+b and y=c*v+d, where (x,y) and (u,v) are coordinates of the two images. It could be a matrix, if rotation is also involved and this would work the same way, but professional scans do have shift and scaling problems, but much less rotation problems.

So to estimate a, b, c and d, it is possible to go through all the pixels (u,v) of one image. Now the pixels (x,y) = (u+du, v+dv) are compared for combinations of small du and dv. The result is a sum of squares of differences s in a range between 0 and 3*65536. Now low values are good, that means that the coordiates (u,v,x,y) are a good match and need to be recorded with a high weight. This is achieved by weighting them with (3*65536-s)^2. Now it is just two linear regressions to calculate a,b,c and d. Of course, the points near ues the borders need special care, but it is possible to ommit a bit near the boarder to avoid this issue.

Now the points (u,v) of one image are iterated and the approximate match (x,y) on the other image is found. Now for (r,g,b) there can be a function to transform them. To start, it can be linear again, so this will be three linear regressions for r, g, and b. Or it could involve a matrix multiplied to the input vector. Or some function, then a matrix multiplication then vector addition and then the inverse function. The result has to be constrained to RGB-values between 0 and 255.

Now in the end, for each pixel in one image there is a function that changes the colors more to what the other image has. This can be applied to the full resolution image. Also, a weighted average of the original values and the calculated values, to find the best results. Then this can be applied to the whole series.

A few experiments with the function and a bit research on good functions for this might be done, to improve. In the end of the day there is a solution that is good enough and works. Or maybe a few variants and then some manual work to pick the best.

Share Button