Goodware: Russian BOT-Networks

The FSB in Russia has an increasing demand for computing power and storage. On the other hand it is getting more difficult to import computers or parts into Russia due to the sanctions.

So now an innovative Russian IT work group has developed a software stack, that allows reliable storage and computing on bot nets.

Some challenges:

  • It is possible that a whole bot net is taken down and all the data is lost
  • Since the confidential data is distributed across millions or billions of nodes all over the world, good encryption is needed all the way
  • Transfer of data needs to be obfuscated

The work group came up with solutions. They develop malware, which is actually called „goodware“, because it serves a good purpose, that installs the software on millions of computers all over the world. This malware is constantly changed and updated, in order to create multiple independent botnets.

The crucial data is encrypted. Advanced Russian technology allows to perform the calculations on encrypted data, so the real content is never revealed on the node.

The FSB-IT-team creates virtual networks, again based on botnets of goodware, with many hops that change rapidly to transfer data and goodware to and from the nodes and to control them.

Data is stored and processed redundantly. Since there is a huge surplus of computing power and storage volume, it is possible to store data with such high multiplicity that recovery is possible even if multiple complete goodware-botnets are taken down.

A positive side effect is that the FSB learns what is going on on the devices where they have positioned their goodware.

So let’s buy more and better computers and help the FSB to get more computing power.

Share Button

Ukraine

To all my readers:
I am way to upset about the invasion of Ukraine by Russian fascists and the atrocities they are committing there. I am at the moment not able to write useful articles about IT topics.

This is something from mainstream Russian TV by a super star who gets medals of honor from Putin:

I hope that this nightmare can be finished soon and we can invest some of our energy into building positive things. I will come back…

Share Button

Test systems

Typical software development environments have several systems running the software. While usually each developer runs it on his machine, there is a continuous integration server running and versions of the software that succeed there go to an artifactory and are immediately, during the night or manually installed on a system that is called „staging“, „development“, „dev“, „dev-test“ or something like that. Then it goes to a system called „test“, where the test team can work with it. They only get versions that are actually of interest to them, so it makes sense to separate it from „dev“. When successful, it might go to production, but usually there is just another system, with whatever name, that is supposed to be identical to production and is used for final release tests, before the software actually goes to production.

This kind of works, as long as the software is done by one team. Now we observe cases, where many teams work on the software. They develop their part of the software and kind of need a stable version of everything else. This is especially true for external remote teams.

Now we live in a time where virtual servers are the normal way to work. So getting a new set of servers is no longer an issue of buying hardware, but it is just a matter of running a few scripts. Good organizations can set up whole systems automatically in a matter of minutes. So it should be possible, to just provide within reason a few test systems more when needed and discard them again when done. Abusing an existing system for different purposes rarely works out smoothly.

So it is a good idea to select a technology that allows to setup a system or a whole landscape automatically. This can be virtual servers, docker containers or even physical hardware. It is a lot of work to set this up, but then it becomes easier to add just one more test system to run special tests or to have a stable release to develop against for other teams.

Share Button

File Permissions

Everyone who uses Linux (or some Unix) knows the normal file permissions that are administered with chmod, at least to some extent.

It is worth noting, what permissions of a file and the underlying directory mean.

In order to read the file, one needs read rights for the file and also read and executable rights for the directory in which the file is located and all super-directories.

Having the write right for the file means being able to change its contents. Which could mean deleting all content leaving a zero byte file.

But the right to delete a file requires write access to its directory, because it changes the contents of the directory. Same with creating files. rm actually checks if you have write access to the file as well, unless you use rm -f, which is just enforced by rm, not by the system.

Now there is another bit that comes in. ACL-rights, which can override the settings of the normal permissions and for example get specific users read or write access. These are read using getfacl and changed using setfacl. I think it would be desirable that ls had an option to show these ACL-rights as well, but it does not yet have that, at least not the version that I am using. What ls at least does: it shows a ‚+‘ after the normal permissions string, which indicates that acl-rights are actually in use.

So, if you do file operations and observe something works or does not work, which cannot be explained by the normal file permissions, check the ACL.

Now things can get even a bit more hairy, if you mount remote directories from a file server, because now accessing them requires that your computer and the file server allow it. And the network including firewalls.

Yet another bit can come in, it is called „SELinux“. Linux supports security modules and if a security module is present, all system calls are checked against the security module and may end up with „permission denied“. AppAmor is an example and the most well know security module is SELinux. It allows fine granular permission policies what is allowed and what is not allowed. In the end we get „permission denied“ although there is no visible obstacle. You can find out if that is the case by running
sestatus
and
getenforce
Now on productive systems it may not be allowed to turn off security features just for the fun of it, because they should protect from certain attack scenarios and we must expect that these attacks happen exactly when we turn them off for a minute to check something. But of course we do have test systems and there we can try out things. If after
setenforce 0
the same command works and after
setenforce 1
it does not work, than SELinux is stopping it. Which means you have to fix your SELinux settings or file attributes which are observed by SELinux.

Links

Share Button

Certificates

We have been using the term „certificates“ for a long time for files that are used to encrypt or sign data or to prove identity.

Now „certificates“ have become something that we use in our daily life to show that we are vaccinated or tested against Covid-19.

Because they are provided as QR-codes and people are used to thinking of QR-codes as a way to encode an URL in a machine readable way, many people think that this means that the certificates are actually stored on a server and just retrieved from there. Which again causes a lot of resistance against the certificates and the requirement to have them and to show them. They suggest using the „yellow book“ instead, which has worked just fine for providing proof of vaccinations.

But things have changed. Today a lot depends on being vaccinated and some people are really scared of the vaccinations, for example some internet stars spread the rumor that people will die a few years after the vaccination. How do they know less than that number of years after the first person has been vaccinated? I guess it is their secret. But today there are millions of people who are really willing to pay good money for a falsified yellow vaccination book and it is not really that hard to do that. So in the end of the day, the yellow book cannot be trusted any more.

Now, what is it with the certificate?

Just think of a text file that contains the relevant information. JSON, XML or something like that.

  • Who?
  • valid until?
  • what (test/vaccination/…)?
  • count of vaccinations (1/1, 2/2, 3/3…)
  • type of vaccine (mRNA…)?
  • Product (Moderna/BionTech/SputnikV/…)?
  • Producer?
  • Country (last) of vaccination?
  • Date of (last) vaccination?
  • Serial number of shot

More or less something like this…

Now this file is digitally signed by a universally trusted entity, like the ministry of health. So the person who performs the vaccination creates this file, transmits it to the ministry of health, where it is cryptographically signed and then it is transmitted back as a binary file, whose integrity can be verified by the signature, using the public key of the ministry, but it cannot be changed or created without the private key of the ministry, which needs to be kept safe in the ministry. A lot of really important things rely on this public key cryptography, so it should be OK for this purpose as well.

Now this binary file is just encoded as QR-code. Not the URL of the file, the actual contents. When we show the certificate and the person checking it is serious about it, they use an app that can read the QR-code and decode it to show the contents of the original text file in a nice format and ensure that it is exactly what has been signed by the ministry.

Now the ministry promises that it does not store a „backup“ of the certificate that it signed, once it is done. So the data is deleted shortly after having been signed. We can believe this or not. If we assume that they store data about our vaccinations and tests etc., then yes, they do that. But they do not need the certificate for that. They can get the information anyway without creating a certificate for us.

So I guess it is a pragmatic approach to use these certificates as a relatively fraud safe method to prove that one has been vaccinated or tested or gone through a Covid-infection. The yellow books should not be accepted any more and actually they are less and less. And of course the certificate with the QR-code on the phone or on a piece of paper is only better than the yellow book, if they really verify it with an app and if they check some ID as well.

This technology has a lot of interesting applications that we might make use of in the time after Covid-19.

Tickets for Railroad, Flight, Theater, Cinema, Museum,… can be personalized and done in a way that is hard to forge. We actually see this already today. Now there could be a digitalized passport on the phone as well, which is signed by the issuing authority. For many purposes that could work as well as the paper passport. Now think that this passport and the ticket can be matched, i.E. one trusted person checks that the passport and the ticket and the person travelling match and from than on the combined certificate containing the passport and the ticket can be shown, which would make things easier and more efficient and more relyable than having to show ticket and passport again and again.

Share Button

2022 Happy New Year

¡Próspero año nuevo! — うれしい新しい年 — Shnorhavor nor tari! — Selamat tahun baru! — Честита нова година! — Gelukkig nieuwjaar! — สวัสดีปีใหม่ — Lokkich nijjier! — عام سعيد — Среќна нова година! — Un an nou fericit! — Happy New Year! — Срећна нова година! — Feliz año nuevo! — С новым годом! — Srechno novo leto! — Sugeng warsa enggal! — Feliĉan novan jaron! — Próspero ano novo! — Godt Nyttår! — 새해 복 많이 받으세요 — Boldog új évet! — Godt nytår! — Frohes neues Jahr! — Laimingų naujųjų metų! — Felice Anno Nuovo! — Gleðilegt nýtt ár! — Hääd uut aastat! — 新年好 — Sala we ya nû pîroz be! — Sretna nova godina! — Szczęśliwego nowego roku! — السنة الجديدة المبتهجة — Gott nýggjár! — Cung chúc tân xuân! — Щасливого нового року! — Naya barsa ko hardik shuvakamana! — Feliz ano novo! — Bun di bun an! — Akemashite omedetô! — Een gelukkig nieuwjaar! — Laimīgu Jauno gadu! — FELIX SIT ANNUS NOVUS! — Bonne année! — Gott nytt år! — Yeni yılınız kutlu olsun! — سال نو مبارک — Šťastný nový rok! — Καλή Χρονια! — Onnellista uutta vuotta! — Весёлого нового года! — Laimīgu jauno gadu! — Ath bhliain faoi mhaise! — Subho nababarsho! — Nav varsh ki subhkamna!

 

This was created by a Ruby-program:

#!/usr/bin/ruby
# encoding: utf-8
# coding: utf-8

require 'securerandom'

texts = [ 'Akemashite omedetô!', 'Ath bhliain faoi mhaise!', 'Boldog új évet!', 'Bonne année!', 'Bun di bun an!', 'Cung chúc tân xuân!', 'Een gelukkig nieuwjaar!', 'FELIX SIT ANNUS NOVUS!', 'Felice Anno Nuovo!', 'Feliz ano novo!', 'Feliz año nuevo!', 'Feliĉan novan jaron!', 'Frohes neues Jahr!', 'Gelukkig nieuwjaar!', 'Gleðilegt nýtt ár!', 'Godt Nyttår!', 'Godt nytår!', 'Gott nytt år!', 'Gott nýggjár!', 'Happy New Year!', 'Hääd uut aastat!', 'Laimingų naujųjų metų!', 'Laimīgu Jauno gadu!', 'Laimīgu jauno gadu!', 'Lokkich nijjier!', 'Nav varsh ki subhkamna!', 'Naya barsa ko hardik shuvakamana!', 'Onnellista uutta vuotta!', 'Próspero ano novo!', 'Sala we ya nû pîroz be!', 'Selamat tahun baru!', 'Shnorhavor nor tari!', 'Srechno novo leto!', 'Sretna nova godina!', 'Subho nababarsho!', 'Sugeng warsa enggal!', 'Szczęśliwego nowego roku!', 'Un an nou fericit!', 'Yeni yılınız kutlu olsun!', '¡Próspero año nuevo!', 'Šťastný nový rok!', 'Καλή Χρονια!', 'Весёлого нового года!', 'С новым годом!', 'Срећна нова година!', 'Среќна нова година!', 'Честита нова година!', 'Щасливого нового року!', 'السنة الجديدة المبتهجة', 'سال نو مبارک', 'عام سعيد', 'สวัสดีปีใหม่', 'うれしい新しい年', '新年好', '새해 복 많이 받으세요' ];

shuffled_texts = texts.shuffle(random: SecureRandom);
str = shuffled_texts.join(' — ');
puts(str);

replace no-break-spaces by spaces…

Share Button

Christmas 2021

С Рождеством − З Рiздвом Христовим − Prettige Kerstdagen − Feliz Natal − क्रिसमस मंगलमय हो − Nollaig Shona Dhuit! − Buon Natale − Sretan božić − Natale hilare − Честита Коледа − Veselé Vánoce − Bella Festas daz Nadal − Kellemes Karácsonyi Ünnepeket − Crăciun fericit − Zalig Kerstfeest − Wesołych Świąt Bożego Narodzenia − 圣诞快乐 − Vesele bozicne praznike − Glædelig Jul − کريسمس مبارک − クリスマスおめでとう ; メリークリスマス − Gledhilig jól − God Jul − Fröhliche Weihnachten − Feliz Navidad − Su Šventom Kalėdom − Hyvää Joulua − Gleðileg jól − 즐거운 성탄, 성탄 축하 − Срећан Божић − Priecîgus Ziemassvçtkus − Bon nadal − God Jul! − Merry Christmas − καλά Χριστούγεννα − Mutlu Noeller − Selamat Hari Natal − ميلاد مجيد − Vesele Vianoce − Gëzuar Krishtlindjet − Häid jõule − Feliĉan Kristnaskon − Joyeux Noël

xmas tree

This was generated with a Perl Script:


#!/usr/bin/perl

use utf8;
binmode STDOUT, ':utf8';

my $SEP = ' − ';

my @texts = ( 'Bella Festas daz Nadal', 'Bon nadal', 'Buon Natale', 'Crăciun fericit', 'Feliz Natal', 'Feliz Navidad', 'Feliĉan Kristnaskon', 'Fröhliche Weihnachten', 'Gledhilig jól', 'Gleðileg jól', 'Glædelig Jul', 'God Jul!', 'God Jul', 'Gëzuar Krishtlindjet', 'Hyvää Joulua', 'Häid jõule', 'Joyeux Noël', 'Kellemes Karácsonyi Ünnepeket', 'Merry Christmas', 'Mutlu Noeller', 'Natale hilare', 'Nollaig Shona Dhuit!', 'Prettige Kerstdagen', 'Priecîgus Ziemassvçtkus', 'Selamat Hari Natal', 'Sretan božić', 'Su Šventom Kalėdom', 'Vesele Vianoce', 'Vesele bozicne praznike', 'Veselé Vánoce', 'Wesołych Świąt Bożego Narodzenia', 'Zalig Kerstfeest', 'καλά Χριστούγεννα', 'З Рiздвом Христовим', 'С Рождеством', 'Срећан Божић', 'Честита Коледа', 'ميلاد مجيد', 'کريسمس مبارک', 'क्रिसमस मंगलमय हो', 'クリスマスおめでとう ; メリークリスマス', '圣诞快乐', '즐거운 성탄, 성탄 축하');

my $n = @texts;
my $m = $n;
for ($i = 0; $i < $n; $i++) {     my $idx = rand($m);     my $text = splice(@texts, $idx, 1);     $m = @texts;     if ($i > 0) { 
        print $SEP; 
    }
    print $text;
}
print "\n";

Please replace NBSP by normal spaces…

Share Button

Pseudo Data Structure for Strava and Komoot

Strava and Komoot can be used to plan bicycle trips and to record them. Both tools can be used for both purposes, but it seems that Strava is better for the recording and Komoot better for the planning. Btw. it works also for walking or running, for example. But I will stick with the bicycle as the main example…

Now the normal process is to plan a trip and then cycle it and it gets recorded. If you use wandrer.earth also and connect it to Strava, you will see a map of the aggregated routes that you have cycled and coverage of areas in % and stuff like that. Recording can also be done by a smart speedometer, which is again connected to Strava…

Now there are trips that have been performed in the past, where recording was not yet in place, but you want to have them in the collection as well.

In this case the process is slightly different. You have already cycled it long ago and then plan it in komoot to exactly match what you cycled. This can be difficult, because roads may no longer exist, have bicycle forbidden or Komoot just thinks so, even though it is not true. Or they have become forbidden after you cycled them. In all these cases Komoot will refuse the actual route. There are two approaches to solve this. Either you put „off grid“ intermediate points and mark the actual route. Or if it is still present, you can plan it with mymaps of google, export it as kmz-file, convert kmz to gpx (using some of the sites that offer such a conversion or a program) and then you can import the gpx to Komoot and still change the parts that Komoot thinks are allowed.

Then it will be possible to export the route as gpx, use https://gotoes.org/strava to import it to Strava and provide some additional information, mostly the actual date and speed. Or the rough date and speed, if you do not know it exactly and do not find that important. Please put lower speed in doubt, so you do not become a cheater for speed records of certain sections. And make sure you use the right unit for speed and to use „bike“ as elevation speed calculation parameter…

Now it turns out that there will be hundreds or thousands of trips. There will be multiday trips, which you might want to handle as separate trips that kind of belong together. Or because of ferries or reuse of sections already planned or complexity you might even want to split the day into several parts.

Now both Komoot and Strava allow to give their trips names.

For multiday trips, I have numbered them and 3 digits are still sufficient. So their name in both Komoot and Strava is something like
R () {/}
where
is the 3 digit zero padded number of the trip,
is a capital letter indicating the part of the year (think of something like seasons) and the 4-digit year.
is clear
can be the actual date in ISO-format or if it is now know something like D1, D2, D3…
or for longer trips D01, D02, D03 or even D001, D002,… is a number for the section within the day, if there is more than one

Now if a route segment is used multiple times, in Komoot it will have a different name than in Strave, something like this:
S
where is often the highway number and the start and end point, if that works as a complete description.
is random, unique and consists of 6 characters [0-9A-Z].

If the Strava segment of a multiday trip is based on such a multiply used segment from komoot, it is named like this:
R () {/} S

For one day trips, that consist of only one segment that is specific to it, the description looks like this (in Strava and Komoot):
T {} {} {}
where
is random, unique and consists of 6 characters [0-9A-Z].
optionally describes how often the trip has been done (with no multiplicity meaning „once“), it can be a number or a „many“ or a number preceded by „~“.
is the iso date of the trip, if known (only applicable if multiplicity is not used)

For one day trips, that consist of only one segment that is multiply used, the description looks like this (in Strava and Komoot):
T {} {} {} S
This case is quite unlikely to occur…

For one day trips, that consist of several segments, the description looks like this:
T {} {} / {S}
can be the actual date if known or just „D“ for „any date“ if unknown. is used to number the segments from 1 to n
If it is based on a multiply used segment in Komoot, the S is included, otherwise the route has the same name in Komoot.

To support hiking or running or others, I suggest to just add this „means of transport“ as a keyword to the descriptions, making bicycle the default, if that is most commonly used.

So using some pseudo data structure in the description fields will help being able to have useful searchability even when the number of trips gets to large to page through…

What we can learn from this: Whenever we have some amount of data in some software, it is good to think of how to organize it and how to be able to find and connect the data. Even if the tool is quite dumb in this aspect.

The primary example of such a (less dumb) software is of course the database, where we actually do exactly this in very sophisticated ways.

Share Button

Some Thoughts about Geo-Positioning

Today every phone and tablet and a lot of other devices have capabilities to detect the current location and this can be used for many things… Even some speedometers for bicycles rely on geo-positioning only or use a rotation count of the wheel just as additional, optional source of information.

In the 1990s I worked in a company that provided management systems for public transport, which included radio transmission, schedule tracking, information display for passengers, information for drivers etc. An important ingredient was knowing where the vehicles were, or even better, when they passed a certain point. This seems to be mathematically equivalent, but when the vehicle moves at different speeds, stops at traffic lights and bus stops (or tram stops or subway stops) then the time of passing at a location cannot easily be found from the location at a certain time without loss of accuracy.

To understand things well: At that time GPS was already there, but it was deliberately made inaccurate for everyone who was not the US-military. This could be improved by putting an additional GPS-device at a fixed, well-known position and by applying the deviation found by this device to the measured positions of the mobile GPS devices. But some more sources where used. The vehicles usually moved along predefined routes. These were measured with a special measuring vehicle, which required someone with excellent knowledge of the bus routes to participate in measuring the distances between the busstops and for the whole line. In this case there were formal bus stops, which were stored in the software systems with scheduled times for each run and of course exact distances. And between there were informal stops which are marked with some sign and recognized by the drivers, with times being somehow between the scheduled times at the surrounding formal stops. Distances could vary, because depending on the daytime different routes for the same line could be planned. But for the specific vehicle on a specific run, in principal its exact route, the distances between formal stops and the arrival and departure times at the formal stops should be well defined internally, up to the second. Even though published schedules only show minutes. So, this adds another useful source of information, because the vehicle itself also measures the distance, maybe a bit less accurate than the measuring vehicle, but hopefully accurate enough to be useful. That is kind of the opposite of what current bicycle speedometers do, using the real measured distance and speed to learn about the position rather than the position to learn about speed and distance. Of course, bus drivers are humans, they make mistakes and when a new routing has been implemented they might miss a turn or so. Not very often, but it can happen. Also a driver might make a little detour and stop at a coffee shop to get a cup of coffee (of course take-away). But 99.5% of the time the distance measuring should help.

Yet another source of information are the stops and the door opening events, especially in conjunction with scheduled travel times and distances. Of course, the door can in rare cases open even without a stop. And more often, informal stops can be skipped, if nobody wants to get of or on. And stops can occur because of traffic jam and because of traffic lights. But even about these there can be some knowledge. This is yet another source of information.

The most reliable source of information are beacons. These are devices that are positioned at certain fixed points where a lot of vehicles pass by. Ideally in such a way that on each route at least once a beacon is passed. And they provide the more interesting information, when the vehicle passes them, not where the vehicle is at a certain time. Unfortunately these beacons were expensive, so they could only be used in small numbers as an additional source of information.

Today the concept of finding one’s position is quite mature. But we should not forget that companies like Samsung or Google can spend millions on developing hardware and software to do this really well. For a small company which is fighting with software delivery schedules, different client requirements, legacy software, and of course very limited budgets, doing a good geo-positioning by combining many insufficient sources of information to get a decent accuracy was quite a challenge. A lot of ideas to improve this, that where „obvious“, would be moved to some far future releases that never actually happened. For example, the system could learn distances and travel times between stops by observing the system for a few weeks and even include informal stops. And different sources of information could be combined in sophisticated ways, not by just using the best information or some kind of average or so… I am sure the developers did a good job.

So what happened: Today we have several sets of satellites for geo-positioning. Galileo was launched to get more accuracy and to be independent of the deliberately inaccurate American GPS. And the Americans stopped making GPS inaccurate and even removed the feature from newer satellites. At least they say that they did. There is a Russian, a Chinese and a Japanese system and some more. So current phones and speedometers can and do use all of these simultaneously to get fantastic accuracy. It is said that Google observes WiFi names and uses that as an additional source of information. What could be used, are the antennas of the mobile network. At a certain location a certain number of antennas can be seen and the strength of the signal can be measured and be used in conjunction with knowledge of obstacles, orientation of antennas and their transmission power to improve location finding. I do not know if this is done and if this would be of any help for improving accuracy. What could also be used are sensors in the devices that observe acceleration. And of course, beacons can now be any kind of object that is sufficiently unique and that is at a fixed position and can be recognized by a camera with some images recognition software. In case of bicycle speedometers or in case of public transport vehicles, using the wheel rotation as measurement for the distance and using that as additional source of information could still be useful, especially in tunnels, where the satellites are not helpful. When it comes to the third dimension, elevations, mountains or just human built structures, the satellite positioning can be used even for that. But the air pressure is an additional source of information, especially in conjunction with weather data and with data about the location of elevations and artificial structures. Then it can even help knowing about the position, unless it is located in a flying object.

It can be seen that this is an interesting area and even though the accuracy is already very good today, some further improvements would still be useful.

Share Button

Automatic editing

For changing file contents, we often use editors.

But sometimes it is a good idea to do this from a script and do a change to many files at once or do the same kind of changes often.

The classic approach to this is using sed, which is made exactly for this purpose.

But most scripting languages, like for example Perl can do similar work to sed. Since Perl was developed with the idea of replacing awk and sed and some shell scripting and being a complete programming language, it claims to do the same better than sed. Ruby, Python, Raku and some others are also possible to use.

Basically this can be done in typical bash style work, so the bash script uses specific tools to perform small tasks and together an end result is achieved.

I am quite familiar with Perl, so for this kind of replace-in-file I am using Perl, just in a way that would with some minor changes also work with the other four languages mentioned or probably some others. Of course for more advanced stuff there might be areas where it for example Perl or Raku work better.

One thing needs to be thought about:

These „scripted search and replace“ operations primarily work like pipes, so with something like

perl -p -e 's/ä/ae/g;s/ö/oe/g;' < infile > outfile

or the equivalent in sed, Python, Raku or Ruby. In the end it is probably desired that the outfile just replaces the infile, but

perl -p -e 's/ä/ae/g;s/ö/oe/g;' < infile > infile

won’t work.

So an universally applicable approach is to just mv the modified file to the original afterwards, like

perl -p -e 's/ä/ae/g;s/ö/oe/g;' < infile > outfile \
&& mv outfile infile

Sometimes it is desirable to retain the original file to be able to check if the transformation did what it was meant to do and to be able to go back…

mv infile infile~
perl -p -e 's/ä/ae/g;s/ö/oe/g;' < infile~ > infile

And then the files ending with ~ need to be cleaned up when everything is done.

Since this is the most common way to work, there is a shortcut (at least for Perl):

perl -p -i~ -e 's/ä/ae/g;s/ö/oe/g;' infile

If no backup is needed, this can be done as

perl -p -i -e 's/ä/ae/g;s/ö/oe/g;' infile

For these simple examples, using sed, ruby, perl, python or raku is more a question of what syntax we know by heart. And of course, Perl, Python and SED are most likely installed on almost any Linux server…

Sometimes we want to do more complex things.. So depend on a context or on a state and do different substitutions depending on that. Or aggregate information in variables and insert them at some point.. For more complex things I really prefer real programming languages like Perl or Ruby, but a lot can be done by using just bash, sed and awk, if you know them really well.

A funny thing is that there is a simple editor ed, which can actually be used from the command line to apply editing functions to a file.

I remember that in some project there were a lot of template files that where used to create emails for example. It was quite a good idea to change them with perl scripts (and retain the original). Then I could show the outcome to the stakeholders and change the scripts until it was ok. This works for a couple of hundred files without problems, but of course it works better if they are kept clean with no unnecessary differences sneaking in that cause automatic editing to fail or create different results on some of the files. Testing remains important, of course.

Better learn to do at least some basic automated editing than doing tedious manual editing of a couple of hundred files, possibly having to do it more than once.

Share Button