Friday, February 27, 2009

Using regular expressions in Vim

Today I was faced with the task of replacing lots of words in a script file. The problem was that I could not simply replace them by doing a "search-and-replace" (:%s/old_word/new_word/g) because they should only be replaced on some of the lines. For example, the script says

OUTSTANDING
COMMAND Record SetPriority priorityRecording_INI
COMMAND Play SetPriority priorityRecording_INI
COMMAND Record SetDestinationSampleRateL Record_A_006_INI
COMMAND Record SetDestinationBitRateL Record_A_005_INI

My task was to replate the priorityRecording_INI with priorityPlayback_INI for all the lines which had the COMMAND Play on them. While I in this case could have done it by hand, a regular expression is perfect for this task!

First we need to use the "search-and-replace" command like we stated above:

:%s/exp1/exp2/g

This is in more detail the command s (find and substitute), extended with "%" which tells it to look through the entire file. "exp1" is then replaced by "exp2", and "/g" in the end tells it to do it for all occurrences for every line.

To boil the problem down, we first focus on what tells the line apart. We use "Play" as the command to tell the difference. We then search for "Play", and don't care about the rest, at least up until the thing we want to replace. This means we search for

/Play.*priority

since priority is the last word we want to keep. The problem here is that we have two instances of the word priority, but we can tell them apart by the fact that one of them has a TAB in front of it (it could be spaces as well, but I won't go into that flame war). To enter a TAB character in Vim, we press ctrl+v TAB, which looks like ^I.

The command

/Play.*^Ipriority

highlights everything from Play to end of priority. We could also do a case sensitive search, but I have that disabled as default, so this was a smoother solution.

To keep certain parts of our search, and replace others, we use \(exp3\) formatting. This means that the substitution command won't replace that, but instead move it to a new position in the substitution expression. Alll wildcards or search expressions outside of the \(\) will be replaced. The new position for the "exp3" (or several) is noted by using \1 (\2, \3, ..., etc etc) in the order they have been declared in the search expression.

So first we define the "/Play.*^Ipriority" as the first thing we want to keep, and since we also want to keep the "_INI" in the end, we define that as \2. The end result is rather confusing, but it does the job!

:%s/\(Play.*^Ipriority\).*\(_INI\)/\1Playback\2/g

And the result is exactly what I wanted!

COMMAND Record SetPriority priorityRecording_INI
COMMAND Record SetPriority priorityPlayback_INI
COMMAND Record SetDestinationSampleRateL Record_A_006_INI
COMMAND Record SetDestinationBitRateL Record_A_005_INI

Imagine doing this for a file with 2000 rows, allowing us to automate boring manual labour. You can read more about regular expressions in Vim's help page (:help :s or :help pattern) or http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/, a great cheat-sheet.