Linux All-In-One For Dummies
Book image
Explore Book Buy On Amazon

The following Linux example using sed includes sample lines of a colon-delimited employee database that has five fields: unique id number, name, department, phone number, and address.

1218:Kris Cottrell:Marketing:219.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:219.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:21974 Unix Way
1221:Anne Heltzel:Finance:219.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:219.555.5555:984 Bash Lane

This database has been in existence since the beginning of the company and has grown to include everyone who now works, or has ever worked, for the company. A number of proprietary scripts read from the database, and the company cannot afford to be without it. The problem is that the telephone company has changed the 219 prefix to 260, so all entries in the database need to be changed.

This is precisely the task for which sed was created. As opposed to standard (interactive) editors, a stream editor works its way through a file and makes changes based on the rules it is given. The rule in this case is to change 219 to 260. It’s not quite that simple, however, because if you use the command

sed 's/219/260/'

the result is not completely what you want (changes are in bold):

1218:Kris Cottrell:Marketing:<b>260</b>.555.5555:123 Main Street
1<b>260</b>:Nate Eichhorn:Sales:219.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:<b>260</b>74 Unix Way
1221:Anne Heltzel:Finance:<b>260</b>.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:<b>260</b>.555.5555:984 Bash Lane

The changes in the first, fourth, and fifth lines are correct. But in the second line, the first occurrence of 219 appears in the employee id number rather than in the phone number and was changed to 260. If you wanted to change more than the very first occurrence in a line, you could slap a g (for global) into the command:

sed 's/219/260/g'

That is not what you want to do in this case, however, because the employee id number should not change. Similarly, in the third line, a change was made to the address because it contains the value that is being searched for; no change should have been made because the employee does not have the 219 telephone prefix.

The first rule of using sed is to identify what makes the location of the string you are looking for unique. If the telephone prefix were encased in parentheses, it would be much easier to isolate. In this database, though, that is not the case; the task becomes a bit more complicated.

If you said that the telephone prefix must appear at the beginning of the field (denoted by a colon), the result would be much closer to what you want:

sed 's/:219/:260/'

Again, bolding has been added to show the changes:

1218:Kris Cottrell:Marketing:<b>260</b>.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:<b>260</b>.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:<b>260</b>74 Unix Way
1221:Anne Heltzel:Finance:<b>260</b>.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:<b>260</b>.555.5555:984 Bash Lane

The accuracy has increased, but there is still the problem of the third line. Because the colon helped to identify the start of the string, it may be tempting to turn to the period to identify the end:

sed 's/:219./:260./'

But the result still isn’t what was hoped for (note the third line):

1218:Kris Cottrell:Marketing:<b>260</b>.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:<b>260</b>.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:<b>260</b>.4 Unix Way
1221:Anne Heltzel:Finance:<b>260</b>.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:<b>260</b>.555.5555:984 Bash Lane

Because the period has a special meaning of any character, a match is found whether the 219 is followed by a period, a 7, or any other single character. Whatever the character, it is replaced with a period. The replacement side of things isn’t the problem; the search needs to be tweaked.

By using the character, we can override the special meaning of the period and specify that you are indeed looking for a period and not any single character:

sed 's/:219./:260./'

The result becomes:

1218:Kris Cottrell:Marketing:<b>260</b>.555.5555:123 Main Street
1219:Nate Eichhorn:Sales:<b>260</b>.555.5555:1219 Locust Avenue
1220:Joe Gunn:Payables:317.555.5555:21974 Unix Way
1221:Anne Heltzel:Finance:<b>260</b>.555.5555:652 Linux Road
1222:John Kuzmic:Human Resources:<b>260</b>.555.5555:984 Bash Lane

And the mission is accomplished.

About This Article

This article can be found in the category: