Sed by example, Part 1

Another sed example
Here's an example of how to use sed to remove the first line of the /etc/services file from our output stream:
$ sed -e '1d' /etc/services | more



1.1                  xml/htdocs/doc/en/articles/l-sed2.xml

file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed2.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo
plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed2.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo

Index: l-sed2.xml
===================================================================




 
Sed by example, Part 2


  Daniel Robbins


  Łukasz Damentko



Sed is a very powerful and compact text stream editor. In this article, the
second in the series, Daniel shows you how to use sed to perform string
substitution; create larger sed scripts; and use sed's append, insert, and
change line commands.




1.0
2005-07-15


How to further take advantage of the UNIX text editor

Substitution!



The original version of this article was published on IBM developerWorks, and is
property of Westtech Information Services. This document is an updated version
of the original article, and contains various improvements made by the Gentoo
Linux Documentation team.




Let's look at one of sed's most useful commands, the substitution command.
Using it, we can replace a particular string or matched regular expression with
another string. Here's an example of the most basic use of this command:


$ sed -e 's/foo/bar/' myfile.txt



The above command will output the contents of myfile.txt to stdout, with the
first occurrence of 'foo' (if any) on each line replaced with the string 'bar'.
Please note that I said first occurrence on each line, though this is normally
not what you want. Normally, when I do a string replacement, I want to perform
it globally. That is, I want to replace all occurrences on every line, as
follows:


$ sed -e 's/foo/bar/g' myfile.txt



The additional 'g' option after the last slash tells sed to perform a global
replace.



Here are a few other things you should know about the s/// substitution
command. First, it is a command, and a command only; there are no addresses
specified in any of the above examples. This means that the s/// command
can also be used with addresses to control what lines it will be applied to, as
follows:


$ sed -e '1,10s/enchantment/entrapment/g' myfile2.txt



The above example will cause all occurrences of the phrase 'enchantment' to be
replaced with the phrase 'entrapment', but only on lines one through ten,
inclusive.


$ sed -e '/^$/,/^END/s/hills/mountains/g' myfile3.txt



This example will swap 'hills' for 'mountains', but only on blocks of text
beginning with a blank line, and ending with a line beginning with the three
characters 'END', inclusive.



Another nice thing about the s/// command is that we have a lot of
options when it comes to those / separators. If we're performing string
substitution and the regular expression or replacement string has a lot of
slashes in it, we can change the separator by specifying a different character
after the 's'. For example, this will replace all occurrences of
/usr/local with /usr:


$ sed -e 's:/usr/local:/usr:g' mylist.txt



In this example, we're using the colon as a separator. If you ever need to
specify the separator character in the regular expression, put a backslash
before it.





Regexp snafus



Up until now, we've only performed simple string substitution. While this is
handy, we can also match a regular expression. For example, the following sed
command will match a phrase beginning with '<' and ending with '>', and
containing any number of characters inbetween. This phrase will be deleted
(replaced with an empty string):


$ sed -e 's/<.*>//g' myfile.html



This is a good first attempt at a sed script that will remove HTML tags from a
file, but it won't work well, due to a regular expression quirk. The reason?
When sed tries to match the regular expression on a line, it finds the longest
match on the line. This wasn't an issue in my previous sed article, because we
were using the d and p commands, which would delete or print the
entire line anyway. But when we use the s/// command, it definitely makes
a big difference, because the entire portion that the regular expression matches
will be replaced with the target string, or in this case, deleted. This means
that the above example will turn the following line:


<b>This</b> is what <b>I</b> meant.



Into this:


meant.



Rather than this, which is what we wanted to do:


This is what I meant.



Fortunately, there is an easy way to fix this. Instead of typing in a regular
expression that says "a '<' character followed by any number of characters, and
ending with a '>' character", we just need to type in a regexp that says "a
'<' character followed by any number of non-'>' characters, and ending
with a '>' character". This will have the effect of matching the shortest
possible match, rather than the longest possible one. The new command looks like
this:


$ sed -e 's/<[^>]*>//g' myfile.html



In the above example, the '[^>]' specifies a "non-'>'" character, and the '*'
after it completes this expression to mean "zero or more non-'>' characters".
Test this command on a few sample html files, pipe them to more, and review
their results.





More character matching



The '[ ]' regular expression syntax has some more additional options. To specify
a range of characters, you can use a '-' as long as it isn't in the first or
last position, as follows:



1.1                  xml/htdocs/doc/en/articles/l-sed3.xml

file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed3.xml?rev=1.1&content-type=text/x-cvsweb-markup&cvsroot=gentoo
plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/articles/l-sed3.xml?rev=1.1&content-type=text/plain&cvsroot=gentoo

Index: l-sed3.xml
===================================================================





Sed by example, Part 3


  Daniel Robbins


  Łukasz Damentko



Sed is a very powerful and compact text stream editor. In this article, the
second in the series, Daniel shows you how to use sed to perform string
substitution; create larger sed scripts; and use sed's append, insert, and
change line commands.




1.0
2005-07-16


Taking it to the next level: Data crunching, sed style

Muscular sed



The original version of this article was published on IBM developerWorks, and is
property of Westtech Information Services. This document is an updated version
of the original article, and contains various improvements made by the Gentoo
Linux Documentation team.



In my second sed article, I
offered examples that demonstrated how sed works, but very few of these examples
actually did anything particularly useful. In this final sed article, it's time
to change that pattern and put sed to good use. I'll show you several excellent
examples that not only demonstrate the power of sed, but also do some really
neat (and handy) things. For example, in the second half of the article, I'll
show you how I designed a sed script that converts a .QIF file from Intuit's
Quicken financial program into a nicely formatted text file. Before doing that,
we'll take a look at some less complicated yet useful sed scripts.





Text translation



Our first practical script converts UNIX-style text to DOS/Windows format. As
you probably know, DOS/Windows-based text files have a CR (carriage return) and
LF (line feed) at the end of each line, while UNIX text has only a line feed.
There may be times when you need to move some UNIX text to a Windows system, and
this script will perform the necessary format conversion for you.


$ sed -e 's/$/\r/' myunix.txt > mydos.txt



In this script, the '$' regular expression will match the end of the line, and
the '\r' tells sed to insert a carriage return right before it. Insert a
carriage return before a line feed, and presto, a CR/LF ends each line. Please
note that the '\r' will be replaced with a CR only when using GNU sed 3.02.80 or
later. If you haven't installed GNU sed 3.02.80 yet, see my first sed article for instructions on
how to do this.



I can't tell you how many times I've downloaded some example script or C code,
only to find that it's in DOS/Windows format. While many programs don't mind
DOS/Windows format CR/LF text files, several programs definitely do -- the most
notable being bash, which chokes as soon as it encounters a carriage return. The
following sed invocation will convert DOS/Windows format text to trusty UNIX
format:


$ sed -e 's/.$//' mydos.txt > myunix.txt



The way this script works is simple: our substitution regular expression matches
the last character on the line, which happens to be a carriage return. We
replace it with nothing, causing it to be deleted from the output entirely. If
you use this script and notice that the last character of every line of the
output has been deleted, you've specified a text file that's already in UNIX
format. No need for that!





Reversing lines



Here's another handy little script. This one will reverse lines in a file,
similar to the "tac" command that's included with most Linux distributions. The
name "tac" may be a bit misleading, because "tac" doesn't reverse the position
of characters on the line (left and right), but rather the position of lines in
the file (up and down). Tacing the following file:


foo
bar
oni



....produces the following output:


oni
bar
foo



We can do the same thing with the following sed script:


$ sed -e '1!G;h;$!d' forward.txt > backward.txt



You'll find this sed script useful if you're logged in to a FreeBSD system,
which doesn't happen to have a "tac" command. While handy, it's also a good idea
to know why this script does what it does. Let's dissect it.





Reversal explained



First, this script contains three separate sed commands, separated by
semicolons: '1!G', 'h' and '$!d'. Now, it's time to get an good understanding of
the addresses used for the first and third commands. If the first command were
'1G', the 'G' command would be applied only to the first line. However, there is
an additional '!' character -- this '!' character negates the address, meaning
that the 'G' command will apply to all but the first line. For the '$!d'
command, we have a similar situation. If the command were '$d', it would apply
the 'd' command to only the last line in the file (the '$' address is a simple
way of specifying the last line). However, with the '!', '$!d' will apply the
'd' command to all but the last line. Now, all we need to to is understand what
the commands themselves do.



When we execute our line reversal script on the text file above, the first
command that gets executed is 'h'. This command tells sed to copy the contents
of the pattern space (the buffer that holds the current line being worked on) to
the hold space (a temporary buffer). Then, the 'd' command is executed, which
deletes "foo" from the pattern space, so it doesn't get printed after all the
commands are executed for this line.



Now, line two. After "bar" is read into the pattern space, the 'G' command is
executed, which appends the contents of the hold space ("foo\n") to the pattern
space ("bar\n"), resulting in "bar\n\foo\n" in our pattern space. The 'h'
command puts this back in the hold space for safekeeping, and 'd' deletes the
line from the pattern space so that it isn't printed.



For the last "oni" line, the same steps are repeated, except that the contents
of the pattern space aren't deleted (due to the '$!' before the 'd'), and the
contents of the pattern space (three lines) are printed to stdout.



Now, it's time to do some powerful data conversion with sed.





sed QIF magic



-- 
gentoo-doc-cvs@gentoo.org mailing list