Escapes - sed, a stream editor

Previous: Extended Commands, Up: sed Programs

3.9 GNU Extensions for Escapes in Regular Expressions

Until this chapter, we have only encountered escapes of the form ‘\^’, which tell sed not to interpret the circumflex as a special character, but rather to take it literally. For example, ‘\*’ matches a single asterisk rather than zero or more backslashes.

This chapter introduces another kind of escape¹—that is, escapes that are applied to a character or sequence of characters that ordinarily are taken literally, and that sed replaces with a special character. This provides a way of encoding non-printable characters in patterns in a visible manner. There is no restriction on the appearance of non-printing characters in a sed script but when a script is being prepared in the shell or by text editing, it is usually easier to use one of the following escape sequences than the binary character it represents:

The list of these escapes is:

\a: Produces or matches a bel character, that is an “alert” (ascii 7).
\f: Produces or matches a form feed (ascii 12).
\n: Produces or matches a newline (ascii 10).
\r: Produces or matches a carriage return (ascii 13).
\t: Produces or matches a horizontal tab (ascii 9).
\v: Produces or matches a so called “vertical tab” (ascii 11).
\cx: Produces or matches Control-x, where x is any character. The precise effect of ‘\cx’ is as follows: if x is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz’ becomes hex 1A, but ‘\c{’ becomes hex 3B, while ‘\c;’ becomes hex 7B.
\dxxx: Produces or matches a character whose decimal ascii value is xxx.
\oxxx: Produces or matches a character whose octal ascii value is xxx.
\xxx: Produces or matches a character whose hexadecimal ascii value is xx.

‘\b’ (backspace) was omitted because of the conflict with the existing “word boundary” meaning.

Other escapes match a particular character class and are valid only in regular expressions:

\w: Matches any “word” character. A “word” character is any letter or digit or the underscore character.
\W: Matches any “non-word” character.
\b: Matches a word boundary; that is it matches if the character to the left is a “word” character and the character to the right is a “non-word” character, or vice-versa.
\B: Matches everywhere but on a word boundary; that is it matches if the character to the left and the character to the right are either both “word” characters or both “non-word” characters.
\`: Matches only at the start of pattern space. This is different from ^ in multi-line mode.
\': Matches only at the end of pattern space. This is different from $ in multi-line mode.

Footnotes

[1] All the escapes introduced here are GNU extensions, with the exception of \n. In basic regular expression mode, setting POSIXLY_CORRECT disables them inside bracket expressions.