Next: Extended regexps, Previous: Other Resources, Up: Top
Email bug reports to bonzini@gnu.org.
Be sure to include the word “sed” somewhere in the Subject:
field.
Also, please include the output of ‘sed --version’ in the body
of your report if at all possible.
Please do not send a bug report like this:
while building frobme-1.3.4
$ configure
error--> sed: file sedscr line 1: Unknown option to 's'
If GNU sed doesn't configure your favorite package, take a few extra minutes to identify the specific problem and make a stand-alone test case. Unlike other programs such as C compilers, making such test cases for sed is quite simple.
A stand-alone test case includes all the data necessary to perform the test, and the specific invocation of sed that causes the problem. The smaller a stand-alone test case is, the better. A test case should not involve something as far removed from sed as “try to configure frobme-1.3.4”. Yes, that is in principle enough information to look for the bug, but that is not a very practical prospect.
Here are a few commonly reported bugs that are not bugs.
N
command on the last lineFor example, the behavior of
sed N foo bar
would depend on whether foo has an even or an odd number of
lines1. Or, when writing a script to read the
next few lines following a pattern match, traditional
implementations of sed
would force you to write
something like
/foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N }
instead of just
/foo/{ N;N;N;N;N;N;N;N;N; }
In any case, the simplest workaround is to use $d;N
in
scripts that rely on the traditional behavior, or to set
the POSIXLY_CORRECT
variable to a non-empty value.
\|
,
\+
, \?
, \`
, \'
, \<
,
\>
, \b
, \B
, \w
, and \W
.
As in all GNU programs that use posix basic regular
expressions, sed interprets these escape sequences as special
characters. So, x\+
matches one or more occurrences of ‘x’.
abc\|def
matches either ‘abc’ or ‘def’.
This syntax may cause problems when running scripts written for other
seds. Some sed programs have been written with the
assumption that \|
and \+
match the literal characters
|
and +
. Such scripts must be modified by removing the
spurious backslashes if they are to be used with modern implementations
of sed, like
GNU sed.
On the other hand, some scripts use s|abc\|def||g to remove occurrences
of either abc
or def
. While this worked until
sed 4.0.x, newer versions interpret this as removing the
string abc|def
. This is again undefined behavior according to
POSIX, and this interpretation is arguably more robust: older
seds, for example, required that the regex matcher parsed
\/
as /
in the common case of escaping a slash, which is
again undefined behavior; the new behavior avoids this, and this is good
because the regex matcher is only partially under our control.
In addition, this version of sed supports several escape characters
(some of which are multi-character) to insert non-printable characters
in scripts (\a
, \c
, \d
, \o
, \r
,
\t
, \v
, \x
). These can cause similar problems
with scripts written for other seds.
The permissions on a file say what can happen to the data
in that file, while the permissions on a directory say what can
happen to the list of files in that directory. ‘sed -i’
will not ever open for writing a file that is already on disk.
Rather, it will work on a temporary file that is finally renamed
to the original name: if you rename or delete files, you're actually
modifying the contents of the directory, so the operation depends on
the permissions of the directory, not of the file. For this same
reason, sed does not let you use -i on a writeable file
in a read-only directory, and will break hard or symbolic links when
-i is used on such a file.
0a
does not work (gives an error)0,/
RE/
as active when the script starts: if
you write 1,/abc/d
and the first line includes the word ‘abc’,
then that match would be ignored because address ranges must span at least
two lines (barring the end of the file); but what you probably wanted is
to delete every line up to the first one including ‘abc’, and this
is obtained with 0,/abc/d
.
[a-z]
is case insensitive[a-z]
uses the current locale's collation order – in C parlance, that means using
strcoll(3)
instead of strcmp(3)
. Some locales have a
case-insensitive collation order, others don't.
Another problem is that [a-z]
tries to use collation symbols.
This only happens if you are on the GNU system, using
GNU libc's regular expression matcher instead of compiling the
one supplied with GNU sed. In a Danish locale, for example,
the regular expression ^[a-z]$
matches the string ‘aa’,
because this is a single collating symbol that comes after ‘a’
and before ‘b’; ‘ll’ behaves similarly in Spanish
locales, or ‘ij’ in Dutch locales.
To work around these problems, which may cause bugs in shell scripts, set
the LC_COLLATE and LC_CTYPE environment variables to ‘C’.
s/.*//
does not clear pattern spaceTo work around these problems, which may cause bugs in shell scripts, set the LC_COLLATE and LC_CTYPE environment variables to ‘C’.