Previous: Limitations of Builtins, Up: Portable Shell
The small set of tools you can expect to find on any machine can still include some limitations you should be aware of.
$ gawk 'function die () { print "Aaaaarg!" } BEGIN { die () }' gawk: cmd. line:2: BEGIN { die () } gawk: cmd. line:2: ^ parse error $ gawk 'function die () { print "Aaaaarg!" } BEGIN { die() }' Aaaaarg!
If you want your program to be deterministic, don't depend on for
on arrays:
$ cat for.awk END { arr["foo"] = 1 arr["bar"] = 1 for (i in arr) print i } $ gawk -f for.awk </dev/null foo bar $ nawk -f for.awk </dev/null bar foo
Some Awk implementations, such as HP-UX 11.0's native one, mishandle anchors:
$ echo xfoo | $AWK '/foo|^bar/ { print }' $ echo bar | $AWK '/foo|^bar/ { print }' bar $ echo xfoo | $AWK '/^bar|foo/ { print }' xfoo $ echo bar | $AWK '/^bar|foo/ { print }' bar
Either do not depend on such patterns (i.e., use ‘/^(.*foo|bar)/’, or use a simple test to reject such implementations.
AIX version 5.2 has an arbitrary limit of 399 on the the length of regular expressions and literal strings in an Awk program.
Traditional Awk implementations derived from Unix version 7, such as
Solaris /bin/awk, have many limitations and do not
conform to Posix. Nowadays AC_PROG_AWK
(see Particular Programs) finds you an Awk that doesn't have these problems, but if
for some reason you prefer not to use AC_PROG_AWK
you may need to
address them.
Traditional Awk does not support multidimensional arrays or user-defined functions.
Traditional Awk does not support the -v option. You can use
assignments after the program instead, e.g., $AWK '{print v
$1}' v=x; however, don't forget that such assignments are not
evaluated until they are encountered (e.g., after any BEGIN
action).
Traditional Awk does not support the keywords delete
or do
.
Traditional Awk does not support the expressions
a?
b:
c, !
a, a^
b,
or a^=
b.
Traditional Awk does not support the predefined CONVFMT
variable.
Traditional Awk supports only the predefined functions exp
,
int
, length
, log
, split
, sprintf
,
sqrt
, and substr
.
Traditional Awk getline
is not at all compatible with Posix;
avoid it.
Traditional Awk split
supports only two arguments.
Traditional Awk has a limit of 99
fields in a record. You may be able to circumvent this problem by using
split
.
AC_PROG_CC_C_O
.
When a compilation such as ‘cc -o foo foo.c’ fails, some compilers (such as cds on Reliant Unix) leave a foo.o.
HP-UX cc doesn't accept .S files to preprocess and assemble. ‘cc -c foo.S’ appears to succeed, but in fact does nothing.
The default executable, produced by ‘cc foo.c’, can be
The C compiler's traditional name is cc, but other names like
gcc are common. Posix 1003.1-2001 specifies the
name c99, but older Posix editions specified
c89 and anyway these standard names are rarely used in
practice. Typically the C compiler is invoked from makefiles that use
‘$(CC)’, so the value of the ‘CC’ make variable selects the
compiler name.
Some cp implementations (e.g., BSD/OS 4.2) do not allow trailing slashes at the end of nonexistent destination directories. To avoid this problem, omit the trailing slashes. For example, use ‘cp -R source /tmp/newdir’ rather than ‘cp -R source /tmp/newdir/’ if /tmp/newdir does not exist.
The ancient SunOS 4 cp does not support -f, although its mv does.
Traditionally, file timestamps had 1-second resolution, and ‘cp
-p’ copied the timestamps exactly. However, many modern file systems
have timestamps with 1-nanosecond resolution. Unfortunately, ‘cp
-p’ implementations truncate timestamps when copying files, so this
can result in the destination file appearing to be older than the
source. The exact amount of truncation depends on the resolution of
the system calls that cp uses; traditionally this was
utime
, which has 1-second resolution, but some newer
cp implementations use utimes
, which has
1-microsecond resolution. These newer implementations include GNU
Core Utilities 5.0.91 or later, and Solaris 8 (sparc) patch 109933-02 or
later. Unfortunately as of January 2006 there is still no system
call to set timestamps to the full nanosecond resolution.
Bob Proulx notes that ‘cp -p’ always tries to copy ownerships. But whether it actually does copy ownerships or not is a system dependent policy decision implemented by the kernel. If the kernel allows it then it happens. If the kernel does not allow it then it does not happen. It is not something cp itself has control over.
In Unix System V any user can chown files to any other user, and System V also has a non-sticky /tmp. That probably derives from the heritage of System V in a business environment without hostile users. BSD changed this to be a more secure model where only root can chown files and a sticky /tmp is used. That undoubtedly derives from the heritage of BSD in a campus environment.
GNU/Linux and Solaris by default follow BSD, but
can be configured to allow a System V style chown. On the
other hand, HP-UX follows System V, but can
be configured to use the modern security model and disallow
chown. Since it is an administrator-configurable parameter
you can't use the name of the kernel as an indicator of the behavior.
$ uname -a OSF1 medusa.sis.pasteur.fr V5.1 732 alpha $ date "+%s" %s
Some implementations, such as Tru64's, fail when comparing to
/dev/null. Use an empty file instead.
AS_DIRNAME
(see Programming in M4sh). For example:
dir=`dirname "$file"` # This is not portable. dir=`AS_DIRNAME(["$file"])` # This is more portable.
grep -E
. Also, some traditional implementations do
not work on long input lines. To work around these problems, invoke
AC_PROG_EGREP
and then use $EGREP
.
Portable extended regular expressions should use ‘\’ only to escape characters in the string ‘$()*+.?[\^{|’. For example, ‘\}’ is not portable, even though it typically matches ‘}’.
The empty alternative is not portable. Use ‘?’ instead. For instance with Digital Unix v5.0:
> printf "foo\n|foo\n" | $EGREP '^(|foo|bar)$' |foo > printf "bar\nbar|\n" | $EGREP '^(foo|bar|)$' bar| > printf "foo\nfoo|\n|bar\nbar\n" | $EGREP '^(foo||bar)$' foo |bar
$EGREP also suffers the limitations of grep.
Don't use length
, substr
, match
and index
.
expr '' \| ''
Posix 1003.2-1992 returns the empty string for this case, but traditional Unix returns ‘0’ (Solaris is one such example). In Posix 1003.1-2001, the specification was changed to match traditional Unix's behavior (which is bizarre, but it's too late to fix this). Please note that the same problem does arise when the empty string results from a computation, as in:
expr bar : foo \| foo : bar
Avoid this portability problem by avoiding the empty string.
Portable expr regular expressions should not begin with ‘^’. Patterns are automatically anchored so leading ‘^’ is not needed anyway.
The Posix standard is ambiguous as to whether ‘expr 'a' : '\(b\)'’ outputs ‘0’ or the empty string. In practice, it outputs the empty string on most platforms, but portable scripts should not assume this. For instance, the QNX 4.25 native expr returns ‘0’.
One might think that a way to get a uniform behavior would be to use the empty string as a default value:
expr a : '\(b\)' \| ''
Unfortunately this behaves exactly as the original expression; see the expr (‘|’) entry for more information.
Ancient expr implementations (e.g., SunOS 4 expr and Solaris 8 /usr/ucb/expr) have a silly length limit that causes expr to fail if the matched substring is longer than 120 bytes. In this case, you might want to fall back on ‘echo|sed’ if expr fails. Nowadays this is of practical importance only for the rare installer who mistakenly puts /usr/ucb before /usr/bin in PATH.
On Mac OS X 10.4, expr mishandles the pattern ‘[^-]’ in some cases. For example, the command
expr Xpowerpc-apple-darwin8.1.0 : 'X[^-]*-[^-]*-\(.*\)'
outputs ‘apple-darwin8.1.0’ rather than the correct ‘darwin8.1.0’. This particular case can be worked around by substituting ‘[^--]’ for ‘[^-]’.
Don't leave, there is some more!
The QNX 4.25 expr, in addition of preferring ‘0’ to the empty string, has a funny behavior in its exit status: it's always 1 when parentheses are used!
$ val=`expr 'a' : 'a'`; echo "$?: $val" 0: 1 $ val=`expr 'a' : 'b'`; echo "$?: $val" 1: 0 $ val=`expr 'a' : '\(a\)'`; echo "?: $val" 1: a $ val=`expr 'a' : '\(b\)'`; echo "?: $val" 1: 0
In practice this can be a big problem if you are ready to catch failures of expr programs with some other method (such as using sed), since you may get twice the result. For instance
$ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'
outputs ‘a’ on most hosts, but ‘aa’ on QNX 4.25. A simple workaround consists of testing expr and using a variable set to expr or to false according to the result.
Tru64 expr incorrectly treats the result as a number, if it can be interpreted that way:
$ expr 00001 : '.*\(...\)' 1
grep -F
. Also, some traditional implementations do
not work on long input lines. To work around these problems, invoke
AC_PROG_FGREP
and then use $FGREP
.
The replacement of ‘{}’ is guaranteed only if the argument is exactly {}, not if it's only a part of an argument. For instance on DU, and HP-UX 10.20 and HP-UX 11:
$ touch foo $ find . -name foo -exec echo "{}-{}" \; {}-{}
while GNU find reports ‘./foo-./foo’.
Some of the options required by Posix are not portable in practice.
Don't use ‘grep -q’ to suppress output, because many grep
implementations (e.g., Solaris) do not support -q.
Don't use ‘grep -s’ to suppress output either, because Posix
says -s does not suppress output, only some error messages;
also, the -s option of traditional grep behaved
like -q does in most modern implementations. Instead,
redirect the standard output and standard error (in case the file
doesn't exist) of grep
to /dev/null. Check the exit
status of grep
to determine whether it found a match.
Some traditional grep implementations do not work on long
input lines. On AIX the default grep
silently truncates long
lines on the input before matching.
Also, many implementations do not support multiple regexps
with -e: they either reject -e entirely (e.g., Solaris)
or honor only the last pattern (e.g., IRIX 6.5 and NeXT). To
work around these problems, invoke AC_PROG_GREP
and then use
$GREP
.
Another possible workaround for the multiple -e problem is to separate the patterns by newlines, for example:
grep 'foo bar' in.txt
except that this fails with traditional grep implementations and with OpenBSD 3.8 grep.
Traditional grep implementations (e.g., Solaris) do not
support the -E or -F options. To work around these
problems, invoke AC_PROG_EGREP
and then use $EGREP
, and
similarly for AC_PROG_FGREP
and $FGREP
. Even if you are
willing to require support for Posix grep, your script should
not use both -E and -F, since Posix does not allow
this combination.
Portable grep regular expressions should use ‘\’ only to
escape characters in the string ‘$()*.0123456789[\^{}’. For example,
alternation, ‘\|’, is common but Posix does not require its
support in basic regular expressions, so it should be avoided in
portable scripts. Solaris grep does not support it.
Similarly, ‘\+’ and ‘\?’ should be avoided.
cat >file <<'EOF' 1 x 2 y EOF cat file | join file -
Use ‘join - file’ instead.
For versions of the DJGPP before 2.04,
ln emulates symbolic links
to executables by generating a stub that in turn calls the real
program. This feature also works with nonexistent files like in the
Posix spec. So ‘ln -s file link’ generates link.exe,
which attempts to call file.exe if run. But this feature only
works for executables, so ‘cp -p’ is used instead for these
systems. DJGPP versions 2.04 and later have full support
for symbolic links.
On ancient hosts, ‘ls foo’ sent the diagnostic ‘foo not found’
to standard output if foo did not exist. Hence a shell command
like ‘sources=`ls *.c 2>/dev/null`’ did not always work, since it
was equivalent to ‘sources='*.c not found'’ in the absence of
‘.c’ files. This is no longer a practical problem, since current
ls implementations send diagnostics to standard error.
AS_MKDIR_P(
file-name)
(see Programming in M4sh)
or AC_PROG_MKDIR_P
(see Particular Programs).
Posix does not clearly specify whether ‘mkdir -p foo’ should succeed when foo is a symbolic link to an already-existing directory. The GNU Core Utilities 5.1.0 mkdir succeeds, but Solaris mkdir fails.
Traditional mkdir -p
implementations suffer from race conditions.
For example, if you invoke mkdir -p a/b
and mkdir -p a/c
at the same time, both processes might detect that a is missing,
one might create a, then the other might try to create a
and fail with a File exists
diagnostic. The GNU Core
Utilities (‘fileutils’ version 4.1), FreeBSD 5.0,
NetBSD 2.0.2, and OpenBSD 2.4 are known to be
race-free when two processes invoke mkdir -p
simultaneously, but
earlier versions are vulnerable. Solaris mkdir is still
vulnerable as of Solaris 10, and other traditional Unix systems are
probably vulnerable too. This possible race is harmful in parallel
builds when several Make rules call mkdir -p
to
construct directories. You may use
install-sh -d
as a safe replacement, provided this script is
recent enough; the copy shipped with Autoconf 2.60 and Automake 1.10 is
OK, but copies from older versions are vulnerable.
Here is sample code to create a new temporary directory safely:
# Create a temporary directory $tmp in $TMPDIR (default /tmp). # Use mktemp if possible; otherwise fall back on mkdir, # with $RANDOM to make collisions less likely. : ${TMPDIR=/tmp} { tmp=` (umask 077 && mktemp -d "$TMPDIR/fooXXXXXX") 2>/dev/null ` && test -n "$tmp" && test -d "$tmp" } || { tmp=$TMPDIR/foo$$-$RANDOM (umask 077 && mkdir "$tmp") } || exit $?
Moving individual files between file systems is portable (it was in Unix version 6), but it is not always atomic: when doing ‘mv new existing’, there's a critical section where neither the old nor the new version of existing actually exists.
On some systems moving files from /tmp can sometimes cause undesirable (but perfectly valid) warnings, even if you created these files. This is because /tmp belongs to a group that ordinary users are not members of, and files created in /tmp inherit the group of /tmp. When the file is copied, mv issues a diagnostic without failing:
$ touch /tmp/foo $ mv /tmp/foo . error-->mv: ./foo: set owner/group (was: 100/0): Operation not permitted $ echo $? 0 $ ls foo foo
This annoying behavior conforms to Posix, unfortunately.
Moving directories across mount points is not portable, use cp and rm.
Moving/Deleting open files isn't portable. The following can't be done on DOS variants:
exec > foo mv foo bar
nor can
exec > foo rm -f foo
This problem no longer exists in Mac OS X 10.4.3.
Avoid empty patterns within parentheses (i.e., ‘\(\)’). Posix does not require support for empty patterns, and Unicos 9 sed rejects them.
Unicos 9 sed loops endlessly on patterns like ‘.*\n.*’.
Sed scripts should not use branch labels longer than 8 characters and should not contain comments. HP-UX sed has a limit of 99 commands (not counting ‘:’ commands) and 48 labels, which can not be circumvented by using more than one script file. It can execute up to 19 reads with the ‘r’ command per cycle. Solaris /usr/ucb/sed rejects usages that exceed an limit of about 6000 bytes for the internal representation of commands.
Avoid redundant ‘;’, as some sed implementations, such as NetBSD 1.4.2's, incorrectly try to interpret the second ‘;’ as a command:
$ echo a | sed 's/x/x/;;s/x/x/' sed: 1: "s/x/x/;;s/x/x/": invalid command code ;
Input should not have unreasonably long lines, since some sed implementations have an input buffer limited to 4000 bytes.
Portable sed regular expressions should use ‘\’ only to escape characters in the string ‘$()*.0123456789[\^n{}’. For example, alternation, ‘\|’, is common but Posix does not require its support, so it should be avoided in portable scripts. Solaris sed does not support alternation; e.g., ‘sed '/a\|b/d'’ deletes only lines that contain the literal string ‘a|b’. Similarly, ‘\+’ and ‘\?’ should be avoided.
Anchors (‘^’ and ‘$’) inside groups are not portable.
Nested parenthesization in patterns (e.g., ‘\(\(a*\)b*)\)’) is quite portable to current hosts, but was not supported by some ancient sed implementations like SVR3.
Some sed implementations, e.g., Solaris, restrict the special role of the asterisk to one-character regular expressions. This may lead to unexpected behavior:
$ echo '1*23*4' | /usr/bin/sed 's/\(.\)*/x/g' x2x4 $ echo '1*23*4' | /usr/xpg4/bin/sed 's/\(.\)*/x/g' x
The -e option is portable. Some people prefer to use it:
sed -e 'command-1' \ -e 'command-2'
as opposed to the equivalent:
sed ' command-1 command-2 '
The following usage is sometimes equivalent:
sed 'command-1;command-2'
but Posix says that this use of a semicolon has undefined effect if command-1's verb is ‘{’, ‘a’, ‘b’, ‘c’, ‘i’, ‘r’, ‘t’, ‘w’, ‘:’, or ‘#’, so you should use semicolon only with simple scripts that do not use these verbs.
Commands inside { } brackets are further restricted. Posix says that they cannot be preceded by addresses, ‘!’, or ‘;’, and that each command must be followed immediately by a newline, without any intervening blanks or semicolons. The closing bracket must be alone on a line, other than white space preceding or following it.
Contrary to yet another urban legend, you may portably use ‘&’ in
the replacement part of the s
command to mean “what was
matched”. All descendants of Unix version 7 sed
(at least; we
don't have first hand experience with older sed implementations) have
supported it.
Posix requires that you must not have any white space between ‘!’ and the following command. It is OK to have blanks between the address and the ‘!’. For instance, on Solaris:
$ echo "foo" | sed -n '/bar/ ! p' error-->Unrecognized command: /bar/ ! p $ echo "foo" | sed -n '/bar/! p' error-->Unrecognized command: /bar/! p $ echo "foo" | sed -n '/bar/ !p' foo
Posix also says that you should not combine ‘!’ and ‘;’. If you use ‘!’, it is best to put it on a command that is delimited by newlines rather than ‘;’.
Also note that Posix requires that the ‘b’, ‘t’, ‘r’, and
‘w’ commands be followed by exactly one space before their argument.
On the other hand, no white space is allowed between ‘:’ and the
subsequent label name.
s/keep me/kept/g # a t end # b s/.*/deleted/g # c :end # d
on
delete me # 1 delete me # 2 keep me # 3 delete me # 4
you get
deleted delete me kept deleted
instead of
deleted deleted kept deleted
Why? When processing line 1, (c) matches, therefore sets the ‘t’ flag, and the output is produced. When processing line 2, the ‘t’ flag is still set (this is the bug). Command (a) fails to match, but sed is not supposed to clear the ‘t’ flag when a substitution fails. Command (b) sees that the flag is set, therefore it clears it, and jumps to (d), hence you get ‘delete me’ instead of ‘deleted’. When processing line (3), ‘t’ is clear, (a) matches, so the flag is set, hence (b) clears the flags and jumps. Finally, since the flag is clear, line 4 is processed properly.
There are two things one should remember about ‘t’ in sed. Firstly, always remember that ‘t’ jumps if some substitution succeeded, not only the immediately preceding substitution. Therefore, always use a fake ‘t clear’ followed by a ‘:clear’ on the next line, to reset the ‘t’ flag where needed.
Secondly, you cannot rely on sed to clear the flag at each new cycle.
One portable implementation of the script above is:
t clear :clear s/keep me/kept/g t end s/.*/deleted/g :end
utime
or
utimes
system call, which can result in the same kind of
timestamp truncation problems that ‘cp -p’ has.
On ancient BSD systems, touch or any command that results in an empty file does not update the timestamps, so use a command like echo as a workaround. Also, GNU touch 3.16r (and presumably all before that) fails to work on SunOS 4.1.3 when the empty file is on an NFS-mounted 4.2 volume. However, these problems are no longer of practical concern.