GNU software that deals with regular expressions provides a number of additional regexp operators. These operators are described in this section and are specific to gawk; they are not available in other awk implementations. Most of the additional operators deal with word matching. For our purposes, a word is a sequence of one or more letters, digits, or underscores (`_'):
\w
[[:alnum:]_]
.
\W
[^[:alnum:]_]
.
\<
/\<away/
matches `away' but not
`stowaway'.
\>
/stow\>/
matches `stow' but not `stowaway'.
\y
\B
/\Brat\B/
matches `crate' but it does not match `dirty rat'.
`\B' is essentially the opposite of `\y'.
There are two other operators that work on buffers. In Emacs, a buffer is, naturally, an Emacs buffer. For other programs, gawk's regexp library routines consider the entire string to match as the buffer. The operators are:
\`
\'
Because `^' and `$' always work in terms of the beginning and end of strings, these operators don't add any new capabilities for awk. They are provided for compatibility with other GNU software.
In other GNU software, the word-boundary operator is `\b'. However, that conflicts with the awk language's definition of `\b' as backspace, so gawk uses a different letter. An alternative method would have been to require two backslashes in the GNU operators, but this was deemed too confusing. The current method of using `\y' for the GNU `\b' appears to be the lesser of two evils.
The various command-line options (see Options) control how gawk interprets characters in regexps:
--posix
--traditional
[[:alnum:]]
, etc.).
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
--re-interval