Node:Backslash Escapes, Previous:Match Structures, Up:Regular Expressions
Sometimes you will want a regexp to match characters like *
or
$
exactly. For example, to check whether a particular string
represents a menu entry from an Info node, it would be useful to match
it against a regexp like ^* [^:]*::
. However, this won't work;
because the asterisk is a metacharacter, it won't match the *
at
the beginning of the string. In this case, we want to make the first
asterisk un-magic.
You can do this by preceding the metacharacter with a backslash
character \
. (This is also called quoting the
metacharacter, and is known as a backslash escape.) When Guile
sees a backslash in a regular expression, it considers the following
glyph to be an ordinary character, no matter what special meaning it
would ordinarily have. Therefore, we can make the above example work by
changing the regexp to ^\* [^:]*::
. The \*
sequence tells
the regular expression engine to match only a single asterisk in the
target string.
Since the backslash is itself a metacharacter, you may force a regexp to
match a backslash in the target string by preceding the backslash with
itself. For example, to find variable references in a TeX program,
you might want to find occurrences of the string \let\
followed
by any number of alphabetic characters. The regular expression
\\let\\[A-Za-z]*
would do this: the double backslashes in the
regexp each match a single backslash in the target string.
regexp-quote str | Scheme Procedure |
Quote each special character found in str with a backslash, and return the resulting string. |
Very important: Using backslash escapes in Guile source code
(as in Emacs Lisp or C) can be tricky, because the backslash character
has special meaning for the Guile reader. For example, if Guile
encounters the character sequence \n
in the middle of a string
while processing Scheme code, it replaces those characters with a
newline character. Similarly, the character sequence \t
is
replaced by a horizontal tab. Several of these escape sequences
are processed by the Guile reader before your code is executed.
Unrecognized escape sequences are ignored: if the characters \*
appear in a string, they will be translated to the single character
*
.
This translation is obviously undesirable for regular expressions, since
we want to be able to include backslashes in a string in order to
escape regexp metacharacters. Therefore, to make sure that a backslash
is preserved in a string in your Guile program, you must use two
consecutive backslashes:
(define Info-menu-entry-pattern (make-regexp "^\\* [^:]*"))
The string in this example is preprocessed by the Guile reader before
any code is executed. The resulting argument to make-regexp
is
the string ^\* [^:]*
, which is what we really want.
This also means that in order to write a regular expression that matches
a single backslash character, the regular expression string in the
source code must include four backslashes. Each consecutive pair
of backslashes gets translated by the Guile reader to a single
backslash, and the resulting double-backslash is interpreted by the
regexp engine as matching a single backslash character. Hence:
(define tex-variable-pattern (make-regexp "\\\\let\\\\=[A-Za-z]*"))
The reason for the unwieldiness of this syntax is historical. Both regular expression pattern matchers and Unix string processing systems have traditionally used backslashes with the special meanings described above. The POSIX regular expression specification and ANSI C standard both require these semantics. Attempting to abandon either convention would cause other kinds of compatibility problems, possibly more severe ones. Therefore, without extending the Scheme reader to support strings with different quoting conventions (an ungainly and confusing extension when implemented in other languages), we must adhere to this cumbersome escape syntax.