Node:Backslash Escapes, Previous:Match Structures, Up:Regular Expressions



21.5.3 Backslash Escapes

Sometimes you will want a regexp to match characters like * or $ exactly. For example, to check whether a particular string represents a menu entry from an Info node, it would be useful to match it against a regexp like ^* [^:]*::. However, this won't work; because the asterisk is a metacharacter, it won't match the * at the beginning of the string. In this case, we want to make the first asterisk un-magic.

You can do this by preceding the metacharacter with a backslash character \. (This is also called quoting the metacharacter, and is known as a backslash escape.) When Guile sees a backslash in a regular expression, it considers the following glyph to be an ordinary character, no matter what special meaning it would ordinarily have. Therefore, we can make the above example work by changing the regexp to ^\* [^:]*::. The \* sequence tells the regular expression engine to match only a single asterisk in the target string.

Since the backslash is itself a metacharacter, you may force a regexp to match a backslash in the target string by preceding the backslash with itself. For example, to find variable references in a TeX program, you might want to find occurrences of the string \let\ followed by any number of alphabetic characters. The regular expression \\let\\[A-Za-z]* would do this: the double backslashes in the regexp each match a single backslash in the target string.

regexp-quote str Scheme Procedure
Quote each special character found in str with a backslash, and return the resulting string.

Very important: Using backslash escapes in Guile source code (as in Emacs Lisp or C) can be tricky, because the backslash character has special meaning for the Guile reader. For example, if Guile encounters the character sequence \n in the middle of a string while processing Scheme code, it replaces those characters with a newline character. Similarly, the character sequence \t is replaced by a horizontal tab. Several of these escape sequences are processed by the Guile reader before your code is executed. Unrecognized escape sequences are ignored: if the characters \* appear in a string, they will be translated to the single character *.

This translation is obviously undesirable for regular expressions, since we want to be able to include backslashes in a string in order to escape regexp metacharacters. Therefore, to make sure that a backslash is preserved in a string in your Guile program, you must use two consecutive backslashes:

(define Info-menu-entry-pattern (make-regexp "^\\* [^:]*"))

The string in this example is preprocessed by the Guile reader before any code is executed. The resulting argument to make-regexp is the string ^\* [^:]*, which is what we really want.

This also means that in order to write a regular expression that matches a single backslash character, the regular expression string in the source code must include four backslashes. Each consecutive pair of backslashes gets translated by the Guile reader to a single backslash, and the resulting double-backslash is interpreted by the regexp engine as matching a single backslash character. Hence:

(define tex-variable-pattern (make-regexp "\\\\let\\\\=[A-Za-z]*"))

The reason for the unwieldiness of this syntax is historical. Both regular expression pattern matchers and Unix string processing systems have traditionally used backslashes with the special meanings described above. The POSIX regular expression specification and ANSI C standard both require these semantics. Attempting to abandon either convention would cause other kinds of compatibility problems, possibly more severe ones. Therefore, without extending the Scheme reader to support strings with different quoting conventions (an ungainly and confusing extension when implemented in other languages), we must adhere to this cumbersome escape syntax.