Go to the first, previous, next, last section, table of contents.


AWK Patterns

In this section the built in awk patterns are dissected one by one so as how they work can be shown.

%%PATTERN  "<awk><nul>" "<nul>" "" "<awk><sor>" "<sor>"
%%PATTERN  "<awk><sor>" "<wht>" "" "<awk><sor>" "<nul>"
%%PATTERN  "<awk><sor>" "<nul>" "" "<awk><sof>" "<sof>"
%%PATTERN  "<awk><sof>" "<del>" "" "<awk><sor>" "<eof>"
%%PATTERN  "<awk><sof>" "<any>" "" "<awk><sof>" "<fld>"
%%PATTERN  "<awk><sof>" "<nul>" "" "<awk><sor>" "<eof>"
%%PATTERN  "<awk><sor>" "<new>" "" "<awk><nul>" "<eor>"

The first which matches on the default initial mode matches any character on the input stream (but leaves the input stream as it is) jumps to the start of record mode, returning the start of record token in the process. In the start of record mode there are three possible matching patterns. They match on whitespace, the nul character and on the newline character. Whitespace is ignored (so that multiple spaces and tabs in the input are not interpreted as multiple fields), whereas the <nul> flags the start of a field and sets the mode to start of field mode. Since the <nul> is defined after the pattern for whitespace it will only match when the input is not whitespace. The newline character (which should actually also be defined above the <nul> pattern) sets the mode back to the initial mode and flags the end of the record. The remaining patterns only match in start of field mode. The three possible matching characters are the delimiter (normally whitespace), any other character except newline, and the nul character again. The order these are declared in is important. A character which is a delimiter character will always match first and sets the mode back to start of record mode (in prepartion for another field or end of record) and flags the end of field). Otherwise the character will match the next pattern (except if newline) and this leaves the mode the same but flags that the character is to be appended to the field definition. Finally on a newline the <nul> character matches and flags the end of the field and puts the mode back to start of record mode. Note that just matching a newline here (rather than null) wouldn't work as the newline signals the end of the record but would be taken off to flag the end of the field and end of record would not then be flagged. So using the <nul> matches the newline to return end of field but leaves the newline on the input stream so that in the start of record mode it can be matched by the last pattern definition to indicate the end of the record. This is neccessary since usually the last field does not have a delimiter after it.


Go to the first, previous, next, last section, table of contents.