Author: Laurence D. Finston.
This copyright notice applies to the text and source code of this web site, and the graphics that appear on it. The software described in this text has its own copyright notice and license, which can be found in the distribution itself.
Copyright (C) 2005, 2006 The Free Software Foundation
Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of this license is included in the file COPYING.TXT
Last updated: April 29, 2006
Top |
Parser |
Introduction to the Grammar |
Declarations |
Assignments |
Contact |
The GNU 3DLDF language is implemented by means of a parser. This is actually a C++ function called yyparse(), which is generated by a package called GNU Bison from a parser input file.
Parser input files for Bison have a special form. They consist largely of rules written in a special syntax with associated actions, written in C or C++. Bison reads such an input file and generates a file C or C++ code with the definition of yyparse().
Bison expects a single input file. In the case of 3DLDF, this file is
generated by ctangle from a large set of CWEB files. These are
the files in the directory 3DLDF-1.2.0.0/CWEB/
with names of the pattern p*.w
, where *
stands
for any sequence of characters. Names that fit this pattern are, for
example, parser.w, pbsndecl.w, pldfdcl.w, etc.
There are a couple of steps involved, but when ctangle and Bison are finished, there's a file of C++ called parser.c++ and two header files called parser.h and parser.h++. The header files are included in some of the other source files, and parser.c++ is compiled and linked with the other object files to create the executable 3dldf.
Bison also generates the file parser.output as a by-product of its operation. It's not needed for building the package, but I include it in the distribution as a convenience. It contains information about the parser function, including a list of the grammar rules in Backus-Naur form. It is an important source of information for anybody who wants to learn to use the interactive version of GNU 3DLDF, since I have not yet rewritten the manual. See Introduction to the Grammar for more information.
If you want to start learning about the source code of the parser, the place to start is parser.w. This file includes all of the other source files for the parser.
The file parser.output may seem intimidating at first glance, but it's not that hard to learn to interpret. And once you do, the grammar of the 3DLDF language will be an open book to you.
Please note that parser.output changes with every change to the grammar, and this happens frequently. So the examples on this page may not correspond exactly to what's in the most recent version of parser.output. However, I will mostly be talking about general principles here, so it shouldn't matter.
parser.output starts out with a copyright notice and a license.
After that, you'll find a line with the text
Terminals which are not used
followed by a long list of words
in capital letters. Another long list of lines beginning with the
word State
follows. Finally, you get to a line containing only
the word Grammar
. We've arrived!
The text in this section is a numbered list of the grammar rules of
the 3DLDF language in a special form called Backus-Naur
form. Rule 0 is automatically generated by Bison. Rule 1 is the
so-called start symbol of the language. It looks like this:
1 program: statement_list END
This means that in 3DLDF, a program is a statement list
followed by the token END.
In Bison, a token, also known as a
terminal symbol or just terminal for short,
is a basic unit of a language. They are conventionally written in
capital letters. Other parts of speech
are called
non-terminal symbols or just non-terminals.
They are built up out of tokens in the course of parsing input.
The tokens come from the scanner, which is a function
called yyscan(), but that's another story.
In Rule 1, program and statement_list are written in
lowercase letters, so they are non-terminal symbols. In addition,
there is a colon with program on the left-hand side
(LHS) and statement_list END on the right-hand side (RHS).
The colon can be read as is
, i.e.,
program is statement_list followed by END
.
Another way to read this is
statement_list END reduces to program
.
This is really what yyparse() is doing: It tries to match the
sequence of tokens it receives from the scanner to rules. When it does,
it goes into a particular state. These states pile up on a
stack until they can be reduced.
Then, they can begin to pile up again. Ultimately,
a valid input will reduce to the state corresponding to Rule 1, i.e.,
the rule for the start symbol, program.
The next two rules look like this:
2 statement_list: /* empty */ 3 | statement_list statement
empty, or a statement_list followed by a statement.
/*and
*/enclose a comment, so Rule 2 really has nothing at all on its right-hand side. The
|symbol means
or.
Rules 2 and 3 could be restated as follows:
2 statement_list: /* empty */ 3 statement_list: statement_list statement
Rules 2 and 3 are an example of recursion. Let's say yyparse()
starts parsing an input and it gets a statement (whatever that is).
It can now try to reduce to the state corresponding to Rule 3.
All we have is a statement, but it matches, because nothing
is a valid statement_list.
Now we scan another statement. We can match Rule 3 again, because we've got a statement_list from before, followed by our current statement. So, in effect, we're appending statements onto a statement_list.
To stop, all we have to do is get the scanner to pass an END token
to the parser. This is easy. The string end
, written as a separate word
in the input, following a statement, will do the trick.
In fact, this works for most tokens: There's usually a character or string
that corresponds or maps to them.
For example, these are the rules for statement:
4 statement: SEMI_COLON 5 | END_INPUT 6 | equation SEMI_COLON 7 | declaration SEMI_COLON 8 | assignment SEMI_COLON 9 | operation_assignment SEMI_COLON 10 | macro_definition 11 | command SEMI_COLON 12 | conditional 13 | loop 14 | macro_call 15 | let_statement SEMI_COLON
;and END_INPUT maps to the string
end_input. In fact, END_INPUT also maps to
endinput, so
end_inputand
endinputcan be used interchangeably. In other words, they are synonyms.
The exception is tokens used for declaring objects of types defined in
the 3DLDF language. For example, when an input contains the line
point p;
the string point
doesn't map to the token POINT,
which does exist, but rather to the token POINT_DECLARATOR.
The same applies to boolean
, transform
, path
, and
all of the other types defined in the 3DLDF language.
So what strings do map to POINT, PATH, etc.?
Well, none at all. When one is needed, the parser passes it to the scanner,
which passes it back to the parser. I call this
faking a token
, although such tokens are no less real than
ones created by scanning input. The reason for this is the separation
between the rules and the actions. Information about the type of
a variable can be found in an action, but it's not possible to
influence the course of parsing except by means of the stream of tokens
passed to the parsing function. Faking
tokens is a means
of doing this.
Most kinds of statements will require their own chapter. However, a few of them are so simple that I can explain them here.
4 statement: SEMI_COLON
This is the magic rule that makes it possible to have extra
semi-colons between statements.
5 statement: END_INPUT
When the parser receives an END_INPUT
token, it stops
reading from the current input stream. This rule is actually a bit more complicated
than it appears at first glance. The behavior of 3DLDF depends on what the
current input source is and whether it's using multiple threads to read input.
I will explain about this elsewhere.
6 statement: equation SEMI_COLON
Equations à la METAFONT do not yet exist in 3DLDF, so this rule is
currently non-functional.
NEXT CHAPTER: | Declarations |