|
|
The algorithm yacc uses to go from the specification to the C code for the parser is complex and is not discussed here. The parser itself is relatively simple and an understanding of how it works, while not essential, makes treatment of error recovery and ambiguities easier.
The parser produced by yacc consists of a finite-state machine with a state stack. The parser is also capable of reading and remembering the next input token, called the ``look-ahead token''. The current state is always the one on the top of the stack. The states of the finite-state machine are given small integer labels. In addition to the state stack, a parallel stack, the value stack, holds the values returned by the lexical analyzer and the actions. Initially, the machine is in state 0 (that is, the state stack contains only state 0) and no look-ahead token has been read.
The machine has only five actions available: shift, reduce, goto, accept, and error. The goto is always performed as a component of a reduce action. The parser operates in the following manner:
The shift action is the most common one the parser takes. A shift occurs whenever a token is recognized. Whenever a shift action is taken, there is always a look-ahead token. A shift is represented in y.output as follows (assume that the current state is 56):
LOOP shift 34This says that in state 56, if the look-ahead token is LOOP, then the current state (56) is pushed down on the stack, and state 34 becomes the current state, that is, it is pushed on the stack. In addition, the look-ahead token is cleared, and the variable yylval is pushed on to the value stack.
A reduce action takes place when the parser determines that all the items on the right-hand side of some grammar rule have been seen. In a reduce action, all the states that were put on the stack while the right-hand side of the rule was being recognized are popped from the stack. Then a new state is put on the stack, based on the symbol to the left of the rule, the state that is currently at the top of the stack, and sometimes the look-ahead token. Suppose the following rule is being reduced:
A : x y z ;The reduce action depends on the symbol to the left of the colon and the number of symbols on the right-hand side (in this case, three). This reduction first involves popping three states off the top of the stack. (In general, the number of states popped equals the number of symbols on the right side of the rule.) In effect, these states were the ones put on the stack while recognizing x, y, and z and no longer serve any useful purpose. After these states have been popped, the state that is on top of the stack is the one the parser was in before it recognized any of the symbols on the right side of the rule. Next, something similar to a shift of 'A' using this uncovered state is performed. A new state is obtained and pushed onto the stack, and parsing continues. This action is called a goto. There are differences between a goto and an ordinary shift of a token. In particular, the look-ahead token is cleared by a shift but is not affected by a goto. Sometimes, but not usually, it is necessary for the parser to refer to the look-ahead token to decide whether or not to reduce.
In effect, the reduce action turns back the clock in the parse, popping the states off the stack so that the stack has the same contents as before the symbols on the right side of the rule were recognized. The parser then treats the symbol on the left side of the rule as if it were an input token and performs an action accordingly. Note that if the rule has an empty right-hand side, no states are popped off the stack.
The reduce actions are associated with individual grammar rules. In the y.output file, these rules are given small integer numbers, which could lead to some confusion. The following action refers to ``grammar rule'' 18:
. reduce 18This action refers to state 34:
LOOP shift 34In any case, the state, which is uncovered when symbols are popped off the stack on a reduce, will contain an entry such as the following:
A goto 20If the left side of the current rule consists of the symbol ``A'', this action causes state 20 to be pushed onto the stack.
The reduce action is also important in the treatment of user-supplied actions and values. When a rule is reduced, the code supplied with the rule is executed before the stack is adjusted. When a shift takes place, the external variable yylval is copied onto the value stack. A reduction takes place after the action code associated with the rule is carried out. When the goto action is done, the external variable yyval is copied onto the value stack. The pseudo-variables $1, $2, and so on, refer to the value stack.
The other two parser actions are conceptually much simpler. The accept action indicates that the entire input has been seen and that it matches the specification. This action appears only when the look-ahead token is the end-marker, and indicates that the parser has done its job.
The occurrence of accept can be simulated in an action by use of the macro YYACCEPT. The YYACCEPT macro causes yyparse() to return the value 0, indicating a successful parse. Here is an example of its use:
quest : wealth | love | holygrail { YYACCEPT; } ;
The error action, on the other hand, represents a place where the parser can no longer continue parsing according to the specification. The input tokens it has seen (together with the look-ahead token) cannot be followed by anything that would result in a legal input. The parser reports an error and attempts to recover the situation and resume parsing. Error recovery is discussed later.
The error parser action can be simulated in a action code by use of the macro YYERROR. YYERROR causes the parser to behave as if the current input symbol had been a syntax error. The function yyerror() is called and error recovery takes place. Here is an example of such usage:
seq: first second third { if ($1<$2 || $2<$3) { printf("Values out of order!\n"); YYERROR; } }
Consider the following yacc specification:
%token DING DONG DELL %% rhyme : sound place ; sound : DING DONG ; place : DELL ;The y.output file corresponding to the preceding grammar (with some statistics stripped off the end) is:
state 0 $accept : \_rhyme $endThe actions for each state are specified, with a description of the parsing rules processed in each state. The underscore (_) character in a rule separates the symbols that have been seen from those that are expected to follow. The following input can be used to track the operations of the parser:DING shift 3 . error
rhyme goto 1 sound goto 2
state 1 $accept : rhyme\_$end
$end accept . error
state 2 rhyme : sound\_place
DELL shift 5 . error
place goto 4
state 3 sound : DING\_DONG
DONG shift 6 . error
state 4 rhyme : sound place\_ (1)
. reduce 1
state 5 place : DELL\_ (3)
. reduce 3
state 6 sound : DING DONG\_ (2)
. reduce 2
DING DONG DELLThe input is processed in the following steps:
sound : DING DONGIn the reduction, two states, 6 and 3, are popped off the stack, uncovering state 0.
sound goto 2This causes state 2 to be pushed onto the stack and become the current state.
place : DELLThis rule has one symbol on the right-hand side, so one state, 5, is popped off, and state 2 is uncovered.
rhyme : sound placeThere are two symbols on the right, so the top two states are popped off, uncovering state 0.