%tree
directive.
Nodes in the AST are blessed in the production
name.
A rhs can be
named using the %name IDENTIFIER directive.
For each rhs name a
class/package with name IDENTIFIER is created.
Symbolic tokens (like NUM
PRINT or VAR)
are considered by default semantic tokens.
String literals
(like '+', '/', etc.)
are - unless explictly
declared using the semantic token directive -
considered syntactic tokens.
When building the AST syntactic tokens do not yield
new nodes.
Semantic tokens however have their own. Thus
when feed with input b=2*a
the generated parser
produces the following AST1:
~/LEyapp/examples/ParsingStringsAndTrees$ perl -wd infix2pir.pl # start the debugger Loading DB routines from perl5db.pl version 1.31 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(infix2pir.pl:48): my $filename = shift; DB<1> l 48,55 # let us see the source code ... 48==> my $filename = shift; 49: my $parser = Infix->new(); 50: $parser->slurp_file($filename); 51 52: my $t = $parser->YYParse(); 53 54 # Machine independent optimizations 55: $t->s(our @algebra); DB<2> c 55 # continue until the abstract syntax tree (AST) is built b = 2 * a # <- user input main::(infix2pir.pl:55): $t->s(our @algebra); DB<3> p $t->str # show us the AST line_3(EXPS(sts_5(ASSIGN(VAR(TERMINAL[b]),TIMES(NUM(TERMINAL[2]),VAR(TERMINAL[a]))))))Nodes of the AST are hashes that can be decorated with new keys/attributes. The only reserved field is
children which is a reference to the
array of children.
Nodes named TERMINAL are built from the
tokens provided by the lexical analyzer.
The couple ($token, $attribute) returned by the lexical analyzer
is stored under the keys token and attr.
TERMINAL nodes also have the attribute children which is
set to an anonymous empty list.
Observe the absence of TERMINAL nodes corresponding to
tokens '=' and '*'.
If we change the status of '*' and '='
to semantic using the %semantic token directive:
1 %semantic token '*' '=' 2 %right '=' 3 .... etc.we get a - concrete - syntax tree:
EXPS(
ASSIGN(
VAR(TERMINAL[b]),
TERMINAL[=],
TIMES(
NUM(TERMINAL[2]),
TERMINAL[*],
VAR(TERMINAL[a])
) # TIMES
) # ASSIGN
)
Let us now consider the input 2*(a+1).
The parser yields the tree:
EXPS(
TIMES(
NUM(
TERMINAL[2]),
exp_14(
PLUS(
VAR(TERMINAL[a]),
NUM(TERMINAL[1]))
) # PLUS
) # TIMES
)
Two features are noticeable: the parenthesis rule exp: '(' exp ')'
had no name
and got automatically one: exp_14. The name of a rhs by
default results from concatenating the left hand side of the rule
with the ordinal number of the rule2.
The second is that node exp_14 is useless and can be suppressed.
The %tree directive can be accompanied of the %bypass
clause. A %tree bypass produces an automatic bypass of any
node with only one child at tree-construction-time.
A bypass operation consists in returning the only child
of the node being visited to the father of the node and re-typing (re-blessing)
the node in the name of the production3.
Changing the line %tree by %tree bypass
in file Infix.eyp
we get a more suitable
AST for input 2*(a+1):
EXPS(TIMES(NUM[2],PLUS(VAR[a],NUM[1])))
The node exp_14 has disapeared in this version
since the bypass operation applies to the rhs
of the rule exp: '(' exp ')':
Tokens '(' and ')' are syntactic tokens
and therefore at tree construction time
only one child is left. Observe also the absence
of TERMINAL nodes. Bypass clearly applies
to rules exp: NUM and exp: VAR since
they have only one element on their rhs. Therefore the
TERMINAL node is re-blessed as NUM and
VAR respectively.
A consequence of the global scope application of %tree bypass
is that undesired bypasses may occur. Consider the tree
rendered for input -a*2:
EXPS(TIMES(NEG,NUM))
What happened? The bypass is applied to the rhs
'-' exp. Though the rhs has two symbols, token '-' is
a syntactic token and at tree-construction-time
only exp is left. The bypass
operation applies when building this node.
This undesired bypass can be avoided applying
the no bypass directive to the
production:
exp : %no bypass NEG
'-' exp %prec NEG
Now the AST for -a*2 is correct:
EXPS(TIMES(NEG(VAR),NUM))
Eyapp provides operators +, * and ?
for the creation of lists and optionals as in:
line: sts <EXPS + ';'>which states that a line is made of a non empty list of
EXPS separated by semicolons.
By default the class name for such list is _PLUS_LIST.
The %name directive can be used to modify
the default name:
line: sts <%name EXPS + ';'>
Explicit actions can be specified by the programmer.
They are managed as anonymous subroutines
that receive as arguments the attributes of the symbols
in the rule and are executed each time a reduction
by that rule occurs. When running under the %tree directive
this provides a mechanism to influence the shape of the AST.
Observe however that the grammar in the example is clean of actions:
Parse::Eyapp
allowed us to produce a suitable AST without writing
any explicit actions.
Procesadores de Lenguajes 2010-01-31