>But if you took
>your nodes from a pool of nodes in an array rather than by malloc(),
>you'd be able to use the array index rather than the assigned temp,
>saving another word in every node.
What happens when something is reassigned and what was once a common
sub-expression no longer can be considered one? Would you have to take a
new node from the pool? If you have the temps numbered in the nodes, you
could just renumber the node in question and any nodes that depend on it
(similar to what InvalidateDependencies() in my CSE demo does.)
----
Bart
--- In compilers101@yahoogroups.com, Bart <trzynadl@u...> wrote:
> Actually, if I assigned temporary variables to nodes as they were
generated
> (MakeNode()), I could do the reordering automatically based on
only a few
> rules:
> - Node and Node: Node w/ lesser temporary var, node w/
higher temp. var
By assigning a temp you're effectively just numbering the node. The
temp doesn't have to actually be used as a temp. But if you took
your nodes from a pool of nodes in an array rather than by malloc(),
you'd be able to use the array index rather than the assigned temp,
saving another word in every node. Your nodes are already larger
than I'm comfortable with.
However even with a malloc scheme the simplest thing to do is just
compare the addresses of the nodes! It's sufficient that the
ordering is merely consistent; it doesn't have to be meaningful too.
--- In compilers101@yahoogroups.com, Bart <trzynadl@u...> wrote:
> >If I did the reordering while generating triples, it would work
out, but
> >this is done while creating the nodes.
>
> Actually, if I assigned temporary variables to nodes as they were
generated
> (MakeNode()), I could do the reordering automatically based on
only a few
> rules:
>
> - Number and number: Smaller number, bigger number
> - Variable and variable: Var with lesser ASCII value, var
with higher
> ASCII value
> - Variable and number: Number, Variable
> - Node and Node: Node w/ lesser temporary var, node w/
higher temp. var
> - Node and var/number: Var/number, node
Yep, that's what I wasted 200 lines trying to say :-)
Here's another one:
Here's some code that might plausibly be generated by a static
recompilation of some Z80 code...
cCycles -= 4;
m_regE = m_regH;
cCycles -= 4;
m_regE = m_regL;
cCycles -= 7;
m_regE = MemReadByte(m_regHL);
cCycles -= 4;
m_regE = m_regA;
cCycles -= 4;
m_regH = m_regA;
(Let's just pretend that these were written as cCycles = cCycles -
const; it's just syntactic sugar; the IR will be the same.)
The optimisation question here is whether you can replace all
the "cCycles -= const" statements with a single cCycles -= 23;
This lets a static binary translation be cycle accurate without the
overhead that regular emulators impose.
The potential difficulty of this optimisation is that you may
never get two nodes which are both constants:
eg cCycles = (((cCycles - 4) - 3) -7)
=
/ \
cCycles -
/ \
- 7
/ \
- 3
/ \
cCycles 4
>If I did the reordering while generating triples, it would work out, but
>this is done while creating the nodes.
Actually, if I assigned temporary variables to nodes as they were generated
(MakeNode()), I could do the reordering automatically based on only a few
rules:
- Number and number: Smaller number, bigger number
- Variable and variable: Var with lesser ASCII value, var with higher
ASCII value
- Variable and number: Number, Variable
- Node and Node: Node w/ lesser temporary var, node w/ higher temp. var
- Node and var/number: Var/number, node
----
Bart
>It was at this point that I realised what you meant above by
>scanning 'multiple times until nothing can be done further'...
>you're doing a top-down tree walk to find nodes containing an
>operand and two constant values, right? Which you then replace but
>you're at the wrong level of recursion now to be able to substitute
>this new constant in its parent expression, eg 1 + (2 * 3). What
>you want to do is propogate it back up and fold the parent too,
>right? And what you're suggesting is to just tree-walk again until
>you stop finding leaf nodes with constant expressions, right?
No, that's not what I meant.
Constant folding is performed in one walk of the tree.
static void FoldConstants(DAG **pnode)
{
DAG *new, *node;
int i, new_num;
node = *pnode;
if (node == NULL)
return;
for (i = 0; i < node->num_children; i++)
FoldConstants(&(node->child[i]));
switch (node->op)
{
/*
* Binary arithmetic operators
*/
case '+':
case '-':
case '*':
case '/':
... snipped: new_num is the result of the calculation ...
new = MakeDataNode('n', new_num);
if (new != NULL) // allocation was successful
*pnode = new;
}
break;
}
}
The children are folded before the current node is. FoldConstants(dag) is
only called once in main().
When I mention multiple passes, I mean on the IR. It's just a list of
operations, so for each operation, you try to fold the constants and repeat
until there's nothing left to fold.
I suppose you could do this while actually generating the IR and then each
time a temporary variable is assigned, you could check to see if the
assignment is redundant.
>but the best way to do this
>is to fold constants bottom-up as you add them to the array of
>triples, and simply ignore the tree structure. Constant folding
>should therefore be done in MakeNode.
That's what I mean by on-the-fly :)
Perhaps constant folding does belong in the parser, but what about CSE?
A normal tree could be built (not a DAG), and then as the IR is being
emitted (or after it has been emitted), common sub-expressions could be
identified by scanning the IR.
>When I realised you were not keeping the sort of array structure I'd
>suggested, I realised that is also probably why you have these T<n>
>variables even in cases where they're sometimes redundant. I think
>you're using these variables as tags to identify your triples,
>rather than using the index of the cell in the array. Let's go back
>over some examples:
>
>i = (2 * x) + (y * 3);
>j = (3 * y) + (x * 2);
>k = 1 + 2 * 3;
>
>Instead of T<n> being a temporary, I'm going to use it here as a
>label or index ID of the cell where the triple is held:
>
>T0: * 2 x
>T1: * 3 y
>T2: + @T0 @T1
>T3: = i @T2
>T4: + @T1 @T0
>T5: = j @T4
>T6: = k 7
For simplicity, I simply printed the code out as it was generated. None of
the triples were stored anywhere.
I'd implement triples similarly to how you described if I really wanted to
make use of them. I was only generating the code on the fly so I didn't
have to bother with adding a whole nother data structure to the demo.
>As for why you weren't able to spot that T0 + T1 were not the same
>as T1 + T0, you said "the source reordering code doesn't peek down
>the tree." It doesn't need to. I don't understand why
>your "ReorderSources" procedure is recursive; all it needs to do is
>look at the two operands themselves, not their children.
>You know how to re-order a pair of <var + const> to a canonical
>order, so what's harder about re-ordering a pair of <var + var>
The problem is in reordering <op + op>
Going back to these expression: (3 * x) + (2 * y) and (y * 2) + (x * 3)
The precedence rules (left to right) used to establish the canonical order
is something like this:
=, +, -, *, /, n, v
When both children are 'n' or both are 'v', they are reordered so that the
lesser number is first.
But when you've got 2 children that are both *, nothing occurs.
If I did the reordering while generating triples, it would work out, but
this is done while creating the nodes.
----
Bart
--- In compilers101@yahoogroups.com, Bart <trzynadl@u...> wrote:
> This input:
>
> i = (2 * x) + (y * 3);
> j = (3 * y) + (x * 2);
> k = 1 + 2 * 3;
>
> Generates:
>
> T0 = 2 * x
> T1 = 3 * y
> T2 = T0 + T1
> i = T2
> T3 = T1 + T0
> j = T3
> k = 7
>
> You'll notice that the assignment to T3 performs a redundant
addition. It
> would have been more efficient to assign T2 to T3, but the reason
it
> doesn't happen is that the source reordering code doesn't peek
down the tree.
>
> Since these are all quads, I think all of this reordering and
redundant
> calculation elimination could be performed by scanning the IR
output
> multiple times until nothing can be done further,
Whoa there! Doing anything "multiple times until nothing can be
done further" in general is a bad idea, especially if there's an
algorithm that can do it in one pass. I didn't realise what you
meant here at first but on second reading I think I know what you're
referring to - more on this below...
> rather than generating all of this on the fly.
But in a way I think you're right in as much as I think I've just
realised a misapprehension I was having about your code. (been too
busy with Christmas preparations to study your code, so I had been
making some assumptions about how you were doing things which I
think were incorrect).
First of all let's clear up this "on the fly" issue. You may have
taken me too literally in that area. When I was saying that by
using triples like this you could get CSE on the fly, I was
referring specifically to the ability to match triples as you add
them to the data structure. If you do this using a hash table each
operation is constant-cost and the complexity is N in terms of N
triples, although if you linearly scan a linked list as you're doing
it's N^2, which still isn't too bad for small N.
However if you build up a tree first, and then do recursive calls
down branches to compare subtrees, which is what I think your are
suggesting, it becomes an N^2logN algorithm at best and N^3logN at
worst if you're doing what I think you're doing below... (as far as
I remember; it's been a long time since I've done complexity
calculations). (Constant folding by scanning down a tree
recursively is also a bad idea - see later)
It was at this point that I realised what you meant above by
scanning 'multiple times until nothing can be done further'...
you're doing a top-down tree walk to find nodes containing an
operand and two constant values, right? Which you then replace but
you're at the wrong level of recursion now to be able to substitute
this new constant in its parent expression, eg 1 + (2 * 3). What
you want to do is propogate it back up and fold the parent too,
right? And what you're suggesting is to just tree-walk again until
you stop finding leaf nodes with constant expressions, right? If
so, this is really really wrong! A mild improvement is to do the
constant folding *after* the recursive call, so that you propogate
constant expressions back up the tree; but the best way to do this
is to fold constants bottom-up as you add them to the array of
triples, and simply ignore the tree structure. Constant folding
should therefore be done in MakeNode. [I added this paragraph last
so I do repeat myself a little, later on, but it's too late in the
evening to now go back and re-edit all this so please bear with me
if it doesn't read linearly]
Anyway, the basic CSE recognition of similar triples (OK, and maybe
also constant folding) was the only part that I meant to suggest was
done on the fly. Most of the other optimisations are done on the
array of triples once it is built, but if possible by scanning them
linearly or at least in a way such that no more than N cells are
ever looked at for any operation (eg as in the reachability analysis
to determine dead code).
When I realised you were not keeping the sort of array structure I'd
suggested, I realised that is also probably why you have these T<n>
variables even in cases where they're sometimes redundant. I think
you're using these variables as tags to identify your triples,
rather than using the index of the cell in the array. Let's go back
over some examples:
i = (2 * x) + (y * 3);
j = (3 * y) + (x * 2);
k = 1 + 2 * 3;
Instead of T<n> being a temporary, I'm going to use it here as a
label or index ID of the cell where the triple is held:
T0: * 2 x
T1: * 3 y
T2: + @T0 @T1
T3: = i @T2
T4: + @T1 @T0
T5: = j @T4
T6: = k 7
This is close enough to what you were doing but I hope that it
should be more obvious if you look at it this way that you do not
need to declare your temporary variables until code generation time.
As for why you weren't able to spot that T0 + T1 were not the same
as T1 + T0, you said "the source reordering code doesn't peek down
the tree." It doesn't need to. I don't understand why
your "ReorderSources" procedure is recursive; all it needs to do is
look at the two operands themselves, not their children.
You know how to re-order a pair of <var + const> to a canonical
order, so what's harder about re-ordering a pair of <var + var> -
all you need to do is sort them by the index numbers of the two
operands. You don't need to peek down the tree at all. T0 < T1.
easy!
As far as I can see, MakeNode is already doing the reordering, then
looking for a duplicated node, and if found it throws away the new
info and returns the previous node - so why doesn't this work for T0
* T1???
(MakeNode is also the obvious place to do constant folding.
Constants should definitely be folded bottom-up, not top-down, for
least complexity cost.)
T0: * 2 x
T1: * 3 y
T2: + @T0 @T1
T3: = i @T2
//T4: + @T0 @T1 matches T2, so any time from now
// on that you would have referred to T4 insert T2
// instead. Increment the use count on the T2 triple.
T5: = j @T2
T6: = k 7
So what I was saying earlier, about identifying the variables which
have to be written back to (in this case i, j, k) and then doing a
tree-walk to tag the cells as reachable, then outputting the array
of triples in order... that is all to be done in a post processing
stage acting on the final array of triples, *not* on the fly as
you're building it. Sorry if I misled you somewhere.
>Not meaning to derail your original project, but this would be just
>as interesting as a real straight-to-binary compiler. The parsing work
>is the same, the high-level optimisation is the same, only the generation
>of machine code is missing.
It does sound interesting, but I'd like to forge on with my normal compiler
because I'd really like to get something working that can generate machine
(assembly) code that actually runs.
The CSE stuff I've been submitting isn't actually part of my compiler
project. It started out as an experiment to see if I could construct n-ary
trees using yacc. I think I'm going to go ahead and use these sorts of
n-ary trees with recursion, despite the fact that recursion really isn't
suitable for a good compiler implementation, IMHO. How do real compilers
like GCC get around recursion? I'm aware of how recursion can be eliminated
in tree traversals using stacks, but this isn't practical if my goal is to
have easily underdtandable, clean, and maintainble code.
For some kinds of trees, I _think_ it might be possible to use a linked
list. For example:
a
/ \
b c
/ \ \
d e f
Or, if the formatting doesn't show up: (a (b (d,e)), (c f))
The linked list would first look like this:
a
Then it would be expanded:
b->c
Again:
d->e->f
I don't think this is a general-case solution, though.
These are issues I'll no doubt have to grapple with once the grammar is
complete (give me a couple of months or more to finish this, I move at a
snail's pace :))
But perhaps once I finish the front-end of my compiler, it will be suitable
for a C-to-C optimizer, if you wish to do it (or if I want to work on it
later.)
I'll upload the sources to my compiler as soon as I make enough progress in
the parser to generate ASTs for simple programs.
>after those statements have all been seen, you can safely discard the
>"a = b * c;" entirely. I thought your CSE optimiser did this, but I see
>it doesn't;
If I ever get the front end of my compiler done, I'm going to start all
over on CSE and other optimizations by trying to work on the IR itself,
rather than generating it all on the fly. I believe it might actually be
simpler that way, because whenever I analyze this stuff in my head, it
seems much simpler to work in passes on the intermediate code output.
>Which reminds me, does your parser handle this?
>
>a = (b = x * y) + (c = p * q);
>
>All your examples were of the form a = b = c = <expression> so I wasn't
>sure if you allowed assignments inside an expression.
It's allowed, but the CSE isn't very good when you throw in a lot of this
stuff. I remember doing some tests that produced less-than spectacular code.
>Here's another trick to add to your repertoire: remembering that one
>variable shadows another; i.e. in the above after b and c have been
>assigned, you can use either one for the other, such as a = b + b
>rather than a = b + c, which can then be translated to a = b << 1,
>an operation which on some architectures (I think ARM for one?) can be
>done on the fly with an indexing mode while doing the load.
Yeah, but all of this looks like it would be done as a post-pass. There's
only so much you can do on-the-fly, it seems. One of the reasons I want to
write a C compiler is so I can play around with these kinds of
optimizations, it looks like fun, and it would even be applicable to
dynamic recompilers.
----
Bart
Hello,
I've added constant folding to the CSE demo as well as source reordering
(which reorders the sources to + and * operators according to a fixed set
of rules to improve CSE.)
This input:
i = (2 * x) + (y * 3);
j = (3 * y) + (x * 2);
j = 1 + 2 * 3;
Generates:
T0 = 2 * x
T1 = 3 * y
T2 = T0 + T1
i = T2
T3 = T1 + T0
j = T3
k = 7
You'll notice that the assignment to T3 performs a redundant addition. It
would have been more efficient to assign T2 to T3, but the reason it
doesn't happen is that the source reordering code doesn't peek down the tree.
Since these are all quads, I think all of this reordering and redundant
calculation elimination could be performed by scanning the IR output
multiple times until nothing can be done further, rather than generating
all of this on the fly.
----------
/*
* cse.y
*
* Example of a simple expression parser with common sub-expression
* elimination (CSE) which builds DAGs with n-ary trees and generates simple
* code. Constant folding is performed.
*
* The primary problem with building n-ary trees lies with compound
* statements, where the number of children cannot be determined until all of
* them have been processed.
*
* For nodes which have a fixed number of children (such as arithmetic operat-
* ors with 2 children), there is no problem, but for compound statements, the
* solution is to have a tree built for each statement and then appended to
* a list.
*
* Each list element is a struct which contains 2 fields: the tree, and a
* "next" pointer. When a node is to be created, it counts the number of
* elements in the CHILD_LIST, allocates that many DAG * pointers, and sets
* each pointer to a member of the list.
*/
%{
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
/*****************************************************************************
Data Structures
Definitions for data structures related to the DAGs.
*****************************************************************************/
/*
* DAG
*
* A dag node.
*/
typedef struct dag_struct DAG;
struct dag_struct
{
int op; // '+', '-', '*', '/', '=', 's', 'n', or 'v'
/*
* Data:
*
* This union serves two purposes:
*
* 1. For 'n' (number) and 'v' (variable) nodes, the appropriate
* field is set to the data that the lexer provided.
* 2. During code generation, the "num" member is set to a scratch
* variable name (0 means T0, 1 is T1, etc.) if the node's op is
* NOT 'n' or 'v', otherwise, "num" is a constant number if op
* == 'n' and "var" is a variable if op == 'v'.
*
* For nodes other than 'n' and 'v', data.num is set to -1 by default in
* order to signify to the code generator that no scratch register has
* yet been assigned (-1 is invalid for scratch registers.)
*/
union
{
int num; // number (if op == 'n')
char var; // variable (if op == 'v')
} data;
/*
* Children:
*
* Nodes can have an arbitrary number of children. The child[] array is an
* array of pointers to child DAG nodes and only has as many elements as
* the num_children member specifies.
*/
int num_children; // number of children
DAG **child;
/*
* Global list:
*
* A linear list of all the nodes is maintained by chaining them all
* together with this pointer. This helps with CSE.
*/
DAG *prev, *next; // previous and next nodes
};
/*
* CHILD_LIST
*
* Sometimes, the number of children for a given DAG node cannot be
* determined until after all of the children have been created. In this
* grammar, this situation only occurs with lists of statements.
*
* The problem is resolved by creating a list of CHILD_LIST structures, each
* of which contains a DAG and a pointer to the next structure. As the parser
* parses a series of statements in a statement list, a new DAG is added to
* the list for each statement encountered.
*
* When all is done, the parent node of all of these children is created, the
* list is counted, and enough pointers are allocated to hold all of the
* children. Then, each DAG in the list is plucked one at a time and attached
* to the parent node's child[] array.
*/
typedef struct child_list_struct CHILD_LIST;
struct child_list_struct
{
DAG *node; // the child node itself
CHILD_LIST *next;
};
/*****************************************************************************
Global Data
The DAG and global list of nodes are defined here.
*****************************************************************************/
/*
* DAG
*/
static DAG *dag; // DAG (NULL if no non-null statements in program)
static DAG *node_list; // global list of nodes (newest first)
/*****************************************************************************
Function Prototypes
These functions are used by the parser.
*****************************************************************************/
static DAG *MakeArbitraryNode(int, CHILD_LIST *);
static DAG *MakeNode(int, int, DAG *, DAG *);
static DAG *MakeDataNode(int, int);
static int yylex();
static void yyerror(const char *);
%}
/*****************************************************************************
Token Definitions
YYSTYPE and the tokens are defined here. Tokens are only defined for numbers
and variable names. The lexer will return ASCII characters for everything
else which helps make the grammar a little more readable.
*****************************************************************************/
%union
{
int num; // number
char var; // variable
DAG *node;
/*
* A list of child DAGs. The 2 pointer system is for fixed-time insertion
* of new elements to the end of the list (each time one is inserted, the
* "last" pointer is updated, but "first" remains unchanged.)
*/
struct
{
CHILD_LIST *first, *last; // pointers to first and last children
} children;
}
%token <num> NUMBER
%token <var> VARIABLE
%type <children> stmt_list
%type <node> stmt
%type <node> expr
%type <node> assgn_expr
%type <node> factor
%type <node> element
%%
/*****************************************************************************
Grammar
Ye olde grammar.
The language is really quite simple. It's just a series of statements
terminated by semicolons. Multiple statements can appear between brackets,
but there are no scope rules, so these brackets do nothing more than serve
as a demonstration for how to implement compound statements using n-ary
tree structures.
There are 52 possible variable names: a-z and A-Z. They are always available
and cannot be declared or undeclared.
A single statement may be an assignment statement:
a = 3;
b = a;
c = d = e = 4 * 3;
Or an arithmetic expression using +, -, *, and / operators:
1 + 2 * 3 / a;
The above example clearly does nothing since there is no assignment, but it's
perfectly valid.
*****************************************************************************/
/*
* file
*
* A file is simply a list of statements. stmt_list is set to a list of
* DAGs. These are attached to a statement list node.
*/
file: stmt_list
{
if ($1.first != NULL) // if there were any non-null statements
dag = MakeArbitraryNode('s', $1.first);
else
dag = NULL;
}
| // empty
;
/*
* stmt_list
*
* Takes advantage of left recursion in bottom-up parsing to allow an
* arbitrary number of statements to be repeated in a list. Consult Holub's
* book to learn about how this sort of left recursion works.
*
* When a stmt is on the stack, the first production is reduced to stmt_list.
* Then, another stmt will make "stmt_list stmt" which again reduces to
* stmt_list, etc. So, for each stmt_list, the "stmt_list->stmt"
* production gets reduced once (and it happens first.)
*
* A CHILD_LIST is created and each time the second production is reduced, the
* action executed adds the DAG associated with stmt to stmt_list and
* attaches it to $$.
*/
stmt_list:
stmt // this is the first kind of stmt_list production to be
// reduced because of how left recursion works
{
$$.first = $$.last = NULL;
if ($1 != NULL) // don't corrupt child list w/ NULLs
{
$$.first = malloc(sizeof(CHILD_LIST));
$$.last = $$.first;
$$.first->node = $1;
$$.first->next = NULL;
}
else
$$.first = $$.last = NULL;
}
|
stmt_list stmt
{
$$ = $1; // keep the list alive by passing it to $$
if ($2 != NULL) // don't corrupt child list with NULLs
{
if ($$.last == NULL) // nothing has been allocated!
{
$$.first = malloc(sizeof(CHILD_LIST));
$$.last = $$.first;
$$.first->node = $2;
$$.first->next = NULL;
}
else // allocate to end of list
{
($$.last)->next = malloc(sizeof(CHILD_LIST));
($$.last)->next->node = $2;
($$.last)->next->next = NULL;
$$.last = ($$.last)->next;
}
}
}
;
/*
* stmt
*
* A single statement, which can be an expression, an assignment, or a
* compound statement (just another statement list.) C-style null-statements
* are perfectly acceptable and don't generate any DAG nodes.
*/
stmt: expr ';' { $$ = $1; }
| ';' { $$ = NULL; }
| '{' stmt_list '}'
{
if ($2.first == NULL) // no actual statements
$$ = NULL;
else
$$ = MakeArbitraryNode('s', $2.first); // make a 's'
(statement list) node and attach all the children
}
| '{' '}' { $$ = NULL; }
| assgn_expr ';' { $$ = $1; }
;
/*
* assgn_expr
*
* Assignment expression. The entire expression itself takes the value of the
* newly-modified variable, thus chaining is possible:
*
* a = b = c = 3 * 4;
*
* Note that the variable node is created first, and then the operator node
* is created and overwrites $$.
*/
assgn_expr: VARIABLE '=' assgn_expr
{
$$ = MakeDataNode('v', $1);
$$ = MakeNode(2, '=', $$, $3);
}
| VARIABLE '=' expr
{
$$ = MakeDataNode('v', $1);
$$ = MakeNode(2, '=', $$, $3);
}
;
/*
* expr
*
* The lowest precedence expression rules (+ and -) are handled here.
*/
expr: expr '+' factor { $$ = MakeNode(2, '+', $1, $3); }
| expr '-' factor { $$ = MakeNode(2, '-', $1, $3); }
| factor { $$ = $1; }
;
/*
* factor
*
* Multiplication and division are higher precedence than +/- (expr.)
*/
factor: factor '*' element { $$ = MakeNode(2, '*', $1, $3); }
| factor '/' element { $$ = MakeNode(2, '/', $1, $3); }
| element { $$ = $1; }
;
/*
* element
*
* A basic element -- a number, variable, or parenthetical expression.
*/
element: '(' expr ')' { $$ = $2; }
| '(' assgn_expr ')' { $$ = $2; }
| NUMBER { $$ = MakeDataNode('n', $1); }
| VARIABLE { $$ = MakeDataNode('v', $1); }
;
%%
/*****************************************************************************
Source Reordering
Sources are reordered according to a set of guidelines. When dealing with
operations where order is unimportant, a predictable ordering is helpful
because it improves common sub-expression elimination.
For example, if we have 5+3, we may choose to reorder the sources so that
the smaller number is always the first source: 3+5. This way, if 3+5 and 5+3
appear in the input source code, they will be recognized as being identical.
MakeNode() calls ReorderSources() when a + or * op is detected. This code
relies on the fact that only MadeNode() handles binary arithmetic operators.
*****************************************************************************/
/*
* AssignPrecedence():
*
* Replaces the operator ID character with a number which indicates its
* left-to-right precedence. =, +, -, *, /, n, v, s correspond to 0, 1, 2, 3,
* 4, 5, 6, 7. This Function is used only by ReorderSources().
*/
static void AssignPrecedence(int *op)
{
switch (*op)
{
case '=': *op = 0; break;
case '+': *op = 1; break;
case '-': *op = 2; break;
case '*': *op = 3; break;
case '/': *op = 4; break;
case 'n': *op = 5; break;
case 'v': *op = 6; break;
case 's': *op = 7; break;
}
}
/*
* ReorderSources():
*
* Sources are reordered when order is mathematically irrelevant according to
* this set of rules:
*
* - Lesser numbers come first.
* - Variables with a letter of lesser ASCII value come first.
* - Numbers come before operators and variables.
* - Variables come before operators.
* - Left-to-right precedence of operators is: =, +, -, *, /.
* - No re-ordering for 's' nodes -- it's totally unnecessary!
*/
static void ReorderSources(DAG **a, DAG **b)
{
DAG *tmp;
int i, op1, op2;
if (*a == NULL || *b == NULL)
return;
if ((*a)->num_children == 2)
ReorderSources(&((*a)->child[0]), &((*a)->child[1]));
if ((*b)->num_children == 2)
ReorderSources(&((*b)->child[0]), &((*b)->child[1]));
if ((*a)->op == 'n' && (*b)->op == 'n')
{
/*
* Reorder so that child[0] is the lesser number
*/
if ((*a)->data.num > (*b)->data.num)
{
tmp = (*b);
(*b) = (*a);
(*a) = tmp;
}
}
else if ((*a)->op == 'v' && (*b)->op == 'v')
{
/*
* Reorder so that child[0] is the variable with the lesser ASCII
* value
*/
if ((*a)->data.var > (*b)->data.var)
{
tmp = (*b);
(*b) = (*a);
(*a) = tmp;
}
}
else
{
/*
* Left-to-right precedence among operators is: =, +, -, *, /
*/
op1 = (*a)->op;
op2 = (*b)->op;
AssignPrecedence(&op1); // assign a numerical value
AssignPrecedence(&op2);
if (op1 > op2)
{
tmp = (*b);
(*b) = (*a);
(*a) = tmp;
}
}
}
/*****************************************************************************
DAG Functions
Functions for making DAG nodes and attaching children to them.
Nodes other than 'n' and 'v' have data.num set to -1 to indicate that no
scratch register has been assigned to the node by the code generator yet
(see the code generator code and the DAG structure definition for more info.)
*****************************************************************************/
/*
* RemoveNode():
*
* Removes a node from the global list and frees the memory associated with
* it.
*/
static void RemoveNode(DAG *node)
{
int i;
for (i = 0; i < node->num_children; i++)
RemoveNode(node->child[i]);
/*
* Unlink
*/
if (node->prev == NULL)
node_list = node->next;
else
node->prev->next = node->next;
if (node->next != NULL)
node->next->prev = node;
/*
* Free memory
*/
if (node->num_children)
free(node->child);
free(node);
}
/*
* MakeArbitraryNode():
*
* Creates a node which can have an arbitrary amount of children.
*
* Attaches all of the DAGs in the CHILD_LIST as children to the node (first
* argument.) This is done by counting the number of DAGs in the list,
* allocating that many child pointers, and then taking each element in the
* list and putting it in the newly-allocated child[] array.
*
* Next, the result is checked against all of the other DAG nodes to see if
* there is already one like this, and if so, this one is discarded and the
* existant one is returned.
*
* NOTE: When searching for a matching node in the global node list, the
* "data" member is not checked because it is assumed that it is unused in
* this sort of node.
*/
static DAG *MakeArbitraryNode(int op, CHILD_LIST *cl)
{
CHILD_LIST *l;
DAG *node, *n;
int num_children, i, all_equal;
/*
* Count number of children by iterating through list
*/
for (l = cl, num_children = 0; l != NULL; l = l->next, num_children++)
;
/*
* Make the node
*/
if ((node = calloc(1, sizeof(DAG))) == NULL)
return NULL;
node->op = op;
node->data.num = -1;
/*
* Allocate space for their pointers and then copy them in
*/
node->num_children = num_children;
node->child = malloc(sizeof(DAG *) * num_children);
for (i = 0, l = cl; i < num_children; i++, l = l->next)
node->child[i] = l->node;
/*
* Try to see if a matching node already exists
*/
for (n = node_list; n != NULL; n = n->next)
{
if (n->op == node->op && n->num_children == node->num_children)
{
all_equal = 1; // will be cleared if not all children equal
for (i = 0; i < n->num_children; i++)
{
if (n->child[i] != node->child[i])
{
all_equal = 0;
break;
}
}
if (all_equal)
{
free(node); // destroy current node
return n;
}
}
}
/*
* No match found, insert this node in the global list and return it
*/
node->next = node_list;
node->prev = NULL;
if (node->next != NULL)
node->next->prev = node;
node_list = node;
return node;
}
/*
* MakeNode():
*
* Makes a DAG node with the specified number of children (0 is acceptable.)
* The number of children determines how big child[] will be. If the number
* of children is 1 or 2, they are passed as parameters a and b.
*
* NOTE: When searching for a matching node in the global node list, the
* "data" member is not checked because it is assumed that it is unused in
* this sort of node.
*
* MakeNode() is not intended for number ('n') and variable ('v') nodes, use
* MakeDataNode() for that purpose.
*/
static DAG *MakeNode(int num_children, int op, DAG *a, DAG *b)
{
DAG *node;
if (op == '+' || op == '*')
ReorderSources(&a, &b); // because binary nodes are all made here
/*
* Try to see if a matching node exists
*/
for (node = node_list; node != NULL; node = node->next)
{
if (node->op != op) // can't be the same node!
continue;
if (node->num_children != num_children)
continue;
/*
* Ugly and not optimal... I know, but it's easy to understand :)
*/
if (num_children == 0)
return node; // must be a match!
else if (num_children == 1)
{
if (node->child[0] == a)
return node;
}
else if (num_children == 2)
{
if (node->child[0] == a && node->child[1] == b)
return node;
}
}
/*
* Otherwise, allocate a new one
*/
if ((node = calloc(1, sizeof(DAG))) == NULL)
return NULL;
node->next = node_list;
node->prev = NULL;
if (node->next != NULL)
node->next->prev = node;
node_list = node;
node->op = op;
node->data.num = -1;
node->num_children = num_children;
if (num_children)
{
node->child = malloc(sizeof(DAG *) * num_children);
node->child[0] = a;
if (num_children == 2)
node->child[1] = b;
}
return node;
}
/*
* MakeDataNode():
*
* Virtually the same as the above MakeNode() function, except it is intended
* for use with 'n' and 'v' nodes only.
*/
static DAG *MakeDataNode(int op, int data)
{
DAG *node;
for (node = node_list; node != NULL; node = node->next)
{
/*
* Only check op and fields... Children are irrelevant for 'n' and 'v'
*/
if (node->op == op)
{
if (op == 'n')
{
if (node->data.num == data)
return node;
}
else // 'v'
{
if (node->data.var == (char) data)
return node;
}
}
}
if ((node = calloc(1, sizeof(DAG))) == NULL)
return NULL;
node->next = node_list;
node->prev = NULL;
if (node->next != NULL)
node->next->prev = node;
node_list = node;
node->op = op;
if (op == 'n')
node->data.num = data;
else if (op == 'v')
node->data.var = data;
else
node->data.num = -1; // to indicate this isn't being used (codegen)
node->num_children = 0;
return node;
}
/*****************************************************************************
Constant Folding
Arithmetic operations with 2 constant children can be evaluated and collapsed
into a single constant.
Careful attention has to be payed to the fact that nodes are not unique. If
some number is used more than once, only one actual node is present (because
this is how DAGs work.) Therefore, when constants are folded, a whole new
node has to be created (or, if the new node is identical to an existing one,
the existing one can be used.) Nodes cannot actually be modified by having
constants folded into them.
*****************************************************************************/
/*
* FoldConstants():
*
* Recursively evaluates all constant operations.
*/
static void FoldConstants(DAG **pnode)
{
DAG *new, *node;
int i, new_num;
node = *pnode;
if (node == NULL)
return;
for (i = 0; i < node->num_children; i++)
FoldConstants(&(node->child[i]));
switch (node->op)
{
/*
* Binary arithmetic operators
*/
case '+':
case '-':
case '*':
case '/':
if (node->child[0]->op == 'n' && node->child[1]->op == 'n')
{
switch (node->op)
{
case '+':
new_num = node->child[0]->data.num + node->child[1]->data.num;
break;
case '-':
new_num = node->child[0]->data.num - node->child[1]->data.num;
break;
case '*':
new_num = node->child[0]->data.num * node->child[1]->data.num;
break;
case '/':
new_num = node->child[0]->data.num / node->child[1]->data.num;
break;
}
new = MakeDataNode('n', new_num);
if (new != NULL) // allocation was successful
*pnode = new;
// node->op = 'n';
// RemoveNode(node->child[0]);
// RemoveNode(node->child[1]);
// node->num_children = 0;
// free(node->child);
}
break;
}
}
/*****************************************************************************
Code Generator
Code is output in a simplistic quadruple format (op, dest, src, src) with a
scratch variable assigned for each operation. This means that a lot of
scratch variables are generated that could be eliminated, but that would be
fairly trivial as a post-step.
CSE (common sub-expression elimination) is performed during code generation.
It's really quite simple and most of the work was done in constructing the
DAGs themselves. When a scratch variable is found to be assigned to a given
node (which happens once the code generator touches it the first time), it is
re-used.
If a sub-expression is no longer valid (when another statement changes the
contents of a variable used as an operand, for instance), the scratch
variable is disassociated with the node (data.num is set to -1.)
The code generator assumes (especially in InvalidateDependencies()) that
all nodes other than 'n', 'v', and '=' can only be represented by a temporary
variable (whose number is in data.num), Tx.
*****************************************************************************/
static int current_scratch = 0;
/*
* NewScratch():
*
* Returns a new scratch variable.
*/
static int NewScratch()
{
return current_scratch++;
}
/*
* PrintOperand():
*
* Prints the current node as an operand. It assumes that it has already been
* consumed by GenerateCode(). The rules for operand printing are:
*
* - If op == 'n', print data.num.
* - If op == 'v', print data.var.
* - If op == '=', print data.var.
* - Otherwise, data.num contains the number of a scratch variable (Tx)
* that must be printed.
*/
static void PrintOperand(DAG *node)
{
switch (node->op)
{
case 'n':
printf("%d", node->data.num);
break;
case 'v':
case '=':
putchar(node->data.var);
break;
default:
printf("T%d", node->data.num);
break;
}
}
/*
* InvalidateDependencies():
*
* Invalidates all dependencies on the variable which holds this node. Scans
* the global node list to find nodes that have a child which is the current
* node. Recursively eliminates dependencies on dependencies, etc.
*/
static void InvalidateDependencies(DAG *node)
{
DAG *n;
int i;
for (n = node_list; n != NULL; n = n->next)
{
for (i = 0; i < n->num_children; i++)
{
if (n->child[i] == node)
{
InvalidateDependencies(n);
/*
* '=' and 'v' both use data.var to store an actual (a-z,A-Z)
* user-defined variable which must never be invalidated.
*/
if (n->op != '=' && n->op != 'v') // these 2 will always have
an actual (a-z,A-Z) variable
n->data.num = -1;
break;
}
}
}
}
/*
* GenerateCode():
*
* Generates code from a DAG and prints it to stdout.
*/
static void GenerateCode(DAG *node)
{
int i;
if (node == NULL)
return;
switch (node->op)
{
case 's':
/*
* List of statements
*/
for (i = 0; i < node->num_children; i++)
GenerateCode(node->child[i]);
break;
case '=':
/*
* Assignment
*/
GenerateCode(node->child[1]); // child[0] guaranteed to be a var
node->data.var = node->child[0]->data.var; // this node becomes
// equivalent to the var
printf("%c = ", node->child[0]->data.var);
PrintOperand(node->child[1]);
printf("\n");
InvalidateDependencies(node->child[0]);
break;
case '+':
case '-':
case '*':
case '/':
/*
* Binary arithmetic operators
*/
if (node->data.num != -1) // scratch register already assigned?
break; // do nothing (parent node will treat this
// as an operand by re-using the scratch
// register)
GenerateCode(node->child[0]);
GenerateCode(node->child[1]);
node->data.num = NewScratch();
printf("T%d = ", node->data.num);
PrintOperand(node->child[0]);
printf(" %c ", node->op);
PrintOperand(node->child[1]);
printf("\n");
break;
case 'n':
case 'v':
break; // do nothing
}
}
/*****************************************************************************
Lexical Analyzer
A simple lexical analyzer which recognizes tokens from file input.
*****************************************************************************/
static FILE *fp;
static int yylex()
{
int i, num;
if (feof(fp))
return 0;
i = fgetc(fp);
while (i == ' ' || i == '\t' || i == '\r' || i == '\n')
i = fgetc(fp);
if (isdigit(i))
{
num = i - '0';
while (1)
{
i = fgetc(fp);
if (isdigit(i))
{
num *= 10;
num += (i - '0');
}
else
{
ungetc(i, fp);
yylval.num = num;
return NUMBER;
}
}
}
else if (isalpha(i))
{
yylval.var = i;
return VARIABLE;
}
return i;
}
/*****************************************************************************
Main
main(), yyerror(), and friends.
*****************************************************************************/
static void PrintDAG(DAG *node, int indent)
{
int i;
if (node == NULL)
return;
for (i = 0; i < indent * 2; i++)
printf(" ");
if (node->op == 'n')
printf("%d\n", node->data.num);
else if (node->op == 'v')
printf("%c\n", node->data.var);
else
{
printf("%c\n", node->op);
for (i = 0; i < node->num_children; i++)
PrintDAG(node->child[i], indent + 1);
}
}
static void yyerror(const char *msg)
{
fprintf(stderr, "%s\n", msg);
}
int main(int argc, char **argv)
{
if (argc <= 1)
{
puts("usage: cse <file>");
exit(0);
}
else
{
if ((fp = fopen(argv[1], "r")) == NULL)
{
fprintf(stderr, "error: unable to open %s\n", argv[1]);
exit(1);
}
}
node_list = NULL;
if (yyparse())
exit(0);
puts("DAG:\n"
"----\n");
PrintDAG(dag, 0);
puts("\nDAG: (Constants Folded)\n"
"-----------------------\n");
FoldConstants(&dag);
PrintDAG(dag, 0);
puts("\nCode:\n"
"-----\n");
GenerateCode(dag);
return 0;
}
----------
----
Bart
[Non-text portions of this message have been removed]
>I'd like to know where I could find some lecture about parsing,
I'll have to ask this question: How much do you want to learn?
Do you want to spend time writing parsers that won't be used in your
project (for practice and educational purposes), or do you want a working
parser with as little hassle as possible?
If you need a parser quickly and don't care much about the details, you can
use yacc (the GNU version is called bison.) My CSE examples are all yacc
grammars. It can take some time getting used to how yacc works, but once
you start getting the hang of it, it's very handy, especially when you want
your parser to generate a relatively simple tree structure or intermediate
code.
For an actual compiler, using yacc can be kind of a pain (as I'm finding
out), but it's still quicker than writing a top-down parser by hand. For a
game scripting engine, yacc would work great. It is kind of ugly and seems
a bit inflexible in terms of its interface at first, but it's not too bad.
Here's some online documentation for bison:
http://dinosaur.compilertools.net/bison/index.html
It's probably not enough to learn from, but it might prove to be a useful
reference. I'd recommend the O'Reilly book "lex and yacc" to learn about
yacc. It's not the greatest book on the topic (it seems to dive in very
quickly), but with some persistance and experimentation, you should be
fine. The key is always to try to do things on your own first, and then
look at how the book did it if you get stuck.
If you're willing to spend some time writing parsers, I recommend getting a
hold of the "dragon book": "Compilers: Principles, Techniques, and Tools"
by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. I wouldn't recommend
buying it -- borrow it if you can. I found the chapters on top-down parsing
to be really good (you'll have no problem implementing a simple expression
parser by hand); chapters 1 and 2 specifically.
Another book I've borrowed is "Compiler Design in C" by Allen Holub. It's
not the greatest book in the world, but I thought his explanations of
top-down parsing were good. He tries to get you to develop an automated
parser generator (similar to yacc), but I skipped over that part -- it was
uninteresting to me at the time. I also skipped anything related to
automatic lexer generation. What I did do was implement a top-down parser
by hand and a type of top-down parser called a "push-down automaton." Don't
buy this book, borrow it first if you can.
Of course, you keep looking around the internet for some good material. I'm
sure there's something out there. If you're willing to invest some time,
look up "top-down parsing" and try to implement a simple expression grammar
that can parse mathematical expressions. Start with addition and
subtraction first, then add division and multiplication, then add
parenthesis and unary +/-, etc.
I hope this helps... The hardest part is always getting started :)
----
Bart
In Bart's code to do CSE, we haven't yet considered the ramifications of
calling functions with side-effects (actually not just functions -
memory-mapped device registers can cause the same issues with simple
variables)
Sometimes this affects all languages and other times it is a language-dependent
issue. For instance:
int succ(void) {
static int next = -1;
return(++next);
}
This affects any language:
a = succ() << succ();
The next one is language dependent; some languages allow parameters to be
evaluated in any order. This is to give the compiler writer the flexibility
to stack them in either order, depending on the architecture. I honestly
don't remember if C mandates left to right evaluation or not, but I think
you'd be ill advised to write a compiler that didn't evaluate left to right
nowadays.
b = functioncall(succ(), succ());
There are various ways of handling this, but what they have in common is
that the calls with side effects must be evaluated in the given order.
So...
T0 = succ()
T1 = succ()
a = T0 << T1
Actually that's not complex enough to be a good example; let me try another:
a = (b = succ() * 3) + (c = 4 * succ());
T0 = CALL succ
b = T0 * 3
T1 = CALL succ
c = T1 * 4
a = b + c
There is nothing wrong with generating the above like so:
T0 = CALL succ
T1 = CALL succ
b = T0 * 3
c = T1 * 4
a = b + c
so you could if you wanted bring all the side-effected calls to the
front, assign them to temporaries, then optimise the rest of the
expression. This would make something like the following rather stupid
code work correctly:
a = (succ() * 0) * (succ() * 1);
T0 = CALL succ();
T1 = CALL succ();
T2 = T0 * 0;
T3 = T1 * 1;
a = T2 * T3;
=>
T0 = CALL succ();
T1 = CALL succ();
T2 = 0;
T3 = T1;
a = 0 * T3;
=>
T0 = CALL succ();
T1 = CALL succ();
a = 0;
I.e. you still make the calls, get the side effects, but throw away the
rest of the evaluation as it is not needed.
However that's just a notional way of looking at it; in practise you'll
probably intersperse the calls in the natural order they were made.
To do this, you need to have a flag in each triple saying that it has
side effects, and be sure to generate code for each triple which has that
flag set, even if your final optimising pass would have otherwise eliminated
that triple. Or more precisely, when you do the tree-walk to mark each
triple as reachable or otherwise, don't just start from the assignments
to variables; add to the list of starting points each triple which is marked
as having side-effects; so
a = succ();
b = 3;
a = b;
will generate this code:
succ();
b = 3;
a = 3;
(Aside: whether you generate a = b or a = 3 is really a machine-dependent
choice. Is a "move quick" of a short constant faster than a load from
store, or than a transfer from another register"? These are choices
that can be parameterised in a generic optimiser and chosen to match
the target architecture.)
> a = b * c;
> x = p * q;
> a = w * z;
>
> after those statements have all been seen, you can safely discard the
> "a = b * c;" entirely. I thought your CSE optimiser did this, but I see
> it doesn't; however I'm 100% sure the framework you have in place makes
> it an easy addition. Just note whether the last access to any variable
> was read or write, and if you're doing a write and the last access
> was a write, remove the last access from the DAG. Do it naively
> and you'll be left with generating code for "b * c" even though
> it's not stored; remove the "b * c" naively and you'll be left with
> anything which generated them in turn; but be too aggressive and you
> may accidentally remove some intermediate result that is needed later;
> for instance:
>
> w = 2;
> z = 3;
> b = (w = 4) * (z = 5);
> a = b * z;
> a = w * z;
I didn't actually explain how to do this the 'non-naive' way...
It's pretty simple actually; there are two obvious ways - they're
equivalent I guess:
Method A) Have a tag bit in each triple to say whether it is reachable or
not, and once you have built up your array of triples and are ready to
code generate, start with each of the assignments which you wish to execute
(i.e. the final assignments to a, b, z, and w) and do a tree-walk starting
at those assignment triples, flagging everything you reach as 'reachable'.
then just output everything in your triples array in order, but don't output
it if the reachable flag is not set.
Method B) Generate code for the final assignments, outputting each triple
on the fly as you do so. Mark it as having been output, so that you don't
output the same triple twice if it's used in two expressions.
They're almost identical methods; the only difference is that the first
preserves the original ordering of triples and the second may re-order
them if you're not careful, which can be a problem if you have side-effects.
However a new node type to implement sequential evaluation would take
care of that, by making the sequential structure explicit; eg
a = b + c;
w = x + y;
=> (trying a text-based diagram since the trees haven't worked so well on
the web interface)
sequential(assign(a, add(b, c)), assign(w, add(x, y)))
There's a third method which is more complicated and I can never see
a reason for using so I don't see any point in describing it, but it
involves use-counts on cells rather than just a 'printed' or 'reachable'
flag.
G
--- In compilers101@yahoogroups.com, Graham Toal <gtoal@g...> wrote:
> other posters can help; all I can say is to use Google and the
compiler
> resources FAQ and just read everything you can find.
Search for "Catalog of Free Compilers and Interpreters"
> My name is Marcelo, I'm from Brazil, and I'm currently working on 2
> projects: one of them is a game, the other one is a compiler.
> I'd like to know where I could find some lecture about parsing,
> because I'd like to use it for both my compiler (which will
> generate .NET IL code) and for my game scripting engine.
>
> Thanks for any help
>
> PS: If this is not the place to post this kind of message, please let
> me know.
Welcome Marcelo!
That's a perfectly on-topic question, although personally I can't
recommend any specific document on parsing as being definitely worth
a read, as I haven't looked at those for some time. Maybe some of our
other posters can help; all I can say is to use Google and the compiler
resources FAQ and just read everything you can find.
What I can say however is this:
there is a lot of theoretical computer science behind all sorts of
fancy parsers, and a whole lot of books on the subject, and a depressingly
large number of them are no bloody help whatsoever!
Really it boils down to "do I use a top-down parser, or a bottom up
parser" and "do I use a parser generator or write it ad-hoc"?
There's no fixed answer to either of these questions; it depends on
the language you are writing for, your skill, and whether you're
doing it to learn and/or have fun, or just to get a product out in
a hurry and all you care is that it works.
As part of another project, I recently helped get running an compiler
that was written in the 60's, with a very ad-hoc parser, and running
on a modern PC it was the fastest compiler I've ever seen, so there
are times when old-fashioned hand-crafted ad-hoc code can be appropriate.
G
> How does Imp77 compare to C?
Well, it's roughly at the same level; it would be just as easy to
convert from C to Imp as from Imp to C; it's a closer mapping than
say Pascal to C.
> C as a language doesn't actually define any I/O constructs and can be
> Is the situation with Imp77 similar?
"Yes and no". There is a default header file akin to stdio.h
which is mandated, but you can write a compiler with no special built-in
library knowlege and just make that header file reference ordinary
externals. But if you do build in the knowlege of the intrinisic
procedures, you can optimise the heck out of them.
For instance indirection via a memory address is done with what on
the surface appears to be a procedure:
i = integer(address)
is equivalent to C's
i = *address;
Actually this is what in Imp is called a "map" and can be used on
the LHS of an assignment, ie
integer(address) = i;
so when I generated C for that, a naive iplementation using actual
external procedures would have to be:
* ((int *) integer(address)) = i;
where "integer" was a function returning the address of its parameter.
If we allow the compiler knowlege of the intrinsic library, it merely
generates: *address = i; instead.
> And is the language sufficiently close enough to C that it could share the
> same IR that a C front end would emit?
Well if you can generate C from it then by definition that must be
true, but given the amount of mangling that went into generating C,
I think if I had generated GCC's internal representation I would
be months behind and riddled with bugs. plain ascii C is much easier
to debug.
However other compilers have translated from their language to C
and gone back after their project was finished and tied it in more
tightly to GCC's native code generation. It can be done, but I'd
rather do it after I have a fully working version. And honestly
the benefits of changing at that point ar few - all you really
do is get a miniscule performance increase in exchange for no longer
being able to use alternative compilers such as ARM's SDK or CodeWarrior.
G
I'll move this over here...
> >Unfortunately none of the compilers I've used have eliminated
> >redundant loads and stores; however Bart has just finished writing
> >a source to source optimiser which I hope to steal and adapt for
> >this project so that I can eliminate those within basic blocks.
>
> But it doesn't really eliminate redundant loads/stores at all. At least not
> the kind I'm thinking of. The code generator simply generates ugly
> load/stores to temporary variables (which, granted, could easily be
> optimized away by some post-analysis.)
>
> Let me know if it helps you in your project, I'd be interested to hear if
> it works out.
I kind of slipped that under the door but I'd been meaning to talk to
you about this anyway...
It occurred to me, after playing around with your CSE demo yesterday,
that a good project to do, which to my knowlege has *never* been done
by anyone, would be a source to source optimiser for C, which did as
many of the compiler-style optimisations as were possible at source
level. This could then be used as a pre-pass by any other C compiler
which didn't have as much of the fancy high-level stuff but which
did a reasonable job of the machine-code-level optimisations (eg
register choice).
Not meaning to derail your original project, but this would be just
as interesting as a real straight-to-binary compiler. The parsing work
is the same, the high-level optimisation is the same, only the generation
of machine code is missing.
The only source to source optimisers which exist that I know of are
as a side-effect of work in parallelizing computation; they do things
like decompiling loops so that the task can be done by an array processor,
so they're not really useful at all for anything other than parallel work.
But I digress: as to loads/stores; I may have used the terms wrongly as
I wasn't talking about the register level; rather this:
a = b * c;
x = p * q;
a = w * z;
after those statements have all been seen, you can safely discard the
"a = b * c;" entirely. I thought your CSE optimiser did this, but I see
it doesn't; however I'm 100% sure the framework you have in place makes
it an easy addition. Just note whether the last access to any variable
was read or write, and if you're doing a write and the last access
was a write, remove the last access from the DAG. Do it naively
and you'll be left with generating code for "b * c" even though
it's not stored; remove the "b * c" naively and you'll be left with
anything which generated them in turn; but be too aggressive and you
may accidentally remove some intermediate result that is needed later;
for instance:
w = 2;
z = 3;
b = (w = 4) * (z = 5);
a = b * z;
a = w * z;
Which reminds me, does your parser handle this?
a = (b = x * y) + (c = p * q);
All your examples were of the form a = b = c = <expression> so I wasn't
sure if you allowed assignments inside an expression.
If your DAG structure is well designed, the related optimisations
should come for almost free once the statement is parsed; if they don't,
look at your data structures again and ask why not.
You can use as example like this to test if the optimisations work:
a = (b = x * y) + (c = x * y);
Wait a minute, I guess I can find out for myself...
... two minutes later ...
I just tried it on your program and got this:
T0 = x * y
b = T0
c = T0
T1 = b + c
a = T1
Damn, you're good at this!
By the way, you were worried about the use of extra temporaries by
your code. true, it would be nicer above to generate:
b = x * y;
c = b;
a = b + c;
*however* what you have should generate the same code on most compilers
if you declare T0, T1 etc with C's "register" qualifier.
Here's another trick to add to your repertoire: remembering that one
variable shadows another; i.e. in the above after b and c have been
assigned, you can use either one for the other, such as a = b + b
rather than a = b + c, which can then be translated to a = b << 1,
an operation which on some architectures (I think ARM for one?) can be
done on the fly with an indexing mode while doing the load.
(It's just syntactic sugar, but the whole expression above could be
rewritten as: a = (b = c = x * y) << 1; - that would be quite a
challenge to generate!)
G
Hi,
My name is Marcelo, I'm from Brazil, and I'm currently working on 2
projects: one of them is a game, the other one is a compiler.
I'd like to know where I could find some lecture about parsing,
because I'd like to use it for both my compiler (which will
generate .NET IL code) and for my game scripting engine.
Thanks for any help
PS: If this is not the place to post this kind of message, please let
me know.
Marcelo
> > What your target will be?
> Prob. x86, although I was toying with the idea of outputting .NET ILR,
> as while I know *some* (ie: little) of the x86 instruction set, I would
> not be able to make the most out of the CPU with my limited knowledge.
.NET and the java bytecode would both give you portability, though
going to a real machine might be more fun, in the sense that you get
real-machine problems to solve, not some dumbed-down virtual machine
that's deliberately trying to make life as easy for you as possible!
There's a sense of accomplishment that comes from seeing your code
run on real hardware.
Do you have any video game consoles by any chance? Or (pre-PC era)
home micros? You don't have to design for current hardware or big
iron. For instance I have the following, and if I were writing a
compiler for the fun of it, I'd pick one of these as my target:
Acorn Archimedes (ARM); bbc micro (6502); Vectrex console (6809);
Gameboy Advance (ARM); GP32 (ARM); Virtual Boy (ARM); Palm Pilot (68000)
... any of those would be great fun to do!
Using an emulator is a quick developent tool for many of these,
and gives you superior access to debugging techniques that you wouldn't
have on real hardware (eg transparent single stepping, tracing).
Imho going to x86 binary if you don't already know it is probably
the fastest way to put you off compilers for life. :-)
> > Are you more interested in parsing or code generation?
> Code generation.
So have you considered taking an off-the-shelf parser and just
doing the code generation? Parsing is such a big task, and it's
never easy to debug two separate projects at the same time. Have
a look at C-Tree (ftp.kagi.com:/flisakow/ctree_XXX.tar.gz) as a
possible starting point.
G
>Prob. x86, although I was toying with the idea of outputting .NET ILR,
>as while I know *some* (ie: little) of the x86 instruction set, I would
>not be able to make the most out of the CPU with my limited knowledge.
If you know some other processor architecture well, perhaps you could
generate code for it? X86 is definitely the most useful if you're running
an X86 machine, though ;)
Another option is something like MIPS or Alpha. You can get access to Alpha
machines through Compaq's "Test Drive" program, which gives you shell
accounts to a variety of Alpha boxes, no questions asked. You wouldn't want
to develop on them, since it's not backed up, but I reckon they'd be good
for assembling compiler output and testing it.
Coming from an X86 background, the biggest nuisance when working with RISC
architectures is the extremely limited constant encodings. Dealing with
identifiers feels so much more natural on X86 because you treat them as a
plain constant:
mov eax, foo
And if you need to access the data:
mov eax, [foo]
Whereas on a typical RISC processor, you have to index into some table
(called a "GOT" -- global offset table) just to get the address. A register
will always be set to the base of the GOT, and the compiler (or perhaps the
linker gets involved, considering the GOT contains identifiers from all the
modules) has to keep track of which offset in the GOT contains the address
of the identifier we want.
ld r0,0x100(r30) ; r30 is the GOT ptr
ld r0,(r0)
The assemblers usually provide mnemonics (macros) which abstract this, so
all you see is:
ld r0,(foo)
The expansion is handled by the assembler.
It's completely necessary that it be done this way, and I'm not trying to
badmouth RISC or anything, but it's just something I find to be a pain ;)
Another thing I don't like is the assemblers. I hate AT&T style syntax and
the way registers are named. I'd prefer a syntax where R0 is "R0", not
"$0". Then there's the issue of certain registers having names to
illustrate their purpose (FP, SP, GP, etc.) which is nice sometimes, but
can be a pain if you prefer just seeing R0...R31.
Maybe when I get around to having the compiler generate assembly output,
I'll make it very flexible so different assemblers can be targeted with a
minimal amount of effort. That way, if I decide to generate RISC code, I
can output usable assembly code when I want to run a program and "ideal",
readable code when I want to analyze the output comfortably.
>> Are you more interested in parsing or code generation?
>Code generation.
Please keep us informed on the status of your compiler.
I can't wait to get past the parsing phase and into actual optimization and
code generation, but it looks like I'll be writing the parser for quite a
while (it's slow going and it's kind of hard for me to modularize the C
grammar so that I can work on specific features at a time.)
I'd like to know how you handle your parser (yacc? Hand-written top down?)
and how you handle your implementation of types, symbols/scope, and the
tree representation (if you plan on using one.)
At the very least, it should make for some good discussion that we (and
others) can learn from.
>As an aside: I noticed that Bart has made some posts to comp.compilers -
>anyone else on this list read that group?
Since I have to use Google Groups and because comp.compilers is moderated,
it takes days for messages and replies to appear. That's part of the reason
I started posting some of my questions to dynarec -- less of a delay and
there were people on the list (like Graham) who are experienced in the field.
----
Bart
>> I'm working on a language called "Imp77" which is like Algol60 in
>> some ways, and directly descended from Brooker's Atlas Autocode.
>> I'm working from an existing parser which generates a machine
>> independent intermediate code, and I'm generating low-level C from
>> that. This is not quite the same thing as an Imp77 to C translator,
>> because the C is not meant to be seen by humans in my scheme.
How does Imp77 compare to C?
C as a language doesn't actually define any I/O constructs and can be
viewed as a language which does nothing more than manipulate memory. The C
standard library isn't an actual part of the language. I plan on taking
advantage of this with my compiler. I don't want to bother implementing my
own clib -- if I need clib functions, I'll make some headers with the
appropriate function and data prototypes and link against the GNU clib, or
something like that.
I don't even plan on writing a preprocessor -- GCC or the CPP that comes
with LCC will suffice.
Is the situation with Imp77 similar?
And is the language sufficiently close enough to C that it could share the
same IR that a C front end would emit?
----
Bart
On Mon, 2002-12-23 at 16:27, Graham Toal wrote:
> So what choices have you already made, Evan?
Yeah - maybe I forgot to mention a few things.
> Have you decided which language you're going to compile?
I am currently thinking a small subset of C at this stage. I am going
to scrap what I started (absolute rubbish) and start again.
> How much of it to implement?
No floating point stuff, probably only chars and ints. This is all
speculation however.
> What your target will be?
Prob. x86, although I was toying with the idea of outputting .NET ILR,
as while I know *some* (ie: little) of the x86 instruction set, I would
not be able to make the most out of the CPU with my limited knowledge.
> Are you more interested in parsing or code generation?
Code generation.
> As you know, Bart is tackling C and is interested in both parsing
> and optimisation. We haven't discussed the target code yet I don't
> think.
Yeah - I have been following is project quietly. Stalking if you will.
> I'm working on a language called "Imp77" which is like Algol60 in
> some ways, and directly descended from Brooker's Atlas Autocode.
> I'm working from an existing parser which generates a machine
> independent intermediate code, and I'm generating low-level C from
> that. This is not quite the same thing as an Imp77 to C translator,
> because the C is not meant to be seen by humans in my scheme.
I read about that in your other post - sounds quite interesting.
> <snip>
As an aside: I noticed that Bart has made some posts to comp.compilers -
anyone else on this list read that group?
--- In compilers101@yahoogroups.com, Evan Clarke <pbclarke@a...>
wrote:
> I would like to take this moment to introduce myself (to those who
were
> not part of dynarec.com's mailing list - or those that missed the
three
> posts I made to it ;-) ) - I am from the Dynarec.com mailing list.
>
> I am also working on a simple compiler project much like Bart's -
So what choices have you already made, Evan? Have you decided which
language you're going to compile? How much of it to implement? What
your target will be? Are you more interested in parsing or code
generation?
As you know, Bart is tackling C and is interested in both parsing
and optimisation. We haven't discussed the target code yet I don't
think.
I'm working on a language called "Imp77" which is like Algol60 in
some ways, and directly descended from Brooker's Atlas Autocode.
I'm working from an existing parser which generates a machine
independent intermediate code, and I'm generating low-level C from
that. This is not quite the same thing as an Imp77 to C translator,
because the C is not meant to be seen by humans in my scheme.
I'm writing mine in C; I have a friend writing an Imp77 compiler
which will generate native binary for the X86 which he's writing
in Imp - the story of how he bootstrapped that is quite interesting:
http://imp.nb-info.co.uk/
G
>Grrr. Yahoo stores it properly and it even looks OK when you look at
>it in the moderators window or the preview window, but when you read
>it in the group via the web as I generally do, it's all messed up.
Yeah, and I just realized that my alternate example also relies on spacing
;) I'll try to stick to a LISP-like form when posting this kind of stuff to
make it web-readable.
>Nope, I just haven't worked out yet how to turn the moderation off!
Just go to Management -> Messages (in the Group Settings table) -> "Posting
and Archives" (Edit).
----
Bart
--- In compilers101@yahoogroups.com, Bart <trzynadl@u...> wrote:
> i = a[j * 2];
>
> =
> / \
> i array_access
> / \
> a *
> / \
> j 2
>
> If the formatting doesn't come out right in your email:
Grrr. Yahoo stores it properly and it even looks OK when you look at
it in the moderators window or the preview window, but when you read
it in the group via the web as I generally do, it's all messed up.
So I'll go answer this one separately from email instead, where it
usually looks OK.
> BTW, is there a reason for messages on the list being moderated? It
seems
> there's not much use for it considering the list is still so low on
traffic.
Nope, I just haven't worked out yet how to turn the moderation off!
It came as on by default when I selected the option that let me
moderate sign-ups to the list, which was all I really wanted to do,
to discourage spammers.
G
>It would be good if the members of this list could post a description on
>what project they are working on currently, along with how far along
>they are - it will help others to know what everyone is doing.
I'm writing a simple C compiler (a subset of the language, actually.)
Currently, the lexer is pretty much complete and I'm now working on the
yacc grammar. It can partially parse declarators, currently (just the type
and storage specifiers and qualifiers) for a symbol and construct a type
definition for it. It still doesn't construct trees or anything (I haven't
worked out how I'm going to do it just yet, but I'll have to decide soon.)
The groundwork for the symbol table is there (symbols can be stored and
looked up, scope is handled, etc.) No fancy hash tables yet, just a simple
scan (hashing can be added later since it won't change the symbol table
interface.)
A major goal of the project is to keep the code as simple, clean, and
understandable as possible.
----
Bart
>a = 3
>b = a
>T0 = 2 * 3
>e = T0
>d = e
>c = d
>T1 = T0 / a
>T2 = 1 + T1
>
>It dang ol' works! I'm impressed.
Not only that, but it should also work in those situations where a variable
is modified:
a = c * d;
b = c * d;
c = 2;
e = c * d;
c*d is only a common sub-expression for the assignments to a and b, but it
is recalculated when assigned to e.
>Why don't you try the canonical ordering of nodes now,
>so that 3 * 2 and 2 * 3 can be caught.
I was going to add that and constant folding in but I figured I'd just post
what I have so far in order to keep it as simple and focused on CSE as
possible.
This should be a fairly trivial addition to make and I'll upload it when
it's done.
>I don't know if you're doing this already, but you can treat array
>indexing as just another operation. eg
>
>x = a[2 * 3]
>
>could be
>
>T0 = 2 * 3;
>T1 = a [ T0;
>x = T1;
>
>You'll also need something for indirect assignments, maybe like this:
>
>a[2 * 3] = x
>
>=>
>
>T0 = 2 * 3;
>T1 = a [ T0;
>*T1 = x;
The simple expression grammar I wrote doesn't really support arrays. It
wouldn't be too hard to add them in, but I think I'll hold off on it until
I need to do it in my C compiler.
I didn't think array indexing would be any different than what you posted,
but what exactly would a tree representation of an array access look like?
For example:
i = a[j * 2];
I guess the same issue applies to all aggregate types, including structs
and unions (neither of which I have considered in my C compiler yet -- I've
still got the basic types to implement):
i = s.a;
Should I have an array access node or should I have the parser break it
down to: *(a + (j * 2) * sizeof(a's type)) ?
I would think that some sort of array node would be more appropriate for an
abstract syntax tree:
i = a[j * 2];
=
/ \
i array_access
/ \
a *
/ \
j 2
If the formatting doesn't come out right in your email:
=
i
array_access
a
*
j
2
The array_access node has 2 children: The array identifier and the index.
>So see if you can add array indexing to your CSE code version 2, and work out
>the basic commutativity rules (such as A [ (B + C) => (A + B) [ C ),
>and see if you can make either of the optimisations above come out
>of your existing CSE for free...
I'd have to incorporate some logic specifically for array index CSE,
wouldn't I? Assuming I use an array_access node, I'd have to...
But, if I assumed that sizeof(array type) doesn't matter, I could simply
have the parser automatically simplify array accesses into an addition.
This would only be practical if the language was only designed to run in
some interpreted environment without a real concept of "machine addresses"
(addresses would always be a multiple of the variable size.)
Looking at the generated code, I can't help but wonder if CSE wouldn't be
easier to perform as a separate pass rather than being integrated into the
code generator as part of the tree walking process.
a * 2 + a * 2;
->
T0 = a * 2
T1 = a * 2
T2 = T0 + T1
Some simple scanning ought to determine that T0 and T1 are the same. Then
again, it's easy for a human brain to pick up on this kind of stuff but
maybe I'll find it's not so easy when I actually sit down to write the code?
Another thing I was thinking of is an intermediate representation with an
unlimited amount of source operands. You mentioned permutations like:
a+b+c+d and a+d+c+b and an idea you had was to construct a + node in an AST
with an arbitrary number of children. This is basically the same idea, but
implemented in IR:
i = a + b + c + d;
j = d + b + a + c;
->
+ i; a, b, c, d
+ j; d, b, a, c
For operations like + where order doesn't matter, the sources could be
reordered in some predictable way.
I don't know much about intermediate representations yet, so I don't know
if this is feasible. I've read that the most common forms are triples and
quads. If dealing with arbitrarily long lists of source operands is a
problem, then perhaps they could be decomposed into normal triples and
quads after an initial CSE has been performed...
It'll be fun to take a look at this issue when it comes up in my C compiler :)
BTW, is there a reason for messages on the list being moderated? It seems
there's not much use for it considering the list is still so low on traffic.
----
Bart
Sorry for the late reply, I wanted to compile and try your code first.
Input:
a = 3;
b = a;
c = d = e = 2 * 3;
1 + 2 * 3 / a;
Code:
a = 3
b = a
T0 = 2 * 3
e = T0
d = e
c = d
T1 = T0 / a
T2 = 1 + T1
It dang ol' works! I'm impressed.
Why don't you try the canonical ordering of nodes now,
so that 3 * 2 and 2 * 3 can be caught.
That should be relatively easy. After that, here's the next
thing to try:
I don't know if you're doing this already, but you can treat array
indexing as just another operation. eg
x = a[2 * 3]
could be
T0 = 2 * 3;
T1 = a [ T0;
x = T1;
You'll also need something for indirect assignments, maybe like this:
a[2 * 3] = x
=>
T0 = 2 * 3;
T1 = a [ T0;
*T1 = x;
When indexing is looked at like just another operation, some of the other
optimisations that are often done as special cases by some compilers just
become regular cases of the general optimiser you've already written.
For instance, here's a piece of real live code from a program of mine:
static char buff[128];
static int bp;
void setconst(int m)
{
buff[bp+1] = 78;
buff[bp+5] = m;
m = m >> 8;
buff[bp+4] = m;
m = m >> 8;
buff[bp+3] = m;
m = m >> 8;
buff[bp+2] = m;
bp = bp + 5;
}
(it's not great C because it was actually machine translated from
another language, but compilers have to cope with badly written code
for whatever reason it was created)
Now, there are two optimisations here, and I'll cover them both; the
one you would choose depends on your target architecture. Your optmiser
however should be capable of handling either style:
---------------
A: every access to buff includes a base bp. See if you can assign
buff+bp to a temp, then index from the temp.
let's just take a short extract to work an example:
buff[bp+5] = m;
m = m >> 8;
buff[bp+4] = m;
=> non-optimised tree
T0 = bp + 5;
T1 = buff [ T0;
*T1 = m;
T2 = m >> 8;
T3 = bp + 4;
T4 = buff [ T3;
*T4 = T2;
=> optimised
T0 = buff + bp;
T1 = T0 [ 5;
*T1 = m;
T2 = m >> 8;
T3 = T0 [ 4;
*T3 = T2;
------------------
B: because buff is static, its address is effectively known at compile time,
so instead of making buff+bp the common factor, add any constants in the
index to buff and create a new address. So let's say buff was stored
at location 100, well in the example below we'll pretend that "@100" is
the absolute address of buff[0]. In real life of course it would be
plugged in by a linker, but the same trick is still possible. It just
takes a little work...
buff[bp+5] = m;
m = m >> 8;
buff[bp+4] = m;
=> non-optimised tree
T0 = @100 + 5;
T1 = T0 [ bp;
*T1 = m;
m = m >> 8;
T2 = @100 + 4;
T3 = T2 [ bp;
*T3 = m;
=> optimised
T1 = @105 [ bp;
*T1 = m;
m = m >> 8;
T2 = @104 [ bp;
*T2 = m;
This is a case of both CSE and constant folding.
------------------------
OK, I might not have got your temporaries right in detail but you should
at least see what I'm getting at here.
So see if you can add array indexing to your CSE code version 2, and work out
the basic commutativity rules (such as A [ (B + C) => (A + B) [ C ),
and see if you can make either of the optimisations above come out
of your existing CSE for free...
The tricky part of course will be choosing which optimistion to take in
real life, when you have a choice that's mutually incompatible. You
can't usually just generate both and see which is best, because they
are part of a larger sequence and the number of choices starts to expand
exponentially. I don't know of a good answer to this other than always
selecting one preferred choice first. Maybe someone else does.
G
--- In compilers101@yahoogroups.com, Evan Clarke <pbclarke@a...>
wrote:
> It would be good if the members of this list could post a
description on
> what project they are working on currently, along with how far
along
> they are - it will help others to know what everyone is doing.
I run another list for a historical project to recover lost software
written in the early days of my old University, Edinburgh. We used
to use a language called Imp, but it went stiff when the machines it
ran on were all decomissioned and they stopped teaching in the
language - it had entirely disappeared by the time the 386 came out,
so there are no Intel compilers for it.
I've taken the parser written in the 70's which generated a portable
intermediate code (much like the recently reinvented .net stuff) and
am writing a back-end which generates low-level C for use as an
intermediate code so that we can use GCC as a binary code
generator. It seemed easier than learning about the internals of
GCC's ASTs.
My friend Andy Davis (who I should invite to join this list) has
written a native 8086 code generator for the language recently and
is now working on a full 386 one. It's a bit of a friendly rivalry
to see who gets there first, though the truth is he probably will
now because although I got off to a fast start, I've had to put mine
aside recently as I prepare for a new job I've just accepted that
starts on Jan 2nd.
You can see my back-end that translates "icode" to C here:
http://www.gtoal.com/athome/edinburgh/imp77/gtoal/i2c.c.html
The imp source of the parser is here:
http://www.gtoal.com/athome/edinburgh/imp77/gtoal/pass2.i.html
and my best translation so far of that to C (the job is 90% done)
is here:
http://www.gtoal.com/athome/edinburgh/imp77/gtoal/pass2.c.html
I'm in the middle of adding dynamically sized arrays, and const/own
array initialization; also there's a small bug in sub sub fields of
structs. Other than that though it's close to being bootstrapped.
Oh - and if you're interested, here's Andy's pass2 in Imp; it's
not a full code generator, it too generates another intermediate
form, but one that is very close to 8086 format so that it can be
patched up by the third pass/assembler:
http://www.gtoal.com/athome/edinburgh/imp77/abd/PASS2.IMP
G
On Sat, 2002-12-21 at 10:25, Bart wrote:
> Hello,
> <big snip!>
> ----
> Bart
I would like to take this moment to introduce myself (to those who were
not part of dynarec.com's mailing list - or those that missed the three
posts I made to it ;-) ) - I am from the Dynarec.com mailing list.
I am also working on a simple compiler project much like Bart's -
unfortunatily it has been stalled for the past couple of months (exams
and whathaveyou), but now I am on holidays and will be able to get some
real work done (first re-reading the dragon book).
It would be good if the members of this list could post a description on
what project they are working on currently, along with how far along
they are - it will help others to know what everyone is doing.
I would also like to make a collection of links to online lectures, etc
of compiler theory stuff - I did have a whole collection of them, but I
dont anymore (long story).
Anyway - must be off - and sorry for any typos - I am still dehydrated
from work (long story again).
--
Evan
Hello,
I've attached a yacc grammar for a simple expression parser which emits
code with CSE.
It builds a DAG using n-ary trees (with pointers, not the cell method.) It
uses a high amount of recursion which would make it an unsuitable technique
for a production-quality compiler.
No constant folding or arithmetic optimizations are performed (T0 = T1 - T1
will not reduce to T0=0, for example), but those would be trivial to add.
The output code is a little braindead, it uses lots of temporary variables,
but this would be easy to analyze away and I don't see it causing any real
trouble.
The language itself is really simple, it's just arithmetic expressions
(using the +, -, *, and / operators) and assignments (=) which can be
chained together just as with C.
Brackets are also allowed, though they have no real effect (they're simple
there to illustrate how to construct nodes with an arbitrary number of
children.) Only integers are allowed and single-letter variables a-z and
A-Z (they don't have to be defined.)
There is no evaluator for the output, so all that's emitted is code.
The only really interesting thing about this grammar is how it builds the
n-ary trees. The rest of it isn't too great, IMHO, especially with all the
recursion.
In response to this:
>This only works within a basic block, but the algorithm is this:
>when you assign to a variable x, create a notional new variable (let's
>call it x') and assign to that instead. Any time you use
>x from here on, substitute x'. If you assign to it again, call it x'', etc.
>Then the CSE falls out in the wash.
>
>x = a + b
>y = p + q
>x = p + q
>y = a + b
>x = y
>
>=>
>
>x = a + b
>y = p + q
>x' = p + q
>y' = a + b
>x'' = y'
This requires data to be maintained for each variable (to know how many
times it has been shadowed.) I like this idea and the main reason I didn't
use it is because it was a bit simpler to just increment a temporary
variable counter and not keep track of any data related to the variables.
I think a real CSE implementation would be best done after the tree is
translated to an IR. This makes me wonder whether DAGs are going to be
worth the trouble in my C compiler. If I'm not mistaken, I don't think GCC
uses DAGs... I never saw anything that would indicate it does when I looked
through the sources. It seems to just build a tree.
/*
* cse.y
*
* Example of a simple expression parser with common sub-expression
* elimination (CSE) which builds DAGs with n-ary trees and generates simple
* code.
*
* The primary problem with building n-ary trees lies with compound
* statements, where the number of children cannot be determined until all of
* them have been processed.
*
* For nodes which have a fixed number of children (such as arithmetic operat-
* ors with 2 children), there is no problem, but for compound statements, the
* solution is to have a tree built for each statement and then appended to
* a list.
*
* Each list element is a struct which contains 2 fields: the tree, and a
* "next" pointer. When a node is to be created, it counts the number of
* elements in the CHILD_LIST, allocates that many DAG * pointers, and sets
* each pointer to a member of the list.
*/
%{
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
/*****************************************************************************
Data Structures
Definitions for data structures related to the DAGs.
*****************************************************************************/
/*
* DAG
*
* A dag node.
*/
typedef struct dag_struct DAG;
struct dag_struct
{
int op; // '+', '-', '*', '/', '=', 's', 'n', or 'v'
/*
* Data:
*
* This union serves two purposes:
*
* 1. For 'n' (number) and 'v' (variable) nodes, the appropriate
* field is set to the data that the lexer provided.
* 2. During code generation, the "num" member is set to a scratch
* variable name (0 means T0, 1 is T1, etc.) if the node's op is
* NOT 'n' or 'v', otherwise, "num" is a constant number if op
* == 'n' and "var" is a variable if op == 'v'.
*
* For nodes other than 'n' and 'v', data.num is set to -1 by default in
* order to signify to the code generator that no scratch register has
* yet been assigned (-1 is invalid for scratch registers.)
*/
union
{
int num; // number (if op == 'n')
char var; // variable (if op == 'v')
} data;
/*
* Children:
*
* Nodes can have an arbitrary number of children. The child[] array is an
* array of pointers to child DAG nodes and only has as many elements as
* the num_children member specifies.
*/
int num_children; // number of children
DAG **child; // up to 3 children
/*
* Global list:
*
* A linear list of all the nodes is maintained by chaining them all
* together with this pointer. This helps with CSE.
*/
DAG *next; // next node in the global list of nodes
};
/*
* CHILD_LIST
*
* Sometimes, the number of children for a given DAG node cannot be
* determined until after all of the children have been created. In this
* grammar, this situation only occurs with lists of statements.
*
* The problem is resolved by creating a list of CHILD_LIST structures, each
* of which contains a DAG and a pointer to the next structure. As the parser
* parses a series of statements in a statement list, a new DAG is added to
* the list for each statement encountered.
*
* When all is done, the parent node of all of these children is created, the
* list is counted, and enough pointers are allocated to hold all of the
* children. Then, each DAG in the list is plucked one at a time and attached
* to the parent node's child[] array.
*/
typedef struct child_list_struct CHILD_LIST;
struct child_list_struct
{
DAG *node; // the child node itself
CHILD_LIST *next;
};
/*****************************************************************************
Global Data
The DAG and global list of nodes are defined here.
*****************************************************************************/
/*
* DAG
*/
static DAG *dag; // DAG (NULL if no non-null statements in program)
static DAG *node_list; // global list of nodes (newest first)
/*****************************************************************************
Function Prototypes
These functions are used by the parser.
*****************************************************************************/
static DAG *MakeArbitraryNode(int, CHILD_LIST *);
static DAG *MakeNode(int, int, DAG *, DAG *);
static DAG *MakeDataNode(int, int);
static int yylex();
static void yyerror(const char *);
%}
/*****************************************************************************
Token Definitions
YYSTYPE and the tokens are defined here. Tokens are only defined for numbers
and variable names. The lexer will return ASCII characters for everything
else which helps make the grammar a little more readable.
*****************************************************************************/
%union
{
int num; // number
char var; // variable
DAG *node;
/*
* A list of child DAGs. The 2 pointer system is for fixed-time insertion
* of new elements to the end of the list (each time one is inserted, the
* "last" pointer is updated, but "first" remains unchanged.)
*/
struct
{
CHILD_LIST *first, *last; // pointers to first and last children
} children;
}
%token <num> NUMBER
%token <var> VARIABLE
%type <children> stmt_list
%type <node> stmt
%type <node> expr
%type <node> assgn_expr
%type <node> factor
%type <node> element
%%
/*****************************************************************************
Grammar
Ye olde grammar.
The language is really quite simple. It's just a series of statements
terminated by semicolons. Multiple statements can appear between brackets,
but there are no scope rules, so these brackets do nothing more than serve
as a demonstration for how to implement compound statements using n-ary
tree structures.
There are 52 possible variable names: a-z and A-Z. They are always available
and cannot be declared or undeclared.
A single statement may be an assignment statement:
a = 3;
b = a;
c = d = e = 4 * 3;
Or an arithmetic expression using +, -, *, and / operators:
1 + 2 * 3 / a;
The above example clearly does nothing since there is no assignment, but it's
perfectly valid.
*****************************************************************************/
/*
* file
*
* A file is simply a list of statements. stmt_list is set to a list of
* DAGs. These are attached to a statement list node.
*/
file: stmt_list
{
if ($1.first != NULL) // if there were any non-null statements
dag = MakeArbitraryNode('s', $1.first);
else
dag = NULL;
}
| // empty
;
/*
* stmt_list
*
* Takes advantage of left recursion in bottom-up parsing to allow an
* arbitrary number of statements to be repeated in a list. Consult Holub's
* book to learn about how this sort of left recursion works.
*
* When a stmt is on the stack, the first production is reduced to stmt_list.
* Then, another stmt will make "stmt_list stmt" which again reduces to
* stmt_list, etc. So, for each stmt_list, the "stmt_list->stmt"
* production gets reduced once (and it happens first.)
*
* A CHILD_LIST is created and each time the second production is reduced, the
* action executed adds the DAG associated with stmt to stmt_list and
* attaches it to $$.
*/
stmt_list:
stmt // this is the first kind of stmt_list production to be
// reduced because of how left recursion works
{
$$.first = $$.last = NULL;
if ($1 != NULL) // don't corrupt child list w/ NULLs
{
$$.first = malloc(sizeof(CHILD_LIST));
$$.last = $$.first;
$$.first->node = $1;
$$.first->next = NULL;
}
else
$$.first = $$.last = NULL;
}
|
stmt_list stmt
{
$$ = $1; // keep the list alive by passing it to $$
if ($2 != NULL) // don't corrupt child list with NULLs
{
if ($$.last == NULL) // nothing has been allocated!
{
$$.first = malloc(sizeof(CHILD_LIST));
$$.last = $$.first;
$$.first->node = $2;
$$.first->next = NULL;
}
else // allocate to end of list
{
($$.last)->next = malloc(sizeof(CHILD_LIST));
($$.last)->next->node = $2;
($$.last)->next->next = NULL;
$$.last = ($$.last)->next;
}
}
}
;
/*
* stmt
*
* A single statement, which can be an expression, an assignment, or a
* compound statement (just another statement list.) C-style null-statements
* are perfectly acceptable and don't generate any DAG nodes.
*/
stmt: expr ';' { $$ = $1; }
| ';' { $$ = NULL; }
| '{' stmt_list '}'
{
if ($2.first == NULL) // no actual statements
$$ = NULL;
else
$$ = MakeArbitraryNode('s', $2.first); // make a 's'
(statement list) node and attach all the children
}
| '{' '}' { $$ = NULL; }
| assgn_expr ';' { $$ = $1; }
;
/*
* assgn_expr
*
* Assignment expression. The entire expression itself takes the value of the
* newly-modified variable, thus chaining is possible:
*
* a = b = c = 3 * 4;
*
* Note that the variable node is created first, and then the operator node
* is created and overwrites $$.
*/
assgn_expr: VARIABLE '=' assgn_expr
{
$$ = MakeDataNode('v', $1);
$$ = MakeNode(2, '=', $$, $3);
}
| VARIABLE '=' expr
{
$$ = MakeDataNode('v', $1);
$$ = MakeNode(2, '=', $$, $3);
}
;
/*
* expr
*
* The lowest precedence expression rules (+ and -) are handled here.
*/
expr: expr '+' factor { $$ = MakeNode(2, '+', $1, $3); }
| expr '-' factor { $$ = MakeNode(2, '-', $1, $3); }
| factor { $$ = $1; }
;
/*
* factor
*
* Multiplication and division are higher precedence than +/- (expr.)
*/
factor: factor '*' element { $$ = MakeNode(2, '*', $1, $3); }
| factor '/' element { $$ = MakeNode(2, '/', $1, $3); }
| element { $$ = $1; }
;
/*
* element
*
* A basic element -- a number, variable, or parenthetical expression.
*/
element: '(' expr ')' { $$ = $2; }
| '(' assgn_expr ')' { $$ = $2; }
| NUMBER { $$ = MakeDataNode('n', $1); }
| VARIABLE { $$ = MakeDataNode('v', $1); }
;
%%
/*****************************************************************************
DAG Functions
Functions for making DAG nodes and attaching children to them.
Nodes other than 'n' and 'v' have data.num set to -1 to indicate that no
scratch register has been assigned to the node by the code generator yet
(see the code generator code and the DAG structure definition for more info.)
*****************************************************************************/
/*
* MakeArbitraryNode():
*
* Creates a node which can have an arbitrary amount of children.
*
* Attaches all of the DAGs in the CHILD_LIST as children to the node (first
* argument.) This is done by counting the number of DAGs in the list,
* allocating that many child pointers, and then taking each element in the
* list and putting it in the newly-allocated child[] array.
*
* Next, the result is checked against all of the other DAG nodes to see if
* there is already one like this, and if so, this one is discarded and the
* existant one is returned.
*
* NOTE: When searching for a matching node in the global node list, the
* "data" member is not checked because it is assumed that it is unused in
* this sort of node.
*/
static DAG *MakeArbitraryNode(int op, CHILD_LIST *cl)
{
CHILD_LIST *l;
DAG *node, *n;
int num_children, i, all_equal;
/*
* Count number of children by iterating through list
*/
for (l = cl, num_children = 0; l != NULL; l = l->next, num_children++)
;
/*
* Make the node
*/
if ((node = calloc(1, sizeof(DAG))) == NULL)
return NULL;
node->op = op;
node->data.num = -1;
/*
* Allocate space for their pointers and then copy them in
*/
node->num_children = num_children;
node->child = malloc(sizeof(DAG *) * num_children);
for (i = 0, l = cl; i < num_children; i++, l = l->next)
node->child[i] = l->node;
/*
* Try to see if a matching node already exists
*/
for (n = node_list; n != NULL; n = n->next)
{
if (n->op == node->op && n->num_children == node->num_children)
{
all_equal = 1; // will be cleared if not all children equal
for (i = 0; i < n->num_children; i++)
{
if (n->child[i] != node->child[i])
{
all_equal = 0;
break;
}
}
if (all_equal)
{
free(node); // destroy current node
return n;
}
}
}
/*
* No match found, insert this node in the global list and return it
*/
node->next = node_list;
node_list = node;
return node;
}
/*
* MakeNode():
*
* Makes a DAG node with the specified number of children (0 is acceptable.)
* The number of children determines how big child[] will be. If the number
* of children is 1 or 2, they are passed as parameters a and b.
*
* NOTE: When searching for a matching node in the global node list, the
* "data" member is not checked because it is assumed that it is unused in
* this sort of node.
*
* MakeNode() is not intended for number ('n') and variable ('v') nodes, use
* MakeDataNode() for that purpose.
*/
static DAG *MakeNode(int num_children, int op, DAG *a, DAG *b)
{
DAG *node;
/*
* Try to see if a matching node exists
*/
for (node = node_list; node != NULL; node = node->next)
{
if (node->op != op) // can't be the same node!
continue;
if (node->num_children != num_children)
continue;
/*
* Ugly and not optimal... I know, but it's easy to understand :)
*/
if (num_children == 0)
return node; // must be a match!
else if (num_children == 1)
{
if (node->child[0] == a)
return node;
}
else if (num_children == 2)
{
if (node->child[0] == a && node->child[1] == b)
return node;
}
}
/*
* Otherwise, allocate a new one
*/
if ((node = calloc(1, sizeof(DAG))) == NULL)
return NULL;
node->next = node_list;
node_list = node;
node->op = op;
node->data.num = -1;
node->num_children = num_children;
if (num_children)
{
node->child = malloc(sizeof(DAG *) * num_children);
node->child[0] = a;
if (num_children == 2)
node->child[1] = b;
}
return node;
}
/*
* MakeDataNode():
*
* Virtually the same as the above MakeNode() function, except it is intended
* for use with 'n' and 'v' nodes only.
*/
static DAG *MakeDataNode(int op, int data)
{
DAG *node;
for (node = node_list; node != NULL; node = node->next)
{
/*
* Only check op and fields... Children are irrelevant for 'n' and 'v'
*/
if (node->op == op)
{
if (op == 'n')
{
if (node->data.num == data)
return node;
}
else // 'v'
{
if (node->data.var == (char) data)
return node;
}
}
}
if ((node = calloc(1, sizeof(DAG))) == NULL)
return NULL;
node->next = node_list;
node_list = node;
node->op = op;
if (op == 'n')
node->data.num = data;
else if (op == 'v')
node->data.var = data;
else
node->data.num = -1; // to indicate this isn't being used (codegen)
node->num_children = 0;
return node;
}
/*****************************************************************************
Code Generator
Code is output in a simplistic quadruple format (op, dest, src, src) with a
scratch variable assigned for each operation. This means that a lot of
scratch variables are generated that could be eliminated, but that would be
fairly trivial as a post-step.
CSE (common sub-expression elimination) is performed during code generation.
It's really quite simple and most of the work was done in constructing the
DAGs themselves. When a scratch variable is found to be assigned to a given
node (which happens once the code generator touches it the first time), it is
re-used.
If a sub-expression is no longer valid (when another statement changes the
contents of a variable used as an operand, for instance), the scratch
variable is disassociated with the node (data.num is set to -1.)
The code generator assumes (especially in InvalidateDependencies()) that
all nodes other than 'n', 'v', and '=' can only be represented by a temporary
variable (whose number is in data.num), Tx.
*****************************************************************************/
static int current_scratch = 0;
/*
* NewScratch():
*
* Returns a new scratch variable.
*/
static int NewScratch()
{
return current_scratch++;
}
/*
* PrintOperand():
*
* Prints the current node as an operand. It assumes that it has already been
* consumed by GenerateCode(). The rules for operand printing are:
*
* - If op == 'n', print data.num.
* - If op == 'v', print data.var.
* - If op == '=', print data.var.
* - Otherwise, data.num contains the number of a scratch variable (Tx)
* that must be printed.
*/
static void PrintOperand(DAG *node)
{
switch (node->op)
{
case 'n':
printf("%d", node->data.num);
break;
case 'v':
case '=':
putchar(node->data.var);
break;
default:
printf("T%d", node->data.num);
break;
}
}
/*
* InvalidateDependencies():
*
* Invalidates all dependencies on the variable which holds this node. Scans
* the global node list to find nodes that have a child which is the current
* node. Recursively eliminates dependencies on dependencies, etc.
*/
static void InvalidateDependencies(DAG *node)
{
DAG *n;
int i;
for (n = node_list; n != NULL; n = n->next)
{
for (i = 0; i < n->num_children; i++)
{
if (n->child[i] == node)
{
InvalidateDependencies(n);
/*
* '=' and 'v' both use data.var to store an actual (a-z,A-Z)
* user-defined variable which must never be invalidated.
*/
if (n->op != '=' && n->op != 'v') // these 2 will always have
an actual (a-z,A-Z) variable
n->data.num = -1;
break;
}
}
}
}
/*
* GenerateCode():
*
* Generates code from a DAG and prints it to stdout.
*/
static void GenerateCode(DAG *node)
{
int i;
if (node == NULL)
return;
switch (node->op)
{
case 's':
/*
* List of statements
*/
for (i = 0; i < node->num_children; i++)
GenerateCode(node->child[i]);
break;
case '=':
/*
* Assignment
*/
GenerateCode(node->child[1]); // child[0] guaranteed to be a var
node->data.var = node->child[0]->data.var; // this node becomes
// equivalent to the var
printf("%c = ", node->child[0]->data.var);
PrintOperand(node->child[1]);
printf("\n");
InvalidateDependencies(node->child[0]);
break;
case '+':
case '-':
case '*':
case '/':
/*
* Binary arithmetic operators
*/
if (node->data.num != -1) // scratch register already assigned?
break; // do nothing (parent node will treat this
// as an operand by re-using the scratch
// register)
GenerateCode(node->child[0]);
GenerateCode(node->child[1]);
node->data.num = NewScratch();
printf("T%d = ", node->data.num);
PrintOperand(node->child[0]);
printf(" %c ", node->op);
PrintOperand(node->child[1]);
printf("\n");
break;
case 'n':
case 'v':
break; // do nothing
}
}
/*****************************************************************************
Lexical Analyzer
A simple lexical analyzer which recognizes tokens from file input.
*****************************************************************************/
static FILE *fp;
static int yylex()
{
int i, num;
if (feof(fp))
return 0;
i = fgetc(fp);
while (i == ' ' || i == '\t' || i == '\r' || i == '\n')
i = fgetc(fp);
if (isdigit(i))
{
num = i - '0';
while (1)
{
i = fgetc(fp);
if (isdigit(i))
{
num *= 10;
num += (i - '0');
}
else
{
ungetc(i, fp);
yylval.num = num;
return NUMBER;
}
}
}
else if (isalpha(i))
{
yylval.var = i;
return VARIABLE;
}
return i;
}
/*****************************************************************************
Main
main(), yyerror(), and friends.
*****************************************************************************/
static void PrintDAG(DAG *node, int indent)
{
int i;
if (node == NULL)
return;
for (i = 0; i < indent * 2; i++)
printf(" ");
if (node->op == 'n')
printf("%d\n", node->data.num);
else if (node->op == 'v')
printf("%c\n", node->data.var);
else
{
printf("%c\n", node->op);
for (i = 0; i < node->num_children; i++)
PrintDAG(node->child[i], indent + 1);
}
}
static void yyerror(const char *msg)
{
fprintf(stderr, "%s\n", msg);
}
int main(int argc, char **argv)
{
if (argc <= 1)
{
puts("usage: cse <file>");
exit(0);
}
else
{
if ((fp = fopen(argv[1], "r")) == NULL)
{
fprintf(stderr, "error: unable to open %s\n", argv[1]);
exit(1);
}
}
node_list = NULL;
if (yyparse())
exit(0);
puts("DAG:\n----\n");
PrintDAG(dag, 0);
puts("\nCode:\n-----\n");
GenerateCode(dag);
return 0;
}
Hello,
I did a little bit of work on a simple CSE demo last night. All that's left
to do is the actual CSE/code generation ;) Right now I've got the grammar
and DAG-construction working. I'll post it here when I'm done (which will
probably be sometime _after_ I see "The Two Towers" today ;)), it may be
useful as a resource for other beginners.
----
Bart