On Sat, Jun 6, 2009 at 9:39 AM, Graham Toal<gtoal@...> wrote:
> It took about 2 days of programming and 2 days of debugging (one of which
> was Rainer's :-) ) to get an x86 code generator working for the toy compiler
> I wrote last year.
>
> There were two bugs in the generated code; Rainer caught the first one, to
> do with comparisons. The second was that I was failing to assign to any
> array elements. Turned out to be simple: I had "mov eax,[ecx]" when I should
> have had "mov [ecx], eax"
>
> It now runs my test program, Ecce, perfectly. (Ecce is an old line-based
> text editor; I actually still use a later variant of it, embedded within
> emacs, as my main editor)
>
> I defined a very simple stack machine, and then wrote a program that read
> the stack-machine assembly code and macro expanded each instruction into a
> functionally equivalent set of x86 instructions.
>
> The generated code is something like a third the speed and three times the
> size of 'proper' x86 code, but it works.
next development - I'm using the stack-based code as an intermediate
code from which to generate more traditional x86 instructions in a
load/store format; one way to look at it is that the code generator
performs pattern-based rewrite rules; another way to look at is is
that it is a nasty peepholing hack :-) it's amazing how similar the
two approaches are.
The new compiler output is at the same URL as before, but if you
looked at the original code I think you will be quite surprised by how
it looks now.
There are still many optimisations that can be made, but it's close to
the diminishing returns point now - the code is now far faster than
regular GCC and only slightly worse than GCC -O.
Have a look for yourself:
http://www.gtoal.com/compilers101/intro/gtoal/interp/ecce.S.txt
The stack-machine instructions are preserved as comments.
Unfortunately the current compiler structure doesn't let me also pass
through the original high-level source, so you'll have to look at
http://www.gtoal.com/compilers101/intro/gtoal/interp/ecce.t.txt if you
want to compare.
The optimisation code is interesting only as an example of how well
code can be improved by a handful (well, about 20) of simple
peepholing rules. It's not a good example of how to write a code
generator. The same optimisations should really be made at a higher
level, but I think it is primarily due to the small number of general
purpose registers in the x86 that this style of code generator can
appear to work well. I don't think this approach would work as well
on an ARM for example, with 16 general purpose registers.
By the way there's about 8 hrs programming work between the original
'dumb' code and this. And this is without knowing any clever x86
programming tricks, just the 'obvious' code.
The trick with peepholing is to define your assumptions about the
generated code and take advantage of them. The same peepholing rules
would generate incorrect code if the input didn't come from your own
compiler. (If you don't take advantage of those assumptions, for
example that after storing a register, that register is never referred
to again until it is reloaded) then you severely limit the
opportunities for this sort of optimisation.
G