Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

compilers101 · Compilers 101

The Yahoo! Groups Product Blog

Check it out!

Group Information

? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Messages

Advanced
Messages Help
Messages 330 - 359 of 1492   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#330 From: Graham Toal <gtoal@...>
Date: Sat Jul 24, 2004 4:44 am
Subject: Good web page
graham_toal
Send Email Send Email
 
I found this guy tonight, I tend to agree with some of his
snide comments ;-)

    http://www.softpanorama.org/Algorithms/compilers.shtml

G

#331 From: Rick Clark <rickclark58@...>
Date: Sat Jul 24, 2004 2:54 pm
Subject: Re: Good web page
rickclark58
Send Email Send Email
 
--- Graham Toal <gtoal@...> wrote:
> I found this guy tonight, I tend to agree with some
> of his
> snide comments ;-)

Very interesting. Thanks for the link.

=====
Rick Clark
http://home.grandecom.net/~rickclark58/



__________________________________
Do you Yahoo!?
Y! Messenger - Communicate in real time. Download now.
http://messenger.yahoo.com

#332 From: Graham Toal <gtoal@...>
Date: Fri Aug 6, 2004 2:08 am
Subject: compiler and book
graham_toal
Send Email Send Email
 
I found another 'intro to compilers' type book online.  Caveat - the code
generation section is extremely brief, which is a shame because that's
the most interesting part of a compiler in my opinion.  Like too many
academic texts this one concentrates on parsing.

http://aleron.dl.sourceforge.net/sourceforge/inger/CompilerConstruction.pdf

I couldn't find the source of the compiler anywhere, just a windows
executable.  Seems rather cheeky to put it on *source*forge..!

G

#333 From: Graham Toal <gtoal@...>
Date: Fri Aug 6, 2004 3:17 am
Subject: good teaching compiler now online!
graham_toal
Send Email Send Email
 
http://not.meko.dk/Hacks/Alvilda/The%20Alvilda%20Optimizing%20Compiler.html

I found this page a couple of years ago but there was no source.

The author has now found his missing source and put it online.

It's pretty good.  It shows all the main techniques of an optimising
compiler, nicely modularised.

(The download is a .tar.gz file with unix-style line endings, so may
not be immediately useful for Windows users.  If necessary I'll put
up an expanded copy on the web this weekend if anyone need it)

G

#334 From: Mauro Persano <mauro_persano@...>
Date: Fri Aug 6, 2004 5:04 pm
Subject: Yet another toy compiler
mauro_persano
Send Email Send Email
 
Hello, group

I wrote a compiler for a small subset of C the other
day. It can deal with things like shortcut operators,
coercing the result of a logical expression into a
scalar, and... not a lot more. The output is i386
assembly code.

It (apparently) generates correct code for Dik
Winter's pi calculator:

int
a=10000,b,c=2800,d,e,f[2801],g;main(){for(;b-c;)f[b++]=a/5;
for(;d=0,g=c*2;c-=14,printf("%.4d",e+d/a),e=d%a)for(b=c;d+=f[b]*a,
f[b]=d%--g,d/=g--,--b;d*=b);}

Well, in case anyone is interested, the source code
is here:

   http://ondska.dyndns.org/~fuse/mcc_i386.tar.gz

I don't see anyone actually learning anything from
this, since the code is so ugly - sorry, I was in a
rush to see if it would work! Anyway, I'm currently
working on another, more complete, one - with a little
more ahead planning this time.

Thanks,

   -- Mauro

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#335 From: "Piyush Narang" <crazybitez@...>
Date: Sun Sep 12, 2004 1:27 pm
Subject: Lexical analyzer for C
dinosaurclub
Send Email Send Email
 
Hi,
   I have to implement a lexical analyzer for the C language...I
however am running into some problems.. ;-<

Firstly It is whether i should have a TOKEN for each operator like + -
* / and so on or one general operator to deal with them all and some
way of differentiating between them...

Secondly i have decided to go in for a simulated dfa for my tokens..I
am not to sure how i should simulate this..Use a function for each
state of the dfa ? Any other ideas?

Thanks,
Piyush

#336 From: "ed_davis2" <ed_davis2@...>
Date: Mon Sep 13, 2004 2:18 pm
Subject: Re: Lexical analyzer for C
ed_davis2
Send Email Send Email
 
--- In compilers101@yahoogroups.com, "Piyush Narang" wrote:
>   I have to implement a lexical analyzer for the C language...I
> however am running into some problems.. ;-<
>
> Firstly It is whether i should have a TOKEN for each operator like + -
> * / and so on or one general operator to deal with them all and some
> way of differentiating between them...

Personal opinion (e.g., I can't prove it either way): I prefer
the former - unique tokens.  It seems simplest to me, but I've
never written a full c compiler - just a subset-c compiler (for a
VM).

> Secondly i have decided to go in for a simulated dfa for my tokens..I
> am not to sure how i should simulate this..Use a function for each
> state of the dfa ? Any other ideas?

Why a DFA?  For a language like C, scanning seems simple enough
that a simple ad hoc scanner seems appropriate.  Using a DFA just
seems to make it harder.

Surely divide and /* are a little difficult, but not difficult
enough to use a DFA?

Again, all of this is just my humble opinion, which is
admittedly not worth the time it took me to compose this :-)

#337 From: Rainer Thonnes <rainer@...>
Date: Mon Sep 13, 2004 4:01 pm
Subject: Re: Re: Lexical analyzer for C
rainer@...
Send Email Send Email
 
On Monday 13 September 2004 2:18 pm, ed_davis2 wrote:

> Why a DFA?  For a language like C, scanning seems simple enough
> that a simple ad hoc scanner seems appropriate.  Using a DFA just
> seems to make it harder.

Are you saying that an ad hoc scanner *isn't* a DFA?  :-)

> Surely divide and /* are a little difficult, but not difficult
> enough to use a DFA?

Another potential problem is how to interpret, say, +++.
Does a+++b mean (a++)+b or a+(++b)?

> Again, all of this is just my humble opinion, which is
> admittedly not worth the time it took me to compose this :-)

I think you're being too humble.

#338 From: "ed_davis2" <ed_davis2@...>
Date: Mon Sep 13, 2004 9:27 pm
Subject: Re: Lexical analyzer for C
ed_davis2
Send Email Send Email
 
--- In compilers101@yahoogroups.com, Rainer Thonnes wrote:
> On Monday 13 September 2004 2:18 pm, ed_davis2 wrote:
>
> > Why a DFA?  For a language like C, scanning seems simple
> > enough that a simple ad hoc scanner seems appropriate.  Using
> > a DFA just seems to make it harder.
>
> Are you saying that an ad hoc scanner *isn't* a DFA?  :-)

It is certainly possible that I don't know a DFA from a NFA from
FSA.  I got them confused in college, and I still do now.

I was under the impression that a DFA was a state machine,
whereas an ad hoc scanner was just, er, well, ad hoc.  For
instance:

These examples are taken from
http://www.scifac.ru.ac.za/compilers/


/* FSA scanner */

   static void initScannerStates() {
     int i;
     for (i = 0; i <= 255; i++) state0[i] = 0;
     for (i = '0'; i <= '9'; i++) state0[i] = 16;
     for (i = 'a'; i <= 'z'; i++) state0[i] = 17;
     for (i = 'A'; i <= 'Z'; i++) state0[i] = 17;
     state0['<'] =  2;  state0['>'] =  4;  state0['='] =  6;
     state0['!'] =  8;  state0['&'] = 10;  state0['|'] = 12;
     state0['('] = 14;  state0[')'] = 15;  state0[EOF] =  1;
   }

   static void getSym() {
   // Scans for next sym from input
     while (ch > EOF && ch <= ' ') getChar();       // skip whitespace
     StringBuffer symLex = new StringBuffer();
     int symKind = noSym;
     int state = state0[ch];
     boolean finished = false;
     while (!finished) {
       symLex.append(ch);
       getChar();
       switch (state) {
         case  1:
           symLex = new StringBuffer("EOF"); symKind = EOFSym; finished
= true; break;
         case  2:
           if (ch == '=') state = 3; else { symKind = lssSym; finished
= true; } break;
         case  3:
           symKind = leqSym; finished = true; break;
         case  4:
           if (ch == '=') state = 5; else { symKind = gtrSym; finished
= true; } break;
         case  5:
           symKind = geqSym; finished = true; break;
         case  6:
           if (ch == '=') state = 7; else { symKind = assignSym;
finished = true; } break;
         case  7:
           symKind = eqlSym; finished = true; break;
         case  8:
           if (ch == '=') state = 9; else { symKind = notSym; finished
= true; } break;
         case  9:
           symKind = neqSym; finished = true; break;
         case 10:
           if (ch == '&') state = 11; else { symKind = noSym; finished
= true; } break;
         case 11:
           symKind = andSym; finished = true; break;
         case 12:
           if (ch == '|') state = 12; else { symKind = noSym; finished
= true; } break;
         case 13:
           symKind = orSym; finished = true; break;
         case 14:
           symKind = lParenSym; finished = true; break;
         case 15:
           symKind = rParenSym; finished = true; break;
         case 16:
           symKind = numSym; finished = !Character.isDigit(ch); break;
         case 17:
           if (!Character.isLetterOrDigit(ch)) {
             finished = true; symKind = literalKind(symLex);
           }
           break;
         default:
           symKind = noSym; finished = true; break;
       }
     }
     sym = new Token(symKind, symLex.toString());
   }

And then an ad hoc scanner:

/*  Ad hoc scanner */

   static void getSym() {
   // Scans for next sym from input
     while (ch > EOF && ch <= ' ') getChar();       // skip whitespace
     StringBuffer symLex = new StringBuffer();
     int symKind;
     if (Character.isLetter(ch)) {
       do {
         symLex.append(ch); getChar();
       } while (Character.isLetterOrDigit(ch));
       symKind = literalKind(symLex);
     }
     else if (Character.isDigit(ch)) {
       do {
         symLex.append(ch); getChar();
       } while (Character.isDigit(ch));
       symKind = numSym;
     }
     else {
       symLex.append(ch);
       switch (ch) {
         case EOF:
           symLex = new StringBuffer("EOF");   // special representation
           symKind = EOFSym; break;            // no need to getChar
here, of course
         case '<':
           symKind = lssSym; getChar();
           if (ch == '=') { symLex.append(ch); symKind = leqSym;
getChar(); }
           break;
         case '>':
           symKind = gtrSym; getChar();
           if (ch == '=') { symLex.append(ch); symKind = geqSym;
getChar(); }
           break;
         case '=':
           symKind = assignSym; getChar();
           if (ch == '=') { symLex.append(ch); symKind = eqlSym;
getChar(); }
           break;
         case '!':
           symKind = notSym; getChar();
           if (ch == '=') { symLex.append(ch); symKind = neqSym;
getChar(); }
           break;
         case '|':
           symKind = noSym; getChar();
           if (ch == '|') { symLex.append(ch); symKind = orSym;
getChar(); }
           break;
         case '&':
           symKind = noSym; getChar();
           if (ch == '&') { symLex.append(ch); symKind = andSym;
getChar(); }
           break;
         case '(':
           symKind = lParenSym; getChar(); break;
         case ')':
           symKind = rParenSym; getChar(); break;
         default:
           symKind = noSym; getChar(); break;
       }
     }
     sym = new Token(symKind, symLex.toString());
   }


> Another potential problem is how to interpret, say, +++.
> Does a+++b mean (a++)+b or a+(++b)?

I was under the assumption that you scan for the longest match,
so given that, a+++b would b (a++)+b.

#339 From: Rainer Thonnes <rainer@...>
Date: Fri Sep 17, 2004 1:31 pm
Subject: Re: Re: Lexical analyzer for C
rainer@...
Send Email Send Email
 
On Monday 13 September 2004 9:27 pm, ed_davis2 wrote:
> --- In compilers101@yahoogroups.com, Rainer Thonnes wrote:
> > On Monday 13 September 2004 2:18 pm, ed_davis2 wrote:
> > > Why a DFA?  For a language like C, scanning seems simple
> > > enough that a simple ad hoc scanner seems appropriate.  Using
> > > a DFA just seems to make it harder.
> >
> > Are you saying that an ad hoc scanner *isn't* a DFA?  :-)
>
> It is certainly possible that I don't know a DFA from a NFA from
> FSA.  I got them confused in college, and I still do now.

DFAs and NFAs are both FSAs (OK, I know it's incorrect to use
an 's' to denote plurals here, as the word is "automata").  The D
and N stand for deterministic and nondeterministic, i.e. without
and with, respectively, crystal balls.  We need not concern
ourselves with the nondeterministic variety.

> I was under the impression that a DFA was a state machine,

Indeed.

> whereas an ad hoc scanner was just, er, well, ad hoc.

But it's still a DFA even though it's ad hoc.  Do not be confused
into believing that a recogniser is only a DFA if it is based on
lookup table technology.

A typical parser will use a hierarchical DFA, often at two levels,
one to recognise tokens in a stream of characters, the other to
recognise code fragments (statements) in a stream of tokens.  Often
the token recogniser is coded ad-hoc and the other one uses a
tabular method, with the table generated separately from something
like a BNF description of the language, or alternatively it uses
code generated by a parser-generator, which can also be described
as ad-hoc.

But there is no reason why one should not do it the other way
round, or any other way.  Various approaches drift into and out of
favour, and at the moment my favourite is to use ad-hoc for both.

#340 From: Rick Clark <rickclark58@...>
Date: Fri Sep 17, 2004 1:23 pm
Subject: Re: Re: Lexical analyzer for C
rickclark58
Send Email Send Email
 
> Why a DFA?  For a language like C, scanning
> seems simple enough that a simple ad hoc scanner
> seems appropriate.

I have written a translator for my AtomVM that uses a
basic/c like syntax. It is quite "ad-hoc", almost
simplistic in fact, yet seems to work very well, it is
very easy to extend and so far, it seems quite solid
at catching syntax errors. I really only use two main
functions: one to scan for statements and control
structures, and the other for assigments.

Of course the syntax is very simple and I did take
some steps to make translating a little easier (since
I could :)).

For example, variables are defined as Dim %a, with the
percent indicating a variable. I don't need a type
specifier since all variables are of type variant. All
statements use () around parameters, so parsing either
a statement, function or control structure is trivial:

dim %c, %d, %e = "This is a string."

%f = 12 + 34.5 * (6 / 4)

%a = sqrt(25)

while (%f < 10)
   %f = %f + 1
   print ("f: ";%f)
endwhile

for (%cnt, %start, %end, %inc)
   print("For Body: "; %cnt)
endfor

Using this format, I found that as I gather tokens, I
just need to check to see if it is an assignment or
statement. Math assignments are recursively evaluated,
so it uses the "built-in" stack that is generated with
recursive functions. Function assignments are
evaluated left to right in a linear fashion, as are
program statements. The control structures are built
recursively.

While this may not be the "right" way to do this, it
seems to work quite well and is extremely easy to
extend. If I need to add a new statement or function,
I just add it to the statement or function selector
and write a new procedure for it. It also seems quite
solid at finding syntax errors, catching all the bad
code I have thrown at it so far.

In case you are interested, here is a sample output
for the while above:

# Atom Assembler Code Generated by BAX
#While Test file for BAX
#while statement
		 addsymbol: f, 0
		 print: "while-endwhile statement:"
L0@  nop:
		 getsymbol: R1, f
		 pushdata: R1
		 set: R1, 10
		 popdata: R2
		 ltz: R2, R1
		 rzflag: R1
		 set: R3, 0
		 eqz: R1, R3
		 brit: L1
		 getsymbol: R1, f
		 pushdata: R1
		 set: R1, 1
		 popdata: R2
		 add: R1, R1, R2
		 setsymbol: f, R1
		 getsymbol: R1, f
		 print: "f: "; R1
		 goto: L0
L1@  nop:
		 end:


=====
Rick Clark
http://home.grandecom.net/~rickclark58/



__________________________________
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!
http://promotions.yahoo.com/new_mail

#341 From: Graham Toal <gtoal@...>
Date: Sat Sep 18, 2004 1:39 am
Subject: Re: Re: Lexical analyzer for C
graham_toal
Send Email Send Email
 
> A typical parser will use a hierarchical DFA, often at two levels,
> one to recognise tokens in a stream of characters, the other to
> recognise code fragments (statements) in a stream of tokens.  Often
> the token recogniser is coded ad-hoc and the other one uses a
> tabular method, with the table generated separately from something
> like a BNF description of the language, or alternatively it uses
> code generated by a parser-generator, which can also be described
> as ad-hoc.
>
> But there is no reason why one should not do it the other way
> round, or any other way.  Various approaches drift into and out of
> favour, and at the moment my favourite is to use ad-hoc for both.

I have for some years had an inkling that it might be possible
to use a single state machine for both grammar and tokenisation.

We've talked before about tries (and DAGs/DAWGs) for recognising
words, eg as an alternative to a hash table.  It's the very same
mechanism that TDRD table-driven compilers (like the Edinburgh
compilers that Rainer will be familiar with) use for parsing
grammar.

Whenever a variable would be declared, it could be inserted in
the grammar at the point where in a normal parser a single token
would be inserted, then the act of parsing would *always* be
a traversal of a state machine at a character by character level.

One consequence of this is that undeclared variables would be
a syntax error rather than a semantic error!  Such a grammar
would have to be heavily augmented with both semantic actions
to be carried out after successful recognition plus parse-time
actions to set a context for error reporting.

I think it is doable and would probably be quite fast, though
it may be rather complex to write or at any rate a distinct
departure from anything that's gone before.  (Unless I was
asleep that day in compiler class and someone has done this
before that I'm not aware of?)

Another consequence of this would be that it would be really
easy to make languages whose syntax could be modified as a
result of declarations within the source being compiler, like
Wonderpop's syntax directed macros.  It would be interesting
to see if a language could be designed as a minimal set of primitives
and then a 'header file' would extend the language constructs
up to the full language spec.

G

#342 From: Rainer Thonnes <rainer@...>
Date: Sat Sep 18, 2004 12:17 pm
Subject: Re: Re: Lexical analyzer for C
rainer@...
Send Email Send Email
 
On Friday 17 September 2004 1:23 pm, Rick Clark wrote:

> Using this format, I found that as I gather tokens, I
> just need to check to see if it is an assignment or
> statement. Math assignments are recursively evaluated,
> so it uses the "built-in" stack that is generated with
> recursive functions.

Does this recursive expression evaluator deal with
precedence of operators?  In other words, does it
evaluate  A+B*C  as  A+(B*C)  or as  (A+B)*C?
If the latter, I guess you'd require the programmer to
use brackets when wanting the former.

It is possible, in a reasonably simple recursive evaluator,
to deal with "natural" precedence even without resorting
to explicit operand and operator stacks, but there's a trick
involved.

#343 From: Rick Clark <rickclark58@...>
Date: Sat Sep 18, 2004 8:21 pm
Subject: Re: Re: Lexical analyzer for C
rickclark58
Send Email Send Email
 
--- Rainer Thonnes <rainer@...> wrote:
> Does this recursive expression evaluator deal with
> precedence of operators?

Yes, it uses the standard operator precedence found in
most languages. Parenthesis are also supported.

=====
Rick Clark
http://home.grandecom.net/~rickclark58/

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#344 From: "M. Uli Kusterer" <Witness.of.TeachText@...>
Date: Sat Sep 18, 2004 9:06 pm
Subject: Re: Re: Lexical analyzer for C
witness_of_t...
Send Email Send Email
 
At 6:23 Uhr -0700 17.09.2004, Rick Clark wrote:
>For example, variables are defined as Dim %a, with the
>percent indicating a variable. I don't need a type
>specifier since all variables are of type variant. All
>statements use () around parameters, so parsing either
>a statement, function or control structure is trivial:
>
>dim %c, %d, %e = "This is a string."

Rick,

   if you're interested in making this a little more mainstream: Most
other C-like languages that have something like this use "$" instead.
This will also avoid conflicts with the "%" operator, which gets you
the remainder of a division.

   Of course, if it's just for yourself, "%" is okay, but if you plan
on distributing this it may increase readability, and everyone who
learns your language will be able to easily move on to PHP and
similar languages.
--
Cheers,
M. Uli Kusterer
------------------------------------------------------------
         "The Witnesses of TeachText are everywhere..."
                     http://www.zathras.de

#345 From: "pennington6809" <rich@...>
Date: Sun Sep 19, 2004 1:40 pm
Subject: Re: Lexical analyzer for C
pennington6809
Send Email Send Email
 
--- In compilers101@yahoogroups.com, Rainer Thonnes <rainer@b...>
wrote:
> On Monday 13 September 2004 9:27 pm, ed_davis2 wrote:
> > --- In compilers101@yahoogroups.com, Rainer Thonnes wrote:
> > > On Monday 13 September 2004 2:18 pm, ed_davis2 wrote:
> > > > Why a DFA?  For a language like C, scanning seems simple
> > > > enough that a simple ad hoc scanner seems appropriate.  Using
> > > > a DFA just seems to make it harder.
> > >
> > > Are you saying that an ad hoc scanner *isn't* a DFA?  :-)
> >
> > It is certainly possible that I don't know a DFA from a NFA from
> > FSA.  I got them confused in college, and I still do now.
>
> DFAs and NFAs are both FSAs (OK, I know it's incorrect to use
> an 's' to denote plurals here, as the word is "automata").  The D
> and N stand for deterministic and nondeterministic, i.e. without
> and with, respectively, crystal balls.  We need not concern
> ourselves with the nondeterministic variety.
>
> > I was under the impression that a DFA was a state machine,
>
> Indeed.
>
> > whereas an ad hoc scanner was just, er, well, ad hoc.
>
> But it's still a DFA even though it's ad hoc.  Do not be confused
> into believing that a recogniser is only a DFA if it is based on
> lookup table technology.
>
> A typical parser will use a hierarchical DFA, often at two levels,
> one to recognise tokens in a stream of characters, the other to
> recognise code fragments (statements) in a stream of tokens.  Often
> the token recogniser is coded ad-hoc and the other one uses a
> tabular method, with the table generated separately from something
> like a BNF description of the language, or alternatively it uses
> code generated by a parser-generator, which can also be described
> as ad-hoc.
>
> But there is no reason why one should not do it the other way
> round, or any other way.  Various approaches drift into and out of
> favour, and at the moment my favourite is to use ad-hoc for both.

First of all, I'd like to say "hi". I just stumbled onto this list.
I read some of the previous posts and it looks as if you have been
discussing some interesting topics.

As a bit of an introduction, I used to write compilers for a living.
I've written C and Modula-2 compilers (and assemblers, linkers, etc.)
for several microcontrollers (Motorola 68XX and 68XXX, National
32000).

To get on topic, lately I've been using DFAs for both scanning and
parsing. All my previous compilers were ad-hoc.

I prefer this new (to me!) method because it gives me a lot of
flexibility. I can make changes to the lexer and the parser using the
higher level representation (in my case, regular expressions for the
lexer and a yacc-like syntax for the grammar).

I'm working on a little project now that uses these ideas. There is
more information on this project at
http://pennware.com/language/introduction.html
I have some external and internal documentation on line as well as
examples of syntax definition files, etc. The documentation is about 6
months out of date: I'm hoping to update it soon.

One of the interesting (again, to me at least!) aspects of this
project is that the lexical/syntax/parse tree definition file is read
in at target language compile time, so it is very convienient to make
changes and see how they affect the resulting parse.

Another interesting aspect of this project is that the parse tree is
generated in a way that makes it look very lisp-like. I'm currently
implementing a small built-in lisp interpreter to allow manipulation
of the parse tree.

-Rich

#346 From: "Chris Cranford" <chris.cranford@...>
Date: Sun Sep 19, 2004 2:06 pm
Subject: Stack Machines
chris.cranford@...
Send Email Send Email
 
I'm in the process of developing a stack machine runtime environment and
have a few implementation questions.

First, is there any recommended layout design for the memory map of the
stack machine?  I've seen various articles refer to the memory layout in
different ways.  Some have used a single memory block partitioned for
different purposes such as CODE, STACK, HEAP, and LITERAL pool and others
have used two memory blocks where one stored the CODE and as the code is
loaded, a portion of the other is set aside for the LITERAL POOL, STACK, and
HEAP.  Any suggestions?

Secondly, I realize that my opcodes can either reference data using a
pointer that resides on the stack that much be dereferenced during the
operation or via a literal value that is stored on the stack.  But what
confuses me is the understanding of how this works in conjunction with
variables of different types, such as an integer or string.

Assuming a very basic-like language syntax:

Dim i as Integer = 2
Print i

In this case, i is a local value referenced off the stack.  So, the
following PCODE could be used to represent the above operations:

DSP    1    // SP=SP-1 (allocate variable i)
PUSHI  2    // push integer value 2
ADR    0    // push address of i
STORI       // store SOS in address TOS
ADR    0    // push address of i
LOADI       // dereference address to literal
PRTI        // print integer on TOS
PRTLN       // print a new line
ISP    1    // SP=SP+1 (deallocate variable i)
HALT        // halt execution

Now if the variable had been a string, it could have been declared two
different ways:

// Heap stored
Dim s1 as String
s1 = "Hello World"
Print s1

Tranlates to:

DSP    1             // SP=SP-1 (allocate variable s1)
PUSHS  "Hello World" // look up label and push string id
ADR    0             // push address of s1
STORS                // dereference string id to contents
                      // from literal pool and allocate a
                      // heap block area storing the heap
                      // pointer in address 0 (s1)
ADR    0             // push address of s1
PRTS                 // dereferences s1 (heap pointer) to
                      // the string and sends to output.
PRTLN                // print a new line
ADR    0             // push address of s1
ZAPS                 // destroy heap pointer to TOS address
ISP    1             // SP=SP+1 (deallocate variable s1)
HALT                 // halt execution

// Stack stored
Dim s2 as String * 30
s2 = "Hello World"
Print s2

Translates to:

DSP   1              // SP=SP-1 (allocate variable s2)
PUSHS "Hello World"  // lookup label and push string id
ADR   0              // push address of s2
STRFS 30             // allocates 8 4-byte stack slots
                      // to hold the string which is
                      // copied byte/byte from literal
                      // pool directly onto the stack.
                      // The starting SP for the string
                      // is then stored in s2.
ADR   0              // push address of s2
PRTFS                // dereferences s2 to stack pointer
                      // where "Hello World" starts and
                      // sends the contents to output.
PRTLN                // print a new line
ISP   1              // SP=SP+1 (deallocate variable s2)
HALT                 // halt execution

Does this look right? Can anyone offer any input or feedback?

Thanks
Chris
--

#347 From: Richard Pennington <rich@...>
Date: Sun Sep 19, 2004 3:14 pm
Subject: Re: Stack Machines
pennington6809
Send Email Send Email
 
Hi Chris,

I have a couple of questions and observations. See below.

On Sunday 19 September 2004 09:06 am, Chris Cranford wrote:
> I'm in the process of developing a stack machine runtime environment and
> have a few implementation questions.
>
> First, is there any recommended layout design for the memory map of the
> stack machine?  I've seen various articles refer to the memory layout in
> different ways.  Some have used a single memory block partitioned for
> different purposes such as CODE, STACK, HEAP, and LITERAL pool and others
> have used two memory blocks where one stored the CODE and as the code is
> loaded, a portion of the other is set aside for the LITERAL POOL, STACK,
> and HEAP.  Any suggestions?

Do you really want to differentiate addresses? By that I mean: Do you want
things in the stack to be in a different address space that things in the
heap or the constant pool?

I think it would simplify things if you invisioned the complete memory as one
big space and split it up between the CODE, STACK, HEAP, and LITERAL areas.

One common method:

Top of memory

STACK
... (empty space for the stack and heap to grow)
HEAP
LITERAL
CODE

Bottom of memory

The stack grows down, the heap grows up. If they cross, *bang*! Out of memory.

>
> Secondly, I realize that my opcodes can either reference data using a
> pointer that resides on the stack that much be dereferenced during the
> operation or via a literal value that is stored on the stack.  But what
> confuses me is the understanding of how this works in conjunction with
> variables of different types, such as an integer or string.
>
> Assuming a very basic-like language syntax:
>
> Dim i as Integer = 2
> Print i
>
> In this case, i is a local value referenced off the stack.  So, the
> following PCODE could be used to represent the above operations:
>
> DSP    1    // SP=SP-1 (allocate variable i)
> PUSHI  2    // push integer value 2
> ADR    0    // push address of i
> STORI       // store SOS in address TOS
> ADR    0    // push address of i
> LOADI       // dereference address to literal
> PRTI        // print integer on TOS
> PRTLN       // print a new line
> ISP    1    // SP=SP+1 (deallocate variable i)
> HALT        // halt execution

I'm not sure I understand your opcodes correctly. Shouldn't the first ADR
instruction be:

ADR 1

??

I picture this:

(Assume SP starts at 10)

DSP 1  // SP = 9
PSHI 2  // SP = 8, *SP = 2
ADR 1  // SP = 7, *SP = 9
(I assume 1 is the offset in the stack from where the offset is. Another
alternative is to save the actual stack address (i.e. 9) on the stack).
STORI  // Store *(SP+1) into *SP, SP = SP + 2
ADR 0  // SP = 8, *SP = 9
LOADI  // *SP = **SP
PRTI 	 // print *SP, SP = SP + 1
PRTNL  // print "\n"
ISP 1 	 // SP = SP - 1, deallocate i
HALT

Not that the offset of I changes as various things are pushed on the stack.

>
> Now if the variable had been a string, it could have been declared two
> different ways:
>
> // Heap stored
> Dim s1 as String
> s1 = "Hello World"
> Print s1
>
> Tranlates to:
>
> DSP    1             // SP=SP-1 (allocate variable s1)
> PUSHS  "Hello World" // look up label and push string id
> ADR    0             // push address of s1
> STORS                // dereference string id to contents
>                      // from literal pool and allocate a
>                      // heap block area storing the heap
>                      // pointer in address 0 (s1)
> ADR    0             // push address of s1
> PRTS                 // dereferences s1 (heap pointer) to
>                      // the string and sends to output.
> PRTLN                // print a new line
> ADR    0             // push address of s1
> ZAPS                 // destroy heap pointer to TOS address
> ISP    1             // SP=SP+1 (deallocate variable s1)
> HALT                 // halt execution

This is OK, if the offset of s1 changes as i's did above.

>
> // Stack stored
> Dim s2 as String * 30
> s2 = "Hello World"
> Print s2
>
> Translates to:
>
> DSP   1              // SP=SP-1 (allocate variable s2)
> PUSHS "Hello World"  // lookup label and push string id
> ADR   0              // push address of s2
> STRFS 30             // allocates 8 4-byte stack slots
>                      // to hold the string which is
>                      // copied byte/byte from literal
>                      // pool directly onto the stack.
>                      // The starting SP for the string
>                      // is then stored in s2.
> ADR   0              // push address of s2

s2's offset should be something like 30 now, right?

Oh, wait! I think you mean:

DSP 30  // SP = SP - 30 (allocate s2)
PSHS "Hello World" // SP = SP - 1, *SP = &"hello world"
ADR 1  // SP = SP - 1; *SP = (base of s2 on the stack)
STRFS  // store string pointed to by *(SP + 1) into **SP, SP = SP + 2.
ADR 0  // SP = SP - 1; *SP = (base of s2)

> PRTFS                // dereferences s2 to stack pointer
>                      // where "Hello World" starts and
>                      // sends the contents to output.
> PRTLN                // print a new line
> ISP   1              // SP=SP+1 (deallocate variable s2)

This should probably be

    ISP 30

> HALT                 // halt execution
>
> Does this look right? Can anyone offer any input or feedback?
>
> Thanks
> Chris

-Rich

--
Richard Pennington
Email: rich@...
http://www.pennware.com ftp://ftp.pennware.com

#348 From: "Chris Cranford" <chris.cranford@...>
Date: Mon Sep 20, 2004 12:04 am
Subject: Re: Stack Machines
chris.cranford@...
Send Email Send Email
 
Hi Richard,

> > First, is there any recommended layout design for the memory map of the
> > stack machine?  I've seen various articles refer to the memory layout in
> > different ways.  Some have used a single memory block partitioned for
> > different purposes such as CODE, STACK, HEAP, and LITERAL pool and
others
> > have used two memory blocks where one stored the CODE and as the code is
> > loaded, a portion of the other is set aside for the LITERAL POOL, STACK,
> > and HEAP.  Any suggestions?
>
> Do you really want to differentiate addresses? By that I mean: Do you want
> things in the stack to be in a different address space that things in the
> heap or the constant pool?

They don't have to be.  The goal however is to make the code generation
process easy and to keep the address calculations easy when the bytecode
is being rendered.  Otherwise, it doesn't matter to me.

> I think it would simplify things if you invisioned the complete memory as
one
> big space and split it up between the CODE, STACK, HEAP, and LITERAL
areas.
>
> One common method:
>
> Top of memory
>
> STACK
> ... (empty space for the stack and heap to grow)
> HEAP
> LITERAL
> CODE
>
> Bottom of memory
>
> The stack grows down, the heap grows up. If they cross, *bang*! Out of
memory.

Does this mean that I need the following pointers:

   PC - program counter
   SP - stack pointer
   BP - base pointer
   MP - mark stack pointer
   HP - heap pointer ??

The last is what I'm a little confused about.  I honestly haven't seen any
implementations with such a pointer so I'm curious how I am suppose to know
where the heap begins and where the last heap allocation has occured?

The remainder of your post makes perfect since I think I have finally
grasped how I am to handle variables of both integers and strings with the
exception of my heap questions above.

Thanks,
Chris

#349 From: Richard Pennington <rich@...>
Date: Mon Sep 20, 2004 1:05 am
Subject: Re: Stack Machines
pennington6809
Send Email Send Email
 
Hi Chris,

(Comments below)
On Sunday 19 September 2004 07:04 pm, Chris Cranford wrote:
> Hi Richard,
[snip]
> > One common method:
> >
> > Top of memory
> >
> > STACK
> > ... (empty space for the stack and heap to grow)
> > HEAP
> > LITERAL
> > CODE
> >
> > Bottom of memory
> >
> > The stack grows down, the heap grows up. If they cross, *bang*! Out of
>
> memory.
>
> Does this mean that I need the following pointers:
>
>   PC - program counter
>   SP - stack pointer
>   BP - base pointer
>   MP - mark stack pointer
>   HP - heap pointer ??

I would have:

PC - point to the current instruction.
SP - point to the bottom of the stack.
HP - point to the next available heap memory.

I'm not sure what functionality your BP and MP have, but the above three
registers are enough to implement a bytecode interpreter for a language that
has local variables and heap based allocations.

If you what to support a language with nested functions/procedures you
probably also want a frame pointer (which may be what you mean by BP or MP).

>
> The last is what I'm a little confused about.  I honestly haven't seen any
> implementations with such a pointer so I'm curious how I am suppose to know
> where the heap begins and where the last heap allocation has occured?

In the memory map I gave above,

PC will point into CODE,
SP (and the framepointer, if any) will point into the STACK
HP will start out pointing at the bottom of the heap.

Basically, HP is used to implement the functionality of the *nix sbrk()
function like this (psuedo code):

void *sbrk(int size)
{
     void *tmp = HP;
     HP = HP + size;
    return tmp;
}

You will need a bit more code around this to implement malloc() and free() or
new and delete type functions. You'll have to maintain a free list of memory
that has been returned to the heap and only call the sbrk() function if now
block of memory in the free list is large enough to satisfy the memory
request.

You don't need HP (or a heap at all) if your language doesn't support
dynamically allocated memory, so a simple byte code interpreter may only need
PC and SP.

>
> The remainder of your post makes perfect since I think I have finally
> grasped how I am to handle variables of both integers and strings with the
> exception of my heap questions above.
>
> Thanks,
> Chris
>

-Rich

--
Richard Pennington
Email: rich@...
http://www.pennware.com ftp://ftp.pennware.com

#350 From: "ed_davis2" <ed_davis2@...>
Date: Mon Sep 20, 2004 3:52 pm
Subject: Re: Lexical analyzer for C
ed_davis2
Send Email Send Email
 
--- Rainer Thonnes wrote:

> DFAs and NFAs are both FSAs (OK, I know it's incorrect to use
> an 's' to denote plurals here, as the word is "automata").  The
> D and N stand for deterministic and nondeterministic, i.e.
> without and with, respectively, crystal balls.  We need not
> concern ourselves with the nondeterministic variety.
>
> > I was under the impression that a DFA was a state machine,
>
> Indeed.
>
> > whereas an ad hoc scanner was just, er, well, ad hoc.
>
> But it's still a DFA even though it's ad hoc.  Do not be
> confused into believing that a recogniser is only a DFA if it
> is based on lookup table technology.

Thanks, this makes things clearer.

> But there is no reason why one should not do it the other way
> round, or any other way.  Various approaches drift into and out
> of favour, and at the moment my favourite is to use ad-hoc for
> both.

Me too.  I've become good enough with YACC and COCO/R that I can
do what I need to, but when whatever it is I'm translating gets
complicated enough, I end up fighting the tool.

Wirth's translation rules regarding EBNF are so simple, that is
what I have been using.  A quick summary:

EBNF:
     "x" - a terminal symbol
     sym - a non-terminal

(sym1 sym2 sym3) - used for grouping
[sym]   - the enclosed symbol occurs 0 or 1 times
{sym}   - the enclosed symbol occurs 0 or more times
sym1|sym2 - the or symbol - either sym1 or sym2

Examples:

     identifiers and integers:
     -------------------------
     identifier = letter {letter | digit}
     integer = digit {digit}
     letter = "a" | "b" | ... | "z"
     digit = "0" | "1" | ... | "9"

     an if statement:
     ----------------
     "if" expr "then" stmt-seq ["else" stmt_seq "endif"]

EBNF construct, followed by translation rule:

sym denotes a global variable always representing the symbol last
read from the source text by a call to procedure next;

"x"
     if sym = "x" then next() else error endif
(exp)
     pr(exp)
[exp]
     if sym in first(exp) then pr(exp) endif
{exp}
     while sym in first(exp) do pr(exp) enddo
fac-0 fac-1 ... fac-n
     pr(fac-0); pr(fac-1); ... pr(fac-n);
term-0 | term-1 | ... | term-n
     case sym of
         when first(term-0): pr(term-0);
         when first(term-1): pr(term-1);
         ...

         when first(term-n): pr(term-n);
     endcase

So, to parse the "if statement", using the translation rules
gives the following:

if sym = if_sym then
     next();
     expr();
     if sym = then_sym then next() else error() endif
     stmt_seq();
     if sym = else_sym then
         next();
         stmt_seq();
     endif
     if sym = endif_sym then next() else error() endif
endif

Then again, this is so simple and standardized, it just begs to
be automated.  And of course it has been, many times.  Where it
gets complicated (for me at least) is attributing the grammar, so
as to build a parse tree, and error handling.

Hmmm, maybe I've talked myself into working again with COCO/R, or
perhaps giving Antlr a look.

#351 From: "ed_davis2" <ed_davis2@...>
Date: Mon Sep 20, 2004 4:09 pm
Subject: Re: Lexical analyzer for C
ed_davis2
Send Email Send Email
 
--- Rick Clark wrote:

> For example, variables are defined as Dim %a, with the
> percent indicating a variable. I don't need a type
> specifier since all variables are of type variant. All
> statements use () around parameters, so parsing either
> a statement, function or control structure is trivial:
>
> dim %c, %d, %e = "This is a string."
>
> %f = 12 + 34.5 * (6 / 4)
>
> %a = sqrt(25)
>
> while (%f < 10)
>   %f = %f + 1
>   print ("f: ";%f)
> endwhile
>
> for (%cnt, %start, %end, %inc)
>   print("For Body: "; %cnt)
> endfor

Please don't take this the wrong way.  The following is only my
simple opinion.  And be sure and take it for what it cost you
(nothing!)

I strongly dislike adding a symbol to variable names.  For me, it
is just too much noise, and makes the source much too hard to
read.

From what you have shown, it appears that all statements
(except assignments) begin with a keyword.  Therefore, I don't
see the need to prefix variables with a symbol.

Again, this is just my opinion.  Since I wasn't raised on BASIC
(Assembly -> COBOL -> Pascal -> C and so on), the %foo just looks
weird.  Yes, EXEC2 has its &var, but at least they got rid of
that in REXX (just in case you're an old IBM mainframer).  And
Perl and PHP have those abominations too.  But I don't have to
like them :-)

While not quite as bad, I don't particularly care for the
parenthesis around expressions, but rather prefer the "then" for
if's and "do" for loops, in order to separate the expression part
from the statement part.

In any case, good job!  And I hope you'll continue posting your
progress. Atom has certainly come a long way!

#352 From: "ed_davis2" <ed_davis2@...>
Date: Mon Sep 20, 2004 4:29 pm
Subject: Re: Lexical analyzer for C
ed_davis2
Send Email Send Email
 
--- pennington6809 wrote:

> I prefer this new (to me!) method because it gives me a lot of
> flexibility. I can make changes to the lexer and the parser
> using the higher level representation (in my case, regular
> expressions for the lexer and a yacc-like syntax for the
> grammar).

I don't have nearly as much compiler experience as you, but in my
dabblings, I've always been enamored with the tools (for the
very reason you state above), but once I get into something more
complicated, I end up fighting the tool.  Of course, I could be
using the wrong tools (YACC or COCO/R), or maybe I don't know the
tool as well as I think I do.

Why do you use BNF rather than EBNF?  EBNF just seems so much
easier to use (repetition) rather than BNF (recursion).  Is EBNF
somehow tied to less powerful grammars?  Is it easier to specify
attributes for a BNF grammar?

> I'm working on a little project now that uses these ideas.
> There is more information on this project at
> http://pennware.com/language/introduction.html I have some
> external and internal documentation on line as well as examples
> of syntax definition files, etc. The documentation is about 6
> months out of date: I'm hoping to update it soon.

Cool!  Will you (one day) be making this available for others to
try out?

> Another interesting aspect of this project is that the parse
> tree is generated in a way that makes it look very lisp-like.
> I'm currently implementing a small built-in lisp interpreter to
> allow manipulation of the parse tree.

So, does this mean that you probably should write the entire tool
in lisp? :-)

Sorry about that - I've recently been reading lots of stuff by
Paul Graham: http://www.paulgraham.com/

He thinks that the best programmers use the richest language,
which of course many would say would be lisp.

Lots of interesting essay's there, for instance:

http://www.paulgraham.com/iflisp.html
http://www.paulgraham.com/icad.html

For lots of fun, be sure to read this:

http://www.paulgraham.com/fix.html

#353 From: "ed_davis2" <ed_davis2@...>
Date: Mon Sep 20, 2004 4:47 pm
Subject: Re: Stack Machines
ed_davis2
Send Email Send Email
 
--- Chris Cranford wrote:

> First, is there any recommended layout design for the memory
> map of the stack machine?  I've seen various articles refer to
> the memory layout in different ways.  Some have used a single
> memory block partitioned for different purposes such as CODE,
> STACK, HEAP, and LITERAL pool and others have used two memory
> blocks where one stored the CODE and as the code is loaded, a
> portion of the other is set aside for the LITERAL POOL, STACK,
> and HEAP.  Any suggestions?

I've used the latter.  When I load a binary, the header tells me
how much is code, how much is static data, and how much
non-static data is needed.  So, the second method just seems
convenient.  But, it can make stack/heap collisions a problem.

In the former method, at least the stack and heap are in separate
data spaces, so possibly you can resize them when they run out.

But, then you have to keep up with additional segment registers,
so it is definitely a trade-off.

A confession - when I added strings to a simple macro language,
handled as in the latter case, I put strings into separate
storage.  Pointers to them are in the data-stack segment, but the
actual string data storage was somewhere else.

Also, none of my toy's have had dynamic allocation (as in malloc
and friends) available to the user, so I have never had a need
for a heap.

> Dim i as Integer = 2 Print i
>
> In this case, i is a local value referenced off the stack.  So,
> the following PCODE could be used to represent the above
> operations:

> DSP    1    // SP=SP-1 (allocate variable i)
> PUSHI  2    // push integer value 2
> ADR    0    // push address of i
> STORI       // store SOS in address TOS
> ADR    0    // push address of i
> LOADI       // dereference address to literal
> PRTI        // print integer on TOS
> PRTLN       // print a new line
> ISP    1    // SP=SP+1 (deallocate variable i)
> HALT        // halt execution

You will probably find that "push address of variable;
dereference address" to be very common sequence.  It might be a
good idea to come up with an instruction that performs both of
those steps.

#354 From: Rick Clark <rickclark58@...>
Date: Mon Sep 20, 2004 5:12 pm
Subject: Re: Re: Lexical analyzer for C
rickclark58
Send Email Send Email
 
--- ed_davis2 <ed_davis2@...> wrote:

Hey Ed,

> Please don't take this the wrong way.  The following
> is only my
> simple opinion.  And be sure and take it for what it
> cost you
> (nothing!)

Never. I always appreciate your input. Since I don't
know what I am doing anyway :), any input is helpful.

> I strongly dislike adding a symbol to variable
> names.  For me, it
> is just too much noise, and makes the source much
> too hard to
> read.

I am of a different view on this. I think that it is
easier to read source when you can differentiate
between keywords and variables. When I see foo buried
down in the code somewhere, I can't tell if this is a
keyword or variable without looking it up. However
@foo, is clearly a variable. (BTW, I am using the
ampersand (@) now for variable designators since it
was brought to my attention that the % identifier
might cause confusion.)

> While not quite as bad, I don't particularly care
> for the
> parenthesis around expressions, but rather prefer
> the "then" for
> if's and "do" for loops, in order to separate the
> expression part
> from the statement part.

I have always thought that having to write "then" was
just too much typing for little gain in readability,
so I chose the C way on this. However, the main reason
is that it makes parsing much easier and allows me to
make a single pass over the language constructs in the
translation process. It is not that I am lazy, just
mostly clueless when it comes to compiler concepts. :)

> In any case, good job!  And I hope you'll continue
> posting your
> progress. Atom has certainly come a long way!
>

Thank you! Again, your input is both appreciated and
always helpful.



=====
Rick Clark
http://home.grandecom.net/~rickclark58/



_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com

#355 From: "Chris Cranford" <chris.cranford@...>
Date: Mon Sep 20, 2004 9:22 pm
Subject: Re: Re: Stack Machines
chris.cranford@...
Send Email Send Email
 
Can someone here show me how you declared your stack?  I totally forgot to
think about the concept of different data types that play an important role
in things like division computation where real (fractions) come up and have
to be considered.

Thanks for the input
Chris

#356 From: Richard Pennington <rich@...>
Date: Mon Sep 20, 2004 9:59 pm
Subject: Re: Re: Lexical analyzer for C
pennington6809
Send Email Send Email
 
Hi Ed,

(Comments below)
On Monday 20 September 2004 11:29 am, ed_davis2 wrote:
> --- pennington6809 wrote:
> > I prefer this new (to me!) method because it gives me a lot of
> > flexibility. I can make changes to the lexer and the parser
> > using the higher level representation (in my case, regular
> > expressions for the lexer and a yacc-like syntax for the
> > grammar).
>
> I don't have nearly as much compiler experience as you, but in my
> dabblings, I've always been enamored with the tools (for the
> very reason you state above), but once I get into something more
> complicated, I end up fighting the tool.  Of course, I could be
> using the wrong tools (YACC or COCO/R), or maybe I don't know the
> tool as well as I think I do.

I'm enamored as well. That's why I'm writing my own tools.

>
> Why do you use BNF rather than EBNF?  EBNF just seems so much
> easier to use (repetition) rather than BNF (recursion).  Is EBNF
> somehow tied to less powerful grammars?  Is it easier to specify
> attributes for a BNF grammar?

Sorry for the confusion. My grammars do support [] and ? for one or more and
{} and * for zero or more (e.g.):

ifstate: if expr then statement [else statement].

>
> > I'm working on a little project now that uses these ideas.
> > There is more information on this project at
> > http://pennware.com/language/introduction.html I have some
> > external and internal documentation on line as well as examples
> > of syntax definition files, etc. The documentation is about 6
> > months out of date: I'm hoping to update it soon.
>
> Cool!  Will you (one day) be making this available for others to
> try out?

Yes. I'm hoping to have something usable by the end of the year.

>
> > Another interesting aspect of this project is that the parse
> > tree is generated in a way that makes it look very lisp-like.
> > I'm currently implementing a small built-in lisp interpreter to
> > allow manipulation of the parse tree.
>
> So, does this mean that you probably should write the entire tool
> in lisp? :-)

:-)

No, I want to keep the lisp-like stuff limited to the part that translates
syntax trees to the internal form. I hope it is a very thin layer.

>
> Sorry about that - I've recently been reading lots of stuff by
> Paul Graham: http://www.paulgraham.com/
>
> He thinks that the best programmers use the richest language,
> which of course many would say would be lisp.
>
> Lots of interesting essay's there, for instance:
>
> http://www.paulgraham.com/iflisp.html
> http://www.paulgraham.com/icad.html
>
> For lots of fun, be sure to read this:
>
> http://www.paulgraham.com/fix.html
>

I'll check these out. I agree about the richest language part, however. I just
happen to want to use a different language for each part: (E)BNF for grammar,
lisp for tree manipulation, (???) for optimization (I'm working on that one
;-).

-Rich
--
Richard Pennington
Email: rich@...
http://www.pennware.com ftp://ftp.pennware.com

#357 From: "ed_davis2" <ed_davis2@...>
Date: Tue Sep 21, 2004 12:42 am
Subject: Re: Stack Machines
ed_davis2
Send Email Send Email
 
--- Chris Cranford wrote:

> Can someone here show me how you declared your stack?  I
> totally forgot to think about the concept of different data
> types that play an important role in things like division
> computation where real (fractions) come up and have to be
> considered.

This one just has integers and reals.  It is from a toy language
I was playing with, trying to see how I could use both integers
and reals:

typedef union {
     int i;
     double r;
} Stack;

This one has integers, reals, and characters.  It is from another
toy language that I was working on to see if I could extend the
first one to have a variant type:

typedef struct {
     int type;
     union {
         int i;
         double r;
         char c;
     };
} Stack;

#358 From: "Chris Cranford" <chris.cranford@...>
Date: Tue Sep 21, 2004 1:38 am
Subject: Re: Re: Stack Machines
chris.cranford@...
Send Email Send Email
 
> > Can someone here show me how you declared your stack?  I
> > totally forgot to think about the concept of different data
> > types that play an important role in things like division
> > computation where real (fractions) come up and have to be
> > considered.
>
> This one just has integers and reals.  It is from a toy language
> I was playing with, trying to see how I could use both integers
> and reals:
>
> typedef union {
>     int i;
>     double r;
> } Stack;
>
> This one has integers, reals, and characters.  It is from another
> toy language that I was working on to see if I could extend the
> first one to have a variant type:
>
> typedef struct {
>     int type;
>     union {
>         int i;
>         double r;
>         char c;
>     };
> } Stack;

Ah, so you basically allocate an array of Stack objects populating each
object according to the type of data being pushed.  I suppose that is one
way and probably the most straight forward approach.

But if I wanted to allocate 1 big block of memory as follows:

   +--------+ 0x0000
   |  CODE  |
   +--------+
   | LIT PL |
   +--------+
   |  HEAP  |
   +-/    /-+
   |  STCK  |
   +--------+ 0xFFFF

And my code segment was to push integer with value 1 and print it, I would
need:

   Stack Memory[32768]; // 32k block

   Memory[0].type = TYPE_INT;
   Memory[0].i    = 0x01; // push
   Memory[1].type = TYPE_INT;
   Memory[1].i    = 0x01; // operand 1
   Memory[2].type = TYPE_INT;
   Memory[2].i    = 0x02; // print integer
   Memory[3].type = TYPE_INT;
   Memory[3].i    = 0x03; // halt

This would imply that the HP=4, SP=32768.  During execution, the SP would
get decremented to 32767 and this slot would be populated with an integer
type with value 1 and then the print operation would pop that value from the
stack and send it to the output stream.

Any other alternative on how to do this large memory block but be able to
support multiple data types at the stack and heap level?

I was looking at some code earlier today that another program generates.
Does this make any sense to you:

STORE SCALAR 1  // stores byte
FETCH SCALAR 1  // fetchs byte
STORE SCALAR 1  // stores boolean
FETCH SCALAR -1 // fetchs boolean
STORE SCALAR 4  // stores integer
FETCH SCALAR -4 // fetchs integer
STORE SCALAR 4  // stores real
FETCH SCALAR -4 // fetchs real
STORE SCALAR 2  // stores short
FETCH SCALAR -2 // fetchs short
STORE SCALAR 2  // stores word
FETCH SCALAR 2  // fetchs word

From what I gather, the store scalar operand denotes the number of bytes
required to store the numeric value; however, the fetch scalar sometimes
uses a positive or negative operand.  Maybe this denotes some form of
byte-reorder required based on the stored stack value?

Thanks
Chris

#359 From: Richard Pennington <rich@...>
Date: Tue Sep 21, 2004 2:37 am
Subject: Re: Re: Stack Machines
pennington6809
Send Email Send Email
 
On Monday 20 September 2004 08:38 pm, Chris Cranford wrote:
> > > Can someone here show me how you declared your stack?  I
> > > totally forgot to think about the concept of different data
> > > types that play an important role in things like division
> > > computation where real (fractions) come up and have to be
> > > considered.
> >
> > This one just has integers and reals.  It is from a toy language
> > I was playing with, trying to see how I could use both integers
> > and reals:
> >
> > typedef union {
> >     int i;
> >     double r;
> > } Stack;
> >
> > This one has integers, reals, and characters.  It is from another
> > toy language that I was working on to see if I could extend the
> > first one to have a variant type:
> >
> > typedef struct {
> >     int type;
> >     union {
> >         int i;
> >         double r;
> >         char c;
> >     };
> > } Stack;
>
> Ah, so you basically allocate an array of Stack objects populating each
> object according to the type of data being pushed.  I suppose that is one
> way and probably the most straight forward approach.
>
> But if I wanted to allocate 1 big block of memory as follows:
>
>   +--------+ 0x0000
>
>   |  CODE  |
>
>   +--------+
>
>   | LIT PL |
>
>   +--------+
>
>   |  HEAP  |
>
>   +-/    /-+
>
>   |  STCK  |
>
>   +--------+ 0xFFFF
>
> And my code segment was to push integer with value 1 and print it, I would
> need:
>
>   Stack Memory[32768]; // 32k block
>
>   Memory[0].type = TYPE_INT;
>   Memory[0].i    = 0x01; // push
>   Memory[1].type = TYPE_INT;
>   Memory[1].i    = 0x01; // operand 1
>   Memory[2].type = TYPE_INT;
>   Memory[2].i    = 0x02; // print integer
>   Memory[3].type = TYPE_INT;
>   Memory[3].i    = 0x03; // halt
>
> This would imply that the HP=4, SP=32768.  During execution, the SP would
> get decremented to 32767 and this slot would be populated with an integer
> type with value 1 and then the print operation would pop that value from
> the stack and send it to the output stream.
>
> Any other alternative on how to do this large memory block but be able to
> support multiple data types at the stack and heap level?
>
> I was looking at some code earlier today that another program generates.
> Does this make any sense to you:
>
> STORE SCALAR 1  // stores byte
> FETCH SCALAR 1  // fetchs byte
> STORE SCALAR 1  // stores boolean
> FETCH SCALAR -1 // fetchs boolean
> STORE SCALAR 4  // stores integer
> FETCH SCALAR -4 // fetchs integer
> STORE SCALAR 4  // stores real
> FETCH SCALAR -4 // fetchs real
> STORE SCALAR 2  // stores short
> FETCH SCALAR -2 // fetchs short
> STORE SCALAR 2  // stores word
> FETCH SCALAR 2  // fetchs word
>
> From what I gather, the store scalar operand denotes the number of bytes
> required to store the numeric value; however, the fetch scalar sometimes
> uses a positive or negative operand.  Maybe this denotes some form of
> byte-reorder required based on the stored stack value?
>
> Thanks
> Chris

We could get very basic and act like a real processor. The instruction opcode
would indicate the type of operand (s).

unsigned char memory[SIZE];

unsigned char *sp = &memory[SIZE]; // Point to the top of memory.
unsigned char *pc = &memory[0];  // The pc starts at 0.


memory[0] = PUSHI4; 		 // Push immediate four byte operand.
memory[1] = 0; 			 // Four byte operand (big endian).
memory[2] = 0;
memory[3] = 0;
memory[1] = 1;
memory[2] = PRINTINT; 	 // Print the int at the top of the stack.
memory[3] = HALT;

The instruction decode might look like:

for (;;)
{
   switch (*pc++)
   {
   case PUSHI4: // Push the four byte immediate value.
     sp -= 4;  // size of my integer.
     sp[0] = *pc++;
     sp[1] = *pc++;
     sp[2] = *pc++;
     sp[3] = *pc++;
     break;
  case PRINTINT:
     int value;
     value  = *sp++; // Get the high byte.
     value <<= 8;  // Shift the high byte up.
     value |= *sp++;
     value <<= 8;
     value |= *sp++;
     value <<= 8;
     value |= *sp++; // Get the low byte.
     printf("%d\n", value);
     break;
    case ADDINT:  // Another example: add two ints, leave the result at TOS.
      int arg1, arg2;
     arg1  = *sp++; // Get the high byte.
     arg1 <<= 8;  // Shift the high byte up.
     arg1 |= *sp++;
     arg1 <<= 8;
     arg1 |= *sp++;
     arg1 <<= 8;
     arg1 |= *sp++; // Get the low byte.
     arg2  = *sp++; // Get the high byte.
     arg2 <<= 8;  // Shift the high byte up.
     arg2 |= *sp++;
     arg2 <<= 8;
     arg2 |= *sp++;
     arg2 <<= 8;
     arg2 |= *sp++; // Get the low byte.
     arg1 += arg2;  // Compute the result.
     // Push the result.
     sp -= 4;  // size of my integer.
     sp[0] = arg1 >> 24;
     sp[1] = arg1 >> 16;
     sp[2] = arg1 >> 8;
     sp[3] = arg1;
     break;

   case HALT:
     exit(0);
     break;
  }
}

This looks a little messy, you could make appropriate macros/functions to do
pushes and pops, e.g.:

#define push(arg) do { sp -= 4; sp[0] = (arg) >> 24; sp[1] = (arg) >> 16; \
   sp[2] = arg >> 8; sp[3] = arg; } while(0);

#define pop(arg) do {  arg  = *sp++; arg <<= 8; arg |= *sp++; arg <<= 8; \
     arg |= *sp++; arg <<= 8; arg |= *sp++;  } while(0);

The ADDINT above becomes:

case ADDINT:
     int arg1, arg2;

     pop(arg1);
     pop(arg2);
     arg1 += arg2;
     push(arg1);
     break;

And SUBINT is now easy to do:

case SUBINT:
     int arg1, arg2;

     pop(arg1);
     pop(arg2);
     arg1 1= arg2;
     push(arg1);
     break;

This approach can be written in a host machine independent way,
isn't necessarily the best aproach, especially if floats are also allowed. How
about

#define push(arg, type) ( sp -= sizeof(type), *(type *)sp = (arg))
#define pop(arg, type) ((arg) = *(type *)sp, sp += sizeof(type))

Then ADDINT is:

case ADDINT:
     int arg1, arg2;
     pop(arg1, int);
     pop(arg2, int);
     arg1 += arg2;
     push(arg1, int);
     break;

and ADDFLT is:

case ADDFLT:
     float farg1, farg2;
     pop(farg1, float);
     pop(farg2, float);
     farg1 += farg2;
     push(farg1, float);
     break;

Not that this approach is very fast, uses the same "endianess" as the host
machine, and is also very dangerous.

You have to keep you SP aligned so that they stay on a boundry acceptable to
your host. On many machines using this technique with four byte integers and
four byte floating point numbers can work very well.
This method is not only not portable, but probably invokes undefined behavior
according to the ISO C standard. It does work, however. ;-)

-Rich

--
Richard Pennington
Email: rich@...
http://www.pennware.com ftp://ftp.pennware.com

Messages 330 - 359 of 1492   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help