Search the web
Sign In
New User? Sign Up
fpga-cpu · FPGA CPU and SoC discussion list
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want your group to be featured on the Yahoo! Groups website? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
CISCifying RISC calls and returns   Message List  
Reply | Forward Message #722 of 3302 |
Re: [fpga-cpu] Re: CISCifying RISC calls and returns

Hi,

Since I designed the MicroBlaze, I can say why MicroBlaze ended up to way it
did.
There is a tradeoff between area and performance and I tried to find a good
compromise.
MicroBlaze has good enough performance and can still be implemented even to
smaller devices.
I could have made MicroBlaze smaller but with less performance or vice verse.

In MicroBlaze, this is what I did for the link register
In PowerPC the link register is a special register and in order to save it
you first have to move to a
general purpose register, the same for restoring it.
So it actually takes three instructions, one for the branch, one for moving
the link register to a general purpose and
one for saving to memory.
Since MicroBlaze has 32 register, I can use one of them as the link register
(r15) and thus save one instruction compared to PPC.
A branch and save onto stack instruction have to do
1. PC <- PC + offset
2. mem(r1) <- PC
3 r1 <- r1 -1
This would introduce one adder , one mux and more decoding logic and would
increase the size of MicroBlaze with 10%.
The gain would be 1 clock cycle for all branch to a subroutines that isn't a
leaf subroutine.

Which one is better, it depends on the application but I took the decision to
have a link register because it cleaner and wouldn't force
one register to be the stack pointer. Adding the logic may or may not
decrease the clock frequency, even if it's not on the critical path
the design will be bigger and thus move thing further apart from each other,
this can in itself introduce larger routing delay.

For the coming MMU on MicroBlaze, yes, a MMU will introduce 1 clock cycle
penalty for looking up a MMU table.
In PPC that's normal.

Göran Bilski


robfinch@... wrote:

> > This delivers flexibility, and more importantly, simplicity, in the
> > instruction set and in the implementation.
>
> I was thinking in terms of from a programming standpoint. AFAIK
> programming wise, there is no advantage. In terms of hardware, I
> won't argue there is no advantage. It depends on your goals. Keeping
> things simple and general purpose leads to hardware that has a small
> footprint and is fast.
>
> I have to admit, stacking the pc register instead of using a link
> register does lead to a little more hardware. The overwhelming reason
> to do things that way is that I find it aestically pleasing. If you
> are looking at the benefit versus cost from a mathematical
> perspective there is probably little or no difference, in which case
> it would logically be better to go with the simpler hardware. But, I
> was brought up on the likes of the 6502,Z80,68k,x86 processors and
> the risc paradigm of using a link register just seems plain alien,
> especially considering what you really want to do with the pc is
> stack it on a subroutine call. The human factor.
>
> I don't know if I'll implement push and pop instructions because the
> pop instruction requires a register file with two write ports in
> order to execute in a single cycle. There's also no real reason to
> implement them as subroutine parameters should be passed via
> registers, not on the stack.
>
> The processor I'm working on now (the BlueBird) is getting to be
> quite large (over 40% of SpartanII 2S200). It has loads of different
> instruction formats, complex decoding, a four stage instruction
> pipeline and other features. I've found I've been able to add complex
> features to the processor without affecting the performance because
> the thing that limits performance right now is access to the block
> ram. My instincts tell me that the block ram access is what is going
> to limit performance and I've not worried so much about the
> complexity of the decoder. If anybody wants a good laugh, the current
> verilog source for the bluebird (not working yet) is available on my
> website.
>
> I've been wondering what Xilinx is going to do a couple of years from
> now when their customers start asking for memory management for the
> MicroBlaze soft cpu. My guess is extra pipeline stalls.
>
> Sorry for the long post.
> Rob http://www.birdcomputer.ca
>
> PS. Does AFAIK = as far as I know ? (I've been guessing what all the
> internet acronyms are.
>
> To Post a message, send it to: fpga-cpu@eGroups.com
> To Unsubscribe, send a blank message to: fpga-cpu-unsubscribe@eGroups.com
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/




Tue Nov 20, 2001 4:58 pm

goran.bilski@...
Send Email Send Email

Forward
Message #722 of 3302 |
Expand Messages Author Sort by Date

I opted for an internal return stack for the YARD-1A core, which has worked out well so far, although I have yet to decide on the exact overflow handling...
brimdavis@...
brimdavis
Offline Send Email
Nov 21, 2001
3:38 am

Hi, Since I designed the MicroBlaze, I can say why MicroBlaze ended up to way it did. There is a tradeoff between area and performance and I tried to find a...
Goran Bilski
goran.bilski@...
Send Email
Nov 20, 2001
4:58 pm

Hi, I forgot to mention that MicroBlaze instruction for branch and link can have any of the 32 registers as the link register. The compiler needs however to...
Goran Bilski
goran.bilski@...
Send Email
Nov 20, 2001
5:04 pm

... Some compilers have a mechaism to allow the programmer to tell the compiler a routine is a leaf function. Some versions of GCC have this option. GCC for...
veronica.merryfield@...
veronica_mer...
Offline Send Email
Nov 20, 2001
5:31 pm

... Years back I remember in Dr Dobb's (early 1980's?) somebody had a idea for marking C-functions calling depth so that leaf functions could use static ...
Ben Franchuk
woodelf1
Offline Send Email
Nov 20, 2001
5:34 pm

... Other than small C , is GCC and LCC the only reasonably easily to port C compilers out there? I still figure a C compiler (source & basic library) need not...
Ben Franchuk
woodelf1
Offline Send Email
Nov 20, 2001
5:40 pm

Ben ... LCC isn't big. It isn't difficult to port. LCC is ANSI C only. The library support is poor. GCC is big. It is difficult to port. It is C and C++. The...
veronica.merryfield@...
veronica_mer...
Offline Send Email
Nov 20, 2001
5:56 pm

Hi, LCC takes one week to port but the optimization isn't that great. No good register allocation, etc .. No libraries. GCC takes 1-3 month to port but the...
Goran Bilski
goran.bilski@...
Send Email
Nov 20, 2001
6:02 pm

Rob I meant to pick up on this when I saw it, so appologies for the delay. ... This was said in a broader discussion about link registers verses stack and also...
Veronica Merryfield
veronica_mer...
Offline Send Email
Nov 21, 2001
10:45 pm

Hi Veronica, ... Cache flushing is a problem right now as there is no good way to invalidate the cache all at once. I've thought of a couple of different...
robfinch@...
rtfinch35
Offline Send Email
Nov 22, 2001
8:04 am

... Don't forget the old 'exchange' register instructions like on the Z80. They too could be adapted easily with the larger FPGA ram nowadays. ... Lets not...
Ben Franchuk
woodelf1
Offline Send Email
Nov 22, 2001
1:54 am

... Need all the software have the same cache size.Can't you have small and medium fixed size cache for core kernel/irq service and the standard cache for...
Ben Franchuk
woodelf1
Offline Send Email
Nov 22, 2001
6:06 pm

Go eat dinner !!!!! Happy thanksgiving __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting,...
Ed Corter
artiedc
Offline Send Email
Nov 22, 2001
7:28 pm

... As I was driving home tonight I was thinking about the issues this brings up. There are certain known processes that could have fixed "things" set aside...
Veronica Merryfield
veronica_mer...
Offline Send Email
Nov 22, 2001
9:08 pm

... Seems like a good idea to me. You wouldn't want this memory cached so it would have to be in a non-cacheable memory area. I've thought of having a memory...
robfinch@...
rtfinch35
Offline Send Email
Nov 23, 2001
3:09 am

Various ideas and comments on fast context switches in FPGA CPUs: (Please treat this entire article with a big "If I Recall Correctly" prefix -- I wrote this...
Jan Gray
jsgray@...
Send Email
Nov 23, 2001
6:04 am

< Rob tips his hat to the designers that have come before, incl. members of this newsgroup. > ... (slots) x ... threads on ... any ... [i], ... register ... ...
robfinch@...
rtfinch35
Offline Send Email
Nov 23, 2001
10:05 am

... The Alto is quite a neat machine, but it does NOT have the potential for doing microtask switches in any cycle. It uses cooperative multitasking, and will...
Eric Smith
jdripper
Offline Send Email
Nov 23, 2001
10:09 pm

... You're right. My mistake. I thought NTASK management was automatically done in hardware (e.g. as higher priority tasks need service), but it is not: ...
Jan Gray
jsgray@...
Send Email
Nov 23, 2001
10:59 pm

... In case it isn't apparent from the literature, it is *important* that the Alto microcode not change tasks at arbitrary times. The memory control requires...
Eric Smith
jdripper
Offline Send Email
Nov 24, 2001
12:46 am

Hi ... I'm told that credit goes to David May. -jc...
Campbell, John
john.campbell@...
Send Email
Nov 26, 2001
5:12 pm
 First  |  |  Next > Last 
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help