Search the web
Sign In
New User? Sign Up
fpga-cpu · FPGA CPU and SoC discussion list
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want your group to be featured on the Yahoo! Groups website? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Just what is small and is it the best?   Message List  
Reply | Forward Message #155 of 3302 |
RE: [fpga-cpu] Just what is small and is it the best?

> Small is often good but is the smallest the BEST?

If "smallest" delivers on requirements (e.g. fast enough, C programmable,
has interrupt handling, or what have you), probably yes.

"A small cat is better than a large cat because it eats less, poops less,
and sheds less." "So it follows that the ideal cat is a cat of zero
length?"

As with so many things, the first few resource units provide the essentials.
The rest are luxuries. As you climb the luxury curve, each resource spent
provides less and less additional value. Sometimes supposed luxuries (like
deeper pipelines) make things worse.


If you add up the number of 4-LUTs in a minimal "bare necessities" n-bit
processor datapath, for example,

Cost What
n 1 port 16-entry register file
n adder/subtractor
n logic unit
0 TBUF-based immediate mux
0 TBUF-based operand mux
---
3n

you can build a simple streamlined RISC datapath in only 3n logic cells.
Maybe even 2n if your ALU operation is "add/nand". If you're willing to
multi-cycle it (take k cycles per word) then it's 3n/k or 2n/k.

But it takes a few cycles to execute even one "RISC instruction" like add
r3,r1,r2:

(assume r[0]=0, rPC=1, r[2]=2, bus is 3-state bus, t is temp reg, ir is
instruction register)
; increment PC and fetch insn
t = bus <- r[2]
r[rPC] = mar = bus <- r[rPC] + t
ir = mem[mar]
; add instruction
t = bus <- r[ir.ra]
t = bus <- r[ir.rb] + t
r[ir.rd] = t

If you're only building a toaster SoC, or a toaster channel processor, where
100 kHz frequency would be quite adequate, you might as well build the 3n or
3n/k datapath.

But if that's not fast enough, if you need closer to one instruction per
cycle, you must add resources. The first thing you add is a dedicated PC
register, PC adder/incrementor, and PC mux. Next you add a second read port
to the register file, and perhaps a concurrent write port too. And you add
a result multiplexor to select among the various results (add, logic,
shifts, load-data-in, return address, etc.):

Cost What
2n-4n 2r1w 16-entry register file
n adder/subtractor
n logic unit
0-6n result multiplexer
n PC
n PC incrementer
n PC mux
---
7n-15n

This is a lot more costly, but is now approximately one instruction per
cycle.

If you still need more speed, you'll add pipelining to reduce the cycle
time. (But add 2n (or more) for result forwarding muxes for each stage.)
Each new pipeline stage you add will reduce the cycle time until the
diminishing returns set in, possibly due to the extra interconnect delay
incurred by signalling across many result forwarding multiplexers.

If you still need more speed, you'll think about multiple issue,
out-of-order, LIW, custom function units, or perhaps multiple processors on
chip.

Including control unit overhead, etc., xr16 is about 300 logic cells / 16
bits = ~20n overall, xr32 about ~14n overall.

Jan Gray
Gray Research LLC




Sat Oct 7, 2000 12:56 am

jsgray@...
Send Email Send Email

Forward
Message #155 of 3302 |
Expand Messages Author Sort by Date

Just reading the latest News: "8-Bit Micro controller for Virtex Devices. If I may be permitted to quote so extensively, I'll let this superb app note speak...
Ben Franchuk
bfranchuk@...
Send Email
Oct 6, 2000
9:58 pm

... streamlined ... I've had similar thoughts, and I started designing a streamlined CISC but dumped it. The tough part is arguing the requirements, and ...
Rob Finch
robfinch@...
Send Email
Oct 6, 2000
11:18 pm

... http://www.cs.uiowa.edu/~jones/arch/cisc/ Do you consider this as RISC ? (Just an example) But nevertheless I am of the opinion that there are...
Tim Böscke
t.boescke@...
Send Email
Oct 6, 2000
11:47 pm

... I view Risc machines as Micro-coded hardware that uses all of main memory as micro-code. ... This is the kind of thought that made CISC complex.Very good...
Ben Franchuk
bfranchuk@...
Send Email
Oct 6, 2000
11:53 pm

... I consider it to be a stack machine... But then I don't teach computer architecture. I consider a CISC machine to be single address machine. ... RISC...
Ben Franchuk
bfranchuk@...
Send Email
Oct 7, 2000
12:25 am

... If "smallest" delivers on requirements (e.g. fast enough, C programmable, has interrupt handling, or what have you), probably yes. "A small cat is better...
Jan Gray
jsgray@...
Send Email
Oct 7, 2000
12:56 am

Jan, How small do you think a xr16-opcode-compatible cpu could be if one didn't care about speed? Do you think it could get below 150 logic cells, maybe? And,...
Gary Watson
gary@...
Send Email
Oct 7, 2000
11:58 am

... Let me take you on a tour of less through more drastic changes to xr16 to save area. At some point it ceases to be xr16, but retains its character. This...
Jan Gray
jsgray@...
Send Email
Oct 7, 2000
6:01 pm

Jan, following the suggestion on your web site, I looked at the Xilinx app note Xapp213 which describes their KCPSM microcontroller. It's pretty cool that...
Gary Watson
gary@...
Send Email
Oct 8, 2000
1:09 pm

... True except maybe for some unnamed OS's and sales people. ... True but sometimes cutting corners has a big impact on things. While not hardware, I am...
Ben Franchuk
bfranchuk@...
Send Email
Oct 7, 2000
1:46 am

... Why copywrite the CPU? Copywrite the BUGS in the CPU or specific workarounds for hardware limitations. It seems to me the bug fixes and workarounds stay in...
Ben Franchuk
bfranchuk@...
Send Email
Oct 7, 2000
6:19 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help