> Say I want to build a cycle accurate model of an exisiting
processor. Say Intel 386 for example. Now I have access to all the
data sheets and plenty of other information. Now I have, a very good
specifications as well a brief internal design.
>
> How exactly I go about handling this project, so that I make best
use of the all the information available and in very systematic way.
I have put together a cycle-accurate implementation of the 6502
processor. Whether this is a good appraoch to use or not, this how I
accomplished the task:
First, ignore the cycle accuracy. I'm of the opinion it's better to
get a working processor first, then make it cycle accurate. But keep
the cycle accuracy in the back of your mind. In other words don't put
together a bit-serial implementation of a byte wide processor, a
micro-code version of a RISC cpu, etc. The required conversion later
will probably only cause problems. My advice is to follow a similar
design pattern to begin with. Once you have a working processor, then
go back and patch it up so it's cycle accurate.
Get to cycle accuracy in stages. First try and make the cpu *faster*
than cycle accurate while keeping it simple at the same time. Once
it's faster than the original, it's easier to go back and add in
additional 'nop' cycles to slow the instructions down so they match
the original timing. Reducing the speed of a design is probably a lot
easier than trying to increase the speed of a design that started off
on the wrong foot with the wrong architecture.
Keep the pipelining of the original in mind. If the original
processor is pipelined so that instructions execute in a single
cycle, then you'll have to duplicate that pipelining in order to get
the single cycle instruction execution.
Tackle cycle accuracy on the instructions that are a) easy to make
cycle accurate, and b) the instructions that are likely to be the
critical ones for cycle accuracy. It might be acceptable for other
less critical instructions to be non-cycle accurate.
Cycle accuracy is mostly marketing hype. It's great to be able to say
the processor is 100% cycle-accurate, but it's not normally a
requirement. Coding that depends on cycle accuracy is strongly
discouraged because different versions of a processor (even within
the same generation from the same manufacturer) could potentially
have different timings. With todays complex systems involving
overlapped instructions sequences, caches accesses, interrupts, etc.
Almost no-one depends on cycle accuracy because it's an unreliable
approach.
Where cycle accuracy has been used in the past is for simple systems
where clock cycles were counted to determine timing delays. Most of
these delays consist of loops that simply decrement a counter. So
critical instructions to make cycle accurate are probably branch /
loop instructions and decrements / increments.
For my '02 implementation, in the first pass I had many instructions
that took longer than the original. Once I had the processor
basically working, I then looked at how I could streamline the cpu. I
streamlined the cpu to reduce all the instructions to the minimum
number of cycle (once again not trying too hard to keep cycle
accuracy). This was the second iteration of the cpu. At this point I
had all instructions executing in the same or fewer clock cycles than
the original. For the third iteration of the processor, I went back
and added in additional 'nop' cycles to extend instruction out to the
same timings as the original.
Note there are different kinds of cycle accuracy as well. My '02 has
instruction timing accuracy, but not bus-cycle by bus-cycle accuracy
(although it's very close).
Note obtaining cycle accuracy cost about 10% of the clock cycle, and
10% in size. The cycle accurate version runs at 10% slower clock
frequency and consumes about 10% more fpga resources. (Cycle-accuracy
uses the fpga resources less efficiently than they could otherwise be
used in this case). I have an option to build the code with non-cycle
accuracy for better performance and size.
=================================================
I spent about a year getting the 02 basically working. It was more
than another year before I had it cycle accurate. These were not
really man years, but I spent a lot of time at it on weekends and
evenings. It' probably represents many man-months of effort anyway (I
can code and get things working very fast....)
The x86 series is a complex processor. Twice I've started a 8086
clone, but then dropped it after a just a few hours. I'd estimate it
to be about three or four times more complex than the '02, meaning it
would probably take me about five years to get a decently working
version (without working on it full time). Something like the 386 is
several times more complex than that so the other poster's comment
about spending 30 man years isn't an unreasonable time estimate.
still, if you like a
challenge.....
Implementing an existing processor has a lot of attraction because of
the existing base of software and tools.
Depending what your goals are...... it might be easier to get x386
comparable performance with a much simpler processor. For instance
isn't the xr16 20 MIPS ?
Rob