??? 10/09/07 07:39 Read: times |
#145530 - try to benefit from prior art Responding to: ???'s previous message |
Jan Waclawek said:
Richard,
I think I understand the point of your approach - and this is exactly on the relative cost of the "elements". You work out of the premise that the clocked elements - latches, counters - are "expensive". I'd rather "spend" more silicon on speed at the moment, I just don't know if I won't overshoot it in some respect. I'm not so sure I agree with you about how I approach this sort of problem. I have said that less silicon ("gates") is required to produce a transparent latch than a clocked "D" flipflop. The approach I'm encouraging you to consider is slower in FPGA because of the routing and cascading delays. If you were to design the same thing in hard logic, you'd find that the multiplexers are much faster than in FPGA, since a 15 input gate is just that, and not 15 concatenated LUT's. If you design the ALU such that it can perform all the operations required, and you'll have to do that anyway, and then add an upper byte to the adder/subtractor (the upper byte only needs no-op, add and subtract to perform inc of PC, and inc/dec of DPTR and SP in one stroke) I believe it will consume considerably less logic and impose no more propagation delay than the considerably more complex state machines and architecture that you've sketched. But, on the other hand, your approach is certainly not how the 200 megaclock out-of-order-execution super-'51 is going to be accomplished... True enough! The way in which out-of-order instruction execution and valid parallel execution would be effected would be through access to a larger instruction word than just 8 bits, so multiple instructions could be examined and assessed for execution simultaneously. I am not going to design an FPGA-optimal '51 - the reason is obvious, that I know nothing on FPGAs, so I can't optimise for them... Also, I am sure that these paths are walked often enough to be well known to the insiders, no reason to try to push it in this way.
I am just playing... Yes, and it should be fun! I'm encouraging you to look at two regions, namely the ALU, and the "mappable region" wherein lies the source and destination of every operation, whether a data operation or an address operation, as they appear in that cartoon I provided. That model, I believe, provides a path for every operation that an 805x has to perform on either address or data objects. The data sources can be accessed into both sides of the ALU. Clearly what I've shown you so far is an overly simplified notion, but the data flow is what I feel is important. If you want a one-clocker, that's one way to get there. There are, of course, destinations and sources that aren't so obvious. For example, in addition to the register constructs that drive the pins, there has to be a data bus register, and there has to be an address bus register. The latter is sourced from the PC part of the time, but it also is sourced from the SP or DPTR, and sometimes it is sourced from the PC+{IOR] (in relative addressing) or PC+[Rn}, or DPTR+[A], among others. The data bus register can be sourced from very many places, but that model supports all of them, and within a single clock cycle. The fact that indexing must occur, implies that there must be another adder somewhere, right? If you didn't mind the extra cycle, you could do the job without the extra adder, but an adder is small and only adds two levels of logic. In reality, it could be the same adder that's used to increment/decrement the DPTR ... oh ... wait ... that's not going to work ... sometimes you have to add an index to DPTR ... <sigh> ... Well, you see what I mean, don't you? Jan No, Jan, I don't want to deprive you of the fun of devising your own mechanism for developing your own internal architecture. I just think you'd benefit from considering what others have done before. I am not the first one to come up with this construct. I used it in the mid-'80's to build a TTL model of a 6502 that ran ... well... sort-of ... in an APPLE-][ via a tentacle. The entire idea is that if you allow the ALU to perform ALL the arithmetic for you, then you have only data routing and timing to consider. Of course, you have to decode instructions, but that's just a ROM, isn't it? The state machines get their initialization instruction-by-instruction from the opcode. Those aren't affected by status, or what's happened in the past. RE |