??? 12/13/07 08:43 Read: times |
#148130 - different strokes for different folks Responding to: ???'s previous message |
Russ Cooper said:
Richard said:
A lot depends on what your goals are.
If you want a faithful recreation of the Intel debacle, with all its failures, including that apparently unreliable positive-going reset, then you'd probably take one approach. If, on the other hand, you want an efficient, high-performance core that consumes a minimal amount of the resources in your programmable logic, you might consider something different. Well, as with my Morse code decoder, the primary goal is to become proficient as an FPGA user. This project is just a means to that end. Having said that, I think it would be silly to simply duplicate what Intel did. For one thing I'm not sure it's possible to discover exactly what they did. Then of course the game is completely different now, given the wealth of resources that's available on the FPGA. So what I'm after is something that strikes a happy balance between simple and fast. I don't care so much about conserving the FPGA resources. For my purposes, they're there to be squandered. In a learning mode, that's true, for sure. If you don't limit yourself to an 8-bit adder, i.e. if you have an adder that adds across, say, 16 bits, as you might want for the purpose of incrementing the PC, etc, or computing (A)+DPTR in a single stroke, then you'd "squander" some resources to make that all happen faster. Further, if you really want to build the loadable synchronous up-down counters needed for use as SP and DPTR, as well as the loadable up-counter for the PC, you are certainly free to do that. However, you pay a price in complexity, as the timings for each will be different, and that will mean that you have to live with the slowest as the rate-determining step. That doesn't mean it will be slower than a common ALU for all internal arithmetic, though. If you design an ALU that does everything that's needed, including operations on the DPTR, SP, PC, etc, then the multiplier and divider are the only real challenges. I don't agree with this approach. It seems to me that the Harvard architecture simply screams for concurrent operation of the code fetching mechanism and the other side that manipulates the data. I haven't thought about it very much, but it seems like trying to run everything through the same ALU would make an instruction like LCALL and long, drawn-out ordeal. Is it really that expensive to dedicate an adder or two for things like incrementing the program counter and calculating relative jump addresses? I guess I'll find out! Yes, you will. You may surprise yourself with the result, too. I'd envision an ALU that's wide rather than deep, so the paths through the thing are all short, and, above, all, essentially the same in timing. You won't need shift registers to do the shifts and rotates, as multiplexers do that more elegantly. However, it may turn out that the muxes are too slow, as they require lots of logic. Nobody says you have to use a single-clock cycle for everything, either. Just because I like 'em doesn't mean the multi-clock-cycle types are worse in any way. After all, improving performance above what you can buy is just about the only justification for building an MCU in FPGA. Wrong in my case. See above. Well, I've been wrong before ... From a practical standpoint, however, one does have to think about the benefit/cost ratio. FPGA's aren't that cheap yet. The ads say they are, but that's the cost per bit/gate/flipflop or whatever, and the marketing guys lie. I'd build a '5x core FPGA if I had to have custom hardware that I couldn't obtain more cheaply with a discrete MCU and a CPLD. I'd certainly have to run the numbers first, though. If you treat SFR's, Internal data memory, external data memory, and code memory as distinctly mapped memory regions, with registers as a special case, things easily fall into place. Think about it! I have thought about this. I agree that the SFRs, registers and internal data memory should all be thought of as the same. However, I think the code memory obviously falls on the "other side" of the Harvard architecture, and that the 16-bitness and limited instructions for accessing XDATA makes it different enough from the internal memory that it should be separate. You'll see these opinions reflected in my block diagrams. I'm not entirely in agreement with that, but I guess you'll figure it out. Isn't the only difference between internal and external code space a matter of whether the signals are brought out to the external interface? You'll need the same sorts of signals internally, won't you? Regardless of where the physical space you're using resides, you have to give it control strobes, don't you? That Harvard architecture was developed at a time when memory access times were slow, and overlapping the processing of data and addresses was pretty helpful. Nowadays, the same principles apply, but they offer much less advantage. When you "do" somehting to the data, you have to "do" it first, and then put it somewhere. I see little advantage in leaving the ALU idle when addresses could be making their way through the thing while it would otherwise be waiting for something else. The situation is very much dependent on how the time is spread out over the various operations. Since memory access time is probably a bout a fourth of the time through the ALU, or perhaps less, the Harvard advantages are swallowed up in the logic and routing delays. I like to think of SFR's as I/O space, since that's where the I/O lives. It just lends itself to a really simple picture if you do that. -- Russ |