??? 12/15/07 03:16 Read: times |
#148263 - Size vs. Performance Responding to: ???'s previous message |
Richard said:
How does it help you to duplicate resources that are already inherently addressable? Well, there are two issues here. One is the need for certain registers to be implemented discretely, as opposed to being represented by a location in a RAM. The other is whether those registers that are implemented discretely should ALSO appear in the RAM. It's fairly obvious that the SFRs need to be implemented as discrete registers, because in general, their outputs need to be available continuously, either to configure various features of the peripherals that they control, or to speed up certain instructions. A second, similar reason is that some SFRs contain bits that get manipulated individually, and that would be messy if they were stored in a RAM. Finally, it may make sense for performance reasons to implement SP and DPTR as loadable counters. Putting them in a RAM would clearly preclude that option. In addition to the SFRs, R0 and R1 should also be implemented as discrete registers in order to speed up instructions that use the @Ri addressing mode. The second issue is whether or not the discrete registers should be duplicated in the RAM. Jan convinced me here that the SFRs should not be. I still think that R0 and R1 should be shadowed in the RAM only because it saves a little bit of hardware and doesn't cost anything (the affected RAM locations would be unused otherwise). Compare is really an XOR, isn't it? If you XOR and then behave as you would in JNZ, wouldn't that do the job? I believe this was a reference to the CJNE instruction. The answer is no. CJNE sets the carry bit to reflect the relative magnitudes of the two operands, so you need to do a subtract and then just throw away the result. If you build an ALU that muxes its inputs from the available resource pool, and is capable of loading as well as of incrementing/decrementing DPTR, and SP, and other destinations, and loading and incrementing PC, it clearly has to contain a 16-bit Adder/Subtractor. If you think about the ALU as the intersection of lots of logic paths, like a railroad yard, and the controlling state machine(s) as providing the steering controls to those logic paths, you'll see that data can be sourced and sunk by the same resources at oposite ends of a single operation. You have expressed this idea many times, and I think I understand it fully. I also think that it leads to a good solution if the goal is to minimize hardware. However, as I mentioned a day or two ago, I want to strike some sort of balance between speed and simplicity. So I'm willing to employ a few extra adders and/or counters and/or whatever as necessary to help performance. On the other hand, I want to be able to implement and debug this thing successfully, so I'm nowhere close to considering the really interesting stuff like pipelining and branch prediction and concurrent instruction execution. We'll leave those things to Silicon Labs, and I'll be happy enough to get "Hello, world" running on my FPGA. -- Russ |