??? 09/24/07 21:59 Modified: 09/24/07 22:08 Read: times |
#144939 - pictures and amusement Responding to: ???'s previous message |
Well, I thought it's the software and description which will be the main source of entertainment; but I admit pictures are much more enjoyable, except that I really can't draw (and I also hate PC-based drawing tools, they are too clumsy for my clumsy hands).
However, for what I am doing now, that sort of pictures are not really needed - as I said, the gross picture is uninterestingly plain, and the detailed picture is too complex to be interesting enough. What I do - regardless of if it leads to quick results or to a mess - is, that I try to think about an instruction - or, better, a class of instructions - as a sequence of events, all of which have to happen in a given order to fulfill the task of the instruction; then I try to think about how this can be achieved using some simple logic like latches and gates. For example, let's just randomly point to the opcodes chart.... eyes closed, finger points directly to the middle of the sheet... voila, MOV @R1,#data, opcode 77h. Why not. So, we need to pick content of R1 (or, Rn, n=0..1 - this is a group of instructions anyway) first, to form the address into IRAM. This means to read the RAM - address 0,1 or 8,9 or 10h,11h, or 18h,19h; depending on the bank - this comes from PSW (assuming we already have one :-) ) - to get the address first. Here comes the first "I knew this or that beforehand" (I was thinking about the "gross picture" before, but no drawings, just a bunch of ideas). Note, that the "@Rn" instructions (except the MOVX @Rn) come together with the same "Ri" instructions (notable exception, DJNZ Ri,dspl are "paired" with the infamous XCHD A,@Rn instructions). I would like to implement the @Rn instructions at no cycle penalty to their Ri counterpart. But, while the Ri instruction has an address generated simply by decoding the opcode (oh, the decoding consists of stripping down the bottommost 3 bits and concatenating it with the bank bits from PSW); for the @Rn instruction, after the Rn "decoding" I'd need an extra RAM access to extract the address. I again don't have exact data, but intuitively, the RAM, while integral part of the '51 core, it is certailny more off-hand (read: slow) than a register inside the core. So the extra RAM access to get the address costs me extra cycles, and that's what I don't want. I am ready to pay the cost of speed in silicon. So, I design an extra set of 2x4 registers for 4 banks of R0 and R1 into the very core, in a SFR manner; being written simultaneously with writing to RAM addresses 0, 1, 8, 9, etc.; and their (tristateable) outputs being tied to the internal address bus, outputting when the corresponding @Rn instruction has to be executed. As this involves only a few gates, I expect this process (opcode decoding, Rn extra register output enable assertion and address output) be of sufficient speed to achieve similar performance as the simple address generation for the Ri instructions. Time for a drawing? Well, maybe...: ![]() (now I see I forgot to finish "INTERNAL ADDRESS BUS" inside the bottom "bus").But, honestly, was it illustrative enough? I doubt. Sorry, I can't do any better. And this illustrates only a fraction of a particular problem; it is far from the gross overall picture; and at the same time it is not too detailed - for example, it does not show what is exactly decoded and how the address into the extra registers generated, nor does it really indicate that the write occurs both into the RAM addresses 0,1 etc. and the extra registers. So, what's exactly the value of these drawings, except for entertainment? But let's proceed. Once the instruction is decoded (if ((OPCODE = something_with_@Rn) and (proper_phase_of_instruction_execution)), sorry for Pascalism - yes, I still do actively practise the C-hatred :-) ), the address is output to the data bus from the extra Rn registers, and WRITE signal is asserted for RAM (accounting for the difference of IRAM and SFR depending on the highest address bit - an another "lookahead"), while the just-fetched second-byte of the instruction is output to the data bus from the fetch unit. As this is a two-byte instruction, there is no hurry - there are at least two full cycles ("fetch" cycles, but that's a clock cycle in this design) to accomplish a single write. Yet another lookahead - I anticipate, that the internal RAM is fast enough, so the RAM and SFR accesses can be squeezed into less than a half-clock interval (less by the ALU delay) - so that single-byte read-modify-write instructions (such as INC Ri) can be executed within a single "fetch" cycle. But, in this case, there is ample time; the write window can be left open for almost the whole second cycle... It would be nice if one could write down: "leave the write window open for one or half cycle, as it suits you better for optimisation of the logic", I just don't know if this is possible... So, I end with simply opening the window only for the minimum time. Time for a "timing" diagram: ![]() I've warned, there is no formalism in these. Try to figure out for yourself. Sorry - again, this is only a tool for myself. Now, this proceeds to the C-code stage (yes, no program, code). I see it simple and straighforward - in the first "stage", nothing special happens, the second byte latch happens "automatically", the mechanism for it is already there. Only for second "stage", the data latch has to open it's threestateable buffer to the data bus, the selected "extra" register has to open to the address bus; and at the end of second "stage", the iram write signal has to be generated. At this stage, I believe, it's simple to write down, either in C, or Verilog/VHDL; or, for those who like very big and complex schematics, in 74' logic, for example. I know this all is very shaky and far from being a real design, but, after all, this is supposed to be fun more than anything else... Enjoy! Jan Waclawek |