??? 01/05/08 14:37 Modified: 01/05/08 15:25 Read: times |
#149075 - re: Historically Responding to: ???'s previous message |
Russell said:
I can think of two methods of implementing an ALU using a FPGA - implement it as the bare logic as you would in TTL gates or as a lookup table driven approach. The first method is usually not the most efficient in FPGAs, so the solution might be in using both approaches. This choice may be made by Verilog - I haven't any experience with this. Based primarily on the advice of Andy Peters, I've been taking a third approach, namely just writing what I want the ALU to do and then trusting the synthesizer to figure out the details. I realize how naive this is, but at this point, I need to get something working, no matter how bad it is, so that I can then try different solutions as you suggest below in order to see what really works best. (What "best" happens to mean at the time will be a separate issue!) Russell said:
Nevertheless, the status bits can be expressed as a function of the inputs - whether you choose to intercept the carry at the adder to adder level as you would in a TTL implementation or let Verilog decide by giving it the equation.
If it were me, i'd try a few solutions and see what generated the best result - best in LUT usage is usually a good gauge or speed. That's my plan, but I have run into a very interesting quirk that has made it impossible so far. I find that the LUT usage for any given implementation of the core is very sensitive to what 8051 test program I happen to have implemented at the time! This was a big surprise for me when I first noticed it, and still remains a bit of a mystery. However (and I'm really just guessing here), here's what I think is happening. My current model has the code memory implemented as a simple case statement, like this: module CodeMemory ( input [15:0] address, output reg [7:0] data ); always @(address) begin case (address) 'h0000: data = 8'h90; // MOV DPTR,#0103h 'h0001: data = 8'h01; 'h0002: data = 8'h03; 'h0003: data = 8'h02; // LJMP 0 'h0004: data = 8'h00; 'h0005: data = 8'h00; default: data = 8'h00; // Fill unused space with NOPs endcase end endmoduleFor tiny little programs like this one, the synthesizer reduces this to a bit of combinational logic and implements it within a few LUTs. Then, I think that because the synthesizer knows what the program is, it can and does optimize away some of the CPU logic that doesn't happen to be used for that program. Of course the synthesizer doesn't know about the 8051 op codes or anything like that, but consider the above program for a moment. I'm guessing that the synthesizer probably does notice that bits 2-6 of 'data' are never set, and probably does simplify some/much/all of the other logic to take advantage of that fact. Of course sooner or later I'll have to stop guessing and actually figure this out and fix it, but first I want to get all the instructions working. For the record, as of this morning I have about 2/3 of the opcodes implemented and quickly tested within short little ad hoc programs like the one above. I look to have the remaining ones done in another couple of weeks. Then the real fun will begin as I try to figure out what I screwed up and how to make the whole thing better generally. -- Russ |