Email: Password: Remember Me | Create Account (Free)

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread
???
05/16/09 00:00
Read: times


 
#165400 - I do not believe bigger is better ...
Responding to: ???'s previous message
Per Westermark said:
Richard said:
I didn't intend to imply that a 1500-gate ALU, such as I suggested the 805x core might use, should be compared with the 100K-gate ALU common in modern processors. [...]"

The ALU is tiny. It is possible to make a 8051 variant with two, three or maybe 5 ALU without burning any larger number of transistors. So what problem did you have with my comment:
"The ALU etc of the 8051 are so tiny that it is very easy for the pipeline to compute both sides of a conditional branch, and throw away the wrong alternative."?

It doesn't matter, even if you have 1K ALU's, if you don't have a 2K-bit (128 bytes in either direction, since that's what branching allows) concurrent "view" into code memory. With 48 bits you can do nearly everything other than branch prediction.

Richard said:
It also allows selective out-of-order OR concurrent execution of 3 2-byte instructions, or up to 6 single byte instructions, selectively, of course.

Loading (at least) 48 bits at a time could allow the execution of two instructions at a time. But what was the point you wanted to make?

Have you forgotten what we (or at least I) were discussing?

No, though I may not know what YOU are discussing.

Per said:
The question to ask is if any 8051 1-clocker can exist without either pipelining or an internal clock doubler.

I'm not too happy with the "can exist" in the middle of the sentence - there are for example a clockless ARM core, so obviously it must be possible to make a 1-clocker without pipeline or internal clock doubler. But are there any?

I don't know ... and, frankly I don't care. Do you either know or care? If one MCU executes code substantially faster than another, that's sufficient. A properly written simulator should quickly reveal which MCU "does it faster" or, at least, "does it fast enough." Sadly, I know of no such simulator for the 805x series. Similarly, I doubt there's one for ARM. I started to write one for the DS89C4x0's but Maxim was unable or unwilling to provide timing details for their UART, among other things, and, it took nearly a year to extract timing information form them regarding the relationship between address signals during ALE. I really didn't get precise timing information. I was able to make what I wanted to do work quite solidly, but, without the support of specifications, I couldn't use it. Consequently, I abandoned my simulator until further notice.

As a follow-up to your last post: Can you do out of order execution of multiple instructions without a pipeline to reorder the instructions?

Well, you could execute them concurrently with other instructions. For example ... if one instruction increments DPTR while the next left-shifts A, those potentially can happen at the same time, since there's no resource collision.

Richard said:
Instructions would be removed from the instruction stream as they're executed, and replace by subsequent code-space content.

Does sound like a pipeline to me. So what are you trying to say? That you can have one-clocker chips without a pipeline just as long as they have a pipeline? Are you trying to correct me, so correct me based on something I have written. If you are trying to agree with me - please try to write the text so that it sounds like you are agreeing, and not arguing against me. Right now, I don't know exactly what your aims are.


Not quite. You simply set a bit in the corresponding position in the buffer and the instruction will be "gone" in the sense that it will be replace by the subsequent portion of the stream. This isn't a pipeline because it isn't registered. A pipeline is, typically, a registered latency engine. It increases the number of steps in order to decrease the total amount of time that has to be consumed in a set of processes. For example, if you have a logic block with several layers of logic more in some paths through the block than in others, you might want to register the stages to synchronize them. Since the depth of the logic is a rate-determining step, you might want to insert a pipeline register in the path, thereby turning the operation of that path into a two-clock path, while others remain one clock long, thereby making the block a two-clock block only when the long path is in use, producing overall performance improvement. I would hope to avoid that, however.

Richard said:
A single input clock can easily be manipulated into a two-phase clock thereby producing a non-overlapping clock pair with a ~40% duty cycle on each phase of the input clock. That provides a convenient system for clocking the separate address and data arithmetic operations.

Of course it can. Why do you think I wrote: "Using a two-phase clock, you would still have quite interesting times to get data from the code space, decode, retrieve input data, compute and store back the result within one low-to-high and one high-to-low clock transition."

Richard said:
I implied no proof of any sort.

But you wanted to imply something. The existence of dual-phase clocks? Don't think you have to spend too much time talking about that, since it should be considered common knowledge. The question here is: Are the existing 8051 one-clockers able to do what they do with just a two-phase clock and no pipeline? This as a follow-up to Kais comment "Well, the DP8051 has such a hidden feature, he is using pipelining".

Do you think I care whether they use pipelining? What I care about is how fast they can fetch a byte, do something with it, then put it away. ARM, MIPS, etc, architectures handle their ideal word size just fine. If I'm processing bytes, though, I don't want to give away byte performance just so I can claim I'm using a 32-bit processor.

Richard said:
Single-cycle branch prediction requires a much deeper view into program memory than concurrent or out-of-order execution of instructions. That's why I've chosen to ignore it for now.

If you have double execution units, then you don't need to do any branch prediction. You process all instructions with one clock "lag", and you compute both sides of the branch. As soon as you know the outcome of the branch, you throw away one alternative and let the other alternative "take", in the same clock cycle as you are busy with the next instruction of the branch halve you did take. Very hard to do with external flash memory, given the tiny bandwidth of the external memory interface. A lot easier with internal code space, where you may just as well have a 128-bit interface from the flash, and maybe 4 or 8 cache lines. Snooping the pipeline would allow potential branches to be loaded. There are a large number of ways a 8051 can be speeded up a lot, if some is interested and the the code is only internal, or the processor has a very wide external interface. The interest in doing it will depend on the market shares taken by $1 ARM chips.

Out-of-order execution is good for a PC processor, where you have very expensive computational blocks that you want to run at best possible utilization. A FPU block is huge. A 64x64->128-bit integer multiplier isn't so small either. But would out-of-order execution be something you want in a 8051-class microcontroller? If you at compile time can see that two instructions should be swapped, then the programmer who needs to count clock cycles don't need the swap feature - he can swap the instructions in the source. If it can't be seen at compile time, then the programmer will not be able to predict the actual clock count. You may also find that the pipeline complexity to predict all combinations of instructions to figure out if they can be reordered or run concurrently may be quite significant, compared to a "stupid" processor that just runs down both sides of a conditional branch until the condition is known. The brute force alternative would actually be simpler. Another thing. A superscalar 8051 is most definitely possible to build. Compared to a superscalar current-generation x86 processor, it would even be trivial (and quite small). But a bit-set and bit-clear (of the same bit) after each other would not combine well. And adding a single byte somewhere in the instruction stream would directly ripple through the program until you reach a point where instructions may not be run concurrently. I would think a pipelined 8051 capable of 500MHz would be more useful than a superscalar 8051 where you would need the assembler to output a special listing showing how instructions have been combined just so you can figure out if that single extra instruction will break already verified timing blocks.

But I did not start this to discuss potential improvements in future 8051 chips - there are too many possible improvements, but it will be the economy and the competitors that will decide what will happen. I was just wondering if any of the one-clockers are non-pipelined and just using two clock transitions.

A microcontroller is not suitable for the huge pipelines in todays PC processors. But two clock transistions is very little. Just going to three or four makes a huge difference. But going to three or four clock transitions in a one-clocker would require a pipeline where you may run overlapping instructions, or a clock doubler. Have anyone managed to stay with just two clock transitions?


What the chip makers do within their chips is of no interest to me, beyond what help it offers (precious little, nowadays, as they don't tell us much!) in understanding their timing. My job is to apply them correctly, and not to reverse-engineer them.

I would say, that Maxim/Dallas hasn't told me much about the relationship between their external clock and the signals they produce. The relationship between ALE, addresses, and X1 isn't defined, as they reference things to ALE.

Ultimately, what matters to me is how fast the MCU can perform certian operations. I use the Maxim/Dallas DS89C4x0 MCU's when I need the dual DPTR's and the auto-increment/decrement or auto-swap of DPTR, or the variable timing of external bus cycles, or the various clocking options. Three are, of course, other MCU's that may be better for things that don't need those features. There may be features in other MCU's that make me use other MCU's.

So far, I haven't been impressed with what ARM and other 32-bit MCU's can do with bytes. When ARM's come out with upwards of 64TB of FLASH on-chip, along with, say 512 GB of on-chip RAM, and an performance of 50 exaflops per femtosecond, and a vast array of peripherals, all for $1 or less, I guess I'll have no choice but to give them a try, but, in the meantime, as I'm able to find adequate inexpensive 8-bitters, not all in the 805x-camp, I'll stick with them, thank you very much.

RE



List of 74 messages in thread
TopicAuthorDate
max clk freq            01/01/70 00:00      
   Which            01/01/70 00:00      
   300MHz            01/01/70 00:00      
      .            01/01/70 00:00      
         Does that make it effectively 600MHz, then...?            01/01/70 00:00      
            That are the links I found...            01/01/70 00:00      
               Interesting item, but did you notice ... ?            01/01/70 00:00      
                  300Mips, equivalent to 3.6GHz!            01/01/70 00:00      
                     That's slightly misleading ...            01/01/70 00:00      
                        You sure about your math?            01/01/70 00:00      
                           It's confusing ... typical marketing drivel            01/01/70 00:00      
                              Based on the claims you posted            01/01/70 00:00      
                                 Those aren't my claims!            01/01/70 00:00      
                                    Read comments _before_ (not) answering them            01/01/70 00:00      
                                       Architecture speed            01/01/70 00:00      
                                          That was my take too            01/01/70 00:00      
                                             Of course, it does not depend on CLK frequency!            01/01/70 00:00      
                              I cannot see a confusion            01/01/70 00:00      
                                 Not all one-clocker mfg's make the same claims            01/01/70 00:00      
                                    But...            01/01/70 00:00      
                                 comparison of 12- and less-clockers            01/01/70 00:00      
                                    Very nice!            01/01/70 00:00      
                                    Cool!            01/01/70 00:00      
                                    Good overview            01/01/70 00:00      
               Another link            01/01/70 00:00      
                  Dhrystone?            01/01/70 00:00      
                     Yes ... one could argue that the core is hobbled            01/01/70 00:00      
                        to sell IS useful... ;-)            01/01/70 00:00      
                     Dhrystone            01/01/70 00:00      
                        give data            01/01/70 00:00      
                  I find it useful...            01/01/70 00:00      
                     Nonsense            01/01/70 00:00      
                        Nice attitude...            01/01/70 00:00      
                        One thing that would be useful for FPGA            01/01/70 00:00      
                           Still waiting            01/01/70 00:00      
                              Here it is ... It's simple arithmetic            01/01/70 00:00      
                                 Not at all!            01/01/70 00:00      
                                 You missed the "at the same frequency" part            01/01/70 00:00      
                                    You're right, in a sense ...            01/01/70 00:00      
                                       Still thinking of the DT8051 as 12-clocker            01/01/70 00:00      
                                          Gee ... I can see where I went off the track!            01/01/70 00:00      
                                             You deserve respect for that...            01/01/70 00:00      
                                             Very easy to miss things            01/01/70 00:00      
                                                It is a shame the documentation is so superficial            01/01/70 00:00      
                                             Marketing demagogy            01/01/70 00:00      
                                                baloney            01/01/70 00:00      
                                                   Insignificant?            01/01/70 00:00      
                                                      the "classical" timing            01/01/70 00:00      
                                                         Fair claim            01/01/70 00:00      
                                                Not so fast, there, Pilgrim...            01/01/70 00:00      
                                                   Any alternative?            01/01/70 00:00      
                                                      Possibly ... ???            01/01/70 00:00      
                                                         Still pipelining            01/01/70 00:00      
                                                            It doesn't have to pipeline            01/01/70 00:00      
                                                               What use?            01/01/70 00:00      
                                                                  if critical, lock - if you can            01/01/70 00:00      
                                                                     What question?            01/01/70 00:00      
                                                                        Whatever happened to Amit Mittal ?            01/01/70 00:00      
                                                                           maximum speed of a car            01/01/70 00:00      
                                                                           Pigeon Poster?            01/01/70 00:00      
                                                                        no question, uncernity            01/01/70 00:00      
                                                                  It's not that difficult ...            01/01/70 00:00      
                                                                     Are we talking about the same thing?            01/01/70 00:00      
                                                                        It is a matter of how you choose to view things            01/01/70 00:00      
                                                                           Q still open: any 8051 with only two clock transitions?            01/01/70 00:00      
                                                                              I do not believe bigger is better ...            01/01/70 00:00      
                                                                                 You argue quite much for not caring            01/01/70 00:00      
                                                                                    Without going into too much detail ...            01/01/70 00:00      
                                                                                       Pipeline for concurrency            01/01/70 00:00      
                                                                                          One step at a time            01/01/70 00:00      
                                                                                             Many steps at the same time            01/01/70 00:00      
                                                                              1-clocker without pipelining            01/01/70 00:00      
                                                                                 Interesting link - I just wish it was a bit meatier            01/01/70 00:00      
   what the datasheet for the particular device states            01/01/70 00:00      

Back to Subject List