??? 05/09/09 04:29 Read: times |
#165182 - You're right, in a sense ... Responding to: ???'s previous message |
Per Westermark said:
It seems like the problem has finally surfaced. If only it were a single problem! Richard said:
Now, Per, what part of the clear claim, "Dhrystone 2.1 benchmark program runs exactly 8.1 times faster than the original 80C51 at the same frequency," is unclear? Most probably the "at the same frequency" part. Don't write your next answer before you have spent a bit of time reading what follows. I thought I'd pointed out that the common MIPS/MHz claim is based on a 64KB block of NOP's. Back in the "old days" the microprocessor vendors made a habit of claiming MIPS figures based on that. It's not a valid reflection of the relative performance of a processor core. Let's say that you are correct. I'll accept that ... for now. Let's say that a 1-clocker is always exactly 12 times faster than a 12-clocker, when both are run with the same crystal frequency. That, of course, is not really the case ... but go on ... That means that a 25MHz 1-clocker can do 25 MIPS, while a 12 MHz 12-clocker can do 1 MIPS. That's only sure to be true if the entire code body is 64k NOP's or other strictly single-cycle instructions. The manufacturer, in this case, claimed that the ASIC solution was 8.1 times faster than a 12-clocker when both the ASIC and the 12-clocker had the same clock frequency.
Are you still with me? In short: A 12MHz 12-clocker does 1 MIPS Only with a code-space full of single-cycle instructions that have no impact on execution speed. A 12MHz 1-clocker does 12 MIPS Only with a code-space full of single-cycle instructions that have no impact on execution speed. A 12MHz ASIC that is 8.1 times faster than a 12-clocker at the same clock frequency does 8.1 MIPS That, actually, isn't the vendor's claim. The vendor claims that the core executes Dhrystone 2.1 compiled with undefined compiler and undefined parameters, and undefined resources, producing a rating of 8.1 MIPS. But then step to 25MHz.
A 25MHz 12-clocker does 2.08 MIPS. A 25MHz 1-clocker does 25 MIPS. A 25MHz ASIC that is 8.1 times faster than a 12-clocker at the same clock frequency does 16.9 MIPS. Let's continue with the ASIC. Not a wise thing unless you can commit to 10K units or more. Even then, it's expensive, and you'd best not make any mistakes. A 130MHz ASIC that is 8.1 times faster than a 12-clocker at the same clock frequency (i.e. being a 1.48-clocker) does 87.75 MIPS - way faster than your "any 25MHz one-clocker". That's probably true, and in custom ASIC, it can probably be made to run faster. Richard said: In fact, any one-clocker at 25 MHz is as fast as that one at 300 MHz, at first glance. Keep in mind that the MIPS/MHz claim is based on nothing-but NOP's or other single-cycle instructions in the code-space. Did you fail to see the "at first glance"? A 12-clocker would need to run at 300 MHz, to manage 25 MIPS.
But the DT8051 is not (!) a 12-clocker (or it wouldn't have been 8.1 times faster than the original 12-clocker). It is most probably a one-clocker, which means that it would do NOP instructions 12 times faster than a 12-clocker, and at the same speed as your "any one-clocker". This is where you lose me. First of all, there's no indication in any of the three separate datasheets that the core is a one-clocker, nor is there any indication of the specifics of their implementation/simulation environment. They could, in fact be "doing things" to the clock, and have built a 12-clocker, or they could have a one-clocker, or something in between. They do, after all, have a "ASIC" version that is claimed to be rated at 8.1 Dhrystone 2.1 MIPS, an ALTERA FPGA version, also claimed to be rated at 8.1 Dhrystone 2.1 MIPS, and a XILINX FPGA version again, claimed to be rated at 8.1 Dhrystone 2.1 MIPS, albeit at somewhat different clock speeds. 8.1 isn't an unreasonable number for a one-clocker operating with some "appropriate" instruction mix. What's used here is whatever Dhrystone 2.1 with their compiler, etc, produces. If you find hardware to run the soft-core at 300MHz without the need to add extra wait states, then it would do 202.5 MIPS - way more than any 25MHz 8051 processors ever released - your 25MHz processors would need to do 8 instructions/clock cycle to keep up. Perhaps, but at what cost? The vendor provides no information about the actual device in which their implementation/simulation was performed, nor do they indicate what other resources were consumed. In an FPGA, based on recent quotes from AVNET, likely to cost 10x-1000x as much as a commercially available MCU, once resource requirements for features that justify using FPGA are added to the computation. My complaint has not been about the claim of performance, however un-persuasive, but about the inadequacy of the information provided, even in their datasheets. There's no real way to assess their device based on their marketing drivel, and that's all they've provided so far. The problem here, is that you made the assumption that the maximum possible speed of the soft-core was 8.1 times faster than a 12 MHz 12-clocker. But the manufacturers claim was specifically "8.1 times faster than standard 80C51 at the same frequency". The speed of the DT8051 will not stay still, but will climb with increased clock frequency.
The uneven figure 8.1 on the other hand is the result of running a benchmark containing a mix of instructions, where not all instructions is exactly 12 times faster than the 12-clocker. Some take one clock. Some take two clocks. Some may take even more clocks. This can so clearly be seen if you view Jan Waclawek's excellent link: http://www.efton.sk/51comp/51comp.htm If 50% of the instructions takes one clock, and 50% takes two clocks, then you get a processor averaging 1.5 clocks/instruction, which is a processor that is about 8x faster than the original 8051. This sounds like a very good fit for the manufacturers claims regarding the DT8051. It depends entirely on the instruction mix, doesn't it? So the end result is: Select any 25 MHz one-clocker you can find on the market. Run it against the DT8051 at 25 MHz, and you would need to read the fine print to figure out which one will be faster. And which one is faster may depend a lot on the mix of instructions. Pump up the clock frequency of the DT8051, and your 25 MHz one-clockers would need to be super-scalar, i.e. doing more than one instruction / clock cycle, to keep up. That's exactly what I believe, i.e. their claim should be based on the same benchmark as used by the typical MCU maker, e.g. the code-space filled with NOP's, which it can execute with maximal brevity. Do you now see the problem with your final claim? I don't see a problem, since I've clearly pointed out that it's based on the popular MIPS/MHz claims from some one-clocker manufacturers. What is important to notice on the other hand, is that even if a DT8051 can manage a quite impressive MIPS count, it may not be the most cost-effective way to get a fast microcontroller. Because of cost, it might be better to jump to a different architecture and select a fast ARM9 or ARM11 or any of the Freescale MPC processors instead. Of course unless the free gates in a FPGA can allow you to solve a problem a standard microcontroller can't do. There are times when "solving a problem" is more important than "finding the cheapest solution". On that we can surely agree, though I'd be careful about the 16- and 32-bitters, as they often only allow one to do 8-bit things VERY slowly, relative to an 8-bitter. If you're doing 8-bit things, it's useful to have an 8-bit-oriented instruction set. So in the end:
Richard said:
What I pointed out, I thought, was that the mfg's claims don't match their own claims. I don't know why you think you should add even more unsubstantiated claims. I have no interest in adding any unsubstantiated claims. Just trying to stop one that seems to have been caused by a reading error. Didn't you stop to think about why the claims didn't seem to add up - that maybe the claims did add up but you missed an important clue somewhere? I looked at all that and simply categorized the whole lot as typical "marketing drivel" designed to confuse rather than to inform. Everyone is wrong now and then. The important thing is to always be ready to reevaluate known truths when new information arrives. I don't believe there have been any new truths introduced, aside, perhaps, from the apparent fact that these so-called specifications are actually just the typical marketing drivel that we're routinely fed. I'm a seasoned man, and have been wrong enough times, not always happily, surely enough that it doesn't upset me. If you want truths, you have to obtain the compiled binaries for their implementation and compilation of their Dhrystone 2.1 benchmark, so you can run it on the MCU of your choice. Without that, any comparison is, at best, a guess. If you want truths about their ASIC/FPGA implementations, you have to know details about their implementation, too. If they use on-chip ROM and RAM, the device will cost many times as much as if it's off-chip, but it will run much faster. There's always a reason why they don't provide complete information. I'd point out, though, that their lowest-cost XILINX implementation in XC3S250E would fit even if the debug hardware support is used. That device has 2448 slices and the DT8051 with their TTAG debugger requires 1172. That's more than half of a device ranging in cost from $18 to $37 on www.octopart.com, which often only produces rough estimates. That doesn't allow for the full internal code and data space, so there's no telling how it would interact with external code and data memories. None of this suggests that one shouldn't consider using their soft-core, but it definitely means that they should provide a great deal of much more precise, and less-marketing-friendly information. RE |