Email: Password: Remember Me | Create Account (Free)

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread
???
04/18/05 09:56
Read: times


 
Msg Score: +1
 +1 Informative
#91810 - especially for those...
Responding to: ???'s previous message
...who gave me -1 with "not useful" reason.

I did not wish to be either useful or not; I just replied on Jan`s question. Anyway if somebody for a reason did not understand what I have said and explained there then here are some pictures for those who sees my English non-understable.

First picture shows Altera multiplicator. Upper one is the not-clocked, no-pipelined pure combinatorial logic multiplicator. Bottom one is clocked multiplicator which utilizes clock input for pipeline operation.

To use pipeline, user must provide clock and enable its utilization. Next example shows 2-stage conveyor setup:

Now is about the differences between these two ways of multiplication.

No-pipelined multiplicator.
See the picture above. When one or more input pins of A[] or/and B[] are changed then output Q[] reflects the changes after a delay. This delay is depend on derivative, its speed, signed/unsigned mode, buses` widthes etc. Because it is not-clocked multiplicator so this delay time is not depend on system clock. Such multiplicator may be used in CPLD which does not utilize a clock in the whole.
Here is the time simulation of the process 0x3FF x 0x3FF:

You may see that input data are changed at 10ns mark and the valid result is provided at about 30,6ns mark. Before this time the result is not valid and changes many times due hard realization of cascade of summators and carryes nets.
Typical delays for no-pipelined 10x10 bits multiplicator realized in Altera Acex EP1K10-1 device are:


Clocked multiplicator with pipeline.
See the picture above once again. When one or more input pins of A[] or/and B[] are changed then output Q[] does not reflects immediately but only after the clock rising edge. Depend on that how many stages pipeline is deep, the valid result on output lines is provided after specified clock cycles.
One of pipeline ideas is that when the number of process stages is defined then the whole process may be divided into small parts per each stage. Small parts are executed quickly and do not request huge complex logic as it is done with pure realization. For example, for multiplication it may be possible to do function not as 10x10bits but as 10x5 + (10x5)<<5 instead. Such way decreases the lenght of carry chains and provides more fast response after final clock.
Here is the time simulation of the process 0x3FF x 0x3FF for multiplicator with 2-stage pipeline:

As you may see, the first clock at 20ns mark does not change the output. Only after second clock (30ns mark) and a delay the valid result appears at the output pins.
With pipelined multiplicator the result time is the sum of pipeline delay defined with number of its clocks + final delay. In fact, final delay is some depended on the number of pipeline stages as well. It is not clear with current example but if somebody is interesting then I may show the real table/diagram for a divider where it is very good seen.
For clocked 10x10 bits multiplicator realized in Altera Acex EP1K10-1 device with 2 stage pipeline the last_clock-to-result time delay is:

As you see, the max. delay is about 15ns when the delay for not-pipelined multiplicator is about 22ns. But we know that the full time from a data clocked into pipeline to valid result output includes one clock period (i.e. between 1st and 2nd clock), so it is about 10+15=25ns for example above where 100MHz system clock is utilized.

Now some notes I need to say.

1. The examples above use pipeline with 2-stages. In fact, for 10x10 bits multiplication it is enough to use one-stage pipeline. In this case one-stage pipelined multiplicator produces valid result after about 14,2ns after the clock rising edge. Here I used two stages pipeline only for demonstration of its features. But there are many applications where 2, 3 or even 6-stages pipelined functions produce result more fast than 1-stage ones. Need a little example? Well, take a divider 16:16 bits which produces 16-bit quotient and 16-bit remain.
System clock: 50MHz
----------+---------------------+---------+
Number of    Valid result time    LE used
pipeline     (including
stages        pipeline delay)
----------+---------------------+---------+
0            210ns                285
1            175ns                326
2            130ns                367
3            115ns                410
4            135ns                451
----------+---------------------+---------+


Regards,
Oleg

List of 66 messages in thread
TopicAuthorDate
Fast Square.            01/01/70 00:00      
   Square dancing            01/01/70 00:00      
      table lookup???            01/01/70 00:00      
   code & algorithm            01/01/70 00:00      
      16*16 bit is slower than what I want.            01/01/70 00:00      
         How fast?            01/01/70 00:00      
            Re: How Fast            01/01/70 00:00      
               ... probably impossible in 15 cycles            01/01/70 00:00      
                  why cycles ?            01/01/70 00:00      
                     Re: Microseconds            01/01/70 00:00      
                  table lookup            01/01/70 00:00      
   Natsemi appnote or CORDIC            01/01/70 00:00      
      Natsemi link to appnote            01/01/70 00:00      
   (a+b)^2=a^2+2*a*b+b^2            01/01/70 00:00      
      Thats Slow.            01/01/70 00:00      
         faster need hardware            01/01/70 00:00      
         How fast do you need?            01/01/70 00:00      
            Re: How Fast.            01/01/70 00:00      
               Just?            01/01/70 00:00      
                  Incorrect            01/01/70 00:00      
                     Correct?            01/01/70 00:00      
                        Whooooopa... Sorry.            01/01/70 00:00      
                           Thanks            01/01/70 00:00      
                        I tried...            01/01/70 00:00      
                  optimum? table driven            01/01/70 00:00      
      Jan metod            01/01/70 00:00      
   Hardware?            01/01/70 00:00      
      CPLD?            01/01/70 00:00      
   SILabs f12x does it in hardware            01/01/70 00:00      
      Re: SiLabs F12x            01/01/70 00:00      
   Price            01/01/70 00:00      
      F12x price            01/01/70 00:00      
         F12x MAC            01/01/70 00:00      
            provided in the datasheet            01/01/70 00:00      
   Just out of interest            01/01/70 00:00      
      clarification            01/01/70 00:00      
      CPLD?            01/01/70 00:00      
         too expensive            01/01/70 00:00      
            Absolute rubbish Oleg            01/01/70 00:00      
               explain            01/01/70 00:00      
               your right            01/01/70 00:00      
            especially for those...            01/01/70 00:00      
               I need to say this....            01/01/70 00:00      
               By the way.....            01/01/70 00:00      
   just a demo            01/01/70 00:00      
      Hang on.            01/01/70 00:00      
   Oh bollocks            01/01/70 00:00      
   Well oleg            01/01/70 00:00      
      Please check my answer.            01/01/70 00:00      
         Here you go            01/01/70 00:00      
            You're having me on.            01/01/70 00:00      
               Pascal?            01/01/70 00:00      
               Pascal?            01/01/70 00:00      
            Why ?            01/01/70 00:00      
               It was changed because,,,            01/01/70 00:00      
               Its because            01/01/70 00:00      
   For Jez            01/01/70 00:00      
      For Michael            01/01/70 00:00      
   simulation            01/01/70 00:00      
   Re: Fast Square            01/01/70 00:00      
   Prahlad, waithing for a conclusion            01/01/70 00:00      
      just an exercise...            01/01/70 00:00      
      Tricky            01/01/70 00:00      
         Jez asked his cat, I asked my sheep            01/01/70 00:00      
      Conclusion.            01/01/70 00:00      
      SPI EEPROM            01/01/70 00:00      

Back to Subject List