Showing results for 
Search instead for 
Did you mean: 

fpga- optimum PWM generation



I have a requirement to generate 8-Channels of PWM, each offset in phase from one another.


I am looking at two methods of doing this:


1a.) Create 256 element boolean array with the desired bit-pattern for the desired PWM duty cycle (done inside a parallel slow loop)

1b.) create 8 copies of this array, each rotated, 32 elements from the previous one (1/8 of 256 = 32)

1c.) Index each array in a timed loop tooutput the bitpatterns to 8 DIOs.




2a.) in a timed loop, create 8 counters (i = i+1) with shift registers for each

2b.)  add intial count values to each of the 8 shift register, outside the loop, of 0, 32, 64, 96, 128, 160, 192, 224

2c.) for each counter do a comparison inside the loop for the PWM duty-cycle value.

2d.) output each of the boolean comparisons to the 8DIO channels.


The first method seems quite elegant but uses 8x256 array elements (kind of like memory)


The second method uses 8 shift registers (counters), which in this example can be 8-bit, but calculates the PMW for each channnel "on the fly" in the same loop.


would it be correct to think that the second method, although less elegant, woud be more efficient?






0 Kudos
Message 1 of 23

I wrote what I thought was a neat optimized PWM routine at a previous job; unfortunately I can't post it here as I no longer have access to it.  Here is what I did as far as I can remember.  In my case I didn't need the phase offset, but the same general approach should still apply.


1. Create a memory block sized to hold one value for each of your counters, and set initial values for them to create your phase offset.

2. Create shift registers or memory blocks to hold the number of ON cycles and the number of OFF cycles.  If your counters might run with different duty cycles you'll need two for each counter and the memory block will be a better choice.

3. Add one more shift register - a U8 works well - to hold the overall state of your digital outputs.

4. Decrement each counter by 1 in a loop.  Whenever a counter reaches 0, invert the appropriate bit in the output state shift register and reload the counter with the appropriate count (so if the counter was 0 and the state was previously OFF, load the ON time).

5.  After iterating through each counter, write the overall state to your outputs.


If you don't need full clock speed, you can do this in a single-cycle timed loop with a case structure and a shift register operating like a state machine.  For example, states 0-7 would update counters 0-7 respectively (reuse the shift register as an address to read and write the memory block).  Case 8 writes all the outputs.  Case 9 does nothing except reset to loop back to case 0.  You've effectively divided your clock rate by a factor of 10 but chances are you don't need to run at the full clock frequency (I don't even know if the digital outputs could update that fast).  I hope this makes sense.

Message 2 of 23

You really want to avoid arrays in FPGA since those take up gate space, if you want to store or retreive data, use locally scoped memory blocks (or DMA to stream back to RT if its alot.)   In your case you can get away without using any of this by simply using the options of the Single Cycle Timed Toop.  The SSTL gives you the unique benefit of giving you both speed and using up smaller space on the FPGA.  If all you use are SSTC, you never have to worry about sychronization between loops since they are guarenteed to iterate at the same time.  Also, the SSTL has a built-in "Offset" option which you can use to make each loop start at a specified offset to each other.  


My example uses 9 SSTL's,  8 for each one of your PWM channels and the 9th is to do the acutal Digital Out.  You'll notice that I tried to use the 2^N VI as much as possible because these are quite literally "free" operations in FPGA.  It takes up no gate space at all because multiplying or dividing by a power of two is just a logical shift.   I use the built-in "Offset" node to offset each loop by (Period/8)*Loop Number. 

Message 3 of 23
I just noticed my example has a bug in each of the loops, the "Greater Than" primitive should be a "Less than or Equal to" primitive such that if the count is greater than the period, the counter is reset back to 0.  Alternatively you can swap the True/False wires.

Message 4 of 23
I'm curious to know if I'm wrong here, but I do not think that using all those local variables is efficient in terms of FPGA space use.
Message 5 of 23
In this case, since I don't have any writers to the pulse/period local Variables, the Xilinx compiler will optimize them away to a single copy.  It would be as if I moved both controls outside of the loops and wired them to each of the 8 loops and used the tunnel node to wire up to the comparison operators.    The Output1-8 local variables only take up an extra 8 bits.

Message 6 of 23

Thanks guys for all your replies. In the end I implemented an algorithm based on option-2 (see attached).


I tried both in the FPGA and the example using arrays occupied 4,500 SLICES, whereas the 2nd example occupied 905 SLICES.


I have another question, thouh, the loop-counter "i" is a 32-bit integer, so what should happen in the FPGA when it reaches the end of its count?


I left the thing running over the w/end and the value of i reached (2^32)/2 , but the loop was still going.




0 Kudos
Message 7 of 23



I found your example a quite interesting way of doing the PWM, by having a time-offset for each loop and synchronising all the loops.


In my adopted method (see above), I have a similar algorithm, but the 8 shift-registers are running in one single loop and the offsets are created by using a different, offset, initial-value for each register.


Is there perhaps an advantage in improving synchronisation or reducing skew bewteen channels to have each in its own loop?




0 Kudos
Message 8 of 23

I don't know how to edit my last message, so I am adding this a a new reply:




I have compiled your PWM example, but I was surprised to find that although it occupies    1058 SLICES out of 14366, it took about 28 minutes to compile, compared to 12 minutes for my other version (see PWM.doc, above). I have tried changing the While loop in my adopted method  into a Timed Loop and found that I can reduce the number of SLICES to 660- the lowest yet.


However with regards to the  using the SCTL, does the SELECT function inside the loop add any delay?


0 Kudos
Message 9 of 23

Arnie1 wrote:


However with regards to the  using the SCTL, does the SELECT function inside the loop add any delay?


There's no such thing as a delay in a Single-Cycle Timed Loop.  The reason it's named that is because everything inside the loop executes in one FPGA clock cycle.  If you have too much code inside your SCTL you'll get a compilation error, but up until that point you can add as much code as you want without adding any additional delay.


With regards to your question about the iteration counter: at least in standard LabVIEW, when the i terminal reaches its maximum it stays there for all future loop iterations.  I assume FPGA operates similarly.  This allows you to wire the i terminal to a case structure and have a 0 case that executes only once.


It would be a lot easier for many of us if you could attach your images in a standard format (.png, .gif, .jpg) instead of as Word documents.

Message Edited by nathand on 11-17-2008 08:50 AM
0 Kudos
Message 10 of 23