FPGA Register limitations

Intaris · ‎04-03-2014

Implementation details for LV 2012 would suffice.

Intaris · ‎04-03-2014

Phew. Today was a very frustrating day.

I finally managed to get a version working sending the values from a 160MHz loop to a 40MHz without having to resort to BRAM or having to scale up the speeds so that they match or that I could live with the 2 cycles of handshaking.

I can implement a group of Registers for a single variable (instead of a single register as I have previously used). I then write to each one in turn on the producer side. Then, on the consumer side I read from the registers one at a time (with an index difference of 1 (Per Cycle: Write N, Read N+1).

This seems to work. I need to investigate if the index shift is repeatable as there's no way to change this in the end software. The final "aha" moment was writing to the registers in series but reading them all in parallel, but only actually using one value at a time (stepping through the registers essentially, but the "blind" read functions take care of the otherwise annoying handshakes). I feel this may prove unstable in real software but it's worth a try. It's basically a custom FIFO using Registers as elements.......

Shane.

Dragis · ‎04-03-2014

One thing to keep in mind here is that writing the values separately on each tick of the faster clock cycle can cause the values to show up at different clock cycles on the read side. If it is important that the data be atomic, you'll want to buffer up an array of 4 values on the writer side and then commit that whole block on the last cycle and then the data will show up aligned on the next (aligned) cycle of the slower clock.

Intaris · ‎04-03-2014

The values showing up on different cycles on the read side is exactly the point. This is essential to what I'm trying to achieve. Each register carries one quarter of the total data being transferred with the handshaking for each register being offset, hence the ability to have a valid new value each and every iteration of the slower loop. The data spread over the four registers is NOT atomic, it's interleaved. The series od data is essentially transferred via Register N, N+1, N+2, N+3, N, N+1, N+2 and so on. The trick is keeping the writing and reading indexes aligned.

The example is for a single parameter being transferred between fast and slow loop. In reality, this would all be duplicated for each unit being multiplexed within the faster loop.

Intaris · ‎04-04-2014

I still have some open questions regarding how the registers work. I'm trying to find a bomb-proof solution to my problem and knowing the parameters important to the transfer of data via register is important for that.

At the moment, the 33% throughput characteristic (one new data point every 3 read cycles) when writing and reading at the same rate from different clock domains is hurting me. Although I have a solution (interleaving - see above), I think I may be able to simplify this if I could clear up a few points regarding the register implementation. In all of the following cases, each clock I am referring to is > 40MHz.

The above-mentioned handshaking only happens between different clock domains (writing in X & reading in Y), right? Using a register within a single clock domain (writing and reading in X) will have a 100% throughput characteristic?
If I am writing in X and also reading in both X and Y, the read in X will proceed without handshaking whereas the read in Y will involve handshaking, is this correct? Or will the presence of a register read ina different clock domain force all reads to resort to handshaking?
When transferring data across clock regions, the slower of the two "controls" the handshaking protocol, or is it always the read function which does this?
If I write to different values to a register in subsequent clock cycles, will tha second data point be lost due to handshaking of will the last value always eventually get propagated? The help on the new "Handshake" items in LV 2013 seems to indicate the data may be lost.

I realise this is getting into the gritty details of the implementation but I'm really trying to squeeze as much as possible out of my architecture and "small" details like this are capable of bringin g the whole house of cards crashing down.

Shane.

Intaris · ‎04-04-2014

Testing would indicate that the first two points are as I assumed. Handshaking occurs ONLY between different clock domains, even if the same register is used for both the same and a different clock domain. There is lost data between the high and low speed loops even though WITH THE SAME REGISTER there is no data loss between reading and writing within the same high speed loop.

Am I right in assuming these are implemented as two independent registers behind the scenes? How can one read work without handshaking and the other require it?

Dragis · ‎04-04-2014

Sometimes you just have to get down in the dirt to get things working : )

If you really want to understand this, here is a decent overview of how synchronization is done in hardware: http://www.stanford.edu/class/ee183/handouts/synchronization_pres.pdf. You can skip down to the part about handshaking if you just want to see what hardware is involved. The implementation for LabVIEW FPGA is a slightly modified/optimized version of this.

For your questions ...

You are correct, when the write and read are in the same clock domain there is not synchronization overhead and the value is available on the next clock cycle. When multiple clocks are involved, think of there being one version of that register in each clock domain it is accessed. When a write occurs, there is some logic (see pdf above) that moves that value safely to the clones of the register in the other clock domains.

And to make the tranfser safe, you can lose data if you push data into the write side more often than those 2-3 cycles of latency (in the slowest clock domain) to get the data to the other regions. Again, all of this "can" be optimized away in some cases if the two clock domains are related nicely (derived from the same source clock, etc.), but LabVIEW FPGA does not currently do that optimization.

Also, for anyone that cares, note that the same clock domain must be the same exact clock. If you derive two 40 MHz clock from two different base clocks that would be clock crossing because they may not be aligned with one another.

That is a lot of info, so please keep asking questions.

Intaris · ‎04-04-2014

OK, It's nice to see that my thoughts on the subject are starting to align with reality, something I've grown to value instead of taking for granted in younger years. 🙂

I'm slowly making progress on my architecture and I think I'll be able to salvage my original architecture with a few tweaks.

Regarding the related clocks, it's important that the clocks are phase locked (exact same base clock) and have frequencies which are whole multiples of each other (40 & 80, 120 & 40 but not 120 & 80) so that the conditions for handshake-free transfers could theoretically be possible, right? As long as the starting points (and end points) of each clock cycle in both domains are always aligned. This is not the case with 120MHz and 80MHz, even though both may be an integer multiple of the base clock (40MHz).

Dragis · ‎04-04-2014

That's right, if you were using 40 MHz and 120 MHz clocks derived from the same clock source the optimization "would" be possible. However, you would have to do it yourself using CLIP for now until LabVIEW FPGA natively supports it.

Intaris · ‎04-04-2014

@Dragis wrote:

That's right, if you were using 40 MHz and 120 MHz clocks derived from the same clock source the optimization "would" be possible. However, you would have to do it yourself using CLIP for now until LabVIEW FPGA natively supports it.

I choose to interpret the word "until" as being a promise. That's a nice way to end the week.

LabVIEW

FPGA Register limitations

Re: FPGA Register limitations

Re: FPGA Register limitations

Re: FPGA Register limitations

Re: FPGA Register limitations

Re: FPGA Register limitations

Re: FPGA Register limitations

Re: FPGA Register limitations

Re: FPGA Register limitations

Re: FPGA Register limitations

Re: FPGA Register limitations