Double registering SCTL inputs

Intaris · ‎03-19-2014

I'm trying to get a new FPGA architecture to compile at insanely high speeds (320MHz). I had it compiled previously at this speed. I have fixed a few bugs, added a low-pass filter and now want to get is compiling at as high a frequency as possible to know how far I can push the architecture before we decide on a definite implementation.

I am using a LVOOP approach with a lot of VI-defined registers within instances of modules I want to use. These registers then provide the interface between the different SCTLs. Loop 1 has configuration data wihch produces an output which is written to a VI-defined register. This register is then read in a separate SCTL (at a different frequency) which produces further data then written to another register Y. Again this register serves as an input for a further loop and so on. Eventually the data is written to hardware. This way I can chain together several processes running at (for the local requirements) optimal clock speeds. The registers allow inter SCTL communication at different clock speeds.

I have been fixing timing errors one at a time as they pop up in an effort to creep gradually higher with the clock speed. I'm currently stuck at 198MHz (for code which was previously compilable at 320MHz). My timing report says that instead of the required 3.13ns for my timed loop I am getting 4.26ns, 2.62ns of which are routing delays for the SCTL itself. Obviously I'm using almost exclusively High-Throughput math with a lot of pipelining.

Could it be that the inter-SCTL communication is my bottleneck at the moment. The loop complaining of timing errors reads 3 registers from a hardware loop running at 80MHz (14-bit Device input), 3 registers from a custom frequency source (64-bit 120MHz) and reads 40 and writes 16 registers which are all written to / read from in a debug loop running at 10MHz (with FP Controls). I would like to have this loop run at 320MHz to multiplex 8 units of code without having to duplicate resources on the FPGA target. Latency is not a huge issue so I can tweak there. Other multiples of 40MHz are also acceptable and 160MHz is currently doable (4x multiplexing).

I have been reading up on the high-throughput FPGA nodes and any documentation mentions that they double-register the inputs and outputs when benchmarking to facilitate placement during compilation. I was wondering if there is an equivalent in LabVIEW to further decouple the different SCTLs from each other to ease placement. Can I simply use two registers in serial to achieve this double-registering?

Shane

T-REX$ · ‎03-19-2014

Hi Shane,

Can you post the design?

There are a couple things that could be going on in the routing... Before adding extra registers to the interloop comm, what did you add before noticing the decrease in clock rate? Anything that could be using a fixed resource (DSP, BRAM, DIO Line etc.)?

I'd be pretty surprised if it was the time-domain crossing stuff holding you back, but adding extra registers (feedback/forward) nodes around the read/write manually might help.

Could you post the Xilinx Log at least?

Cheers!

TJ G

Intaris · ‎03-19-2014

I have a compilation going on at the moment so as soon as its finished (assuming it fails) I'll attach the log.

Regarding the changes: Perhaps some DSPs (High-Throughput Addition and Subtraction plus several 48-bit wide and up to 8 deep discrete delays, no DIO, no BRAM). I fixed some bugs (I wasn't writing to all output registers originally which resulted in them being optimised out of the design.....) and added a low-pass filter (our own design).

We also have a lot of stuff going on in a custom CLIP which is about 1-2 hops away (logically) from the high-speed loop. I notice that whenever the high-speed loop does not compile, the CLIP clock also reports timing errors. This is why I'm thinking the two are affecting each other.

Shane.

PS After reading the help I thought that discrete delays would allow better timings than feedback nodes but other sources suggest that feedback nodes with compile initialisation and no reset signal can also be instantiated as SRLs (the compiler can optimise) whereas a discrete delay is ALWAYS an SRL.

PPS My design iterations are currently rather long. I'm not only implementing a new FPGA architecture, I'm learning about PLLs and Lock-Ins as I go The first compilation was not code which would actually work. Once I hit the 320MHz (theoretically at least), I started simulations to check the functionality and fix bugs. That was 4 days ago. A lot has changed since then.

Intaris · ‎03-19-2014

While I'm wating for the compile to finish (Insert xkcd comic here) I re-read your answer.

Adding feedback nodes INSIDE the SCTL might help? I assumed I would need to have an external intermediate SCTL to help with the routing...... Do you have any more detailed information on that subject?

Shane

PS I'm still in my learning phase with FPGA. I think I know a lot but I've only been FPGA programming since a little more than a year. Please bear with me if some of my preconceptions are completely off.

PPS Here's the latest log. My "optimisation" made things worse.

T-REX$ · ‎03-19-2014

Feedback nodes (or discrete delays) inside the SCTL could help if the clock boundary crossing logic was spread out. Basically, when you cross clock domains, you need a couple of registers on either side of the boundary to make sure that the data doesn't hit any meta-stable state caused by changing bits on different clock edges. If the "best" placement of the design needs one clock grouping near (for example) the bus comm logic, and one clock grouping near (again, for example) the DSP slices, then adding extra registers on either side of that communication could help the design meet timing since it would allow for more room to reach out and meet timing with the necessary increase in routing delay.

That said, I don't think that's the issue here. As you suspected, the timing violations do appear to be associated with your CLIP. Based on the timing report, you seem to be failing most on the 80MHz Derived CLIP Clock. Maybe some additional registers around the IO Nodes(either in LabVIEW, or in your CLIP) would help you meet timing around there.

Do you have some "bus data" (FIFOs or CTRL/INDCTR) being routed directly into your CLIP? I ask because the bus clock is having some difficulty too. Additional registers there might help. Also, do you have Synch Registers Enabled/Disabled around your CLIP IO (right-click on them in the project).

I also want to point out that the more fabric you use on the FPGA, the more routing delay you should expect. If you were able to compile at ~320hz with a mostly empty design (lets say just your CLIP), I thing a reasonably full (60-70% slice utilization) would only expect to hit in the ballpark of 180-220MHz, but that's just my gut... other guts may vary

Cheers!

TJ G

Intaris · ‎03-19-2014

T-REX$ wrote:
Do you have some "bus data" (FIFOs or CTRL/INDCTR) being routed directly into your CLIP? I ask because the bus clock is having some difficulty too. Additional registers there might help. Also, do you have Synch Registers Enabled/Disabled around your CLIP IO (right-click on them in the project).

Not quite, but my 320MHz clock is derived from a DCM-created CLIP clock. The 80MHz derived clock IS my 320MHz clock. It is an external clock source routed through the CLIP and a DCM. Could this essentially be the issue?

Synch registers are all at maximum.

LabVIEW

Double registering SCTL inputs

Double registering SCTL inputs

Re: Double registering SCTL inputs

Re: Double registering SCTL inputs

Re: Double registering SCTL inputs

Re: Double registering SCTL inputs

Re: Double registering SCTL inputs