From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

LabVIEW FPGA: Multiple SCTL versus one SCTL (same clock domain)

Hello NI forums,

 

-------------------

Question:

-------------------

See the attached picture from a modified version of the LabVIEW DRAM FIFO example. It probably explains my question more effectively than the paragraphs below.

 

What is the difference to the LabVIEW / Xilinx compiliers, if any, between placing two independent branches of code in the same SCTL, versus placing them in individual SCTLs (in the same clock domain)?

 

-------------------

Misc. comments:

-------------------

I have briefly experimented with this concept using the included LabVIEW DRAM FIFO example (example finder >> Hardware Input and Output >> FlexRIO >> External Memory >> Simple External Memory FIFO.lvproj).

 

I compiled the default example (the read and write interfaces are in separate 40MHz SCTLs) five separate times. Then I put the read and write interfaces in the same 40MHz SCTL and compiled another five times. The result (when both read and write interfaces were in the same SCTL) was a reduction in resource usage (according to the compilation summary).

 

However, due to my lack of knowledge I'm hesitant to conclude that placing everything in one SCTL is always the best option. For example, I do not know what is created 'behind the scenes' with each SCTL. Perhaps putting independent branches of code in separate SCTLs makes it possible to route clock, reset, implicit enable, etc. signals more effectively.

 

-------------------

Background information:

-------------------

My task involves acquiring 2 channels of analog data using the NI 5772 and PXIe-7966. Data acquisition takes place in a 200MHz SCTL, and downstream processing is performed in a 100MHz SCTL.

 

During a vast majority of the 100MHz SCTL processing stages of the FPGA VI, the 2 channels of data do not interact with eachother. So it would be easy for me to place them in separate 100MHz loops if doing so would somehow help the design (timing, resource usage, etc.).

 

 

--------------

Thanks!

 

0 Kudos
Message 1 of 12
(3,716 Views)

I would also be interested in that.

At NI Week 2 years ago I was told that splitting the code in several timed loops can help if you get timing violations. Everything which is in one timed loop shares the same enable line for the flip-flops, so that might make routing difficult which in turn can lead to timing violations because of routing delays.

Can anybody share more insights?

What additional resources are used (if any) when splitting one loop into multiple (with same clock)?

 

0 Kudos
Message 2 of 12
(3,617 Views)

Based on your example, I see nothing wrong with using a single SCTL.  It probably is saving a little bit of resources from clock routing.  But I wouldn't think it would be enough to actually worry about unless your FPGA is getting really close to being full.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
0 Kudos
Message 3 of 12
(3,608 Views)

I would organize the code in a way that makes it easiest to understand, debug, and maintain. There are some cases where breaking it up into multiple loops might fix a timing issue; there are other cases where combining them might reduce resource utilization allowing the design to fit on the chip.

 

In this particular example, either way would be sufficient so it really depends on whether the code is related or not and if it needs to be tested separately. I would think through those high-level design details first and design the application accordingly.

0 Kudos
Message 4 of 12
(3,600 Views)

Just out of interest, what is the resource usage differential between the two versions?

0 Kudos
Message 5 of 12
(3,565 Views)

There is some amount of overhead associated with each loop to deal with data flowing into and out of the loop (both via tunnels and through other resources). It's generally minimal overhead compared to the logic within a loop, but it does exist and depending on how the application is written combining into a single loop may allow the compiler to share some of these boundary resources between multiple items. Again, this generally is not an issue so don't architect around the number of loops as a primary concern.

0 Kudos
Message 6 of 12
(3,555 Views)

Yes this is exactly why I am now looking into alternative configurations.

 

In my actual design the FPGA usage is more extensive than the example. I've copied the resource usage summary for the latest version of my FPGA VI (several compiles of the same VI). I have the most resource intensive parts of the design completed, but I still need to add some more things (hopefully they will fit).

 

If you are curious, I have 8 Xilinx FFT cores configured as 1024 point, streaming I/O, 100MHz, 12-bit input and output width, and using as much block ram and as many DSP48s as possible. That is what is taking up most of the resources.

 

Device Utilization
---------------------------
Total Slices: 94.1% (13850 out of 14720)
Slice Registers: 66.6% (39219 out of 58880)
Slice LUTs: 61.7% (36343 out of 58880)
DSP48s: 50.0% (320 out of 640)
Block RAMs: 43.4% (106 out of 244)

 

Device Utilization
---------------------------
Total Slices: 98.3% (14466 out of 14720)
Slice Registers: 66.6% (39219 out of 58880)
Slice LUTs: 61.7% (36344 out of 58880)
DSP48s: 50.0% (320 out of 640)
Block RAMs: 43.4% (106 out of 244)

 

Device Utilization
---------------------------
Total Slices: 92.6% (13630 out of 14720)
Slice Registers: 66.6% (39219 out of 58880)
Slice LUTs: 61.7% (36342 out of 58880)
DSP48s: 50.0% (320 out of 640)
Block RAMs: 42.2% (103 out of 244)

 

Device Utilization
---------------------------
Total Slices: 94.0% (13843 out of 14720)
Slice Registers: 66.6% (39219 out of 58880)
Slice LUTs: 61.7% (36345 out of 58880)
DSP48s: 50.0% (320 out of 640)
Block RAMs: 43.4% (106 out of 244)

 

Device Utilization
---------------------------
Total Slices: 93.2% (13723 out of 14720)
Slice Registers: 66.6% (39219 out of 58880)
Slice LUTs: 61.7% (36344 out of 58880)
DSP48s: 50.0% (320 out of 640)
Block RAMs: 43.4% (106 out of 244)

 

 

0 Kudos
Message 7 of 12
(3,542 Views)

Are these all for one SCTL or all for multiple SCTLs?  What is the difference in resource usage for both approaches.

0 Kudos
Message 8 of 12
(3,532 Views)
 
Trusted Enthusiast
Intaris
Posts: 3,264
 
Re: LabVIEW FPGA: Multiple SCTL versus one SCTL (same clock domain)
 

Just out of interest, what is the resource usage differential between the two versions?

----------------------------------------------------------------------------------------------------------------------------

 

 

 

 

 

 

 

In response to the above comment,

 

This is a little embarrassing, but it seems like the resource usage is similar than I initially thought for this particular example. I think the previous compilations that I based my assumption on coincidentally used more resources in the 2-SCTL loop case. I just compiled each version two additional times (see below).

 

Here's the version with everything in one loop:

 

Device Utilization
---------------------------
Total Slices: 17.6% (2587 out of 14720)
Slice Registers: 9.5% (5583 out of 58880)
Slice LUTs: 8.2% (4855 out of 58880)
DSP48s: 0.0% (0 out of 640)
Block RAMs: 2.5% (6 out of 244)

 

Device Utilization
---------------------------
Total Slices: 16.9% (2493 out of 14720)
Slice Registers: 9.5% (5583 out of 58880)
Slice LUTs: 8.3% (4858 out of 58880)
DSP48s: 0.0% (0 out of 640)
Block RAMs: 2.5% (6 out of 244)

 

 

Here's the version with the read and write in separate loops:

 

Device Utilization
---------------------------
Total Slices: 16.4% (2407 out of 14720)
Slice Registers: 9.5% (5583 out of 58880)
Slice LUTs: 8.2% (4852 out of 58880)
DSP48s: 0.0% (0 out of 640)
Block RAMs: 2.5% (6 out of 244)

 

Device Utilization
---------------------------
Total Slices: 19.4% (2859 out of 14720)
Slice Registers: 9.5% (5583 out of 58880)
Slice LUTs: 8.3% (4859 out of 58880)
DSP48s: 0.0% (0 out of 640)
Block RAMs: 2.5% (6 out of 244)

0 Kudos
Message 9 of 12
(3,526 Views)

Intaris:

 

In the version that I just posted, everything that runs at 100MHz is in the same SCTL.

 

I will try a separate 100MHz loop version after I finish some new additions to the VI. It will probably take me a few days.

 

Once I have the new version I will compile it in both configurations 1 and 2 SCTLs and post the results here.

0 Kudos
Message 10 of 12
(3,523 Views)