Real-Time Measurement and Control

cancel
Showing results for 
Search instead for 
Did you mean: 

Using Host-to-Target FIFO to configure Block Memory for Linear Interpolation Table

I'm new to the community, so my apologies if this is a redundant post. I searched for a while and didn't see anything that was close enough to my challenge.

 

I have an FPGA system (PCIe-7852R) which does the following:

  1. Reads 8 channels of analog data at 80kHz
  2. Takes the analog input I16 values and split off off the least significant N bits (configurable in the FPGA) and use them to compute a fractional value for interpolating between correction values that are 2^N bits apart. ***Note: the piece-wise linear correction values are stored in a pre-defined master lookup table that is segmented into sub-tables. The first part of the master lookup table is a collection of pointers for the various linear interpolations defining the start of the individual channel correction tables and the various interpolation specifics (value of N, where the particular sub-table starts, location of zero within the table, etc.)
  3. After corrections are made, the modified data of 4 of the analog channels is fed into two different sub VIs which perform two separate 2D interpolations (same as for the analog inputs) to compute an analog output which is then written to two of the FPGA analog outputs. 2 more of the corrected analog inputs feed a 2-ch SGL precision PID controller.
  4. A Boolean control switches between the PID and Manual mode on the PID controller.
  5. Finally, all 8 analog channels are also fed into a Target-to-Host FIFO.

If the Master LUT is read only (i.e. initialized when I compile the FPGA and never written to later), everything compiles and works. However, as soon as I add a Host-to-target FIFO to asynchronously write data to the Master LUT (so I can re-load correction data and keep the system tuned without having to re-compile every time something changes) I get placer errors (Place543 to be specific). And even more infuriating is the fact that it's always an issue placing 100 FFs and 160 LUTs as if there is some specific resource conflict which is not clearly identified. I've tried this a few ways and it never seems to work.

 

I've done all the standard optimizations (remove front panel objects, remove memory arbitration where possible, etc.) but it is still failing to compile. The hard-coded equivalent (no FIFO writing to Memory) consumes close to 80% of FFs and LUTs, so it is already pretty full. Resource estimation of the version which includes the FIFO-Memory code is always below 100% in pre-synthesis and synthesis resource utilization.

 

While I have many questions, here are three related and key questions:

  • Are there existing standard coding structure examples for FPGA systems with a Host-to-Target FIFO feeding a Block RAM memory entity which is also read by sub-vis elsewhere in the main FPGA VI? I've looked and so far have not been successful.
  • I am reading a lot of data from memory in order to accomplish the linear interpolation - Is there any documentation or examples out there which directly discuss reading sections of memory on FPGAs? The method is single address read/write, so I just used a FOR loop with a shift register initialized to the starting address and a +1 increment fed into the read method address which then outputs an I16 array of my memory data.
  • Is there magical compiler error debugging guide from NI (or best practices to support debugging) which can help trace back placer failures to the offending source code? The naming conventions that come out of the compiler are tough to follow.

I'm working on a code snippet to show what I am trying to do and to test the baseline functionality (which will probably work), but in the meantime I am hoping to find information on the underlying technical details and best practices for this type of FIFO-Memory interaction. Any help will be appreciated.

0 Kudos
Message 1 of 13
(4,121 Views)

I don't know of any examples of what you're trying to do but I do something similar in my code (writing block ram fed by a FIFO). I send my address with the data using split/join number but it's more or less the same concept. I haven't run into the same issue that you are but the fact that you're starting at 80% utilization is a concern.

 

I have one small optimization that I don't know if you're using. Put the FIFO and the MemWrite in a single cycle loop. I use the handshaking terminal from the FIFO to decide if MemWrite executes.

 

I don't know of a magic guide. It sounds like you have a good idea of what's going wrong based on utilization. I'm hoping there's some slack out there in the PID code. I'm guessing because you have 8 channels, there likely is optimization available (serialize the data and use a for loop?)

0 Kudos
Message 2 of 13
(4,081 Views)

Thanks Nanocyte.

 

One of the latest things I did was to move my asynchrounous FIFO/memory update into a SCTL. It sounds like your FIFO is maybe a U32 or U64 with ADDR:DATA packed in each FIFO data element? I could see that being effective for non-sequential or write operations where the memory is written semi-randomly. Since my data is basically a giant lookup table, I would typically write the entire thing rather infrequently. I'm unsure whether it is more space efficient to use a smaller FIFO element (I16) with an incrementing loop for the FIFO->MEM write, or a larger (U32) FIFO element which includes ADDR & DATA with a case structure tied to the handshaking. I haven't done a ton of sample code & compile testing, but maybe it's time to start? 🙂

 

I've been going back through my code and trying to commonalize as much of the multi-channel interpolation as possible. I've already found some places where I can replace multiplier/divider functions with bit shifting. I'm re-architecting now to serialize the interpolation code across the 6 channels where I think I can.

 

Still wish it were easier to trace from VI to VHDL, but I feel like I am limping along.

 

Thanks for your thoughts, sometimes it's just nice to see someone else is doing something similar to feel better about the path your on.

0 Kudos
Message 3 of 13
(4,069 Views)

Hi DanAllis,

 

In regards to your debugging inquiry, NI does have a guide for best FPGA debugging practices, and has a couple of FPGA debugging tools to help ease the headaches that FPGA compilation can undoubtedly cause.

Mike B.
Technical Support Engineer
National Instruments
0 Kudos
Message 4 of 13
(4,046 Views)

Hey DanAllis,

 

A few things that might be helpful - 

  • If you're hitting the compile error after adding a FIFO, you can shrink a bit of it's FPGA footprint by shrinking the FPGA-side buffer. Right click on the FIFO in the project >> Properties >> General. Reduce the "requested number of elements". There is a bottom end to this somewhere. If you end up getting overflow because the FPGA-side buffer is small, you can try increasing the size of the host-side buffer (I think this is done via a FIFO.configure method on the host)
  • Same ideas goes for BRAM. Can decrease Number of Requested Elements. 
  • This White Paper shows a good summary of the FPGA data structures: Understanding Communication Options Between the Windows HMI, RT Processor, and FPGA: Table 5. Interp.... [Edit: in the table there are links to LabVIEW FPGA Help pages. I personally find these Help pages pretty useful. They're also in the CHMs we ship with the LabVIEW FPGA module]
  • For even more depth (you might find this interesting) I recommend the NI LabVIEW High-Performance FPGA Developer’s Guide
  • As for debugging FPGA compiles - yes, there are tools, but they're not terribly pretty unless you're well into VHDL. Xilinx makes a tool (name is escaping me now) that can help with analysis. Generally I've always been able to tweak my project faster than use these tools, but that's probably just because I know LabVIEW more than the Xilinx tool.
Andrew T.
"His job is to shed light, and not to master" - Robert Hunter
0 Kudos
Message 5 of 13
(4,019 Views)

Thanks Mike. I was looking more for a Compiler debugging tool to help bridge the gap between the NI VI code and the initial VHDL that goes to Xilinx. Ideally, there are some intermediate files I can open in the Xilinx tools and use the VHDL schematic viewer as a way to see the full extent of the compiled code.

 

In this case, I solved my problem by code generalization/consolidation and forcing those now reusable chunks to be non-reentrant sub-vis. This seems to be key for space saving on FPGA but get's very little play in the FPGA Development guides. It meant going from 7200/7200 slices on my PCIe-7852R to 6500/7200 slices.

0 Kudos
Message 6 of 13
(4,016 Views)

Thanks Andrew - I hadn't read that white paper before. It is certainly a good read to help understand comm flow between devices.

 

I was never having issues with the FIFO as much as with fitting all of my code onto my FPGA. It turned out to be a lot of redundant code that was being compiled into N blocks of the same logic. In the NI LabVIEW High-Performance FPGA Developer's Guide there are two sentences in the 94 pages that were relevant to me. In order to force the compiler to treat a block of reused sub-vi code as a single block in the FPGA, you have to make the sub-vi non-reentrant. That forces the logic to be create once and then inputs are muxed in over the routing fabric in the FPGA. Makes a huge difference to area utilization. I'm running slow enough that I can process everything serially and save the area.

 

Of all the things which would potentially cause issues with over allocation of FPGA resources, this would seem one of the most likely candidates. Not sure why it doesn't get more play in the dev guide.

 

As for the debugging tools - I like the Xilinx ISE tools and have used them way back when. I'm going to take a look back at the intermediate files and see if I can open them standalone in ISE. The last time I tried, I was probably not trying hard enough 🙂

0 Kudos
Message 7 of 13
(4,014 Views)

Once I resolved the FPGA bloat issue, I was able to get a FIFO-to-Block Ram Asynchronous update loop to compile just fine (image attached). Still waiting to hear from NI if the coding structure is logical. My biggest concern is whether the timeout boolean is sufficient to ensure that as I read the FIFO, it will properly increment and fill the Block Memory with the address increment feedback node.

 

I'm also waiting to hear back from NI regarding the best practices for memory access. Maybe digging into the VHDL will shed some light...

 

 

0 Kudos
Message 8 of 13
(4,012 Views)

You might want to consider sending us a snippet rather than image.

 

How big is the PID_Config array? That is potentially a big resource hog.


Yes, your use of a timeout there is fine. I'm assuming it just feeds through on the true case.

 

I'm a little curious about your LUT_Update true case, I'm hoping you just set the write address to zero. You really shouldn't need that control. Ideally you'd just rely on the fifo and the length.

 

I don't understand the wait in the loop. Why is that there? Can you maybe get this all in a single cycle?

 

Your use of implies is not how most people do it. Without seeing the true case, I can't give you equivalent logic.

 

Please label the constants going into the shift register.

0 Kudos
Message 9 of 13
(4,007 Views)

It's only 3 channels. I had considered encoding the values into a block of memory, but only if I couldn't get the design to compile with my other changes. There is an NI example that's effectively the same code that I swiped and used.

 

When LUT_UPDATE feedback = T (after a complete update occurred or was cancelled) the address counter is reset to a START_ADDR control. This allows for partial writes to the block memory. I can't do everything in a single cycle on this hardware (only the DRAM block memory can execute in a SCTL).

 

I was looking over resource utilization and was trying to limit the operators by using the implicit function. The idea is to react to the edge of the signal, so it a sample(N) = T AND sample(N-1) = F. I could have used an AND compound arithmetic function with an invert on one input, but the implies provides the same function with the inverse output (effectively a NOR with opposite input inversion as the example above).

 

Code Snippet is attached.

 

0 Kudos
Message 10 of 13
(4,005 Views)