From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

LabVIEW FPGA Idea Exchange

Community Browser
cancel
Showing results for 
Search instead for 
Did you mean: 
Post an idea

We're starting development on an Ultrascale device, KU40 and am missing the option to utilise the DSP48E2 primitive as we have for DSP48 and DSP48E1.

 

Intaris_0-1670929335347.png

 

NXG seemed to have it, but as we all know, NXG is no more.

 

https://www.ni.com/docs/en-US/bundle/labview-nxg-fpga-module-cdl-api-ref/page/dsp48e2.html

 

Can we please have a DSP48E2 primitive for LabVIEW FPGA? I would really like access to the new features supported, including the wider multiplier.

Sometimes, you might not care about the outputs under certain input conditions. "Not caring" can lead to significant improvements in optimization and thus resource utilization but there's no way to tell LabVIEW right now that you "don't care". I propose we create new data types that can support "don't care". It should start with the booleans but when you convert a boolean array to an integer, if one of booleans has a "don't care" the numeric output also then becomes a "don't care" which is yet another data type we need as well.

 

Here's what a "don't care" might look like if the user didn't care about the output if the input was 2:

dont care.png

P2P is a very useful technology for sharing data between NI targets.

 

Could this be provided for GPUs?

I think it would be useful if LV kept track of device utilization over each compilation. The data could be presented as a graph which might give useful clues to the developer how the project is approaching the limits of the FPGA. Also, I think this data could optionally be stored in the same folder as the bit file so that the developer can review the file history with their source control.

utilization 3.PNG

 

The document High Performance FPGA Devleoper's guide lists a parallelized bubble sort.  I tried this out, and found that it actually doesn't work.  This this matrix successfully gets the max value on top, and the minimum value on the bottom, it doesn't completely sort the values between. 

Bubble Sort.JPG

 

In this example, if the highest value started at the end of the array (red), and the 2nd highest value started 2nd from the end (pink), the high value ends up at the top, and the 2nd highest value ends up 3rd from the top. 

Bubble Sort - Markup.JPG

This lattice can be completed to sort the middle sections by adding 3 more columns, one with 3 Min/Max blocks comparing the center 6 values, one with 3 Min/Max blocks comparing the center 4, and a final Min/Max operation comparing the middle 2. This will complete the sort, but will take 7 sequential steps instead of the 4 listed.  The following works to sort the entire array:

Bubble Sort - Corrected.JPG

 

 

In labview it is not allowed to exit SCTL running in an external clock domain. Labview claims it could lead to instability of code due to glitches etc on the external clock.

I propose to leave the option open to the programmer to take that risk, which is not always there. It can lead to better understandable code.

For example I have code where I read data from an NI5752 ADC module and store in in block RAM (32 ADc channels, 32 block RAMs). Reading from that ADC implies acquiring the data in the external ADC clock domain. So, also the writing to memory is in that clock domain.

I needed to implement a function to reset the memory as well. That means writing to that memory.  That has to be done in an SCTL in the same external clock domain.

However, this reset function (subVI) can no be inserted in the normal "enable chain" of the main program, since the SCTL can not be terminated and the memory reset subVI never terminates.

 

Now I had to make an ugly trick to get this done. In the main program I create a dead branch doing the reset. That subVI never stops, but after the reset has been done it send a signal via a FIFO to the "wait reset" subVI in the main enable chain. the wait reset is running in the default clock domain and can exit the wait loop after the reset signal has been received.

Capture.PNG

However, this trick is not easy to understand from the program. It would have been easier if the reset function (external clocked loop)  could have exited by itself and be inserted in the main enable chain. That would have been more logical..

 

 

The error cluster has a string to identify where the error occurred, "source". In a FPGA code the string is only accepted (no broken run arrow) inside an error cluster. I guess this is implemented in this way to maintain code compatibility when you move code to another kind of target. The problem is that doesn't matter what you write in these strings when you are in a FPGA environment, it's ignored.

Some people use the same error code to show a kind of error changing only the source to identify where. I got a software, written by somebody else, that used error code 5000 in all user defined errors, changing only the "source" string. That give me no clue where the error was happening.

Since in a FPGA target only the "code" is useful i a error cluster, I propose two solutions:

1) A warning when a string in a error cluster is not empty (compilation time);

2) The FPGA compiler converts the ASCII chars in "source" string of the error cluster in a fixed size array of bytes [U8]. This array will be converted into a string in a target that can handle it. This is very common when you read a error cluster indicator in a FPGA VI from a RT VI. This solution will have a little overhead but it maintains 100% compatibility.

I like the second solution a bit more. A limited number of characters should be allowed order to save memory. One solution to that is to have a configurable option to determine how many chars is the maximum allowed.

I love using enums because they can often make discrete options much clearer. Example:

enum clarity.PNG

But, at least as of 2017, the below code is going to use fewer resources and propagate faster because LabVIEW is going to use an 8 bit register above instead of the two bits below. 

 

My solution is to allow smaller integer representations of enums (I'd use a U2 in the above case).

 

Ideally the user wouldn't even have to specify the integer size, it could just calculate the minimum at edit time and show the user what it is.

If I am choosing to offload multiple FPGA compilations to either a local or cloud compile farm, can we not at least do the itnermediate file generation in parallel?  Our current design takes approximately 10-15 minutes to generate intermediate files.  For 5 Cloud compiles, this blocks my IDE for around an hour.

 

Since the file creation processes are independent of each other, why can't we do them in parallel?

In this thread, I learned that you can't change the sbRIO analog IO to Raw. I would like that functionality to help reduce FPGA resource usage.

raw sbrio2.png

Currently, when you put a fixed point number into a case structure, it uses the next largest integer and you get a red cooercion dot:

Allow fxd point integers in case structures.PNG

 This is unfortunate because, you have to have a default case. It would be nice if the case structures could take the fixed point type since there's isn't any of the ambiguity that exits with floating point. Using a smaller number for the selector might also provide an optimization.

 

 

The FIFO read looks like an event based node (like a dequeue or wait on occurance) and I think there's a lot of people that assume it's going to use minimal cpu resources while it is waiting for data. I'm wondering if we can have an option that behaved like that. For example, could we have fixed sized FIFO read where the FPGA could trigger an interupt to let the RT side know the data is ready?

I've searched but can't see anything similar - please add a method for setting the timeout for FPGA nodes. This includes the 'Open FPGA reference' and FPGA IO nodes.

 

If you disconnect a cRIO FPGA (e.g. NI 9148) from the network, it takes 20-30s for the IO node or Open FPGA reference to execute. This is really bad for the user experience as if they try to exit their application in this time it may take half a minute for the application to exit. It also means you may have to wait that length of time to realise that your FPGA has disconnected under most use cases (you can obviously have an external watchdog loop to check that the node is executing in a timely manner)

 

Please allow me to configure the timeouts for these nodes similar to the TCP/UDP or VISA nodes. They are very similar in how they operate to the FPGA nodes (i.e. a hardware device driver which is susceptible to disconnects!) so I don't understand why these have been omitted.

 

I wouldn't mind having to set the timeout as part of opening the FPGA reference and then internally have it use the same timeout for other IO nodes as follows:

2014-09-29_17-16-29.jpg

 

 

What I would like to see when transferring FPGA Projects from one target machine to another, that I will not be forced to recompile the Vi code when testing the FPGA code whilst passing parameters within the front panel when executed on the new target., Work arounds like creating a small host front panel to allow parameter changes kind of defeats the object of a FPGA Front panel execution.

Well it seems that persistent/non-volatile memory is not available on FPGA targets for scenarios involving cycling power.

 

Apparently, the recommended approach is to transfer the data to the host and store it to disk.

 

This is a bit problematic in that the choice is either to write data to disk at a high rate or to accept that the most recent data might not be reloaded on restart.  For instance, an operator might expect to know exactly how many revolutions a shaft has undergone after power cycles to the FPGA target. A guaranttee that this information is as up to date as possible probably can't be met (maybe even under transferring data to disk at a high rate).

 

So I'd like to request this. 

A very useful feature of the FPGA Butterworth filter is the ability to use it multiple times, saving FPGA resources.

 

However this is not possible for 32 bit wide filters, only for 16 bit filters.

 

It would be useful if the 32bit filters could go multichannel too, at least two channel

 

 

On the cRIO-9068, the third serial port and the second Ethernet adapter is actually mounted on the FPGA, resources are consumed to redirect to realtime. Currently there are no access to this resource on the FPGA for developers, only from the Realtime.

 

I would like some I/O Nodes for interacting with these devices on the FPGA. NI could put up some examples how they could be used.

 

Today the resources are invisible to the developer, except for the additional long compile time and resources used (about 7%).

 

I attached pictures of the FPGA design and the resources consumed for a blank vi.

 

 

Sincerly,

Jens Eriksen

 

 

As part of my quest to solve problems arising from over-cautious Register transfers HERE I found a solution which WOULD have worked if I was able to force multiple clocks derived from the same source to be have synchronised start points (so that the iteration counters of the loops are known relative to each other). It seems that clocks derived from the same base clock do not neccessarily all start with Iteration zero at the same time.

 

My suggestion would be to either

  • Give some option to force such loops to have synchronous starts (also when using external clocks) -or-
  • Allow loops with external clocks to terminate so that we can put together out own synchronisation method

Shane

HERE I detailed a problem I currently have with Registers between two clock domains which are closely related (phase-locked).

 

It turns out that there is handshaking going on which, essentially, is not really neccessary.  It would be nice to have the option to have something similar to a Register for such clock domains where we know explicitly the relationships between the clocks and thus does not require handshaking.

 

Shane.

The CORDIC High throughput functions available in LabVIEW are capable of running at high frequencies, thus allowing FPGA code to (for example) multiplex multiple demodulators without exploding device utilisation.

 

Unfortunately, the option to apply a Gain correction to the results does not pipeline the actual multiplication, thus artificially limiting the available speed of the CORDIC algorithms.

 

In my code I always deactivate the Gain compensation and do this "manually" allowing the code to compile at much higher frequencies and more efficiently utilising the FPGA device.

 

It would be great if it were possible to also pipeline this multiplication as part of the CORDIC High-throughput node instead of being forced to implement the multiplication separately.