Large number of samples through FIFO - avoid overflow on target buffer

ManuelQ · ‎06-12-2013

My hardware is a PXI system, with a 7952R FlexRio FPGA module and a NI5761 14 bit 250 MS/s digitizer.

Most of the posts I have seen deal with acquisition rates of some Ms/s or less and number of samples of a few 1000. Also, my analog read as well as FIFO write is in a SCTL, no while loop.

I want to acquire say a 100000 samples per record at a rate of 250 Ms/s or more (millions of samples) at that rate.

Here is the basic problem I have condensed ( I believe that it's this). I am using the SingleSample Acquisition example as a test project.

The DMA FIFO comprises if the target and the host buffer. In practice, the maximum number of elements the target buffer can have is 32767.

Now, once on the host I request more samples than 32767, the target buffer flows over (overflow), the host FIFO read node is waiting forever until eventually the timeout has been reached and the number of elements remaining in the FIFO (I assume the host buffer) get tremendously large (scales with the time the read node waits, the longer I specify the timeout the larger the number gets).

This is even though the FIFO depth is large enough (The host buffer is 5x the number of samples per record).

This first suggested that the DMO transfer rate is too slow. However, this also happens if I acquire at only 125 MS/s (take every 2 ns sample, which means 14 bit /8 x 125 Ms so roughly 250 Mbytes/s. This is well below the transfer rate as far as I know, so it should not be the reason. Or am I overseeing something?

The only solution I see is to limit the record size to max. 32767 elements at a time.

Has anyone experience with reading large amounts of samples on an FPGA - digitizer configuration using a FIFO?

Simply run the single sample CLIP example vi and try to acquire 1000 samples, it will work. Try acquire say 10000 samples and it will timeout as described above.

Thank you!

Dave.T · ‎06-12-2013

So, when you're asking on the FPGA for 32767 samples on the FPGA, the current project is implementing the memory in U32 data type. That equates to 65534 samples, or 1048 kbit. This is going to be the maximum allowed size you can create for the 7952R

Your 7952R has 128 MB of DRAM, you can store the information on there if you're running out of room.

It seems odd to me that it would be timing out with only 10k elements though. I think it should be able to handle that. after the timeout occurs, can you try reading 0 elements on the FIFO and reading the "Elements Remaining"? I'd be interseted to see how much data is in the fifo when it times out.

You might have to clear the error in order for the subsequent FIFO reads to execute successfully.

Something like this should work.

National Instruments
FlexRIO & R-Series Product Support Engineer

ManuelQ · ‎06-13-2013

Dear Dave,

thank you for the quick reply. I did a very quick test using the SImple Sample CLIP project.

The Simple Sample CLIP project has a target scoped FIFO of 261 elements and a target to host FIFO of 2047 elements

Until 5510 requested elements it works fine, without overflowing the target buffer of the FIFO. Above it gets stucked.

For 10000 requested elements 3344 were remaining. For 6000 I believe it was about 2881. This was independent of the timeout time (10 or 20 seconds), after I've cleared the error as you suggested.

Given what you said about the 1024 kbytes maximum, it seems like I cannot stream 100 ksamples or even Msamples, but need to implement DRAM as you proposed.

I might try the DRAM today. I did a quick first trial and will test it later (see screenshots). Maybe you could have a look at it and tell me if this should work in principle. The first image shows the samples being read in a SCTL (compiled for 125 MHz) and put in the memory. There are some counters, because I want the samples each N number of ticks being pushed into 1 of 2 memories (ON and OFF, true means ON, false means OFF). The second image shows the data being transfered from the memory to the FIFO OUTSIDE of the SCTL. Not sure...

Thanks so much!

ManuelQ · ‎06-13-2013

one more thing: I will first try block memory instead of DRAM, since I am using a SCTL.

Dave.T · ‎06-13-2013

I think the real issue here is that you're on PXI, which as a max theoretical throughput of 100 MB/s. If you were trying to continuously stream data back to the host on one channel of the 5761, you're already at 500 MB/s. The best solution would be to upgrade to PXIe which will get you closer to 750 MB/s

Otherwise, you'll have to get clever with triggering to maximize your throughput without overflow. Do you have an idea of how long this acquisition is going to run and the aggregate MB/s throughput you expect, considering the 500 MB/s per channel, and downtime between triggers?

Regarding the DRAM, I would be careful putting that inside of a case structure. Weird things can happen on FPGA when you put them inside a case structure (inside a SCTL). More on that here:

http://zone.ni.com/reference/en-XX/help/371599H-01/lvfpgaconcepts/fpga_sctl_and_synchro/

I'm pretty sure implementing it as a LabVIEW memory (which you have) should not cause any weird behavior, but it's always good practice to use case structures on LVFPGA as a MUX, instead of a state. Approaching it like this can make compiliation a lot easier, and avoid headaches when mysterious things happen.

National Instruments
FlexRIO & R-Series Product Support Engineer

ManuelQ · ‎06-13-2013

Hi Dave,

Ideally, we would to have a total integration time of say 1 second at say 8 ns per sample (125 MS/s), so 125 MS/s x 1 s so 125 MS. 14 bits per sample makes 14 x 125 MS = 1.75 Gbit so 1.75 Bbit/8 so 218 MByte, if I'm not wrong. Better even 100 times that volume 🙂

However, all we do is averaging sequences of 1000 elements, which could also be done on the FPGA, what we are already doing for another application.

I managed to implement the block memory. See attached images. It works like a charm. But it is limited in space (so far I use 10000 elements).

I used the same structure but DRAM. As a result only 0 was streamed to the host. This is probably the weir behavior you were talking about. Instead of struggling with the DRAM which won't be enough memory anyway, probably I will go towards using block memory to safe the average sequence (1000 elements) and after averaging stream it to the host.

Thanks again for your help!

ManuelQ · ‎06-13-2013

One thing that was/is hard to accept is that even though the effective depth of the FIFI may be 2 million (due to host buffer), but it still overflows when the number of samples to be transfered is considerably larger than the target buffer of the FIFO. I guess this is due to the limited transfer rate to the host...

ManuelQ · ‎06-14-2013

Dear Dave,

Based on the block memory implementation I tried to do an averaging. I would like to read a number of m elements, store them in an 1D array and than repeat the same thing, adding another m elements to the array, i.e. stacking.

The problem is the fixed size issue. The array has to be fixed size. I followed all 4. trouble shooting hints. It still shows me the error "Array must be fixed size in current target", even though I set it it to be 32000 elements wide.

Would you happen to happen to have a hint?

Thank you!

Dave.T · ‎06-14-2013

@ManuelQ wrote:

One thing that was/is hard to accept is that even though the effective depth of the FIFI may be 2 million (due to host buffer), but it still overflows when the number of samples to be transfered is considerably larger than the target buffer of the FIFO. I guess this is due to the limited transfer rate to the host...

You're saying it overflows on the FPGA? I suspect that all has to do with the bottleneck of the PXI transfer.

@ManuelQ wrote:

Dear Dave,

Based on the block memory implementation I tried to do an averaging. I would like to read a number of m elements, store them in an 1D array and than repeat the same thing, adding another m elements to the array, i.e. stacking.

The problem is the fixed size issue. The array has to be fixed size. I followed all 4. trouble shooting hints. It still shows me the error "Array must be fixed size in current target", even though I set it it to be 32000 elements wide.

Would you happen to happen to have a hint?

Thank you!

Can you post up the code so I can get a better idea of what you're doing? It sounds like the signal is periodic of say time t, and you want to collect data from time

t

t+1

t+2

Then add

t+t

t+t+1

t+t+2

to the original,

once you have say, 100 samples, divide that array by 100 and pass that data to the host.

Is this close to what you're trying to do? For troubleshooting purposes, it might be easier just to decimate the acquisition at first to see how much data you can get through the PXI bus.

National Instruments
FlexRIO & R-Series Product Support Engineer

Dave.T · ‎06-14-2013

If you're just trying to average sequential samples, then you could use the Mean, Variance, and Standard deviation VI in the FPGA math palette.

National Instruments
FlexRIO & R-Series Product Support Engineer

LabVIEW

Large number of samples through FIFO - avoid overflow on target buffer

Large number of samples through FIFO - avoid overflow on target buffer

Re: Large number of samples through FIFO - avoid overflow on target buffer

Re: Large number of samples through FIFO - avoid overflow on target buffer

Re: Large number of samples through FIFO - avoid overflow on target buffer

Re: Large number of samples through FIFO - avoid overflow on target buffer

Re: Large number of samples through FIFO - avoid overflow on target buffer

Re: Large number of samples through FIFO - avoid overflow on target buffer

Re: Large number of samples through FIFO - avoid overflow on target buffer

Re: Large number of samples through FIFO - avoid overflow on target buffer

Re: Large number of samples through FIFO - avoid overflow on target buffer