FIFO issue in continuous streaming data from host to target FPGA

nathand · ‎10-28-2015

It would be a good idea to put all the data in the DMA FIFO first, then set Start to True. Without that, you're basically guaranteed a timeout, because there's no data yet in the FIFO when you start looking for that data on the FPGA.

Why do you stop the Write FIFO? If not all the data has been transferred from the host memory to the FPGA when you stop the FIFO transfer, the remaining elements may not be transferred.

It is possible that you're trying to move data faster than your hardware can handle, but try those changes first. If that doesn't fix it, another option might be to create an FPGA memory large enough to hold all the data, load the data into that memory from the FIFO, and once all the data is loaded, then read from the memory block to set the outputs.

nathand · ‎10-28-2015

@tintin_99 wrote:

It doesn't work even with 10MHz FPGA clock. Again ,as soon as the toggling starts and after few cycle_clk full periods it goes in to a weird state and everything stops and I think it is related to the transfer between host and target

This is almost definitely because you stop the FIFO before all the data has been read from it. As I explained in an earlier message, a DMA FIFO has two buffers: a large one on the host, and a small one on the FPGA. Whenever the FPGA buffer becomes nearly empty, more data is transferred automatically from the host buffer to fill the FPGA buffer. When you stop the DMA FIFO, that transfer process stops, even if there is still data remaining in the host buffer. There is no reason to stop the DMA FIFO.

tintin_99 · ‎10-28-2015

nathand,

Evem if I remove the stop FIFO, it just output few more clock cycles and then it stops .

Do you mean I use a flat sequense and first pass all the data from FIFO to the memory! It would be great if you could send me simple code on how to do this.

Using memroy is fine but this pattern is teh smallest I have ( it onbly has 6003 steps ) . The other pattern had 3M steps which needs a memory size of 30M for 30M FPGA ticks. I am sure I will face utilization issues

How can I calculate how fast the FIFO can transfer data from host to target? Is there a formula for that?

nathand · ‎10-28-2015

tintin_99 wrote:

Evem if I remove the stop FIFO, it just output few more clock cycles and then it stops .

What if you don't close the FPGA reference until it finishes outputting the pattern? It's possible that closing the FPGA reference is also stopping the DMA channel.

tintin_99 wrote:

Do you mean I use a flat sequense and first pass all the data from FIFO to the memory! It would be great if you could send me simple code on how to do this.

I just meant change the order of your code so that the FIFO Write is before setting the Start control to True.

tintin_99 wrote:

How can I calculate how fast the FIFO can transfer data from host to target? Is there a formula for that?

Depends on your hardware. You can do a rough calculation with a simple VI. Download an array of a known size from the host to the target, use the FPGA to measure the number of ticks between reading the first element and the last element. I've done such benchmarking before but at a previous job so I don't have any code to share.

Here are two threads with some speed estimates (one mine, one from NI), you can search for more on the forum:

http://forums.ni.com/t5/Digital-I-O/PCI-7813R-R-series-DMA-transfer-speed/td-p/1596214

http://forums.ni.com/t5/LabVIEW/FPGA-target-to-host-DMA-transfer-speed/td-p/2548249

nathand · ‎10-28-2015

tintin_99 wrote:

Evem if I remove the stop FIFO, it just output few more clock cycles and then it stops .

I (or even better, you!) should have checked the help: "The Close FPGA VI Reference function also stops all DMA FIFOs on the FPGA." So, it's not at all surprising that removing the FIFO Stop gets you a few more clock cycles, then you close the FPGA reference and kill the DMA transfer.

Put something in your code so that you know when it finishes outputting the pattern - for example, you might wait on an interrupt in the host, and set the interrupt after the pattern finishes. I bet that will get it working the way you want.

tintin_99 · ‎10-29-2015

nathand,

You are absolutely right. I remove the Close FPGA VI Reference and it can now finishes the transactions

However, now I see another issue

After adding a check to read from FIFO only when timeout is false , the test time has been increased significantly because at many steps FPGA should wait until the data is available at 100MHz

So the duration of running my pattern is now 3.12ms.While with the previous code and by running at 60MHz test time was 996us. ( Target is 603us)

You can see in the attached images that FPGA waits because of FIFO timeout and it slow down running the test

Seems like time ouy issue happens a lot and Host fifo can't keep up with the FPGA speed. Do you have any advice?

nathand · ‎10-29-2015

To make sure I understand what your charts show: the "60Mhz without check" is similar to the original code from the beginning of this thread, where you are doing some of the pattern generation on the FPGA?

I assume that your check for a DMA FIFO Timeout is simply a case structure around the output logic? Or are you doing something more complicated?

From one of the VIs you shared earlier, it appeared you are using less than 16 unique outputs per FIFO element, but transferring a 32-bit element. If you stuff two 16-bit values into one 32-bit element, you can read from the FIFO every other cycle, which will effectively double your transfer speed. Likewise, if you know you don't need to update every output on every cycle, you might be able to select some outputs to update on even cycles, and some to update on odd cycles. Of course you can also attempt to shift some of the generation work back to the FPGA as well, if you can make it fit in your timing constraints. If you can't fit all the logic in a 100mhz loop, but you can generate several output cycles in parallel in a slower loop, then you might be able to make that work with one or more local FIFOs to transfer the data. You never shared enough of your original code in a version of LabVIEW in which I could open it, so I can't suggest how you could speed it up, but I'll take a look if you save your VIs back to an earlier version.

tintin_99 · ‎10-30-2015

Yes. 60MHz is using the original code where all the pattern generation was done in FPGA

In the original code I read from FIFO at cycle_clk speed ( for example when the divider value is 10 and FPGA clock is 60MHz the cycle_clk is 6 MHz )

Above 6MHz I ran into FIFO issues

In the new code , I read from FIFO in every FPGA clock ticks and I definitely get a lot of FIFO timeouts and I have to wait for host much longer.

Let me repeat what you said. I want to make sure I understood correctly. You think by concentering 2 16 bits numbers element and read 32 bit element and then read from FIFO in every 2 FPGA clock ticks instead of 1 and then read the first 16 bit chunk in odd iterations and the other on even iterations I can double the speed

I think it is possible.

However, if I add more stuff to the FPGA then I am sure I can't compile the code at 100MHz and I will have timing violations again

Also , even if I double the speed , then the time will be reduced to 1.6ms which is even higher than 1ms in the old code.

Let me try it. I will get back to you soon

nathand · ‎10-30-2015

tintin_99 wrote:

Let me repeat what you said. I want to make sure I understood correctly. You think by concentering 2 16 bits numbers element and read 32 bit element and then read from FIFO in every 2 FPGA clock ticks instead of 1 and then read the first 16 bit chunk in odd iterations and the other on even iterations I can double the speed

Yes, that's correct. In my work (a few years back) I was trying to transfer data from a host to an FPGA and found that the transfer rate was in elements, rather than bytes (that is, it took the same amount of time to transfer 100 U8 elements as to transfer 100 U32 elements) so even though I only needed to read 1 byte per cycle, I used a 32-bit (4-byte) FIFO and read from it once every 4 cycles to maximize transfer speed.

I wouldn't be surprised if setting the digital output value takes a significant portion of clock cycle at 100Mhz. If so, you might be able to do the generation on the FPGA by pipelining, that is generating the desired output on one cycle, putting it in a shift register, then on the next cycle setting those outputs while generating the next set of outputs in parallel. Did you try that technique already when you hit the timing constraints in your 100Mhz code?

tintin_99 · ‎10-30-2015

Can I use a Feedback node instead of shift register for pipelining?

Does this do the job?

Also I can read from FIFO on even or odd FPGA tick counts but then I can't use the timed loop counter to pass odd and even data inside the Time out case . How can I force it to read the first chunk when time out is false and then read the second chunck in the next iteration?

LabVIEW

FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA

Re: FIFO issue in continuous streaming data from host to target FPGA