LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

FPGA DRAM Performance drop of Reading while Writing

Hi There, I'm working on FlexRIO 7935 with 1483, and I want to use the 2GB DRAM on FlexRIO as a large, real-time updating ring buffer, here is my plan.

 

The DRAM will do reading while writing, and the write speed is maybe 1/5 of reading speed. (the ratio can change, of course).

 

I write to DRAM in such fashion: address 0 to 255, 1024 to 1279, 2048 to 2303, and so on, at 2nd second I will write to 256 to 511, 1280 to 1535 and so on. Each segment means a image frame. I have to repeating the process to fill the total DRAM, and after that I will start from address 0 again, while DRAM reading is ongoing.

 

I read DRAM in sequential style, and at the end of DRAM it will repeating reading from address 0.

 

Here is the question, in my imaging such operation should have a relative high performance, since most operation is kind of sequential, but how much will we suffer from the R/W switch overhead?

 

I checked the user guide, the DRAM have a bandwidth of 10.5GB/s, if don't count the overhead, my application can work smoothly at 4.3GB/s reading. 

 

So is there anything I can do to improve the overall performance?  Any suggestion?

 

Thanks!

0 Kudos
Message 1 of 8
(3,511 Views)

You need to queue up many I/O operations so that the DRAM controller can batch process the sequential access.

 

If you're only readiny element by element (synchronous) then you're only going to get a fraction of the maximum bandwidth.

 

So when you request a read operation, you might need to queue up 32-64 of them in order to allow the DRAM access to be more efficient.  This is because initial access is a lot slower than sequential access.  Themore operations you can queue up for the DRAM controller to handle itself, the closer you'll get to the theoretical bandwidth.

Message 2 of 8
(3,485 Views)

It's hard to say too much without seeing any code but you might want to take a look at the DRAM FIFO API as it will handle the read/write arbitration for you. There is an example under Hardware Input and Output > FlexRIO > External Memory > Memory Throughput Test.lvproj

 

I had to look into this recently and beyond making sure that you have many enqueued read requests as Intaris mentioned, you will want to look at your overall access pattern. For instance, even if you are looking at the same set of sequential addresses, writing multiple samples (WWWWW) and then reading multiple samples (RRRR) provides a better data rate than a WRWRWRWRW. I am told this is because the DRAM memory controller requires several clock cycles to prepare the DRAM chip for writing or reading when the action has changed.

 

Also make sure you are using the full 512 bit data width for the 7935R. If you are only reading/writing 256bits at a time you are only going to get 50% of the expected throughput.

Matt J | National Instruments | CLA
Message 3 of 8
(3,472 Views)

Hi, Intaris, thank you for your advice.

 

"The more operations you can queue up for the DRAM controller to handle itself, the closer you'll get to the theoretical bandwidth."

 

Yes, that's what I'm thinking. I noticed the DRAM parameter config page have a item names: Max outstanding requests for data. and I found I can set to max to 512.

 

Will this parameter affect the performance somehow? I didn't get detail about this parameter. Do you have some idea about it?

 

Also, I'll try to make block write and block reading as possible as I can.

 

 

0 Kudos
Message 4 of 8
(3,459 Views)

Thank you, Jacobson.

 

I'm trying to do something like this:

WWWW________________WWWW________________

RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR

( _ for jump some address, no operate)

at the same time.

 

I have already checked the memory throughput test, it kind not what I expect, if we are writing and reading at the same time, it seems this arbitrator automatically split 50% time for write and 50% for reading, regardless how much data is writing or reading.

 

While in my case, I don't have as much data to write as to read, so am I right if I just use DRAM primitive for better performance? If that's the case, I have further question in mind.

 

There are three primitive for DRAM, Request data, Retrieve data, and Write data.

 

I suppose I can call request and write at the same time (as long as less then 512 element which I set in the DRAM "max outstanding data requests", maybe?), also, I should also be able to call request and retrieve at the same time, otherwise we wouldn't be able to stream DRAM data out?

 

But what about doing all three at the same time? Do we have a lower level of arbitration going on beneath? If I do so, will the lower level arbitrator fill up to 512 element and write at one time?

0 Kudos
Message 5 of 8
(3,457 Views)

Have you tried the DRAM FIFO functions?

 

edit, never mind, just saw this was already recommended.


Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications
0 Kudos
Message 6 of 8
(3,450 Views)

Hi Terry_ALE:

 

I actually considered at first place, but later I realized DRAM FIFO mode can only do first in, first out style operation, while my application need to write address 0..3,16..19,32..35, and read them out sequentially, it won't fit.

 

0 Kudos
Message 7 of 8
(3,442 Views)

I try to avoid using dram primitives. The memory IDL or the DRAM FIFO cover all of the use cases I've had. The memory IDL is the interface I would recommend for what you've described. It allows access to specific DRAM address and condenses the write, request, and retrieve into a single vi with an easier to use interface that includes arbitration options. You can see it in action if you open the Getting Started - External Memory example.

 

memory_IDL.png

 

 

Message 8 of 8
(3,408 Views)