FPGA DRAM usage

Intaris · ‎12-13-2012

We have some existing code which we are porting to some newer FPGA hardware and are looking into tapping into the DRAM capabilities of the new board. The Board is the PXIe 7965R.

I have looked at the examples and have read the Help on the subject but since I currently don't have a device to do tests on (Will have it again soon) and I want to create a universal solution I am trying to find out the limitations which I need to be aware of in order to create a truly re-usable and scaleable solution. At the moment I am looking at creating a ring buffer using DRAM.

I want to have a system which continually fills the RING buffer and then upon triggering outputs a sub set via DMA FIFO to the host. The data requested may or may not be completely acquired at the point of triggering so the logic pertaining to which address is requested when is more complex than a simple sequential read. It is non deterministic on the "Request data" side.

My understanding (Using the Memory item, not the CLIP version):

The memory interface behind the scenes is designed to run at 40MHz.
If I have a single memory partition and a single reader / writer then do I still need to take care of the "Input Valid" and "Ready for Input" settings if writing and reading at, say 8MHz? Bear in mind that I may have (Will have) more than one DRAM system running concurrently so behind the scenes there may be some arbitration required.
How am I to deal with Arbitration when writing to the DRAM interface? I need some kind of caching in order to write new values only when the DRAM Interface is ready, right? How does this look in practice? My imagination is coming up with some complicated schemes which are most likely not neccessary.
When the Write method goes from a state of being busy to being able to receive new data I must wait at least ONE more cycle (in which clock domain?) before writing new data to the node. What happens if I have already wired up the new data and the "Input Valid" nodes? Are they simply ignored until it's time to do something? Do I need to really wait for the "Ready for Input" plus one clock cycle before setting the new values?
The setting "Numer of outstanding requests for data" allows me to queue up N data requests between the "Request Data" and the "Retrieve Data" nodes, right? Both of these nodes run completely asynchronously right?
If I want to access the data from "Retrieve Data" as quickly as possible I simply leave the "Ready for Output" to TRUE and watch the returned "Output Valid" right? I can then fully decouple the "Request Data" and "Retrieve Data" functions. In a RING Buffer I need to use certain logic in the "Request Data" To make sure I'm requesting the correct values so this will be called more sporadically whereas the "Retrieve Data" can be in a permanent state of waiting.
How high a throughput can we expect if we're doing RWRWRWRW instead of RRRRWWWW. Presumably the addressing altencies of the DRAM will have an effect but how drastic are these? Would the maximum throughput drop from 800MB/s to 80,100, 400, 500MB/s?

I'm aware of the 128 Bit data width. I have plans for this, but my main questions are regarding the synchronisation aspects and how complicated my caching / synchronisation code needs to be.

Phew.

Mega thanks to anyone willing to enter into the discussion with an FPGA newling.

Shane.

PS I have read THIS (Great info from Ben Sisney) and THIS (Great info also) and the help on the subject.

James_McN · ‎12-13-2012

Hi,

The memory interface is actually designed to run at 100MHz (x 128bit to get the 1.6GB/s speed per bank). 40MHz is for the FIFO CLIP interface.
The handshaking signals are your friends for most of the points. Ready for Input describes that the item is ready for input on the next iteration, this avoids missing cycles unnecessarily.

Any data written when not ready for input will beignored.
If you fill the outstanding requests that is what will cause it to not be ready.

You are correct, if you have downstream functions which can always take data the you can wire true to ready foroutput.
I'm not sure about the benchmarks for the throughput. I have worked on an application once when we were doing truly random access in RWRWRW mode and saw a big drop in throughput (I think about 1MS/s in that case vs 50 max for RWRWRW max). Sequential access is faster than random access though, the FIFO CLIP achieves 40MS/s which will be highly sequential, I am not sure exactly how they implement this to achieve this rate. Sorry I can't me more specific but hopefully it gives you a rough range.

James Mc
========
Ask me about Rust & NI Hardware
My writings are at https://www.wiresmithtech.com/devs/

Intaris · ‎12-13-2012

@James_McN wrote:

Hi,

The memory interface is actually designed to run at 100MHz (x 128bit to get the 1.6GB/s speed per bank). 40MHz is for the FIFO CLIP interface. Gotcha. Thanks.

The handshaking signals are your friends for most of the points. Ready for Input describes that the item is ready for input on the next iteration, this avoids missing cycles unnecessarily.

Any data written when not ready for input will beignored.

If you fill the outstanding requests that is what will cause it to not be ready.

You are correct, if you have downstream functions which can always take data the you can wire true to ready foroutput.

I'm not sure about the benchmarks for the throughput. I have worked on an application once when we were doing truly random access in RWRWRW mode and saw a big drop in throughput (I think about 1MS/s in that case vs 50 max for RWRWRW max). Sequential access is faster than random access though, the FIFO CLIP achieves 40MS/s which will be highly sequential, I am not sure exactly how they implement this to achieve this rate. Sorry I can't me more specific but hopefully it gives you a rough range. 50MS/s at 128 bit = 800MByte / s, right? 1MS/s = 16MByte/s, right?

Thanks.

Shane.

James_McN · ‎12-19-2012

@Intaris wrote:

50MS/s at 128 bit = 800MByte / s, right? 1MS/s = 16MByte/s, right?

Hi Shane,

Yeah thats right, in that project we were speccing everything in samples but the conversion you have written is correct.

James Mc
========
Ask me about Rust & NI Hardware
My writings are at https://www.wiresmithtech.com/devs/

LabVIEW

FPGA DRAM usage

FPGA DRAM usage

Re: FPGA DRAM usage

Re: FPGA DRAM usage

Re: FPGA DRAM usage