I read about the document on DRAM on FlexRIO 793x,it says the DRAM bandwidth is about 10.67GB/s.I can confirm that when I operate the DRAM using only one method, the bandwidth is around 10.6GB/s.
But when I have to switch between read and write, the performance drops quite a lot, if I write 256 pages, then I can only read another 4608 pages at the frequency of 1152Hz, the over all bandwidth is around 1.2GB/s, and yes, I'm using IDL driver for DRAM access. All Write and read request pages a sequential.
Is this normal? I think loss about 85% performance is kinda unacceptable, I understand there will be some performance loss when we switch from write to read, or vice vase, but I already take care of the access pattern, I'm not request one read followed by one write, I actually request 256 write followed by ~4000 read, I think the switch overhead should be relative low by doing so.
I also tried write 1024 and request ~18000, the overall performance is not increased. total bandwidth is still around 1.2~1.3GByte/s.
I can make W/R switch as less as possible, but it doesn't seem helps a lot, I don't understand why, shouldn't the overhead is caused by R/W switching? It I read huge amount of data and the Write huge amount of data, the performance should be closer to one way bandwidth, but somehow this did't show up in my case. is 1024 page requesting still too small?
Say that I'm Write DRAM at 1kHz speed, and read DRAM at 15kHz, I set IDL Grant time like picture shows, in my mind, if I grant 4096 for write and 61440 for read, the overall performance should be much better then set write grant time 1, and read grant time 15, am I right?
But the two setting doesn't show any thing different, I checked the subVI in DRAM IDL, it uses round-robin schedule to grant W/R access, I think the word grant time means grant dram tick indeed, so I'm really confused at this function.
I hope the links below help you optimize the performance of DRAM:
Grant time specifies how many consecutive ticks the memory process allows writes/reads to occur since, as you've mentioned, sequential operations are more efficient.
The reason you're seeing poor performance is that you have an unbalanced access pattern where you're writing 4096 consecutive ticks, then reading 61440 consecutive ticks. This means that you're sitting idle for at least 57344 ticks. Explained another way, you switch to writes and put 4096 dram words into dram, then you switch to reads and pull 4096 dram words out of dram, then you wait another 57344 ticks (because theres nothing in dram, you already pulled it all out), then finally once 61440 read ticks have passed you switch back to writes.
Try a grant of a 128 writes/128reads. That will give you a balanced access pattern much closer to maximum theoretical throughput.
I believe you are talking about DRAM FIFO mode, it does just what you says. I have tested DRAM FIFO, it do have peak performance when read and write is granted the same clock.
And I have to admit I made a mistake when I try to pack Cameralink Data (8x8bit pixel) to 64xU8 data, I didn't make a proper data valid decrements, which makes the overall data flow runs 8 times then I would have to design, consider this, 1.2GByte /s times 8 is very close to 10.6GByte/s as NI declares, which bring some kind of surprise, because I was worried I might only get about 70~80% max performance out of DRAM. It's turns out I can get >90% bandwidth.
It actually can operate a unbalanced access pattern while maintain a relative high performance(~90%) just as I mention above, but it only works in IDL or DRAM primitive mode.