FPGA DRAM Access: Hard mode

nturley · ‎12-21-2016

One of the requests for SOM that keeps coming up is FPGA DRAM Access. First of all, I want to message that I personally have spent a lot of time researching good solutions to this problem that are safe, useful, and reliable and this problem seems deceptively simple but gets a lot more complicated the more that you look at it. And it's not just me, we've invested a lot of developers and interns time trying to solve this. So if there is anyone that worries that we don't care, I promise you that we do. Secondly, a lot of people have a lot of problems that can be solved in many different ways and sometimes what they think they need is best solved in a different way. Sometimes FIFO's, IRQs, Controls/Indicators, and UDV's do exactly what you are trying to do in a way that is exactly how you need it done. But there are some situations where they fall short.

While I cannot offer you (yet) a feature that solves everyone's problems in a perfect way, there are some power users that are willing to leave the easy-mode zone of well supported NI features and using our hardware the ways it was not necessarily intended and are willing to dig into the guts of our hardware and squeeze performance out of it by using low level features that many of our support engineers would be hardpressed to help you on. I'm not an expert on Zynq or Linux by any means and many of our support engineers are less so. Leaving the path of well supported NI features can do interesting things like brick your hardware or put it into a state that is difficult to recover from. But that is part of the fun of traveling down this rough path.

Any information I am posting here, I am not 100% sure is really correct. Most of it I learned from walking around NI and pestering people that are much smarter than me. It's possible I've forgotten some important details. If you notice an incorrect detail, please let me know. So no guarantees any of this is safe, correct, or factual.

Let's suppose that you are one of these customers that is willing to explore the uncharted territory of not really supported NI features to solve your specific problem. Let's say that this specific problem is that you have large quantities of data coming from the FPGA and you need to do some preprocessing on the data before it reaches the host. Let's say that the amount of data that you need random access to for this preprocessing is more than 4MB but less than 64MB. Let's say that you have done everything you can to reduce the data to be smaller than 4MB (compression, etc) but you can't. Let's also say that your application has to run on an sbRIO specifically. Let's also say that our processing has to happen onboard, meaning we can't attach some kind of memory chip or something to an RMC or SOM carrier.

As of today, this particular set of requirements falls into a gap of things that our software features and hardware cannot easily perform. None of our sbRIO's have onboard DRAM. FIFO's don't offer random access. Our newest sbRIO's are on Zynq and I believe our Zynq's BRAM's can't hold more than about 4MB. Ideally, the FPGA could store all of this data on the DRAM that the host is using (it's Zynq after all, so the FPGA has just as much access to it as the host does). So if this really is a deal breaker for you, then we need to leave the world of well supported NI features.

The first problem to solve is that even if your FPGA had access to system memory space, you need a location in system memory space that is safe to drop data in. If you looked up the Zynq TRM, you can see which pieces of the system map to DDR vs OCM vs the PL (from a quick glance, it looks like the OCM is at the bottom, DDR is above that and the PL starts at 0x4000_0000), but everything in the DDR is being managed by linux. If you randomly start dropping data into physical memory, your system will almost certainly crash.

Ideally, our RIO user mode library would allocate that memory for us (malloc), which would be scattered throughout physical memory, and the RIO driver could take that list of physical pages, page lock them and communicate those addresses to the FPGA which could perform scatter-gather. But scatter-gather is complicated and aint nobody got time for that. Another option is from the RIO driver, we could allocate one big block of physically contiguous page locked memory and send the one address to the FPGA so we don't have to perform scatter-gather. But due to memory fragmentation, there might not exist a contiguous block of physical memory that is large enough for our purposes. Instead of allocating this memory, we can just prevent linux from having access to this memory at all from boot-time. That way we are guaranteed to have this memory available. Linux determines how much memory it has from a uboot environment variable. If we change that variable to a smaller number, Linux won't touch anything above that.

The Second stage boot loader on Zynq is UBoot. There is an environment variable in UBoot that tells linux how much memory it is allowed to use. I believe sbRIO Zynq boards have 512MB of RAM. We are going to change that to 448MB (leaving 64MB unused by the OS). This command does this.

fw_setenv othbootargs mem=448M

If you are wondering what all of the current values of the UBoot variables are you can use this command:

fw_printenv

The next time you boot, linux will have less RAM. Similarly, MAX can tell you how much RAM your target has. Now that linux isn't using the upper 64MB of RAM, the FPGA needs a way to access it.

There are many AXI ports that connect the Zynq PL and PS. We could use one of the HP ports, but it's tricky to get access to this on an NI system. One of the most useful ones is ACP because it offers cache coherency. And cache incoherency issues are not very fun to debug. All FIFO bus interface logic is connected to ACP. There's also a secondary interface to this DMA component that not very many things use. We just need access to this interface. There was a Labs release a while ago for Host Memory Buffer. One of the pieces was a LV FPGA API that gives us raw access to System Memory. Using this interface the FPGA can read or write to any physical address in the system memory space. The interface is complicated, the documentation is lacking, and the example is confusing, and the installer is kind of a nasty hack, but we left the world of easy supported behavior when we started this journey.

HMB looks like it's here: https://forums.ni.com/t5/NI-Labs-Toolkits/Host-Memory-Buffer-for-CompactRIO/ta-p/3501829

Now that we have a safe area in memory to store our data and your FPGA VI has complete access to system memory, we can start storing our data. But there's one more piece that is missing: Host access. If we are planning on sending the results of all of this complicated preprocessing down to the host memory anyway, why should we have to FIFO it over?

Ideally, you would write a simple Linux memory driver that presents a sysfs handle to usermode and usermode could call mmap on the the sysfs handle and get a virtual address to this area of physical memory. As it happens there is already a linux sysfs handle that you can mmap to get a virtual address to physical memory. It's called /dev/mem. If you write a quick C application that calls mmap on /dev/mem, you can start reading and writing this memory. Neat!

If you are allergic to C, it turns out python has an mmap module and you can do it all from python. Neat!

If all of this went smoothly (that's a big if), then you have everything working. With one small caveat. One of the disadvantages of using super low level tools is that you get exposed to super low level details about your system architecture. It turns out that the ARM processor sometimes reorders instructions. Also the ARM processor sometimes reorders memory accesses. While it's possible that your specific application doesn't care, it's very possible that if you rearrange the instructions and memory accesses in your application you end up with old or corrupted data. These issues are also very confusing to debug. ARM has memory barrier instructions that impose more strict guarantees on ordering. So if your application relies on a particular ordering of host instructions, then that's something you'll have to read up on.

Well, that's all I got. Hopefully someone finds this useful.

--Neil

KHartley · ‎01-09-2017

Thanks Neil,

I'll have to look into this later. We were looking into something like this but we were just able to fit a small use case on the FPGA that we might live with for now.

Kyle Hartley
Senior Embedded Software Engineer

nturley · ‎05-25-2017

Host Memory Buffer made it into our 17.0 release! Ross Houston and I gave a NI Week presentation about it. My slides are attached.

KHartley · ‎05-30-2017

Nice!!!!

Kyle Hartley
Senior Embedded Software Engineer

Hardware Developers Community - NI sbRIO & SOM

FPGA DRAM Access: Hard mode

FPGA DRAM Access: Hard mode

Re: FPGA DRAM Access: Hard mode

Re: FPGA DRAM Access: Hard mode

Re: FPGA DRAM Access: Hard mode