I have approximately the code shown below, split across two loops. One loop receives bitmap images over a TCP socket, and puts the bitmaps into a queue. The other loop dequeues the bitmaps and sends them to the FPGA. Is there any way to speed up this process? My DMA FIFO is 32 bits wide, but the data coming from TCP Read is a string. Can I avoid the unflatten from string to U32, while still using a 32-bit FIFO? My tests show that the limit on DMA transfer rate is elements, not bytes, so switching to an 8-bit FIFO cuts my bandwidth by a factor of 4 which overwhelms any gain I'd get by replacing the unflatten with string to byte array. I tried the DMA Acquire Write Region method, but I got a Feature Not Supported error (even though the help doesn't mention that not all targets support that method). Implementing a decompression algorithm on the FPGA is an option but requires more effort and adds complexity.
That's not any better than unflatten from string; it might even be worse. The problem with both your approach, and unflatten from string, is that the conversion from the string to the U32 array involves copying every element of the string into the U32 array. I'm hoping for a way to do this in-place, or even better skip the step entirely.
Ok, that wasn't well thought idea. Give me another try: instead of unflatten from string, try Type Cast. Somehow its about 2 times faster, when I benchmark it on PC.
That's odd, but thanks - it's an improvement. The help says, "Casts x to the data type, type, by flattening it and unflattening it using the new data type." so I assumed that type cast is just a special case of Unflatten. Apparently it's a slightly optimized special case.