Very slow memory transfer when passing arrays to Python node

SebastienT · ‎09-09-2021

Hello,

I would like to use Labview 2020 64-bit Python node to proces a stack of images (tens of 2048 x 2048 16-bit images) previously acquired by a camera in Labview with some custom ML Python code. I could write a test program that runs successfully and pass the images as a UINT16 3D array to the Python node but the memory transfer from Labview to Python is painfully slow. For instance, passing a modest stack of 10 images (80 MB) takes over 13s!

I do not know what kind of memory copy / data conversion is performed by Labview under the hood but it is in practice infinitely faster to write the images to disk and read them from the Python function... I would like to avoid disk transfer and I think that this kind of Python ML application might be useful to other people. Is there a way to achieve a faster memory transfer?

Note: Fiddling around, I could achieve over 4x speedup by reshaping the 3D array to 1D before passing it to the Python node but it is still order of magnitudes slower than the speed I would expect on this computer (Intel i9-10900X 3.7 GHz, 128 GB RAM, Windows x64 10 Entreprise).

Best,

Sébastien

SebastienT · ‎09-09-2021

The speedup from serializing 3D arrays to 1D is actually more modest that I thought (about 15%). The 4x speedup was actually observed when passing a 3D array which largest dimensions are not coming first but last.

So in summary UINT16 3D arrays are transfered to the Python node at approximately 25 MB/s. This is still very slow for memory transfer, writing the 3D array to a binary file from Labview and reading it from the Python node is about 25x faster on the same machine.

santo_13 · ‎09-09-2021

If you already have a good implementation in Python, why not implement the whole stuff in Python without dragging LV to be intermediate.

Any cross-language data transfer is going to be slow because there are multiple handshaking and conversion required. Personally, I am against any large data transfer between programming languages instead you can use TCP/IP or ActiveX or other proven methods of inter-process communication.

Santhosh
Soliton Technologies

New to the forum? Please read community guidelines and how to ask smart questions

Only two ways to appreciate someone who spent their free time to reply/answer your question - give them Kudos or mark their reply as the answer/solution.

Finding it hard to source NI hardware? Try NI Trading Post

SebastienT · ‎09-13-2021

It is actually a rather complex system where we drive tens of components with ad-hoc Labview drivers so it is not really practical to implement everything in Python. Here we really need the best of both worlds 🙂

I understand that there might be several memory copies / conversions involved but I am really surprised that the practical time spent to copy the array is several orders of magnitude slower than what would theoretically allow the memory bandwidth. Do you have some details on why it is actually so slow and if this is inherent to Windows inter-process communication or might point to LV inefficient implementation of the Python node?

Mark_Yedinak · ‎09-13-2021

Without seeing the code it is not possible to say if there are other things happening that is causing the delay. Given you say it is an adhoc collection of drivers and VIs it doesn't sound like there is any type of design or architecture for this system and who knows where inefficiencies may lie.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

BertMcMahan · ‎09-13-2021

A few random thoughts from someone who hasn't used the Python nodes much:

-Are you receiving back a 3D array? Could you try the "flatten to 1D" trick for received data as well? In other words- are you sure it's the "getting data into Python" part that's slow, not the Python part itself or the return from Python? Oftentimes you can speed up large data transfers by preallocating arrays, but that's more on the LabVIEW side of things than the Python one.

-The Help for the Python node mentions that, by default, arrays are converted to Python lists. You can right-click that and tell it to convert it to a NumPy array. I have no idea how Python handles lists, but I bet you it's not the same way that LabVIEW handles arrays. NumPy arrays might be better... who knows.

Unfortunately you are probably having to step through each element of the array one by one and copy it from LabVIEW's memory into Python's memory. The Python node may not know the size of the array when it creates its own memory space and will have to allocate space, fill it, reallocate it, fill it, and so on. That's likely what's taking so long. Memory reallocation of massive datasets will nuke performance in LabVIEW, and it's probably the case in your example.

See this post:

https://stackoverflow.com/questions/311775/create-a-list-with-initial-capacity-in-python

The top voted answer has some issues (for example, it only does 10k elements where you have 4 million per image), so review the second answer. That notes that Python lists have no built-in preallocation. NumPy does have preallocation:

https://stackoverflow.com/questions/3491802/what-is-the-preferred-way-to-preallocate-numpy-arrays

so that might help, but that depends on if the LabVIEW Python node preallocates NumPy arrays. Honestly I don't know. I would first try switching from Python lists to NumPy arrays to see if that helps. If that doesn't work, and the issue is indeed array reallocation, you'll need to switch to something like tcp. That way, you can load a session, preallocate your memory, then transfer your images via tcp into a preallocated space.

Then again the bottleneck might not be preallocation, but that's where I'd start.

PS: Assuming you have 30 images you'll have ~126 million array elements. If Python doubles its memory allocation each time, and it starts with 1 element, you'll have to reallocate memory 27 times, possibly copying the entire dataset each time unless it can allocate contiguous memory, which may or may not be possible.

SebastienT · ‎09-13-2021

The whole system is complex (and requires a combination LV / Python for the reason described above) but to isolate and test array transfer speed to the Python node I have written an extremely simple VI passing a stack of images (3D array) and reading back an image. Please see it below.

SebastienT · ‎09-13-2021

Brilliant insights! Switching from Python list to Numpy array, the transfer speed is over 20 times faster! I can now transfer 80 images (each 8 MB) in less than one second, which is about 25% faster than saving the 3D array to my fastest local disk and reading it from Python. Serializing the 3D array now slightly decreases the performance (possibly due to the overhead of the Labview operation).

The only bugging detail is that it seems that Labview uint16 arrays are transfered to Python as uint32 Numpy arrays, which practically means that the performance could even be slightly better.

Regarding your question on the returned array, it is actually way smaller than the input array and pre-allocation should be possible as the size and type of the returned array is set in the Labview VI (the same is true regarding the input array, its size is completely deterministic).

Thanks a lot!

Best,

S.

francescofabbro · ‎09-13-2022

Can you attach a sample code of the python side as vell?

rolfk · ‎09-14-2022

A Python list is an object of elements. For multidimensional lists it needs one object for each row and another for each plane. Object creation in Python is very slow compared to normal execution, so yes this is going to cost a lot of time to do.

A numpy array on the other hand is simply one object with some metadata describing the number of dimensions and the size of each dimension and the actual data as a single block of memory, pretty much the same as a LabVIEW array is. There is still a copy made, LabVIEW can't just hand off its internal data pointer to an external application like Python but that is a single copy of one memory block not a loop in a loop creating umptien objects and copying some data into each of them.

Rolf Kalbermatter
My Blog

LabVIEW

Very slow memory transfer when passing arrays to Python node

Very slow memory transfer when passing arrays to Python node

Re: Very slow memory transfer when passing arrays to Python node

Re: Very slow memory transfer when passing arrays to Python node

Re: Very slow memory transfer when passing arrays to Python node

Re: Very slow memory transfer when passing arrays to Python node

Re: Very slow memory transfer when passing arrays to Python node

Re: Very slow memory transfer when passing arrays to Python node

Re: Very slow memory transfer when passing arrays to Python node

Re: Very slow memory transfer when passing arrays to Python node

Re: Very slow memory transfer when passing arrays to Python node