I would like to use Labview 2020 64-bit Python node to proces a stack of images (tens of 2048 x 2048 16-bit images) previously acquired by a camera in Labview with some custom ML Python code. I could write a test program that runs successfully and pass the images as a UINT16 3D array to the Python node but the memory transfer from Labview to Python is painfully slow. For instance, passing a modest stack of 10 images (80 MB) takes over 13s!
I do not know what kind of memory copy / data conversion is performed by Labview under the hood but it is in practice infinitely faster to write the images to disk and read them from the Python function... I would like to avoid disk transfer and I think that this kind of Python ML application might be useful to other people. Is there a way to achieve a faster memory transfer?
Note: Fiddling around, I could achieve over 4x speedup by reshaping the 3D array to 1D before passing it to the Python node but it is still order of magnitudes slower than the speed I would expect on this computer (Intel i9-10900X 3.7 GHz, 128 GB RAM, Windows x64 10 Entreprise).
Solved! Go to Solution.
The speedup from serializing 3D arrays to 1D is actually more modest that I thought (about 15%). The 4x speedup was actually observed when passing a 3D array which largest dimensions are not coming first but last.
So in summary UINT16 3D arrays are transfered to the Python node at approximately 25 MB/s. This is still very slow for memory transfer, writing the 3D array to a binary file from Labview and reading it from the Python node is about 25x faster on the same machine.
If you already have a good implementation in Python, why not implement the whole stuff in Python without dragging LV to be intermediate.
Any cross-language data transfer is going to be slow because there are multiple handshaking and conversion required. Personally, I am against any large data transfer between programming languages instead you can use TCP/IP or ActiveX or other proven methods of inter-process communication.
Without seeing the code it is not possible to say if there are other things happening that is causing the delay. Given you say it is an adhoc collection of drivers and VIs it doesn't sound like there is any type of design or architecture for this system and who knows where inefficiencies may lie.
A few random thoughts from someone who hasn't used the Python nodes much:
-Are you receiving back a 3D array? Could you try the "flatten to 1D" trick for received data as well? In other words- are you sure it's the "getting data into Python" part that's slow, not the Python part itself or the return from Python? Oftentimes you can speed up large data transfers by preallocating arrays, but that's more on the LabVIEW side of things than the Python one.
-The Help for the Python node mentions that, by default, arrays are converted to Python lists. You can right-click that and tell it to convert it to a NumPy array. I have no idea how Python handles lists, but I bet you it's not the same way that LabVIEW handles arrays. NumPy arrays might be better... who knows.
Unfortunately you are probably having to step through each element of the array one by one and copy it from LabVIEW's memory into Python's memory. The Python node may not know the size of the array when it creates its own memory space and will have to allocate space, fill it, reallocate it, fill it, and so on. That's likely what's taking so long. Memory reallocation of massive datasets will nuke performance in LabVIEW, and it's probably the case in your example.
See this post:
The top voted answer has some issues (for example, it only does 10k elements where you have 4 million per image), so review the second answer. That notes that Python lists have no built-in preallocation. NumPy does have preallocation:
so that might help, but that depends on if the LabVIEW Python node preallocates NumPy arrays. Honestly I don't know. I would first try switching from Python lists to NumPy arrays to see if that helps. If that doesn't work, and the issue is indeed array reallocation, you'll need to switch to something like tcp. That way, you can load a session, preallocate your memory, then transfer your images via tcp into a preallocated space.
Then again the bottleneck might not be preallocation, but that's where I'd start.
PS: Assuming you have 30 images you'll have ~126 million array elements. If Python doubles its memory allocation each time, and it starts with 1 element, you'll have to reallocate memory 27 times, possibly copying the entire dataset each time unless it can allocate contiguous memory, which may or may not be possible.
The whole system is complex (and requires a combination LV / Python for the reason described above) but to isolate and test array transfer speed to the Python node I have written an extremely simple VI passing a stack of images (3D array) and reading back an image. Please see it below.
Brilliant insights! Switching from Python list to Numpy array, the transfer speed is over 20 times faster! I can now transfer 80 images (each 8 MB) in less than one second, which is about 25% faster than saving the 3D array to my fastest local disk and reading it from Python. Serializing the 3D array now slightly decreases the performance (possibly due to the overhead of the Labview operation).
The only bugging detail is that it seems that Labview uint16 arrays are transfered to Python as uint32 Numpy arrays, which practically means that the performance could even be slightly better.
Regarding your question on the returned array, it is actually way smaller than the input array and pre-allocation should be possible as the size and type of the returned array is set in the Labview VI (the same is true regarding the input array, its size is completely deterministic).
Thanks a lot!