State machine architecture, shift registers, subVI memory handling

jckvt · ‎02-07-2011

I am having some performance issues with my code and I am wondering if the way I have designed it is causing memory issues. The code is a state machine design in a timed loop. The state machine logic is inside a subVI. The main VI stores data in shift registers and passes the data into the subVI via a cluster (defined in a strict type def control). The images below show an example of what I mean. The code is in a subVI strictly for the purpose of making it more organized. It does not need to be a sub VI. My real code is very large however with multiple subVI state machines in the same loop. Some run in parallel while some have data dependencies on each other so must run sequentially (there are no dependency loops).

On the first iteration of the loop, some of the state machine VIs load calibration data from csv files (and I am sure it only happens on the first loop iteration, not repeatedly). They store some moderately large 1D and 2D arrays of double values in the data cluster that is stored in the shift register. Without loading the csv files the VI runs just fine. When the cal loading is enabled the CPU usage goes up and I loose communication with the RIO. Even if I am not even doing anything with the data, just having them loaded causes a large increase in CPU load. What I am wondering is if this is a memory/cpu inefficient way of storing a large buffer of data. Is the memory buffer that stores the clusters in the shift register being recopied every time the subVI is called? Would it help to make the VI inline or reentrant?

After reading the "developing efficient data structures" article here it seems like just bundling and unbundling the array is potentially causing the data to be recopied. This would be very bad as this happens multiple times each loop iteration. I would not have thought this copied the data as it is just a wiring operation. I only used clusters to keep things more organized. Without the clusters each subVI would have many wires in and out. It would be very difficult to decipher which signal is which. It seems that both practices that I do just for the sake of organization and readability (using subVI's and using clusters) have a detrimental effect of performance. Is there a way to have the organizational effects of subVIs and clusters without the data overhead?

In a related question, are global variables the most efficient way to pass data between parallel running loops? I have read that local and global variables copy the data into another buffer when reading or writing. Is there a way to get around this? In my code I use variables and there is always a single writer and multiple readers. For each reader to have to copy the data into their own buffers seems inefficient.

Thanks for any helpful comments on this issue.

Mark_Yedinak · ‎02-07-2011

No, global variables would probably be the absolute worst method (for many reasons) you could use. Same with local variables. I would look at using either an Action Engine (LV 2 Style Global) or a single element queue to keep your data. Both offer some protection against race conditions and are preferred methods for passing data.

Is this a timing critical task? If so, you may want to pull the elements of the state machine that are not going to be run very often (initialization for example) outside the state machine. Also, separate the UI from the processing tasks. Don't put the UI stuff in with the processing stuff. UI tasks will impact your performance.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

SteveChandler · ‎02-07-2011

Remember that if you fork a wire you make a copy of the data. One efficent way to send the contents of a large array to another location is by reference. Look for examples on data value reference and the in place element structure.

I think you could put the data value reference into a global variable. Of course you could also put your finger into a light socket. Just because you can doesn't mean it is a good idea

=====================
LabVIEW 2012

Mark_Yedinak · ‎02-07-2011

@SteveChandler wrote:

Remember that if you fork a wire you make a copy of the data. One efficent way to send the contents of a large array to another location is by reference. Look for examples on data value reference and the in place element structure.

Simply from a technical aspect I I recall some folks from NI saying that the compiler does try to minimize data copies and in some cases a forked wire will not result in a data copy. However, if you are trying to keep memory allocation to a minimum this is a good rule of thumb to say all forked wires result in a data copy.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

SteveChandler · ‎02-07-2011

Yea but without seeing his code it is hard to say.Edit: Wasn't done! I acidentally hit the post button. Hurrry...

You can get a good idea of what is going on by showing buffer allocations. Tools/Profile/Show Buffer Allocations.

=====================
LabVIEW 2012

Mark_Yedinak · ‎02-07-2011

@SteveChandler wrote:

Yea but without seeing his code it is hard to say.

I agree completely. I was just stating that it isn't always the case. It is a safe assumption though.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

johnsold · ‎02-07-2011

I have found it more convenient to keep large arrays out of the cluster. I have not really checked to see if there is a performance difference, but it seems reasonable that not putting the arrays into the cluster might avoid some memory reallocations when the array is populated. In the past I usually passed the arrays via shift registers or Action Engines. Now, I would consider queues or LVOOP options as well.

Lynn

bracker · ‎02-07-2011

I had a simiilar problem, and I ended up using most of the techniques that were suggested in the previous responses: functional global variable (action engine), data value reference, in-place element structure, monitoring buffer allocations, and also monitoring the memory usage while running key parts of the program step by step. In the end I was able to improve things a lot, although I never exhaustively characterized it to see which of these steps mattered most. By the way, I suspect that the little black squares that indicate data copies when "show buffer allocations" is selected are not always correct--sometimes there is not a data copy even when there is a black square.

jckvt · ‎02-07-2011

Thanks for the comments. It has all been helpful. Mark, you mentioned UI stuff should be handled in other loops. It actually is. This code will be compiled and run as a real time build specification on a compact RIO. As such the front panel items are not meant for a user. The front panel items are strictly for the purpose of passing data to other loops running in parallel. The reason that multiple loops are used is that they perform lower priority tasks. For instance, one of the loops stores data in shared variables for user interface VI's running on local PC targets. This loop is not operated as fast as the main control loop. Another loop operates a state chart. I dont want the state chart in the main control loop because it also runs at a slower rate. Local variables pass info among the loops. I purposely have all controls and indicators in the fastest loop while local variable access which has more overhead is used in the slower less important loops.There are no race conditions. Every variable has only one writer and one or two consumers. Action engines and queues seem like an overcomplication of a very simple concept. It would be a lot of work to replace all the variables. I dont think the variables are causing a big performance issue, but I was wondering if there was a better way. Would the memory/performance benefit really justify the complication of the code? The main issue is the arrays.

I did some experimentation and it seems like the real problem is passing the arrays in and out of the subvi. I tried moving the arrays out of the cluster and giving each one its own shift register and in/out ports on the subvi but that did not improve things. Using the "show buffer allocation" I could see that just passing the cluster/arrays into the VI allocated new buffers for it. I then tried using feedback nodes to store the arrays therefore never passing them into and out of the subvi and this worked. CPU usage did not take a huge hit when the cals were loaded. I will next try using the data value reference. The only thing that concerns me with that method is that the in place element structure seems to require that I read AND write to the data. What if I only need to read? Once I read the data into the arrays from the files I never need to write to them. I am just going to use them as lookup tables to interpolate values from. According to the help file the in-place element structure will replace the data in original memory. I dont need to do this. I am not modifying the data. Can I read it without writing to it?

Mark_Yedinak · ‎02-08-2011

Local variables will create a copy of the data. If you are working with large data sets I think you would be best off using DVRs. Even though the DVR appears to be writing the data I don't think it actually is. It is working with the memory in place.

Large data sets can be a problem. Are you able to decompose the data and work on smaller portions of it? If not, you should use DVRs. I would also recommend that you look at encapsulating the large data sets in LVOOP classes. You can use the DVR internally and provide a clean interface for working with your data.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

LabVIEW

State machine architecture, shift registers, subVI memory handling

State machine architecture, shift registers, subVI memory handling

Re: State machine architecture, shift registers, subVI memory handling

Re: State machine architecture, shift registers, subVI memory handling

Re: State machine architecture, shift registers, subVI memory handling

Re: State machine architecture, shift registers, subVI memory handling

Re: State machine architecture, shift registers, subVI memory handling

Re: State machine architecture, shift registers, subVI memory handling

Re: State machine architecture, shift registers, subVI memory handling

Re: State machine architecture, shift registers, subVI memory handling

Re: State machine architecture, shift registers, subVI memory handling