LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Inline subvi's and memory usage

As an example here is how I am rocking out to 80's hairband music from the internet right now (minus a few event handlers I omitted for brevity).

 

MP3Streamer.png

It is a similar flow to what you have.  Each row represents a step of data processing and except for the ends is both a producer and a consumer.  The top row gathers raw data from the source.  The next step is to break the raw data into physical frames.  It happens that the data for a given audio frame can span multiple physical frames so the next step is to consume physical frames and when an complete logical frame is present send it off to be decoded.  The frame is decoded and then sent to the player. 

 

This is quite like your system, multiple parallel data-driven steps.  As you can see, no need for sprawling BDs.  I like being able to see the steps from the top down, slightly bummed that there is a bit of right to left as you move down.  The key I have found is to put the loops into subVIs when possible (I like to use the loop glyph supplied with the icon editor).

Message 11 of 16
(1,313 Views)

It's been a long time since I worked with it, but I believe IMAQ already has "C-like" buffers, where you have to explicitly make copies of the data when you need them. There shouldn't be any performance effect from passing an IMAQ buffer into a SubVI. So you might be able to boost performance by making more use of the IMAQ functions.

 

Once you copy the from the IMAQ buffer into a LabView 2D array, it is up to the runtime engine to decide whether it needs to make a copy of the data in order to maintain LabView's dataflow-based execution rules. Tools -> Profile -> Show Buffer Allocations is a useful tool for showing where copies are being made.

 

The structure you describe should be easy pretty easy break into SubVIs. We use a pattern we call "scuttling" where the Top-Level VI instantiates queues/notifiers and passes them into SubVIs that "run forever" until scuttled by killing the queue/notifier in the caller. The TLVI looks like this:

TLVI.png

 

and each SubVI like this below. LabView is perfectly happy having a dozen (or a hundred) of these running in parallel, all waiting on an event from the Queue/Notifier. When the notifier is killed, Dequeue Element throws an error and the SubVI exits. (Timing out does not throw an error, so by adding a timeout this becomes a task that waits n milliseconds between iterations. We use this all the time to run periodic tasks.)

 

You can do a Harvard parallel (pipelined) architecture very easily this way, where Task A -> Task B -> Task C all run in parallel, each reading from a queue, doing some work, and writing to a queue.

 

scuttling-pattern.png

 


LabView queues are very robust but I do think there is a buffer allocation when you dequeue an element, so you might still want to work with pointers/IMAQ refnums.

0 Kudos
Message 12 of 16
(1,311 Views)

Despite what the buffer viewer says, there is no buffer allocation when you dequeue an element (which is why single-element queues are so fast for singleton data storage). The buffer viewer is showing the default data allocation that is only used if there is no data in the queue. You can verify this fairly easily by loading a queue with a 1 million point array, then dequeueing it. Watch the OS task manager for memory allocation as you step through the code and you will see the operation is in place.

0 Kudos
Message 13 of 16
(1,307 Views)

WOW! That is very clear and concise! I am assuming each of those rows is a different class? I am familar with C++ object oriented programming but I've yet to delve into LVOOP. It's right after I finish Core 3 which I am frantically trying to finish so I can make sure I am doing things correctly for my current application.

 

 Regarding the structure you have here, let me know if this how it works:

 

  1. All classes initialize.
  2. The Decode class initiates queues and passes them to an audio loop. The audio loop then deals with data as it comes across the queues and waits.
  3. Similarly, the unpack class initiates some queues and pass them to the decode loop. The decode loop then deals with data as it comes across the queues and waits.
  4. repeat this until you get up to your stream loop where you push a URL to the stream loop and it constantly pulls from the stream.

I'd be interested in seeing an example of the data you pass from the initialze queue methods to the loop methods. Is is just a data queue and a message passing queue?

0 Kudos
Message 14 of 16
(1,292 Views)

Rob, luckily the data from the camera is converted to an array, pushed to a queue, pulled from a queue and then pushed to the GPU. If your hypothesis is correct then if the step of passing into the queues was removed then no copy should be made.

 

Regarding the "C-like" buffers, these would be buffers in RAM not on the framegrabber or camera correct? One of the issues we have regarding the performance is there appears to be a bottleneck as we loop over our Grab Acquire. Grab Acquire.png

 

One of our goals with this code is for a control application, which means pulling data off the camera in the smallest number of lines (e.g. 10 lines of 896 pixels or 1 line of 896 pixels). Our camera can deliver lines at ~ 200kHz and as long as we acquire 1000+ lines in grab acquire, we don't have any issues with turnaround time. However, even if we remove the "IMAQ Image to Array" function as well as the push to the queue we are finding that as we reduce the number of lines, we start to miss lines because of the amount of time spent in grab acquire appears to be too high. We've found that by playing with the duty cycle (e.g. acquire a few lines at 200kHz and then have a long period of downtime) it was hard to get the loop time below 300 us.

 

A quick question regarind your "scuttling" pattern. You would have a seperate message passing queue for each subvi, correct?

0 Kudos
Message 15 of 16
(1,286 Views)

@ColeV wrote:

WOW! That is very clear and concise! I am assuming each of those rows is a different class? I am familar with C++ object oriented programming but I've yet to delve into LVOOP. It's right after I finish Core 3 which I am frantically trying to finish so I can make sure I am doing things correctly for my current application.

 

 Regarding the structure you have here, let me know if this how it works:

 

  1. All classes initialize.
  2. The Decode class initiates queues and passes them to an audio loop. The audio loop then deals with data as it comes across the queues and waits.
  3. Similarly, the unpack class initiates some queues and pass them to the decode loop. The decode loop then deals with data as it comes across the queues and waits.
  4. repeat this until you get up to your stream loop where you push a URL to the stream loop and it constantly pulls from the stream.

I'd be interested in seeing an example of the data you pass from the initialze queue methods to the loop methods. Is is just a data queue and a message passing queue?


You are close, I prefer to think of data flowing down, the stream class consumes data from the network location (via tcp/ip) and produces data for the frames class.  The frames class consumes the string data which is free-form and syncs it to produce physical frames of mp3 data which it produces for the unpacker.  The unpacker consumes the physical frames and produces the logical frames which contain compressed audio data.  The Decoder consumes the compressed data and produces audio waveforms.  The Audio waveform puts the audio data into the sound buffer.  I see the data as being pushed rather than pulled myself.  Each class has some state data it needs, it typically does what it can, for example, the stream may produce one and a half physical frames of data.  The frames class consumes the full frame, leaves the partial frame to be combined with the next input from the stream.

 

1. That is correct, all classes initialize.  The only input is an optional limit to the queue size.  I use the dice typecast to string to generate a random "name".  The queue function following obtains a producer and consumer queue refnum from the name.  Both are passed to the consumer. 

2. That is correct, the decoder consumes data from the unpack class (logical frames of mp3 data which convert to a frame of audio).  The converted audio is fed via the unpack class producer queue to its consumer (the audio class).

 

I call this pattern "data-driven" because I leave the data alone and pass it between queues as is (no messaging in the queue itself).  This way every call to consume a piece of data does not have to begin with a message comparison (usually it is 'data' a million times and 'stop' once which costs me 1 million and 1 comparisons), and I minimize the packing and unpacking of data which creates opportunities for unneccessary data copies.  The stream class produces a string queue, the frames queue produces a queue of MP3 Physical Frame Class, and finally the Decode class produces a queue of waveforms.  The messaging is tucked into the two queue refnums I pass.  They are both created by obtaining from the same (random) name.  The producer uses the producer refnum to produce data until it is done.  Then it immediately releases the queue (but does not force destroy).  The consumer dequeue has a timeout, it could be a network issue and I can change the display to reflect this, or it could timeout because the queue is empty and the producer has been released.  A simple validity check in the timeout case tells me if the producer is finished.  If it is, then the consumer shuts down any producer queues it was using and exits.  When the last piece of data exits the audio player it waits for the music to end (in case it was a file you want to hear the end), then releases the final queues and shutsdown.

 

It is great.  The stream class can be a file, a string, a URL, an audio CD, a DLNA server, my software defined radio, or whatever via DD.  The middle three classes I show handle mp3 from all sources, different data types (WAV or CD audio for example) simply replace those classes.

 

I also have an optional file-based ring buffer so I can create a TiVo-like buffer to pause my internet radio, live FM or whatever simply by inserting a new class in the chain.  I can buffer the audio data, or the compressed data based on where I place the buffer class and what data it expects.

 

My usual advice here is "No LVOOP in a loop".  Things like dynamic dispatch are fabulous for switching modules and the like, but not quite perky enough to be in tight loops.  I also try my best to have methods which contain loops and avoid loops which contain methods. Inlining has gotten much better, and the compiler has as well, but it is still easy to find ways that the compiler can not handle optimally.  A method usually has an unpack-do stuff-repack pattern.  Putting it in a loop leads to packing and unpacking repeatedly.  Usually inplace, but very painful when it is not.  Instead I try to have a method which is unpack-do a bunch of stuff in a loop-pack.  Oh, and those property nodes look very pretty but they once brought my code to its knees because they are not inlinable as of yet.  The underlying accessor VIs are inlinable, but not the node itself.  Still smarting from that one.

 

 

Message 16 of 16
(1,277 Views)