LabVIEW Development Best Practices Blog

Community Browser
cancel
Showing results for 
Search instead for 
Did you mean: 

Re: Using Variant Attributes for High-Performance Lookup Tables in LabVIEW

Elijah_K
Active Participant

If your instinct is to use an array to manage a lookup table, keep reading - there's a far better way.  In case you're not familiar with the term, a 'lookup table' refers to a large table of values that need to be randomly and arbitrarily retrieved programmatically - you may also see these referred to as a 'dictionary' or 'map.'  As the name implies, these are typically used in software when you want to retrieve a very specific element or value from a large dataset based on some unique identifier or key.  If the data you need to retrieve is stored in a large, unsorted array, the only way to retrieve the information is to inspect every element in the array until you find the information you need - this is often referred to as a 'brute-force' method, and it's highly inefficient and extremely slow.  As I'll explain, variant-attributes provide a very efficient and highly performant alternative to this approach in LabVIEW.

A real-world scenario that may help cement this concept would be the task of looking up a word in a dictionary - the word itself is the unique identifier and the information we're hoping to retrieve is the definition of that word. No one would ever try to find a word by inspecting every page in order, as this could take an extremely long time.  Instead, we quickly skip to the word we want thanks to the fact that the dictionary stores the word in a sorted order.  A typical dictionary (especially a print edition) also has the luxury of being a predefined dataset that remains fixed - words are not regularly added or removed. 

In software, large datasets are often dynamic - elements are regularly added, changed or removed, which necessitates an efficient algorithm to easily find, retrieve, store and modify these items.  I recently published this Measurement Utility, which employs several lookup tables that I'll use as examples.  If you're not familiar with it, this utility is an architectural illustration of how to abstract measurements, the hardware that they measurements interact with, and the results that individual measurements return.  The framework keeps track of the measurements that are currently running, the hardware that the system has access to, results from completed measurements, and other characteristics of the system.  Since the framework is designed to run arbitrary measurements with arbitrary (but compatible) hardware, I needed a dynamic and performant way to easily store and retrieve various pieces of information. 

The Measurement Utility uses a total of seven lookup tables to store various pieces of information, all of which are retrieved using a unique identifier string. If you're interested in exploring any of these and their use, look for them in the private data of Controller Object (in user.lib). The data-type of the value returned is predefined and indicated in the parenthesis:

- Table 1: Given the name of a measurement, return the queue to send messages to this task (Queue Reference)

- Table 2: Given the name of a measurement, return the Object representing this measurement (Class)

- Table 3: Given the name of a piece of hardware, return the Object representing this device (Class)

- Table 4: Given the name of a measurement, return an array of device types (ie: DMM, FGEN) it needs (Array of Strings)

- Table 5: Given the type of hardware needed, return an array of device IDs (ie: PXIe-4110) that match the type (Array of Strings)

- Table 6: Given the device ID of a piece of hardware, return whether or not it is currently in use (Boolean)

- Table 7: Given the name and time of a previously run measurement, return the Object representing the results of this measurement (Class)

Before going any further, we need to understand what a variant-attribute is.  A variant is a data-type in LabVIEW that can be used to encapsulate and store any piece of data in LabVIEW; however, nothing is actually stored in the variant when using it as a lookup table.  What we care about for the sake of creating a lookup table is a little-known property of the variant data-type: it can store attributes!  If you look under the 'Cluster, class and variant' palette and dig into the 'variant' sub-palette, you'll see the API that we'll use - in particular, we care about the following functions:

- Set Variant Attribute

- Get Variant Attribute

- Delete Variant Attribute

imageA2.png

Figure 1: The API for storing and retrieving keyed-value-pairs using Variant Attributes are straight forward and easy to use

This very simple API takes advantage of very high-performance algorithms under-the-hood to sort, manage and retrieve keyed-value-pairs.  As per usual in LabVIEW, you don't have to know or understand how this mechanism works to take advantage of it, and it saves you the trouble of trying to keep a dataset ordered and writing the algorithms necessary to parse it efficiently (think hash tables and linked-lists). 

One of the logical next questions is, 'how much more performant is a variant-attribute table versus simple brute-force?'  The actual speed will obviously vary depending upon where in an array the data value would be located, but to compare algorithms you always examine the worst-case scenario.  The worst-case-scenario for brute force is that you'll have to look through every single element before finding an item at the end, so we denote this as O(N) complexity, meaning that there is a linear relationship between the number of elements and the amount of time this operation may take.  Variant-attributes are implemented using a C++ std:: map structure (this has not always been the case - only in more recent versions of LabVIEW), which has an O(log N) complexity, which is a considerable difference, even as N approaches a modest size. 

ImageB.png

Figure 2: A comparison of the two algorithms reveals a considerable difference between the performance - the performance of the variant-attribute table is represented by the blue line, which is very hard to see when compared with the linear complexity of an array.

In addition, it's also much simpler than writing the code necessary to parse and search an array.  The following block diagram illustrates how Table 2 is used in the Measurement Utility to retrieve the specific object representing a measurement based on the name.

Image C.png

When loading a new measurement into the system, tables 2 and 4 are populated with information: Table 2 is the only location where the Measurement Object is stored (to avoid data copies), and Table 4 allows quick retrieval of the required hardware types when needed.  You could argue that table 4 is unnecessary, but since retrieving this information is a common operation (perfored everytime a user selects a measurement from the UI drop-down), I made the design decision to not fetch the measurement object every single time.

ImageD.png

For those of you familiar with the Actor Framework, the Measurement Class is actually derived from the Actor Class.  Everytime a measurement gets spun up, the queue for this actor is stored in Table 1, which makes it easy to retrieve anytime a message needs to be sent to that measurement.  Upon completion of the measurement, that queue is deleted from the lookup table.  Results are returned (as an object) from measurements upon completion, which are stored in yet another lookup table (Table 7).  This is especially useful, since we can easily accrue a large number of results, and we want  users to be able to quickly retrieve and review results.

Hopefully this gives you a good understanding of how variant-attributes can be used as a much more efficient alternative to arrays.  If you're interested in learning more, I would encourage you to explore the Measurement Utility referenced by this article, here. As usual, let me know if you have any thoughts or feedback or want additional explination!

Elijah Kerry
NI Director, Software Community
Comments
soupy
Member

can you give any lessons learned on using variant lookup tables? what errors could the variant lookup primitives throw? how should they be handled?

it would probably worth the time to wrap all this up into a class to incorporate any lessons learned.

Elijah_K
Active Participant

Once I felt comfortable with the concept, I didn't have any major problems.  When retrieving, the only error condition I can think of is 'not found,' which is indicated by the boolean output 'found' - similarly, the only error condition I can think of when setting a new value is 'already exists,' which is also indicated by the boolean output 'replaced.'  Aside from that, the only other potential problem I can think of is incorrectly casting the returned variant to the correct data-type, so you will want to catch any errors from the 'Variant to Data' function.

Elijah Kerry
NI Director, Software Community
mike_nrao
Member

Variant-attributes are implemented using a C++ std:: map structure (this has not always been the case - only in more recent versions of LabVIEW), which has an O(log N) complexity

Have you tested this to verify O(log N) in LabVIEW?  I've read elsewhere that Variants in my code may harm performance due to data copies.

I'm curious to see how it performs for both insertion and retreival as a function of N compared to "brute-force" linear search array procedure.

crelf
Trusted Enthusiast

I've read elsewhere that Variants in my code may harm performance due to data copies.

I've used variant attributes a lot and have found them to be extremely fast. I don't have any empirical data, but I know they're quick





Copyright © 2004-2023 Christopher G. Relf. Some Rights Reserved. This posting is licensed under a Creative Commons Attribution 2.5 License.
Active Participant

Lookup is fast. Data copies are slow*. You can minimize them by using the IPE Structure to access a variant, modify it, and cast it back to variant when you're done.

* slow only really matters if you're doing high-throughput data collection or managing a huge data set. In general, don't put waveforms in variant attributes unless you're prepared to manage inplaceness manually.

TurboPhil
Active Participant

FWIW, at our company, our typical use case is to use the variant attributes to store "pointers" to data, rather than the data itself. So, either data value references or array indexes that map to a separate array of data.

For example, for a current value table, all channel data is stored in a DBL array, and then we have a separate "mapping" variant that we use to find the index of a specic channel within that array. It makes it a two-step process to get to the data, but as previously mentioned, variant lookups are fast, and it avoids data copies.

As a general rule, I try to minimize the number of times the variant attributes need to be written. Reading is fast. Writing is slow.

Daklu
Active Participant

+1.  I use lookup tables fairly regularly in my code, though they've been small enough that performance wasn't a concern and pair of arrays (one for key and one for value) was plenty adequate.

"(this has not always been the case - only in more recent versions of LabVIEW)"

Do you know what version the change was made?  A couple years ago I prototyped some general purpose collection classes and I've been considering revisiting them.  I usually use LV2009 to create reusable code modules.  Does that version use the same search techniques?

Active Participant

TurboPhil, I think you're describing a hash table.

TurboPhil
Active Participant

DavidStaab wrote:


                       

TurboPhil, I think you're describing a hash table.


                   

Yep. I guess that's somewhat deviated from the original "dictionary" funcitonality outlined in this post, but I think the hash table design pattern--while slightly more complex--is much more flexible, as it can be applied for regularly-updating data (like current value tables), not just for static "dictionary" lookup.

crelf
Trusted Enthusiast

TurboPhil wrote:

DavidStaab wrote:

                       

TurboPhil, I think you're describing a hash table.

                   

Yep. I guess that's somewhat deviated from the original "dictionary" funcitonality outlined in this post, but I think the hash table design pattern--while slightly more complex--is much more flexible, as it can be applied for regularly-updating data (like current value tables), not just for static "dictionary" lookup.

CVTs and hash tables solve two different issues, but can be made to fit each other, I like the concept of a hash-table driven CVT, but I use it when I really need the performance. Generally, variant attribute CVTs fit most use cases.





Copyright © 2004-2023 Christopher G. Relf. Some Rights Reserved. This posting is licensed under a Creative Commons Attribution 2.5 License.
Active Participant

Eli, I feel Fig 2 is misleading to the novice reader. It depicts worst-case access time for the two approaches, but it doesn't accurately represent mean lookup delay for randomly accessed data. In random-access of data sets of even a few hundred elements, a linear search isn't noticeably slower than a search in a balanced binary tree (which is roughly how the variant is implemented). Since variants carry their own hazard of being very slippery while coding, it's sometimes merited to just use a simple 1D array lookup if you're not juggling thousands of elements.

(By "slippery", I mean that there's no type checking on inputs of type variant. You can wire anything to a variant input, including an array of variants, which is interpreted as a single variant.)

AristosQueue (NI)
NI Employee (retired)

Michael Lacasse wrote:

I've read elsewhere that Variants in my code may harm performance due to data copies.


                   

A) Be sure that you're reading up-to-date info. Myself and another developer who really liked the variant attributes as a workaround for the lack of other lookup table structure pushed the performance up substantially in LV 2009, achieving the O(log N) performance.

B) Just because you can retrive the variant attribute itself at O(log N) does not say anything about the time needed to extract the value of that variant. Lots of people introduce expensive data copies at this stage. For high-speed access, make sure you use the Inplace Element Structure and the "Variant To/From Element" pair of nodes and if you want to extract the data from the variant and use it outside of the IPE, use the Swap Values primitive, like this:

Untitled.png

AristosQueue (NI)
NI Employee (retired)

TurboPhil wrote:

As a general rule, I try to minimize the number of times the variant attributes need to be written. Reading is fast. Writing is slow.


                   

See the picture I posted in response to Michael Lacasse... the swap trick works for accelerating writing, too, assuming you've already got a variant that contains the expected data type.

TurboPhil
Active Participant

AQ - not to derail the conversation, but can you explain the Swap Values primitive example you posted? It's never been clear to me what that function should be used for, and I don't see how it helps in your example. I presume the point of the code in the example is to do this:

swap_values.PNG

So why is it better to use the "Swap Values" primitive?

Taggart
Active Participant

I'm sure AQ will have a better answer but try clicking on the show buffer allocations.  You'll notice in AQs example that there is no buffer where the array is passed out of the IPE, whereas in your example there is.  My guess is that is why AQ recommends that approach.  If you have a large array that extra allocation could be a big deal depending on your application.

Sam Taggart
CLA, CPI, CTD, LabVIEW Champion
DQMH Trusted Advisor
Read about my thoughts on Software Development at sasworkshops.com/blog
GCentral
AristosQueue (NI)
NI Employee (retired)

TurboPhil:

Short story:

In some ways, this is just a helpful note to other developers reading my code -- the swap makes it very clear what I just did here. But it can (not necessarily will) have a real impact on complation. The LV compiler -- like any other compiler -- is not sentient. It recognizes some patterns but not others, and it is often very minor deviations from pattern that cause less-than-optimal compilation, and giving hints to the compiler about what you're doing lets the compiler do better optimization. In this particular case, the goal is to a void data copies. The Swap primitive is that hint.

Longer story:

Consider the picture you posted: A variant comes in. In memory, that looks like an allocated block that points to a type descriptor and an array. If the compiler does not recognize this situation, it will compile it as "copy the value out of the variant into the output tunnel, then copy the array in the input tunnel into the variant". But if the compiler recognizes this as a swap, it could just exchange the various array pointers and not copy any arrays at all. There are two ways the compiler could recognize this as a swap. The first way is LV R&D can teach the compiler to recognize "if you generate code that says copy-value-to-new-location and that is followed immediately by an instruction overwrite-value-in-this-location then instead, swap the pointer from the output location to this location and then swap the pointer of this location with the input location". The second way is that the programmer declares, "I am doing a swap" and so the compiler knows to do swap instructions right off the bat.

It's the equivalent of a text compiler recognizing that "temp = i; i = j; j = temp" can be replaced with "i = i xor j; j = i xor j; i = i xor j", which is the same number of instructions but saves a memory allocation. The compiler can recognize some special patterns. But if I write this: "temp = i; do_something_else(); i = j; j = temp", it might not recognize the swap optimization.

In the same way, if you do anything other than a swap inside that Inplace Element Structure, it might get in the way of the compiler recognizing the optimization.

The LV compiler keeps getting better at recognizing ever more complex patterns, doing regressive analysis, moving whole code blocks around, testing different compilation variations to see which one actually is more efficient, etc. Someday, I hope to live long enough to see sentient compilers. But even when we have sentient compiler, giving it hints will still make it run faster (there are fewer things it has to check if you're explicit about what you want to happen), and the readability of code benefit will always apply.

TurboPhil
Active Participant

Well spoken, AQ. I see what you mean in this particular instance, but as is the case with many of these advanced memory management concepts, I'm not sure that I understand it well enough to know when it is best to use it. As such, I'll probably end up being "superstitious" about it, and overusing it when it does not really help. Maybe it's just me, but I really think the advanced LV community would benefit from more detailed explanations of how the compiler works, either through more tutorials like this or possibly dedicated training courses. I feel like I come across an interesting comment like this (usually posted from your account) that seems to just barely scratch the surface of the topic, but it never matures enough for a bonehead like me to fully understand it. [I imagine there are a lot of other LabVIEW users who got into LV from science/engineering (academically, I'm a "physicist", but I have long since transitioned to "bumbling LV software engineer"), who don't have the Computer Science background for a lot of these concepts to be immediately intuitive.]

And, one last aside: your example above is specifically for handling data in a variant, as opposed to data inside variant attributes, right? So it might be a little confusing, given the initial thrust of this article. There's no inplaceness that can be done on reading/writing variant attributes, is there?

Active Participant

TurboPhil wrote:


                       

I'll probably end up being "superstitious" about it, and overusing it when it does not really help.


                   

I'll defer to someone from LV R&D for responding to the rest of your post, but I'd like to respond to this piece. Because the LV linking/compilation process is so mythical -- even "Show Buffer Allocations" should be viewed with a grain of salt, like a weather report -- I have the same tendencies. I had to print this reminder out and tape it to my monitor for the duration of my current project:

Write the software first, then determine if it is too slow, and profile the code to find out where the bottle neck is, and then refactor as needed. Don't optimize for performance before you know whether you need to.

AristosQueue (NI)
NI Employee (retired)

TurboPhil wrote:

I'm not sure that I understand it well enough to know when it is best to use it. As such, I'll probably end up being "superstitious" about it, and overusing it when it does not really help.


                   

This primitive is pretty easy to use though without creating problems: when you think of yourself as swapping two values, use the primitive. If nothing else, it gives you cleaner wires than the criss-cross that often results otherwise. The compiler will take the hint. In this sense, there's no such thing as "overusing it". The compiler might do exactly the same with or without your primitive, but you'll never be introducing a deoptimization using this primitive.

Now, there are some other "I did this just to trick the compiler into more optimal code" superstitions that bite you in the long run. The LV compiler (as with all compilers) tends to get smarter about good clean code, not hacky code that follows some non-standard pattern. So in general, Staab is right... write your code clean and let the compiler do its thing for you.

AristosQueue (NI)
NI Employee (retired)

David_Staab wrote:

Because the LV linking/compilation process is so mythical

And it is only going to get more so. The compiler is waaaay smarter than its authors at this point. Even describing how many data copies actually get made at run time in some fairly simple examples, most of the compiler authors would get it wrong without very slow analysis by hand. LabVIEW is (finally) crossing the line that C and other languages crossed so long ago --- worry about your algorithms, not the assembly code they become, because the compiler knows more about that than you ever will.

DavidPL
Member

I think this article suggests storing each measurement data set as a different attribute of a variant because the get variant attribute is efficent at finding the attribute, and adding variant attributes is possible at run time, unlike adding cluster elements. This would be used instead of a 2 D array, or an array of clusters. Labview has a search array function that I would assume is equally efficent. It should not be difficult to keep an array of names of parameters to be searched.

My concern is how efficent the varient VIs are at handling large amounts of data, as I would not think that is what varients are designed to do.

Searching for a measurement slows down the system when it is done continuously. I have seen programs where, if a measurement is put in to a real time graph, the program has to search for it every time a point is posted to the graph. Rather than doing this, I use a list of indexs for each plot, data file, etc. I create a pull down list of the measurements. The user selects the measurement that they want, and the system does not have to search for the measurement. (The user doesn not have to type in the measurment name.) If there are many measurements, I have the user list sorted, with an array that translates the list order to the measurement index. If there are a huge number of measurements, I create a tree structure for the user to use. I avoid index changes by adding parameters at the end of the list, and allowing for empty parameter indexes. If the list is so dynamic that parameters indexes can not be kept constant, then do the search once, or rarely, only when the indexes are changed or loaded. Store the names of the parameters for each plot, calculation, and data file, and generate the index tables when the setup is loaded or changed. Just be sure to have a function that will update each list of names when a parameter is renamed.

tstahl
Member

I believe the search 1D array is a linear search and the get variant attribute is a binary search (thus being far more efficient).

Large amounts of data can be handled using a DVR or index to an array in the data type.

rolfk
Knight of NI

tstahl wrote:


                       

I believe the search 1D array is a linear search and the get variant attribute is a binary search (thus being far more efficient).

Large amounts of data can be handled using a DVR or index to an array in the data type.


                   

While you are certainly right about the Search 1D array, I'm pretty sure you are way off on the variant attribute.

For the search 1D array to do anything else but linear search the data has to be sorted. In general that is not the case.

Variant attributes are strings and again a binary search makes little sense as sorting the array on strings costs lots of performance but offers much less advantage than a hash. Instead using a more or less sophisticated hash algorithme to sort the attribute names makes the lookup of an attribute in the best case a direct lookup and depending on the fillfactor of the hashtable a tiny bit less than that.

Rolf Kalbermatter
My Blog
altenbach
Knight of NI

Elijah,

I somehow missed this discussion completely until I saw it mentioned in the NI News.

I have done similar work (with a few twists), starting with my LabVIEW Idea about associative arrays and the extension to my proposed memoization structure discussed in the last paragraph of my comment here.

In my NI-week 2012 talk I presented my associative cache implementation using variant atttributes and compared it with an alternative implementation using "search array". Similar to your results, the variant attributes really scales much better.

I needed to implement a fixed size cache, because the entries are relatively large. Once the size is exhausted, I need to overwite the oldest entry (EDIT: actually, the entry that has not been accessed for the longest time),  requiring carrying an "age" array along as well as an array of keys (we need the key to delete an entry!). I keep the actual data in a fixed size array and only use variant attributes to keep track of [key(attribute)|array Index(data)], so while the key is several hundred Bytes, the data is a simple I32. even running a simulation overnight in a tight loop does not show any memory leaks.

Since the keys are also fixed size, I have the gut feeling that I could improve the performance by dropping the variant attributes and implementing a fully "in place" algorithm using my own binary search and doing a ultra-simple re-sorting with each insert (basically just a subarray shift&replace element). However, a cache operation using my current variant attribute implementation is several hundred times faster than a recalculation in my particular application, thus any algorithmic improvement would give minimal additional benefit and I have not explored it further.

I have been side-tracked with other things, but I hope I can write up my implementation into a community document in the near future. Stay tuned.

DavidPL
Member

Yes, to do a binary search, the array would have to be sorted by the identifier. Then variant attributes are kept sorted by their name? If trying to implement this through code, don't actually move or insert values- just keep a pointer array that points to each value in order. In c/c++, this could be implemented directly, but in LabVIEW, an auxiliary array would be required.

I would think for Altenbach's fixed size cache, a simple ring buffer would suffice, since once the buffer was filled, the pointer would always be pointing to the oldest location. No separate "age" array required. I have kept track of the age (time stamp) of some of my parameters that were acquired by communication to an instrument or another system so if the external device stops updating values, the value can expire and the lack of communication indicated on the display.

altenbach
Knight of NI

From other discussions long ago, variant attributes are stored in a red-black tree, they are not "sorted by name" in the conventional sense.

A ring buffer would not suffice, because the entries are accessed out of order. I was not fully clear with my description. I don't toss the entry that has been added the longest time ago, I toss the entry that has not been accessed for the longest time (big difference!). Any access (e.g. retrieve entry, reserve entry, etc) will reset the age.

rolfk
Knight of NI

altenbach wrote:


                       

From other discussions long ago, variant attributes are stored in a red-black tree, they are not "sorted by name" in the conventional sense.


                   

Well they are sorted but often not by alphabetical order but by some hash code. And computing that hash code from the data is the real magic. It should be large enough to have a reasonable chance to generate unique hash codes for every unique input variable, yet it should fit also into the natural data size for the system (usually an integer). And then the hash algorithme should be able to spread the codes across the entire range evenly with various input data. A simple exor or ASCI code summation is not very useful as it will tend to clump most typical input variables around a small range of the entire possible hash code range.

And when there is a hash code collision (or the actual variable has been found) the insert and retrieval algorithme have to do an actual variable comparison but given a good hash algorithme that should only happen once on every insert or search operation.

Rolf Kalbermatter
My Blog
Chris_Cilino
Active Participant

Hey ya'll I know this is an old thread but I've created something that may be useful that I'm calling the "LabVIEW Container". The name is still up for grabs. Here's a link to the community page

https://decibel.ni.com/content/docs/DOC-36470

And here's a super short video that gets you the main idea:

http://www.screencast.com/t/i4DVZswx

In short, think of it as a dynamic cluster.. or a variant attribute table with hierarchy.

AELmx
Member

I have been using search algorithmns. I read information from text files, and store it in arrays.  First, I used string arrays but noticed it was too slow, also was concerned about memory usage.  At a second instance, I have replaced the array for a string separated by \n, so I do searches using regular expressions in the string, get the number of lines before the match, and get the index for taking the values I need from the arrays. But I'm a bit concerned about if this was the optimal solution. And I'm having trouble with a file with about 4 million lines, it takes some seconds for reading the data.  Seems this variant option is a better solution, It would be worth to try.

AristosQueue (NI)
NI Employee (retired)

I want to let everyone in this thread know about a new feature of LabVIEW 2016 -- the In-Place Element structure has a new pair of border nodes to let you access the attributes of a variant without copying them out of the variant. This will VASTLY improve the performance of tools like the one this thread is discussing.

I strongly encourage everyone who works on this to look at the shipping example:

labview\examples\Performance\Variant Attribute Lookup Table\

Screen Shot 2016-08-02 at 11.23.03 AM.png

tstahl
Member

I've gotten a long ways into updating our LabVIEW 2012 core applications to 2015 and you post something that could significantly improve performance. Darn you!

AristosQueue (NI)
NI Employee (retired)

These are the kinds of problems I like creating for customers.