LabVIEW Idea Exchange

cancel
Showing results for 
Search instead for 
Did you mean: 
altenbach

Associative Arrays!

Status: Completed

Available in LabVIEW 2019 with the Collection > Map functions and VIs. 

LabVIEW  has a somewhat hidden feature built into the variant attributes functionality that easily allows the implementation of high performance associative arrays. As discussed elsewhere, it is implemented as a red-black tree.

 

I wonder if this functionality could be exposed with a more intuitive set of tools that does not require dummy variants and somewhat obscure VIs hidden deeply in the variant palette (who would ever look there!).

 

Also, the key is currently restricted to strings (Of course we can flatten anything to strings to make a "name" for a more generalized use of all this).

 

I imagine a set of associative array tools:

 

 

  • Create associative array (key datatype, element datatype)
  • insert key/element pair (replace if key exists)
  • lookup key (index key) to get element
  • read all keys
  • delete key/element
  • delete all keys/elements
  • dump associative array to disk
  • restore associative array from disk
  • destroy associative array
  • ... (I probably forgot a few more)
 
 
I am currently writing such a tool set as a high performance cache to avoid duplicate expensive calculations during fitting (Key: flattened input parameters, element: calculated array).
 
However, I cannot easily write it in a truly generalized way, just as a version targeted for my specific datatype. I've done some casual testing and the variant attribute implementation is crazy fast for lookup and insertion. Somebody at NI really did a fantastic job and it would be great to get more exposure for it.
 
Example performance: (Key size: 1200bytes, element size 4096: bytes, 10000 elements) 
insert: ~60 microseconds
random lookup: ~12 microseconds
(compare with a random lookup using linear search (search array): 10ms average. 1000x slower!)
 
Thanks! 

 

25 Comments
Jim_Kring
Trusted Enthusiast

I'd love a native dictionary API.  Here's an OpenG implementation:

 

http://wiki.openg.org/Oglib_dictionary

 

-Jim

JackDunaway
Trusted Enthusiast

If I understand correctly: Associative Arrays would buck two LabVIEW "array" preconceptions: the elements must be the same datatype, and the entire array is contiguous in memory. Is this correct?

 

Also, if you only have 10k elements, why does the key need to be 1200 bytes? Two bytes as an enum representation should be enough to index 10k elements.

shew82
Member

I would definately be fore this addition: Using variant attibutes is all well and good, but posses a bit of a code-readability challenge for future maintainers (e.g. what if they dont know this sneaky trick!)

 

As far as tge 1200byte lookup goes, I guess it all depends on what your lookup is - on one of my systems, the look up is a customer's part number. This number is far larger in terms of bytes than is needed to identify the parts in my system, but as I cannot know what part numbers will be created in the future a enum is not an option.

altenbach
Knight of NI

Jack wrote: >Also, if you only have 10k elements, why does the key need to be 1200 bytes? Two bytes as an enum representation should be enough to index 10k elements.

 

True, but that does not solve the problem in my case. The slow calculation has a large set of input parameters with a near infinite number of combinations, so I have two choices (1) Use an algorithm to compress the key size, e.g. like Zobrist hashing. I could do a MD5 on the flattened parameters, and further reduce it to 16 bits using piecewise XOR. Unfortunately, a 16 bit key would make the chances of a cache collision significant. (2) Just use the the full flattened parameters and settings as key. While I initially did not think it would be reasonable or feasible, some testing with variant attributes showed that the speed would be acceptable. 10000 was just an example number. I don't know the final size of the cache or the key.

 

I am not saying that I found the holy grail here. Maybe I will significantly change the algorithm in the future. Still, while writing my variant attribute benchmarks it occurred to me that the tools suggested in this idea could be of general use. Note that the above quoted benchmark was a bit biased and engineered to be dramatic. The key strings only differed in the last few bytes with a long lorem ipsum prepended. If the difference is in the first bytes, the speed difference is less, but the variants are still dramatically faster. Of course as the number of elements increases, the difference between logN and N becomes even more significant.

 

> Associative Arrays would buck two LabVIEW "array" preconceptions: the elements must be the same datatype, and the entire array is contiguous in memory. Is this correct?

 

I think the elements of any given associative array should be the same (as defined when creating it). Same for the keys.

 

Message Edited by altenbach on 04-05-2010 09:07 AM
jdunham
Member

This would help our team too.  We use variant associative arrays all over the place.  I have requested this feature in the past and it doesn't seem to go anywhere, but maybe voting will help.  

 

@JackDunaway: Don't think of it as an array, but rather as something like a queue (which is a FIFO array) Instead of FIFO, this one is accessed by matching a key.  It would be great to have the API as similar to the queue APIs as possible.

 

One feature that would be really helpful is element locking.  Our associative arrays are often clusters, and it is common to want to access the cluster, modify a value, and reinsert it.  It's necessary to wrap the sequence in a semaphore to prevent race conditions, but I code this so often that it would be very helpful to have it built in.  Again, this would be like the queue API, where there would be a timeout for reinserting an item.

JackDunaway
Trusted Enthusiast

1. Could multiple keys point to the same data value?

2.  Is it a requirement that each array be composed of a single datatype, or is that an opinion? (i.e., would a "get" operation return strictly typed or flattened data?)

3. Does the array require a contiguous block of memory, or are new elements dynamically allocated to arbitrary mem locations?

Ray.R
Knight of NI

This would be an interesting feature to have.  I can think of several uses for this.

Wish we could implement a human version of it 😉

Jarrod_S.
Active Participant
I'm upset I can only Kudo this once.
Jarrod S.
National Instruments
jkurtw
NI Employee (retired)

Would the ability to create/destroy and search for single-process variables satisfy these requirements?

 

I posted that idea here, but the focus of that idea was more about getting a "tag" in LabVIEW.

 

This seems more general purpose, but are the requirements the same as long as the lookup performance is satisfactory?

 

 

Cynic2Winter
NI Employee (retired)

I agree that this would be a great feature to have.  Most (or all) modern programming language have built-in high-performance container types.  It is frustrating that LabVIEW only has the array type.

 

In addition to associative arrays, it would be useful if LabVIEW implemented a set type.  Determining if an element is present in a set is an efficient operation.  Compare this to an array, where the element has to be compared to every other element in the array to determine if it is present.

 

Here is a description of the Python implementation of the set type:

 

http://docs.python.org/library/stdtypes.html#set-types-set-frozenset