Poor search performance in cluster with array of singles (100k)

Ola_A · ‎09-29-2014

In a datalogging project I have created the a datatype to store some periodical data. I have a cluster (see picture) that consists of a name, StringData, NumericalData, and ArrayOfSinglesData (100k points predefined). So each data can be a scalar string, scalar number or array of numbers.

I then make an array of this for all my variables, maybe 150 or so.

Works nicely, but the performance when adding new data (searching for the correct name) is quite poor. Quite high CPU and extremely many page faults even for slow logging like 1-10 times per second.

The input to the add data method is an array of (NameString, DataString). So for each new data I loop through all variables, find the one with the correct namestring and add the new data (replce for the scalar string and number, add for arrayOfSingles).

Since I only have 150 variables I thought a linear search should be ok, but probably the program also moves around the 100k singles for each compare, or that is my idea at least.

Can you suggest what is a good thing to do to improve performance?

I added a more clever, binary search, but this needed sorting the variables, and since I have two names for each variable (called internal and external) that I use when adding, this only helped for half of the searches.

If I find time I would like do divide the cluster so all ArrayOfSingles are separate, but maybe you can tell me right away if this is useful?

(Labview 2012sp1)

GerdW · ‎09-29-2014

Hi Ola,

use the well-known approach of key-value-pairs embedded in variant attributes as described here!

- Handling a lot of arrays with 100k points each will be not that fast. You might also consider limiting the history of each value to more reasonable size.

- Using a cluster for all entries might also increase execution time as it requires one more step of referencing your data. You might think about a variant only handling the scalar data and a second variant only handling the corresponding array data…

Best regards,
GerdW

using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019

Yamaeda · ‎09-29-2014

Just like Gerds link shows i guess you're looping through the array looking for your name?

I just tested, and using DVR to extract only the ID and afterwards grabbing a single element is quite fast. I'll attach my test so you can try for yourself.

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

Hooovahh · ‎09-29-2014

If you are doing key value pairs I suggesting using my Variant Repository. You write key value pairs, with the write data being anything, and then read that back. It's implemented using XNodes so it should run as fast as an inlined VI. Type adaption is done by the XNode so you tell it the data type it should be, or it looks up stream. It can also read tags within tags, or save and load them from a file.

Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.

16 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord

Ola_A · ‎09-30-2014

Yaemeda, I tried your solution by changing the search loop to one where the names are retrieved by the "In Place Element Structure Index Array", and it works great, thanks!

Search time is less that 1/1000 of before, just as in your example, and the code can run even on a very slow computer!

The application still produces a steady 10 000 000 page faults (seen in task manager) per hour though, so I have some more code to improve!

Yamaeda · ‎09-30-2014

@Ola_A wrote:

The application still produces a steady 10 000 000 page faults (seen in task manager) per hour though, so I have some more code to improve!

That sounds like data copies when arrays need to be reallocated, but hard to say without seeing some code.

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

CoastalMaineBird · ‎09-30-2014

The input to the add data method is an array of (NameString, DataString). So for each new data I loop through all variables, find the one with the correct namestring and add the new data (replce for the scalar string and number, add for arrayOfSingles).

The problem is that you are doing a linear search. If you have a new data and search thru the list and find it at element 50000, you insert it there.

If you have another new data for the same name, you DO THE SEARCH AGAIN. With any luck at all, you get the same answer, but you've just searched 50000 records to get it.

Consider using a hash value. Think of a hash value as a "signature" of the string name for the record. There is no numerical value to the number, but it will be a unique number (if you do it right) for each unique name. It's a fast operation.

I've done something similar to your problem, but I could establish a firm limit on the array size (8192 in my case). If you can establish such a limit, then here you go.

Pick a max size (N)

You init an array of size N of empty records.

You init an array of size N of hash values (U32 = 0)

If you want to write a record, perform a hash value on the new record's name.

Convert the hash value to an index I by using only the bottom M bits (2^M = N) of the hash.

If the hash table at index I is zero, then it's never been used:

---- Store new record at index I of the records array.

---- Store the hash value at index I of the hash table.

If the hash table at index I is not zero, and if it matches your hash, then this is your record - use it as you will.

If the hash table at index I is not zero, and it DOES NOT match your hash, then this is a collision. Rare, but possible.

In case of collision, perform a secondary hash on the same name, add the secondary hash to the index, and add 1 for good measure, and test again.

The basic idea is that you DIRECTLY convert a name string (of any length) into a unique array index.

My timing says that finding an index takes less than 5 uSec. You can retrieve any record by name in that time.

If you want to retrieve a record by name, you do the same procedure:

---- find the hash value of the name.

---- convert to index

---- check the hash table at that index.

---- If hash table[i] is zero, no such record exists.

---- if hash table[i] matches the hash, there's your record

--- if collision, adjust the index and try again.

The collision rate depends on how full the table is - try to keep the table size (N) more than twice what you expect to hold.

I can recommend hash functions if you are interested.

This sounds complicated, but it is VERY fast.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

Ola_A · ‎10-02-2014

Thanks GerdW, Yamaeda, Hooovahh, and CoastalMaineBird for all good suggestions!

To sum up the different solutions to my problem I can use indexes to improve the search, either as key value pairs stored as variant attibutes or hash tables.

I really liked the explanation about hash tables, I think I understand it (apart from collisions, do I just look at the next entry in the hash table, and the next... until an empty slot is found?)

Another suggestion is to use In Place Element Structures to improve performance when working with large arrays. (I am not familiar to the DVR Data Value Reference? concept, but could use the In Place Structure anyway)

The quick solution for me was to implement the In Place Element Structure for Indexing the array, but I will definately use the other suggestions next time I write something similar.

I also think I have all tools needed to speed up the rest of this code, the Show buffer allocations should be a good starting point. Probably I need to learn more about VI inlining and call by reference also, and I know these are described in Labview help. There are also a real time tool to show when memory is allocated I remember now.

Ola

Yamaeda · ‎10-02-2014

I think my In-place-solution will also lower the page faults as you should only extract 1 array each time.

As for the search, the hash table mentioned is great, but feels unnecessary for 150 values. In that case it'd be easier to separate the cluster into a names and a values-array, search the name array to get the index, and extract from the value array (array of clusters of arrays, or an array of variants that you convert to 1D array).

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

LabVIEW

Poor search performance in cluster with array of singles (100k)

Poor search performance in cluster with array of singles (100k)

Re: Poor search performance in cluster with array of singles (100k)

Re: Poor search performance in cluster with array of singles (100k)

Re: Poor search performance in cluster with array of singles (100k)

Re: Poor search performance in cluster with array of singles (100k)

Re: Poor search performance in cluster with array of singles (100k)

Re: Poor search performance in cluster with array of singles (100k)

Re: Poor search performance in cluster with array of singles (100k)

Re: Poor search performance in cluster with array of singles (100k)