Northern European CLD Summits

Reply
This is an open group. Sign in and click the "Join Group" button to become a group member and start posting.
Highlighted

Large Datasets in LabVIEW

Hey Guys,

Here are the slides from today on large datasets, I hope it gave some useful ideas. You can view the speaker notes using the options button on google drive at https://docs.google.com/presentation/d/18aS8gXcMtLelJmKy_FTYs9t-dNq8o1v5LqxylvQegmQ/pub?start=false&..., I've made them user friendly!

Some links:

www.wiresmithtech.com/blog - I will be putting this there as well and  hope to post some information on what I find with MongoDB when I do.

http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/ - big O notation beginners guide

http://bigocheatsheet.com/ - an example of the scalability of some common CS algorithms.

https://www.coursera.org/course/algs4partI - This is the algorithms course I started on line. It's quite involved and requires some Java use! I'm unaware about whether there may be others in different algorithms.

James Mc
========
CLA and cRIO Fanatic
My writings on LabVIEW Development are atdevs.wiresmithtech.com
Message 1 of 6
(5,855 Views)
5 REPLIES

Re: Large Datasets in LabVIEW

Hi James, thank you for the presentation, it was an interesting sample case very similar to a  project that PTP had just develloped for my old company.

One thing I have since been wondering, is the method for storing the rolling data. The Lossy enqueue seems to be the simplest method (in terms of coding). The second method I'd considered was overwriting an array.

In my early days of LabVIEW I had written something that inserted an element on to an array, and then deleted the first element, but this quickly consumed RAM, my assumption was that the RAM storing elements that had been deleted wasn't being released, so the array was effectively traveling through memory leaving a path of unusable memory behind it. I'm a little concerned that the lossy enqeue may do something similar in terms of memory usage?

Rolling Record.png

0 Kudos
Message 2 of 6
(3,325 Views)

Re: Large Datasets in LabVIEW

Hi James, Thanks for the presentation. As the initial proposer of the session I was interested to see what you would present and it was quite interesting.

One of the things I'm looking at and have an interest in an API for storing and retrieving data in my application - essentially having 'data' coming from multiple sources (e.g. DAQ, CAN). This is something that was previously attempted by another developer using variant attributes and DVRs but it lacked some of the features I needed that would have made it reusable.

This is kind of what I've got so far:

2014-05-15_10-13-42.png

The idea is that my user interface and periodic logging can use this API to access the data.

The way it works under the hood is by storing a variant in a DVR with the DVR reference held in an FGV. My 'data' is stored as variant attributes to allow fast lookups based on the name of the data (e.g. a CAN signal name or sensor/actuator name).

In the implementation I'm basing this on, the actual data itself is stored as a variant in a DVR and the DVR reference is stored in the main variant (so you have a variant with variant attributes that are DVRs to variants - wow that's confusing!). I'm not sure if that is overkill or if it's needed to stop the main variant from growing too large or causing lots of unnecessary memory allocations.

My next steps are to investigate typing of the data using classes (a base data class of variant + some basic signal information and sub-classes for the data types I'm interested in (with a base 'to string' function for logging) that converts the variant to the appropriate data type. A non-OO version would store the type information as an enum/string.

I also want to float the idea of being able to attach queues/events to signals for lossless history or for notifying when new data has arrived.

I'm not sure if there are any serious pitfalls that I'm about to fall into as I try to scale this up.

0 Kudos
Message 3 of 6
(3,325 Views)

Re: Large Datasets in LabVIEW

Hi Richard,

Thats a good question. You are somewhat correct about that RAM method. Everytime you delete an element from the front you are changing the size of the array, forcing the array to be moved in memory. It may be a complete location move, or at least every element would have to be copied forward. If moved then the memory would become highly fragmented.

The best way to do it with an array is that you allocate the memory size and then maintain a pointer/index to the oldest element. Each new data point you then overwrite the oldest element and increment the index, then to get the complete buffer you have to read everything after the pointer and then everything before and join them. This is the way I normally implement these.

The lossy queue is an interesting question as the queue's memory handling is hidden a little but this is my understanding:

  • First it shouldn't have as severe a problem as the array will store the individual points in memory as opposed to one massive array that has to be moved around all of the time.
  • I do believe it also works hard to reuse elements that are allocated and not already used, for example you can write a load of elements and read them out but rather than freeing the memory immediately the queue functions will hold onto the memory to save further allocations later.

For these reasons I suspect the queues will not show as severe a memory impact compared to deleting off the front of an array.

James Mc
========
CLA and cRIO Fanatic
My writings on LabVIEW Development are atdevs.wiresmithtech.com
0 Kudos
Message 4 of 6
(3,325 Views)

Re: Large Datasets in LabVIEW

Hi Sam,


Wow that is wierdly similar to what we started working on in our coding session! Though my hope for that was primarily around being lightweight so we ruled out subclassing different types (although it is technically possible). So the specific concern is around having all of the system data widely available, is that correct? And is performance a major concern?

I think there are three options I have seen to this sort of problem:

  • One is what you have shown, variant dictionaries storing the data so it can be found by name is a good method but that method needs to either be in a FGV or DVR so everyone has access (you shouldn't need both as they both offer similar functionality). Putting everything in as a variant is flexible but will probably come with a performance hit. Also be aware that the DVR or FGV can become a bottleneck so I would think about giving a process an option of just getting the underlying DVR and working with that.
  • Another solution I've seen is the current value table library, this uses a fixed size FGV at the core to store the data. It looks like they now also use variants for the lookups.
  • For super high performance then you get much more advanced. I have worked on an architecture inspired by the Veristand engine before where all of the processes send their data down RT FIFOs to a "data engine". This then kept a table of current values and could map outputs of one process to inputs of another. It does require using doubles for everything though!

I hope that gives some inspiration but I will be interested to hear what particular concerns are spurring it and that will probably help pick the best solution.

James Mc
========
CLA and cRIO Fanatic
My writings on LabVIEW Development are atdevs.wiresmithtech.com
0 Kudos
Message 5 of 6
(3,325 Views)

Re: Large Datasets in LabVIEW

Hi James,

Thanks for the information!

The use case for this is that we have systems that pull data from different sources (e.g. CAN over USB, CAN via cRIO, DAQ) and these signals need to be accessible from multiple places in the software (e.g. logging, various UIs). This is all PC based...

One of the key aims was for it to be flexible - the user can load up a new CAN definition file and all of the data is just...there. Hence the desire to use a name/value dictionary.

As for the performance - this is quite important - but I don't think we're doing anything earth shattering - probably reading in 300 or so CAN elements every 10ms and displaying a lot of these on the UI every 500-1000ms.

There are some other things that were important for us like being able to timestamp data so we can use timeouts.

I wanted the core storage mechanism to be data type agnostic so I can reuse this in other applications or for storing more complex data types - hence thinking about using classes (for each data type so I can use dynamic dispatch to format to string) or a type identifier.

One of the things that I wasn't sure on was how to store the extra attributes I'm interested in (e.g. timestamp - either replace my variant with a cluster of the attributes + the variant data or to store them as variant attributes.

I wanted to be able to use this in other applications - if I can crack a nice little API for this then I'm hoping it'll speed up my development of UI/logging functions if all the data is stored in the same core API.

Maybe I'm asking too much - maybe I can't have it all...

0 Kudos
Message 6 of 6
(3,325 Views)
Reply
This is an open group. Sign in and click the "Join Group" button to become a group member and start posting.