07-17-2015 03:01 AM
Hi all,
We're building a networked test system that uses a centralised database for retrieving and storing information for each test station, of which there are many. As a part of this, each test station will generate data that needs to be stored in the database, and we've been thinking about the best way to achieve this.
These are our requirements:
Regarding frequency, the quicker test stations will produce a single result (Pass) every 30 seconds, whilst the slower systems maybe an array of DBLs once every five minutes. If the database gets 'large' then we'll look to archiving out old data once in a while, so size isn't a concern.
We expect that the problem of needing to flexibly store various data in a database has been solved a thousand times before, so to save us reinventing the wheel we were wondering if there's a published solution out there, an industry standard approach perhaps, that we can refer to?
Any advice thankfully received!
RTD
07-17-2015 03:48 AM - edited 07-17-2015 03:49 AM
As far as I know, there isn't really an 'industry standard' for storing test results because every company has their own requirements for what data they want to store/retrieve (e.g. the data stored with each test). There are probably COTS (commercial off the shelf) solutions available but they may not do what you need or may be completely overkill.
As a rough idea, this is something I put together recently as a small system for storing test measurement data:
This allows me to store any number of measurements associated with a specific test run. If I needed to store anything other than floats I would either add a str_data column or create a table for each type of measurement I'm expecting. I've also done some work for storing production line test data - that was a lot more complicated and ran into the order of 15 or so tables.
As for the frequency/size of the database - this isn't really an issue with how big the database gets, they are designed to work with millions of rows (as long as you choose appropriate id number formats and indexing). You're more likely to run out of disk space on the server before you run into any real limitations. It should be pretty easy to estimate the amount of space your database will use over time (add up column record size in kb and multiply by number of rows generated over X months/years).
As for a database engine - MySQL and SQL Server are common, but neither of these are really suited to storing arrays of data (you'd have to convert the array into rows and or serialise it into a string which makes calculations harder) - you might want to look at something like MongoDB (stores records as 'documents' and they don't have to have a rigid structure).
Pretty much all database engines have some sort of interface which you can use with LabVIEW or other applications (e.g. ODBC, command line, TCP/IP, DLL).
07-17-2015 03:59 AM
Hi Sam,
Thank you for your reply. Our current approach is similar to your suggestion, and includes a table of Reports (each with primary keys) and multiple tables of Results. Each Results table handles one data type, and each entry references the Report key. With this we can create Reports with multiple Results entries of various datatypes. It's also expandable to new datatypes. But we're not sure about the chronology - how can we ensure the data in the Report is returned from the database in the same order it was created? For example, if we acquire a numeric, then a boolean, then another numeric results, those three results might come back from the database as two numerics and a boolean. The order is now changed, and that will affect report content generation.
Storing arrays seems troublesome. We're using SQL Server, so we might need to serialise the data and keep it interpretable for other applications.
I'd be interested in seeing how others solve this problem. We need flexibility to accommodate unknown future datatypes (which we have), and potentially large data sets too. For this we expect we would simply store a path to a file and keep the data outside of the database itself.
07-17-2015 04:11 AM
You could put an 'order' column on the results table? Either that or if they need to be sorted chronologically then put a timestamp on the results. You can also sort by the id.
If you're not bothered about the database doing calculations on the arrays (so it's just for archiving) then you can just comma-separate them and store them in the array as text. We did that for storing configuration keys - we wanted to know which configuration keys were used for a test but weren't too bothered about lookups on the data so we just comma-separated them and stored the ids as text.
Also - most database engines support a binary BLOB data type - you can use this to store binary data such as files in the database itself. Store a BLOB of file data and the file name in a 'files' table. You can use this for any custom data types as well (but be aware of things like endianness when dealing with binary data).
MongoDB is good for these sorts of flexible applications because it has no rigid structure - your 'records' can contain any fields/data types.
07-17-2015 04:35 AM
In actuality, I got the description of our current approach wrong. This is a snippet of the design:
We have a TestReport table, and a TestResult table. Each Report is linked to muliple Results entries, which each reference one of the smaller specific tables, such as ResultBOOL.
This gives us our ordering through the ID field and we can expand to additional datatypes by creating more specific tables on the right hand side.
But it's not just about the storage, it's about reasonable levels of retrieval too. A query to this database to return all the Results for a provided ReportID will result in what? When you think about it, you would get a large 2D variant array (as that's how LabVIEW returns data), with columns from all the Specific tables concatenated into one. How do we know what datatype each cell is?
Alternatively, we ask for all the BOOL results, then all the INT results, then all the STR results etc., which makes for multiple calls to the database.
Is there a better way to store the data in the database such that retrieving it is simpler?
07-17-2015 05:23 AM
Have you considered the way TestStand stores data?
07-17-2015 06:57 AM
@jcarmody wrote:
Have you considered the way TestStand stores data?
I've long toyed with the idea of writing a LabVIEW API for this. Someday. 😄
07-17-2015 07:15 AM
I've had a look at the link, I think that will take some time to understand fully. Does it have more flexibility than the aforementioned solution? I can't see any enquiries to this being simple.
07-17-2015 07:49 AM
07-18-2015 03:29 PM
@RTDSafety wrote:
Alternatively, we ask for all the BOOL results, then all the INT results, then all the STR results etc., which makes for multiple calls to the database.
Is there a better way to store the data in the database such that retrieving it is simpler?
We basically store all results in tResults with attributes ID, type, value(dbl), value(string). Int, bool and single/doubles can all easily be stored as doubles with the Type attribute determining how it should be returned. If type is String it's ofcourse the varchar field that's used. One clever solution of the double specials of +/-Inf and NaN, which cannot be stored in SQL server, is stored as StringNumber instead, meaning it should be stored in the String field, but through Extract from string and Format into string. The benefit is ofcourse one data format and one question for all normal data.
Arrays are a special type which uses a sub-table and the string can store the ID/guid of it.
/Y