DIAdem

cancel
Showing results for 
Search instead for 
Did you mean: 

My Datafinder, Settings to use for Large Index files

Solved!
Go to solution

Hi Group,

 

I am using My Data finder to index a large number of files (over 32k different files.) (using a custom VBS Data plugin) This has created a index file that is well over 15 GB. (SQL lite file size)  I am not sure what the limits are for My Data finder, as well as what settings for index timeout(or other settings) that  should be used when the index file is so large.

Notes: My properties names used by all files are very consistent, as they all are read by the same Data plugin.  The overall property count is less than 60.

 

This is my question list:

1) What is the index file size that has been found to work well, and what it the limit on index file size for My Data finder?

 

2) If are going to have large indices/database file, then what values should be used in configuration under advanced and timeouts?

 

3)  Am using around 15 different optimized properties.  What effect will having this quantity of optimized properties, be on index size or Query speed?

 

4) When are indexing a file area, and then find that need to make changes to index properties for which are optimized, Can the changed just be made on fly, or will index need to be reset and rebuilt?

 

5) When notice that fail for timeout for search, and then change the timeout parameter in advance settings, and then click OK.  Seems that still is using the older timeout.  What

event will cause My Data finder to use the new timeout?

 

6) The limit for returning results back to Navigator is 32,000.  Does this limit apply to script interface as well, or just the Navigator interface.

 

 

Maybe what I really need to do is to use Data Finder Server.  If I was to use DFSE: What is the max index size that has been found to work well? (Of course the machine would have over 32 GB of ram and at least  8 cores)

What would be the practical limit on property count, as well as number of optimized properties.    Maybe there is a equation, that would allow this to be modeled, so that can optimize a setup of a system.

 

Paul

Message 1 of 3
(3,370 Views)

Hi Paul,

 

This is a pretty detailed list of questions that would probably be better suited to a service request. It's going to take a fair amount of research to determine the performance effects of all these different settings on the DataFinder, so getting an Applications Engineer on the job will make sure you have someone looking into these questions (as many of them are pretty detailed and/or low-level for a user community forum).

 

 

NI Technical Support
http://www.ni.com/en-us/support.html

NickelsAndDimes
Product Support Engineer - sbRIO
National Instruments
0 Kudos
Message 2 of 3
(3,321 Views)
Solution
Accepted by topic author Pesmith8

Hi Paul,

 

1) What is the index file size that has been found to work well, and what it the limit on index file size for My Data finder?

I have personally used a My DataFinder index of about 50 GB, though query performance can be slow, depending on what level you're querying and where the most properties are located.  Any index less than 2 GB (which would fit in RAM in a 32bit application) would be solid.  Anything bigger is a candidate for the DataFinder Server, though, that's a judgement call based on your performance needs.  It keeps indexing and responding to queries well beyond that.

 

2) If are going to have large indices/database file, then what values should be used in configuration under advanced and timeouts?

The size of the index doesn't strongly affect the indexing time, so you can leave that timeout largely the same as when the index was small.  You will need to increase the query timeout-- I'd try 30 seconds to start with.  You may also need to increase the connect/browse timeout depending on how many files you have in a given folder-- again 30 seconds is a good guess.

 

3)  Am using around 15 different optimized properties.  What effect will having this quantity of optimized properties, be on index size or Query speed?

Theoretically that should make the queries involving those properties run faster, but it will definitely increase the index size.  It will also make it possible for you to access the DISTINCT values of a string property and the min/max values of numeric and datetime properties.  It will also let you use custom datetime properties in queries, so there are a lot of benefits.  On the DataFinder Server the opposite is also possible-- if any of the properties declared by the DataPlugin are never used in your queries, it is possible to "exclude" selected properties from being added to the index.

 

4) When are indexing a file area, and then find that need to make changes to index properties for which are optimized, Can the changed just be made on fly, or will index need to be reset and rebuilt?

You're going to need to rephrase this question.  What's a "file area"?  What is meant by "make changes to index properties for which are optimized"?  Whether the index needs to be reset will hinge on what changes you're talking about here.  Any file that is re-indexed will just update its previous records in the data base.  Adding or removing a property optimization will require large scale operations on the whole index (though not necessarily reindexing all the data files).

 

5) When notice that fail for timeout for search, and then change the timeout parameter in advance settings, and then click OK.  Seems that still is using the older timeout.  What

event will cause My Data finder to use the new timeout?

This should work.  Are you talking about an interactive session or a programmatic one?  How do you determine that the edited timeout value is not being used?

 

6) The limit for returning results back to Navigator is 32,000.  Does this limit apply to script interface as well, or just the Navigator interface.

This is a hard limit for the row-based query, which is the default query mode.  You can switch to the column-based query mode to return more than 32,000 query results.  Note that as you request more columns be returned, you will be allowed fewer returned query rows.  I think the limit for 1 column  is something like 1 million queried records returned, for 2 columns it's something like 500,000 queried records, etc.

 

Remember that the My DataFinder is a single-threaded, 32bit application, so it will NOT take advantage of all the resources on a fancy computer.  If that's your goal, again consider the DataFinder Server.

 

Brad Turpin

DIAdem Product Support Engineer

National Instruments

Message 3 of 3
(3,263 Views)