DIAdem

cancel
Showing results for 
Search instead for 
Did you mean: 

Optimum hardware configuration for DIAdem with large data files?

Brad Turpin provided scripts that aid in loading in large text files into DIAdem. These scripts behave differently when used on different computers. The dataplugins also behave differently when used on different computers. Can anyone recommend a hardware configuration for loading and processing data files in excess of 1 GB? The dual processor machines do not work with DIAdem but the dual core single processors do. Has anyone tried an extreme dual or quad core single processor with DIAdem?
 
Some of the results of our benchmarks are listed below.
The file size is 785 MB.
M90 is a Dell laptop.
Workstations 1 and 2 are Dell Precision 670.  
Fuel system workstation was custom built.
M70 is a Dell laptop  
 
The benchmark test is converting ASCII text file into binary. The benchmark results are the differences between the start and end times.

Benchmark Tests Using National Instruments (Brad Turpin) Scripts

 

M90 Laptop 1– 592 seconds (9.86 minutes)

Workstation 2 – 4 GB, hyperthreading – 672 seconds (11.2 minutes)

Workstation 2 – 4 GB, no hyperthreading – 699 seconds (11.65 minutes)

Workstation 2 – 2 GB, no hyperthreading – (original configuration) - 681 seconds (11.35 minutes)

Fuel System Lab Workstation – 678 seconds - trial 1 (11.3 minutes)

Fuel System Lab Workstation – 648 seconds - trial 2 (10.8 minutes)

 

Benchmark Tests Using DataPlugIn

M90 Laptop 1 – 2312 seconds (38.5 minutes)

M90 Laptop 2 – 2433 seconds (40.5 minutes)

M70 Laptop - 4914 seconds (81.9 minutes)

Fuel System Lab Workstation - 1710 seconds (28.5 minutes)

Workstation 1 - 815 seconds - trial 1 (13.6 minutes)

Workstation 1 - 812 seconds - trial 2 (13.5 minutes)

Workstation 1 - 820 seconds - trial 3 (13.7 minutes)

Workstation 2 - 1741 seconds - trial 1 (with other applications running) (29.0 minutes)

Workstation 2 - 1149 seconds - trial 2 (with no other applications running) (19.1 minutes)

 

0 Kudos
Message 1 of 9
(4,148 Views)

Hi Ken,

That's a nice set of benchmarks.  It's been a while since we last talked about this topic, though I do remember the VBScripts that you used for those tests.  Could you remind me what DIAdem version you are using?

Brad Turpin
DIAdem Product Support Engineer
National Instruments

0 Kudos
Message 2 of 9
(4,133 Views)
Brad,
 
For these benchmark tests, all workstations and laptops are using DIAdem 10.1.
 
Does DIAdem run on 64 bit?
 
What about VISTA 64 bit? Perhaps that will speed up the process?
 
 
0 Kudos
Message 3 of 9
(4,126 Views)
Update to benchmark testing with workstation 3;
 
 
Some of the results of our benchmarks are listed below.
The file size is 785 MB.
M90 is a Dell laptop.
Workstations 1 and 2 are Dell Precision 670.  
Fuel system workstation was custom built.
M70 is a Dell laptop  
Workstation 3 is Dell Precision 390 (3GHZ, 1GB ram) 
 
The benchmark test is converting ASCII text file into binary. The benchmark results are the differences between the start and end times.

Benchmark Tests Using National Instruments (Brad Turpin) Scripts

 

M90 Laptop 1– 592 seconds (9.86 minutes)

Workstation 2 – 4 GB, hyperthreading – 672 seconds (11.2 minutes)

Workstation 2 – 4 GB, no hyperthreading – 699 seconds (11.65 minutes)

Workstation 2 – 2 GB, no hyperthreading – (original configuration) - 681 seconds (11.35 minutes)

Fuel System Lab Workstation – 678 seconds - trial 1 (11.3 minutes)

Fuel System Lab Workstation – 648 seconds - trial 2 (10.8 minutes)

 

Benchmark Tests Using DataPlugIn

M90 Laptop 1 – 2312 seconds (38.5 minutes)

M90 Laptop 2 – 2433 seconds (40.5 minutes)

M70 Laptop - 4914 seconds (81.9 minutes)

Fuel System Lab Workstation - 1710 seconds (28.5 minutes)

Workstation 1 - 815 seconds - trial 1 (13.6 minutes)

Workstation 1 - 812 seconds - trial 2 (13.5 minutes)

Workstation 1 - 820 seconds - trial 3 (13.7 minutes)

Workstation 2 - 1741 seconds - trial 1 (with other applications running) (29.0 minutes)

Workstation 2 - 1149 seconds - trial 2 (with no other applications running) (19.1 minutes)

Workstation 3 - 2021 seconds - trial 1 (33.7 minutes)

Workstation 3 - 2052 seconds - trial 1 (34.2 minutes)

Workstation 3 - 2037 seconds - trial 1 (33.9 minutes)

Workstation 3 - 2013 seconds - trial 1 (33.6 minutes)

 

0 Kudos
Message 4 of 9
(4,047 Views)

Hi Ken,

In answer to your latest questions, I do not expect this process to run any faster on a 64 bit operating system.  DIAdem is a 32 bit application, so the operating system would have the extra work of thunking back and forth between 32 bit and 64 bit environments.  I would expect that to slow down the process, if anything.

DIAdem 10.2 Beta 3 (available on the Beta web site) does run on Vista, but here again I would not expect it to improve your execution speed.  Each new Microsoft operating system takes something like twice the memory of the previous operating system, so in general on Vista you would be dealing with less RAM available to the DIAdem application.  Note, though, that a 32 operating system will only grant up to 2 GB of RAM to any one application (such as DIAdem), so if you have 4 GB of RAM, say, and if Vista and all your other processes take no more than 2 GB, then DIAdem would have all the RAM it's going to get from the OS.  Now if you had 8 GB of RAM, then it MIGHT help to be on a 64 bit operating system, because then the OS MIGHT give DIAdem more than 2 GB of RAM to work with, and DIAdem MIGHT be able to use more than 2 GB.  I don't really know, and I don't think it's been tested.

I would concentrate on making sure that you give DIAdem the full 2 GB of high speed RAM to work with and also invest in very high speed hard disk drives.  It sounds like you already have plenty fast CPUs going there.

Brad Turpin
DIAdem Product Support Engineer
National Instruments

0 Kudos
Message 5 of 9
(4,011 Views)
Brad,
 
Dell will not allow us to run a benchmark on a workstation they plan on purchasing so perhaps you can address the following;
 
 
As shown above in the benchmark results;
 
When comparing the benchmark tests we have performed there is a noticeable difference in using the dataplugin on different workstations. Benchmark times vary from 13.5 - 40.5 minutes.
 
When comparing the benchmark tests when using your script the benchmark times vary only from 9.8 to 11.6 minutes.
 
We need to use the DIAdem standard data plugin more often than the script you prepared for us for loading in large data files.
 
Why would the workstation that performed best with your script perform worst with the dataplugin?
 
Is there anything within your script that allows a dual core processor parse out data faster than with the dataplugin?
 
Would a quad core processor be more beneficial to the data plugin or your script?
 
 
0 Kudos
Message 6 of 9
(3,973 Views)

Hi Ken,

Just to confuse the issue further, I actually sent you 2 different scripts, one that used the DataPlugin to load individual channels one at a time, and the other which used the older DAT header approach to load individual channels one at a time.  My guess is that you are using this second script with the DAT header code, which turned out to be faster.

Are you asking why there is a difference between dragging and dropping a file from the NAVIGATOR into the Data Portal and running my script, or are you asking why there is a difference between using the 1st script that loads channels with the DataPlugin and using the 2nd script that loads channels with the DAT header approach?

Let me go ahead and give you some answers independent of your answer back to me on my question above.  Both of my scripts load only 1 channel at a time from the ASCII file into the Data Portal, then immediately save that one loaded channel to a new and separate binary file.  The benefits you get from this approach are as follows:

1)  Only one channel has to be in DIAdem memory at any time.  DIAdem channels in the Data Portal are always stored as DBLs in RAM.  The hope with this approach is that each channel (as DBLs) will fit into your available RAM without having to resort to virtual memory, which could easily occur if you try to load all channels into DIAdem memory at the same time.
2)  Storing each channel in its own separate binary file gives you maximum random access speed for selective loading and interval/reduced loading of the binary file afterwards.
3)  Loading one channel at a time gives you an easy way to show a "progress bar" on the overall process, which as you've noted tends to take a while.

By contrast, dragging and dropping an ASCII file from the NAVIGATOR into the Data Portal (thereby using the DataPlugin directly) causes all channel values to be loaded into DIAdem memory (as DBLs) at the same time, most likely necessitating virtual memory.  If, however, you right-click on the ASCII file and choose "Open with", then use selective loading to load only 1 channel, you are effectively reproducing the first part of the 1st script.  If you then manually output that one channel to a new binary file with the DAT export functionality, you are effectively completing the steps of the 1st script.

If, on the other hand, you are comparing the performance of the 1st script against the 2nd script, the only difference there is the way in which each channel is loaded individually and sequentially.  The older DAT header approach is less flexible and involves fewer layers of software, and often can load ASCII data about twice as fast as a DataPlugin can.  This is just a difference in the underlying C code which parses the ASCII file and passes the data to the Data Portal.

Hope that helps,
Brad Turpin
DIAdem Product Support Engineer
National Instruments

0 Kudos
Message 7 of 9
(3,951 Views)
Brad,
 
With DIAdem version 9.1, the TDR files would not recall channel names. With DIAdem version 10+,  TDR files do make use of channel names instead of channel numbers and therefore I now load header information. So I used the data plugin wizard and made a new data plugin that identifies the header information. That dataplugin loads large files quickly on only one type of computer but not on other types.
 
With the data header script you developed for us there is consistency for loading large files on all computers. 
 
However the end users in the lab want data in a DAT/R64 format which results from using the dataplugin method.
 
With the script you provided for us the files separate into DAT/I16 formats and the end users have difficulty locating those files if your script is not used on their machine.
 
So our goal is to configure a new computer that will use the data plugin method and lower the 13 minute benchmark to approximately 5 minutes or less.
 
Based on the information you provided in the previous response, virtual memory may be the issue. We will need to follow your instructions to dedicate at least 2 GB of ram to DIAdem. 
 
If we cannot achieve 5 minutes or less with a dataplugin, then we will have to use your script and retrieve the I16 files.
 
Is there a method of combining and then converting I16 to R64 files? If so then I would run your header script which performs well on all machines, then combine all I16 files (maximum of 200 files) and then convert into R64 format so other end users can have 2 files (DAT and R64) to work with.
 
0 Kudos
Message 8 of 9
(3,930 Views)

Hi Ken,

Both the 1st script I sent you (which uses a DataPlugin, either mine or yours, to load the data), and also the 2nd script I sent you (which uses the DAT header approach to load the data), can ouput the data to binary files in a variety of data types.  I made I16 the default data type because it has the smallest hard drive footprint while preserving most of the data precision.  You can change one of the parameters at the top of either script so that the different ouputted binary files matched with the DAT header file are REAL64 or REAL32.  So this would be an easy change.

If, on the other hand, what you mean is that you want the converted binary file to have ONE binary R64 file to go along with the DAT header, then that would require additional effort and processing time.  As I explained in my last post, the script you are using to load the data channels never has more than one channel's data in DIAdem at any one time, so saving all those channels to a single binary file would require an extra process or a fundamental change in the script-- in either case the processing would take more time.

If you do want only one R64 file, then I would suggest that you add a few lines of code at the end of the script which loads the newly converted DAT header (matched with the N binary files), then saves that data out to a DAT or TDM or TDMS file.  My recollection is that loading the converted DAT file took less than half a minute, so this would probably be the fastest way to arrive at only 1 consolidated binary file.  You could even use the new DataFileHeader object in DIAdem 10.1 to read out the header properties with your DataPlugin from the original ASCII file (without loading any of the data) after you've loaded the converted DAT file into the Data Portal, so that when you save the data out to TDM or TDMS file you get all right File, Group, and Channel properties, as laid out in your DataPlugin.  Note that if you use the TDMS file format, you only have one binary file to keep track of.  Also note that both the TDM and TDMS file formats can be loaded directly into Excel with the TDM Excel Add-in available for free download on our web site.

Brad Turpin
DIAdem Product Support Engineer
National Instruments

0 Kudos
Message 9 of 9
(3,909 Views)