DIAdem

cancel
Showing results for 
Search instead for 
Did you mean: 

DataFileLoad vs DataFileLoadSel

I'm trying to understand the speed issues when using one command versus the other, and if there is any better way to speed up this process. Here is my situation.
 
I have a file with 300+groups and 12 channels.  Rather than loading all the data and removing unwanted groups/channels, my current script uses DataFileLoadSel to get the appropriate channel of data I want from each group. When all the data is loaded the data portal shows all the groups with 1 channel of data in each  which I then post process.
 
The data file is custom .csv file where I wrote a plugin appropriate convert the raw data into the groups and channels that I want. The plugin indexes the file properly and there are no issues with it. I also wrote a seperate utility to convert the raw data to TDM format which I can then import as above.
 
When I DataFileLoadSel a channel from the .csv file it takes ~5 secs per channel to load, so it takes nearly 30 min to complete. If I use the TDM version it takes 1
 
 
 
 
 
 
 
 
0 Kudos
Message 1 of 7
(4,768 Views)
Hi jsmalley,

The main difference is between CSV and TDM file format. The CSV file is stored in the ASCII file format. The TDM file is typically stored in XML for the header (TDM) and binary for the values (TDX).
Binary files can be read much faster than ASCII files. Especially if you have huge numbers of data I highly recommend to use a binary file format. It seems that your original data are stored in CSV and because of the numbers of data it makes sense to convert them.  
For that you can use your converter or DIAdem to do it. If you use DIAdem you can create a small script which imports the data and stores them as TDM file. This can be a process which runs during the night.

The difference between DataFileLoad and DataFileLoadSel is that the command DataFileLoad imports the complete data and DataFileLoadSel only a certain number of channels. If you use only a subset of your channels DataFileLoadSel is faster.

But there is another way to access your channels. You can register them (select the file and open the context menu by right-clicking) or use the commands DataFileLoad and DataFileLoadSel with the optional parameter "Register".  If you register your data only the header is loaded. The values are loaded with a special very fast technique direct from file but not until they will be used. Registered channels are read only because they are read directly from file. This is the fastest way.
I hope this info can help you.

Greetings
Walter

0 Kudos
Message 2 of 7
(4,760 Views)
Hi Walter.
 
Adding this part in that was cutoff before..........
 
"....When I use DataFileLoadSel a channel from the .csv file it takes ~5 secs per channel to load, so it takes nearly 30 min to complete the file load. If I use the TDM version it takes 1 sec per channel so it takes 5 min to complete the file load .
 
Odd thing is that if I use DataFileLoad  for all channels and it takes ~10 secs for the whole file to load regardless of using TDM or my pluging .CSV files. Why is the DataFileLoadSel taking so long?? ................"
 
So basically, I'm seeing it take a long time for loading just single channels vs the entire file. Is it just in general that the DataFileLoadSel is slower for large files?
 
I also have tried using "Register" but it doesn't help much for the DataFileLoadSel command.
 


Message Edited by jsmalley on 01-14-2008 07:29 AM
0 Kudos
Message 3 of 7
(4,747 Views)
Hi jsmalley,

The command DataFileLoadSel work like this: open file, load specified channels and close file. So, if you call the command for each channel - that means round 300 times - it takes the time you found out. But there is a way to optimize it. You can try to specify all channels you would like to load in one command. In example if you would like to load each first channel of all channel groups the command is:

DataFileLoadSel(MyFileName, "[1-300]/[1]")

And this should be faster than loading the whole file. I hope this info can help.

Greetings
Walter

0 Kudos
Message 4 of 7
(4,742 Views)

That won't work because I have multiple groups. There are 300+ groups and 12 channels per group. I only want 1 channel in each of the groups to do histogram analysis.  I have a loop like this for each plugin type that does this. For ex, TDM.....

Case "TDM"
          Select Case Trim(ChannelSet)
            Case "", "1-"
              Call DataFileLoad(FilePaths(i), DataPlugin)
            Case Else
              Set oDataFileHeader = DataFileHeaderAccess(FilePaths(i), "TDM",True)
              oDataFileGroupCount = oDataFileHeader.GroupCount
              for j = 1 to oDataFileGroupCount
                 oDataFileGroupName = oDataFileHeader.GroupNameGet(j)
                    Msg = "Loading Group" & " " & j & " of " & oDataFileGroupCount
                    Call MsgBoxDisp(Msg, "MB_NOBUTTON", "MsgTypeNote", 0, 0, 1)
                    Call GroupCreate(oDataFileGroupName)
                    Call GroupDefaultSet(GroupCount)
                    Call DataFileLoadSel(FilePaths(i), DataPlugin, "[" & j & "]/" & ChannelSet, "Register")
                    if (ChnPropValGet("[" & GroupCount & "]/" & ChannelSet, "Count")) =0 then
                      Msg = "Removing zero length data group" & " " & j
                      Call MsgBoxDisp(Msg, "MB_NOBUTTON", "MsgTypeNote", 0, 0, 1)
                      GroupDel(GroupCount)
                    else
                      ChnDel("[" & GroupCount & "]/" & ChannelSet)
                      Call DataFileLoadSel(FilePaths(i), DataPlugin, "[" & j & "]/" & ChannelSet)
                    end if
              next
          End Select

Case "PALM_USERLOGPLUGIN"
          Select Case Trim(ChannelSet)
            Case "", "1-"
              Call DataFileLoad(FilePaths(i), DataPlugin)
            Case Else
             Set oDataFileHeader = DataFileHeaderAccess(FilePaths(i), DataPlugin,True)
              oDataFileGroupCount = oDataFileHeader.GroupCount
              for j = 1 to oDataFileGroupCount
                 oDataFileGroupName = oDataFileHeader.GroupNameGet(j)
                    Msg = "Loading Group" & " " & j & " of " & oDataFileGroupCount
                    Call MsgBoxDisp(Msg, "MB_NOBUTTON", "MsgTypeNote", 0, 0, 1)
                    Call GroupCreate(oDataFileGroupName)
                    Call GroupDefaultSet(GroupCount)
                    Call DataFileLoadSel(FilePaths(i), DataPlugin, "[" & j & "]/" & ChannelSet, "Register")
                    if (ChnPropValGet("[" & GroupCount & "]/" & ChannelSet, "Count")) =0 then
                      Msg = "Removing zero length data group" & " " & j
                      Call MsgBoxDisp(Msg, "MB_NOBUTTON", "MsgTypeNote", 0, 0, 1)
                      GroupDel(GroupCount)
                    else
                      ChnDel("[" & GroupCount & "]/" & ChannelSet)
                      Call DataFileLoadSel(FilePaths(i), DataPlugin, "[" & j & "]/" & ChannelSet)
                    end if
              next
          End Select    

 

 

0 Kudos
Message 5 of 7
(4,739 Views)
I understand now Walter. I tried this after obtaining the group count from the header... 
 
 Call DataFileLoadSel(FilePaths(i), DataPlugin, "[1-" & oDataFileGroupCount & "]/" & ChannelSet).  
 
This takes about 10 secs to load the channel set for the .csv file, 3 secs for the TDM or TDMS file version.  
 
 I unfortunately lose all the group info this way because it loads all the channels under one group. But it does show me that there is a significant timing difference loading multiple channels versus loading them in a loop one group at a time when using the DataFileLoadSel command.
 
 It would have been good to have a way to load in multiple channels but preserve the group information for those channels.
0 Kudos
Message 6 of 7
(4,725 Views)
Hi jsmalley,

I think you will be happy to hear that you don't lose the information of the group. Please mark a channel which you have loaded partially and have a look at his properties in Dataportal. There you find the channel history:

"Group name"        :   The group in which this channel is stored (currently)
"Data store source" :   The file name in which this channel is stored (currently)
"Souce file"        :   The file name from which this channel is loaded
"Souce file path"   :   The path name of this this file
"Data source type"  :   The type of the source file
"Source context"    :   The channel group were the loaded channel is located in the source file


Greetings
Walter
0 Kudos
Message 7 of 7
(4,711 Views)