Apparent inconsistency of "spreadsheet string to array"

Roo Payne · ‎08-11-2004

System: LabView6.1, XPpro, 512MB, 2.4GHz

I have a sub-vi which I call to read a single column of data from a large CSV file. Files can be >500000 rows with 4-8 cols (ie. 15-30MB). The sub-vi uses "Read File", with byte stream type unwired, to return a string. This string is then passed to "spreadsheet string to array" (SStA), to obtain a 2D array. "Index Array" is then used to obtain the required data column.

If I wire the format string of SStA with %f, and the array type with a 2D double, the sub-VI takes about 8 seconds every time I call it.

If I wire format string with %s, and array type with 2D string (and convert to a double later, after index array), the sub-VI takes over 50s the first time it is c
alled, but only about 1s on each subsequent call.

The bottle neck is ONLY at SStA. I am not explicitly preallocating any arrays.

Can anyone explain what LabView is doing internally with respects to memory allocation? Why is the second routine slow on the first call only, and subsequently considerably faster than the first routine? Is there any alternative way of "priming" the second sub-vi so it is not slow on the first call? I guess reading only subsets of data at a time might help, but I'd still like to understand what is going on with the current approach.

rolfk · ‎08-11-2004

Roo Payne wrote:

> Can anyone explain what LabView is doing internally with respects to
> memory allocation? Why is the second routine slow on the first call
> only, and subsequently considerably faster than the first routine? Is
> there any alternative way of "priming" the second sub-vi so it is not
> slow on the first call? I guess reading only subsets of data at a time
> might help, but I'd still like to understand what is going on with the
> current approach.

Basically NI seems to have added some performance optimization with
internal caching to the Spreadsheet String to Array function in the case
of String outputs. Without it it would always take 50 seconds. If the
input string does change significantly it probably won't be such a huge
speed up anymore.

For numeric type outputs no caching has been added and wouldn't help
that much in comparison.

Rolf Kalbermatter

Rolf Kalbermatter
My Blog

shoneill · ‎08-11-2004

Well, I'm going to offer an educated guess here.

Numerical data types in LAbVIEW take up a defined amount of memory. Strings, on the other hand, can vary. When performing operations on large numbers of strings (especially when creatinga rrays of strings) there are many more memory operations required than for numerical data types, as the compiler cannot know in advance how much memory needs to be allocated for each string. This results in a LOT of copying and moving in memory.

It might just be that LabVIEW allocates this memory the first time (Thus the 50 seconds) but doesn't release it immediately. Since this memory space is still reserved for the following execution, the memory operations disappear, and the execution time is massively improved.

I might be wrong, but I observed something similar in the second-last Coding challenge (Meta-word) where using a defined-length data type as an approximate representation of a string led to a huge increase in performance..... I think the inability to pre-determine the size of a string array is crippling due to the extra memory operations required during processing.

Slightly off-topic, I wish LV could make use of a limited-length string similar to what's available in other languages. I think this would nicely combine the flexibility of strings with the performance of fixed-length datatypes.

Hope this helps

Shane.

Using LV 6.1 and 8.2.1 on W2k (SP4) and WXP (SP2)

Roo Payne · ‎08-12-2004

Thanks for these. I wonder if this behaviour is common in other LabView functions?

rolfk · ‎08-12-2004

Roo Payne wrote:

> Thanks for these. I wonder if this behaviour is common in other
> LabView functions?

It would be a safe bet that there are other functions which try to
optimize at least for large string arrays. LabVIEW has come a long way
and they have refined some internal algorithmes in version 6 and 7
considerably to avoid some performance penalties.

That as in your case this makes certain execution to seem inconsistant
in execution time is a small penalty to take. Just consider that in the
case of strings before optimization the same operation would probaably
have always taken around 95% of the time it takes now for the first time
at the advantage of 50 times faster execution in the case of repeated
execution in the new version.

Rolf Kalbermatter

Rolf Kalbermatter
My Blog

LabVIEW

Apparent inconsistency of "spreadsheet string to array"

Apparent inconsistency of "spreadsheet string to array"

Re: Apparent inconsistency of "spreadsheet string to array"

Re: Apparent inconsistency of "spreadsheet string to array"

Re: Apparent inconsistency of "spreadsheet string to array"

Re: Apparent inconsistency of "spreadsheet string to array"