Scan from String fails to generate expected error

mjjaeger · ‎10-07-2010

All,

I'm writing a program that requires me to examine a generic ASCII *.txt file for input data. The provided files vary widely in composition, with differences in the number of header rows, and multiple formats for date and time in the columns. I'm trying to write the import code to be "smart," so that it will automatically exclude the headers and time/date columns from the array of data.

I thought I had this figured out, using the "Scan from String" VI. I'm using a series of loops to go through each individual string in the delimited file, and then parsing those strings using a progressively less-conservative set of format strings. However, I run into problems when trying to examine "2010/10/07" or similar strings. When I use "Scan from String" to attempt to accept this as a double-precision number, I get back "2010." What I really want is for "Scan from String" to throw an error, so I can correctly classify the entire "2010/10/07" stream as a string. See attached VI for an example.

Is there a way to prevent "Scan from String" from ignoring the parts of the input string that don't fit? In other words, force it to try to convert the entire string into a number?

Any help you can offer would be appreciated. Thanks,

Mark

altenbach · ‎10-07-2010

You could check if the remaining string is empty.

(Still, what you are doing seems a bit too complicated. There must be an smarter way to do all this. Do you have some example files?)

LabVIEW Champion.

mjjaeger · ‎10-07-2010

Good point, I hadn't thought to check the remaining string (I'm ignoring it in my program).

I've been thinking about this for a couple of days now, and I'm afraid that I can't identify any easier ways to accomplish what I need to. See attached sample files. Basically, I need to be able to identify the start of the "actual data" in each file (i.e. 1 2 3). The files may have different numbers of header lines, columns, or basically anything; the only thing I know is that there will be an array of data in there somewhere.

If you can point out a more concise method of parsing, I'd be extremely grateful. If not, I'll probably try to implement the "poll for remaining string" functionality that you mentioned.

Thank you for your response!

Mark

ben64 · ‎10-07-2010

You can use regular expression to filter the data that you want and convert to number using the spreadsheet to array function.

note: there is probably a simpler regex that would do the job.

Ben64

mjjaeger · ‎10-07-2010

Ben,

Very impressive, thanks! I've never used the "Regular Expression" VI before, primarilly because I don't understand how to format that input string. I'll have to study what you sent to make sure I can figure it out.

Any suggestions on a beginners tutorial for creating the RegEx strings?

Mark

nathand · ‎10-07-2010

A couple of sites with information on regular expressions were mentioned in this thread.

Darin.K · ‎10-07-2010

@ben64 wrote:

note: there is probably a simpler regex that would do the job.

Look behind assertions are pretty cool, but my regex tip for the day will be the \K code which resets the beginning of the match. I would simplify the regex in ben64's example as follows.

:\d{2}\s\K[\h\d\.]+\n?

Instead of looking for the entire timestamp I find a colon followed by two digits and a space. Now I reset the match using '\K', so only the part that matches the remainder of the regex will be returned as the whole match. For that part I look for a combination of horizontal whitespace (\h), digits (\d), and decimal points (in case the numbers aren't all integers). There may or may not be a newline (\n), sometimes the last line of a file does not have one so I added the ? quantifier.

mjjaeger · ‎10-08-2010

Thanks a lot everyone, very helpful suggestions. I'll post some of my final code (when it gets to that point) to demonstrate how I used your suggestions.

Mark

ben64 · ‎10-08-2010

Thanks for the tip Darin, I was not familiar with the use of \K (very usefull). I tried the \d regex but for an unknown reason it wasn't detecting anything this time (it usually work, I'm using LV2009) so I had to use [0-9].

I think it would also be a good thing to add an optional minus sign to the match pattern in case of a negative value.

:\d{2}\s\K[\h\-?\d\.]+\n?

Ben64

LabVIEW

Scan from String fails to generate expected error

Scan from String fails to generate expected error

Re: Scan from String fails to generate expected error

Re: Scan from String fails to generate expected error

Re: Scan from String fails to generate expected error

Re: Scan from String fails to generate expected error

Re: Scan from String fails to generate expected error

Re: Scan from String fails to generate expected error

Re: Scan from String fails to generate expected error

Re: Scan from String fails to generate expected error