How can I read in a selected portion of a large ascii file (~200Mb)?

doyles · ‎04-27-2012

Check the outputs of the Read from spreadsheet file. 😉

Sorry Yamaeda, I am feeling a little dense because I'm not sure what you're telling me with that sentence. I am checking the outputs of every Read From Spreadsheet File VI. Is there a particular instance or a particular output that you're referring to?

I have refined my previous post:

This starts reading close to where I want (I start reading 1538 rows early - out of +8 million, that's "close"). But I don't understand why this isn't starting to read exactly where I want. The last row in subarray3 is exactly the row prior to where I want to start reading - which is what I expect. I determine the total string length contained in subarray3 and add that to the offset from the start of the read. Why isn't that the correct number of characters to offset?

Scott

Yamaeda · ‎04-27-2012

I assume there's CR/LF in every line ...

What i'd do:

If you want to get the 1millionth value and then 125 I'd simply put a Read from spreadsheet 1000000 lines and use the return offset as starting point in the 'real' data reading loop. (instead of calculating the offset)

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

doyles · ‎04-27-2012

@Yamaeda wrote:

I assume there's CR/LF in every line ...

What i'd do:

If you want to get the 1millionth value and then 125 I'd simply put a Read from spreadsheet 1000000 lines and use the return offset as starting point in the 'real' data reading loop. (instead of calculating the offset)

I thought about the CR/LF for each row but I don't think that's it. The subarray3 is 16618 rows in length and I am short by 1538 rows. Those two numbers don't correlate in any way that I can think of

I'm actually just going to use a hack for this since there are only three specific files and they are the "only" files I will ever need to load for this test. I just went through the text files and determined the exact byte count needed to load the correct row for each file. Then I put those constants hard-coded into a case structure and load those value based on file name. So that will work for this problem.

I would still like to know how to load a specific part of a text file properly. Everybody else has always loaded the entire file? I know that was my initial attempt, but still.

Thanks for all of your help Yameada and Diane.

RavensFan · ‎04-27-2012

You're snippet show 4 Read from Spreadsheet File functions. Yet you only use the Mark after Read (chars.) on one of them. Several times you were told to use that output on the other functions and use that as the offset point on the next read instead of trying to calculate the bytes inaccurately. Why aren't you using them?

DianeS · ‎04-27-2012

Well, I won't claim that this is the "proper" way to do it, but have a look and see what you think. There's not a doubt in my mind that others can do it better, and perhaps this will show up in the Rube Goldberg thread. So be it!

Maybe it at least points you in a direction. I know you've hard-coded your way around your problem, but who knows? Maybe this will be of use to you or someone else.

doyles · ‎04-27-2012

Awesome! Thank you Diane!

@DianeS wrote:

perhaps this will show up in the Rube Goldberg thread.

HA! YOU'RE code is Rube Goldberg! You saw that stuff above that I pasted in, right? I was pretty sure mine fit that bill, but I didn't care. Now I know it does, and I still don't care. It resulted in a working solution.

I know you've hard-coded your way around your problem, but who knows? Maybe this will be of use to you or someone else.

It was hack. It was not a solution. Thank you for eliminating that for me.

I FINALLY get the comments referring to wiring the "mark after read" from one RFSF to another. Sorry for the thick-headedness.

The first RFSF is an attempt to skip about 2.5 million rows to speed up the program. I purposefully did not use that "mark after read" so that I could attempt the large skip. But I understand your comments after reviewing Diane's code and that made me think that this is closer to what you have been trying to tell me to do:

I thought that might be faster than looping through, but both took about the same time to run. Apparently the reads don't take as long as I thought they would The loop will have less of a hit on memory, so I will essentially use your code Diane.

It still seems like it should be simple to find that location in the file without having to read all of the file up to that location into LabVIEW.

Thanks everyone.

Scott

DianeS · ‎04-27-2012

Glad I was able to help! You can probably speed things up somewhat by using "Open File", then using "Read from Text File" (right-click on the function and check "Read Lines" to read lines instead of characters) to read your data, and then obtaining the file mark position after the read using "Get File Mark". That way you only open the file once -- at the beginning -- and close it once -- after you've read the section you want to read. Everything else would remain pretty much the same.

"Read From Spreadsheet File" opens and closes the file every time you call it, so you're opening and closing the file with each iteration of the loop.

On the other hand, if it ain't broke, don't fix it.

doyles · ‎04-29-2012

@DianeS wrote:

You can probably speed things up somewhat by using "Open File", then using "Read from Text File" (right-click on the function and check "Read Lines" to read lines instead of characters) to read your data, and then obtaining the file mark position after the read using "Get File Mark".

I tried it, unfortunately it doesn't work due to the file. The text file is created similar to a csv file:

"header, header,"

"elapsedtime, data,"

.

"elapsedtime, data,"

That first comma must be treated as an EOL character. The first "column" for elapsedtime is read properly but the corresponding data is not (which does improve the read time drastically). I don't see a way to change that with the Read from Text File vi. I did try switching the Convert EOL with no difference.

Both elements are read fine if "Read Lines" is unchecked. However, then the number of characters (bytes) needs to be known.

If I'm missing something - yet again - please point it out.

DianeS · ‎04-30-2012

So when you use "Read Text File" with "Rows" checked, what comes out? It should be a 1D array of strings. Each element of the array should contain the data from 1 line.

As I understand you (and please tell me if I'm wrong), you are getting a 1D array of strings like the following, because your commas are interpreted as EOLs:

timestamp

data

timestamp

data

...

timestamp

data

Is that right? Ok. So decimate that array. (Just use the "decimate array" function on the Array palette.) Convert the relevant decimated portion (the one containing your timestamps) to a 1D array of numbers using "Fractional String to Number" (string palette). Search the appropriate array for your starting timestamp, as before.

For your actual data of interest, you can do the same thing. Convert both decimated string arrays to numeric arrays. Analyze your data.

doyles · ‎04-30-2012

@DianeS wrote:

As I understand you (and please tell me if I'm wrong), you are getting a 1D array of strings like the following, because your commas are interpreted as EOLs:

timestamp

data

timestamp

data

...

timestamp

data

Is that right?

No, it's a 1D array of strings containing just the timestamps (and header info). The data isn't in the array at all. It looks like this (including the commas):

header,header,

unit,unit,

timestamp,

...

timestamp,

A couple of notes:

the comma's are part of each element in the array of strings.
the first two rows have two commas in the rows and those elements have two commas as well.
I opened the file itself and the data "column" is NOT followed by a comma.

So I know longer think the comma is an EOL since every element in the string of arrays is ended by the comma, and there are two commas in the first two elements.

I can only read the data if I don't read lines, but that puts me right back at the original problem.

LabVIEW

How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?