How can I read in a selected portion of a large ascii file (~200Mb)?

doyles · ‎04-26-2012

I have several large ascii files that I need to read in. These files are part of a standard test for an application I wrote. How well my application analyzes the files determines how well the program accomplishes it's primary task. For an actual test, the application captures the live data as a 2D array of doubles and analyzes the data in that form. This array is 3 million elements long (1MS/s @ 3s). I typically never deal with any form of ascii file as all the data is stored as TDMS using this method (although I need to update this link with some critical changes). (Thanks again Ben)

The ascii files I need to read all have two rows of header data, followed by ~8million rows of data representing the data I typically capture. Each row contains one data point and one accumulated time value. I need to separately load each file, analyze the data within, and report the results. The part I need help with is loading the large file. I have thus far been unable to load a file without memory issues. One reprieve is that similar to a real test, I actually only need 3 million of the rows for the analysis. But I need to be able to select which 3 million rows based upon the time values within the file. Technically, I only need the single column of data and the sampling rate represented by the time values.

How can I select a specific section of an ascii file and read them into LabVIEW as a 2D array of doubles? Can that be done for 3 million data points without crippling the system by using all of the memory just to accomplish that task? An alternate "last resort" version would be to run a separate program to create a TDMS file that I could then read and go from there. But I would prefer to read the file in directly to my application.

I am running on an HP EliteBook 8540w with Win 7 Enterprise (64bit), i7 dual-core CPU (2.67GHz) with 8GB RAM and LV2011 32-bit.

Thanks,

Scott

DianeS · ‎04-26-2012

Hi,

I assume that your file is a txt or csv file (i.e. could theoretically be opened by Notepad or something similar?) If so, you can use "Read From Spreadsheet File" to read the data in chunks. It has both a "number of rows" input and a "start of read offset" input.

Alternatively you can use the regular file I/O VIs. "Read From Text File" has a "count" input, so you can tell it how much of the file to read. Then use "Set File Position" to set the file mark to the end of the chunk you just read, so you can read the next chunk. Use "Spreadsheet String to Array" to convert the text file string to an array of numbers.

The LabVIEW help is a wonderful thing...I've learned a lot from it. Start poking around the file function help files and you should be able to put something together.

Hope that gets you started!

d

doyles · ‎04-26-2012

Hi Diane,

Yes, it is a .txt file. Sorry for that omission.

I have read a number of the help files and searched for anything I could find regarding "LabVIEW large ascii" and a bunch of other related searches.

I also have attempted to use the Read From Text File as well as the Read From Spreadsheet File VI - these are the vi's I used in the attempts that resulted in memory issues. Another omission in the original post.

Using the Read From Spreadsheet File VI came the closest to what I need to do. I have been doing some further attempts at only bringing in a part of the file using this since your response. The issue I have with this is now is trying to get the correct chunk of data from the file: The offset is based on a number of characters and the criteria for size to be read is a number of rows. I'm having difficulty correlating the known number of rows read with the unknown number of characters on each row, to use for the number of characters to offset.

In an attempt to make that unknown a known, I read in various sections of each file as a string array and determined that every row (that I sampled) has two columns, each with 18 and 11 characters for each of the two elements of a row of the string array. But when I try to use the 29 character per row as a multplier with the number of rows that I want to offset (also adding in the offset that I determined by trial and error for the header rows) I end up starting to read a part of a column, not starting at a new column. What characters does the String Length not count when looking at an array of strings? I tried adding in an additional 1, 2, and 3 to the multiplier in case it wasn't counting any column delimiters or an end-of-line character on each row. I still am not getting the full part of the first element of the array.

Am I missing something else?

Thanks for the help.

DianeS · ‎04-26-2012

It seems like this should work:

Doesn't it? "Read From Spreadsheet File" gives you the file mark after it's done reading. Use that as your offset to select your next chunk of data.

doyles · ‎04-26-2012

My wife often complains that I can be looking right at something and still not see it. I cannot deny it as it's happened several times. It appears it happens even when I'm looking at the help files. The "mark after read" used for the offset gets the chunk part of it working properly. Thank you Diane.

Unfortunately I still don't know how to get to the exact position in the file that I need. Below is the vi as I currently have it. I have verified that the value entering the expression node (label "A") is very close (within 5 rows) to the row number that the array should start reading from. However, the array that gets read (label "B") in after I attempt to convert the rows to characters (label "C") is significantly off of the desired starting point. I've tried to adjust the multiplier, but no value gets it correct. Is the multiplier concept a typical solution to get to the number of characters? Could you describe a better way to do this? Notice that the character offset (label "C") I'm adding for the header is different than it is for the first read (label "D"), which aligns properly with the columns. I can't explain that, but that is the way it works. If I change the multiplier, I need to change the header offset. I don't understand that either. I believe there is something fundamental about how the characters are counted that I don't understand.

Even with all of that, I decided that being earlier in the file than I desired wouldn't really matter because the while loop will find the exact element I need. And I've verified that it does find the correct element using the subarray and a probe. So all that should be left to do is to convert the for loop iteration value (which represents the row number of the array) into a number of characters, add that onto the starting point of the array location and do the final read from that point. The end result is that the array I'm reading has started reading at a point past where I should be starting. I don't understand that as I'm not even converting to the number of characters, just adding the number of rows directly - which is 1/29th what the number of characters would be.

Hopefully I'm missing something simple again. But I'm not, ahem, seeing it.

Scott

doyles · ‎04-26-2012

I was able to determine that the number of characters does differ for some of the rows. That was what I initially expected.

DianeS · ‎04-26-2012

It might be easier to use the output file mark from your first call to "Read From Spreadsheet File" as your initial starting point...that way you know for certain that you're at the start of the data in your file. I'll noodle at it a little bit, right now I'm kind of distracted with something else. I agree that it would be nice to be able to track which row you're on instead of tracking a character offset. Let me think on it.

Yamaeda · ‎04-26-2012

The input to Read from spreadsheet file is Rows, not characters. The offset is characters. If you read e.g. 1000000 lines you'll get the file pointer position as output, feed it to a shift register and use that offset the next loop.

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

doyles · ‎04-26-2012

@Yamaeda wrote:

The input to Read from spreadsheet file is Rows, not characters. The offset is characters.

Exactly. I can determine where I need to be in the file by the row value. I don't know how to determine where I need to be by the number of characters. I need the number of characters for the offset to get to the correct row, one way or another. I need help with converting the row value to the numaber of characters. I have an idea that I'm working through but I'll be away for a couple hours. Kids, wife, that kind of stuff.

Here's a snippet of the path I'm heading down. I haven't worked through it yet and had to leave at this point. I'm closer with this idea I think, but I have to look further.

Yamaeda · ‎04-27-2012

" I need the number of characters for the offset to get to the correct row, one way or another. "

Check the outputs of the Read from spreadsheet file. 😉

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

LabVIEW

How can I read in a selected portion of a large ascii file (~200Mb)?

How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?

Re: How can I read in a selected portion of a large ascii file (~200Mb)?