LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

How to read big files

Hi all,

 

I have a big text file (about 140MB) containing data I need to read and save (after analysis) to a new file.

The text file contains 4 columns of data (so each row has 4 values to read).

When I try to read all the file at once I get a "Memory full" error messags.

I tried to read only a certain number of lines each time and then write it to the new file. This is done using the loop in the attached picture (this is just a portion of the code). The loop is repeated as many times as needed.

The problem is that for such big files this method is very slow and If I try to increase the number of lines to read each time, I still see the PC free memory decending slowly at the performance window....

 

Does anybody have a better idea how to implement this kind of task?

 

Thanks,

Mentos.

0 Kudos
Message 1 of 15
(6,330 Views)

I would recommend reading the file in chunks (good size are about 10K) and process each chunk. It is possible that you will have left over data at the end of a chunk so you will need a shift regiaster to hold that last bit of data which will be prepended to the new chunk you read. if you need to process the data line by line do that after reading the large chunk and operate on the string data itself, not reading the file line by line.



Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot
0 Kudos
Message 2 of 15
(6,326 Views)

Hi Mark,

 

Sorry for the late reply.

I tried your recommendations and it really works better.

I still have a slow descending of the free memory but I hope it won't be so crucial.

I'll have to do some more checking.

 

Thanks!

0 Kudos
Message 3 of 15
(6,294 Views)

Can you post your code? You shouldn't have any issues with memory if done correctly. However without seeing the code I can't really see what is wrong.



Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot
0 Kudos
Message 4 of 15
(6,279 Views)

Your inner loop can be replaced with a Spreadsheet string to array.

 

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 5 of 15
(6,265 Views)

Hi Mark & Yamaeda,

 

I made some tests and came up with 2 diffrenet aproaches - see vis & example data file attached.

The Read lines aproach.vi reads a chunk with a specified number of lines, parses it and then saves the chunk to a new file.

This worked more or less OK, depending on the dely. However in reality I'll need to write the 2 first columns to the file and only after that the 3rd and 4th columns. So I think I'll need to read the file 2 times - 1st time take first 2 columns and save to file, and then repeat the loop and take the 2 other columns and save them...

Regarding the free memory: I see it drops a bit during the run and it goes up again once I run the vi another time.

 

The Read bytes approach reads a specified number of bytes in each chunk until it finishes reading all the file. Only then it saves the chunks to the new file. No parsing is done here (just for the example), just reading & writing to see if the free memory stays the same.

I used 2 methods for saving - With string subset function and replace substring function.

When using replace substring (disabled part) the free memory was 100% stable, but it worked very slow.

When using the string subset function the data was saved VERY fast but some free memory was consumed.

The reading part also consumed some free memory. The rate of which depended on the dely I put.

 

Which method looks better?

What do you recommand changing?

Download All
0 Kudos
Message 6 of 15
(6,235 Views)

Hi Mentos,

 

Both methods look fine. It really depends what works best with you. I would use the Read bytes aproach.vi, but as far as using replace substring compared to the string subset function thats really up to you. If time is not a really big issue I would go with the replace substring. If time is an issue then go with the string subset and just keep an eye on memory.

 

Tim O

Applications Engineer
National Instruments
Message 7 of 15
(6,204 Views)

Here is more of what I was processing. For posting here I didn't create any subVIs but if I probably would do that if this were going into an application. The processing I do on the data is for example purposes only to illustrate how you can parse the chunks of data. Naturally you would need to implement your specific processing but this illustrates processing the data a line at a time. The loop with the event structure is simply there to allow you to exit the program. The queue data could be a cluster which would contain a command and the data. This way you could post an element to the queue indicating that all the data has been read from the file and to exit the processing loop.

 

Large File Processing.png



Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot
0 Kudos
Message 8 of 15
(6,189 Views)

There was an issue with writing to the file. Here is the fixed version.

 

Large File Processing.png

 



Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot
0 Kudos
Message 9 of 15
(6,185 Views)

I should have noted that your Read bytes VI would have problems with very large files. Even though you are reading in chunks you are still reading the entire contents of the file into memory before doing anything with it. You need to only keep a portion of the file in memory at any one time.

 

The example I posted above doesn't require the parallel loops and queues. You could process each chunk of data within the read loop as well. I showed the queued method because this is a good way if you were reading from some device. Your reads would occur very quickly and you would avoid buffer overruns that may happen if your processing takes a fair amount of time.

 

The queued implementation can also result in excessive memory usage if you don't process the data in a timely fashion. If you read the entire file and queue the chunks before ever dequeuing anything you again will have the entire contents of the file in memory at one time.



Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot
0 Kudos
Message 10 of 15
(6,157 Views)