11-06-2019 09:44 AM - edited 11-06-2019 09:51 AM
Hi,
I am trying to merge several large files into a single file. Each of the files contains several hundred Megabytes of Data in several columns of Tap deliminated txt files.
If I just try to open them, merge them in a concatenating for loop and writing a new file I constantly get the error not enough memory.
What is an easy way to merge several files into one, while avoiding the memory overload issue? Each file contains 4 columns with millions of rows of data, the final file should have the same amount of rows as previously but each column added at the end.
Thank you for your help
Solved! Go to Solution.
11-06-2019 09:55 AM
Adding columns to a text file is difficult to impossible.
If you must use a text file then I suggest(it may not work),
open 1 file convert to binary, close file, open second file convert to binary append to binary array, repeat for third file.
Now you have a binary array of points, convert it to text in chunks, say 100k rows at a time. Append each chunk to the new text file.
mcduff
11-06-2019 10:09 AM - edited 11-06-2019 10:12 AM
@LukasSc wrote:
Hi,
I am trying to merge several large files into a single file. Each of the files contains several hundred Megabytes of Data in several columns of Tap deliminated txt files.
If I just try to open them, merge them in a concatenating for loop and writing a new file I constantly get the error not enough memory.
What is an easy way to merge several files into one, while avoiding the memory overload issue? Each file contains 4 columns with millions of rows of data, the final file should have the same amount of rows as previously but each column added at the end.
Thank you for your help
This task would be better performed in Excel itself.
If you were clever you could:
11-06-2019 10:10 AM
I would read all of the files 1 line at a time. You then merge everything into a line and write that line. Repeat until you have read all of at least one of the files.
11-06-2019 10:17 AM
Command window (or system exec)
Copy [file 1] [file 2] .. [file n] [destination file]
/Y
11-06-2019 10:49 AM
I think copy will append them row by row (add rows, not increase column count).
You may be able to hold all of your file references open and scan blocks from each file, use the Spreadsheet String to array (of strings) and appropriate Transpose operations and build array to go through faster than line by line, but it seems likely you cannot open and read all of any one file and make progress.
11-07-2019 02:16 AM
OK, I think you have some ideas for a solution. But having hundreds of megabytes of data stored as text files is not very good. Merging several of them to create a text file of gigabyte-size is worse. What do you need it for? It will be extremely slow to use and extract data from. Consider using tdms for storing that amount of data instead.
11-07-2019 03:35 AM
Wow TDMS files seem indeed promising. I might have to rewrite some of my Data Acquisition procedures but it seems worth it. Thank you for pointing that out, I wasn't even aware of the format.
11-07-2019 03:52 AM
@LukasSc wrote:
Wow TDMS files seem indeed promising. I might have to rewrite some of my Data Acquisition procedures but it seems worth it. Thank you for pointing that out, I wasn't even aware of the format.
If you're willing to switch to TDMS files, you have the significant advantage of being able to write new columns separately.
That would allow you to parse entire files at a time, and just keep adding new columns (perhaps based on a header row in the existing files?) to the TDMS file.
For TDMS, these would be Channels probably (although you can consider using Groups to group similar measurements, or perhaps to separate by source file, or however you wish).
11-07-2019 04:29 AM