03-06-2017 04:36 PM - edited 03-06-2017 04:37 PM
@tyk - E5-2667v4 Xeon @ 3.20 GHz. Installed memory(RAM): 256 GB (192 GB usable). It is a "sexy beast" of a system.
The built-in subvi has a data-type of 32-bit integer. It does type conversion at the connector. I could specify anything I wanted, but at 2^32+1 it is going to roll over to near -2^32+1. I tried it.
03-06-2017 05:23 PM - edited 03-06-2017 05:24 PM
There may be a way to create a DSN for the file, then use the Database Connectivity Toolkit to read it.
03-06-2017 05:48 PM - edited 03-06-2017 05:51 PM
So you were thinking something like the following?
https://docs.microsoft.com/en-us/sql/odbc/microsoft/text-file-format-text-file-driver
That is less than ideal because, as I said, I have ~1TB split into ~10GB chunks. It is going to be a lot of manual creation. It would be a lot easier if I could recursively look at the directory for all of my "txt" files, and feed their names into a chunk-process-loop that performs the processing that I am looking for.
03-06-2017 05:57 PM
I guess it's a little late in the development cycle to suggest saving in a binary format rather than ASCII? (Ducking.)
03-06-2017 05:59 PM
@bilko - I so dearly wish. It wasn't my choice. I advocated for a decent binary format for... 3 years. Too much organizational inertia/momentum. ... Technical debt?
03-06-2017 06:10 PM
Still waiting for some code. Otherwise I'm just guessing I'm afraid.
Don't forget that by using a 32-bit application (you said you were using LabVIEW 32-bit) you only have access to ~2Gb of memory for the entire process (I don't believe LabVIEW has the Large Address Aware flag compiled in, which would still only take you to a max 3Gb), irrespective of how much physical memory you actually have. Some 32-bit specific data processing applications solve this by spawning multiple independent processes to increase their effective capacity under Windows; it wouldn't surprise me if 32-bit R does this.
03-06-2017 06:34 PM
Does each line have a fixed amount of characters? If so @tyk gave you the answer. It will take some work.
If there is not a fixed number of characters, this will still work with the following modification.
Cheers,
mcduff
03-06-2017 10:15 PM
@EngrStudent wrote:
It isn't going to work. The "set file position" is in bytes and is a 16-bit integer.
.
If you are referring to the "Get File Position" and "Set File Position" functions on the File I/O Palette, you are correct that this is "position in bytes", but it is a 64-bit, not 16-bit, integer. Thus the maximum file size is on the order of 9 exabytes, probably more than your largest hard drive ...
Bob Schor
03-07-2017 02:07 AM
@EngrStudent wrote:
Background:
- I am trying to read in a delimited test file with ~10 million rows, that is ~8 GB in size.
- The "read text file" chokes on memory. It can't allocate enough memory. I am not even using it, it dies "at the gate".
- I'm cleaning up the data and writing to SQLite3 using the SQLite3 library by Dr. Powell.
- After I read the spreadsheet I use "flatten to string" then count characters.
Now I can't see why a pointer in FAT can't make the read part of this work reasonably quickly.
I am trying to use a "read delimited spreadsheet" and increment the start of read offset by number of characters. The idea is that I "gulp" a reasonable number of lines, count characters, increment the character skip value, process the lines, then return to "read delimited spreadsheet" with a larger "characters to skip" number.
It gets to ~1.43 million lines and dies.
Question:
- is there any reason it should die at that row count?
- why might this code be terminating at only a small part of the way through the data?
Wouldn't it be easier to just use Read from text file in line mode (e.g. 1E6 at a time) and do a String Length instead of flatten? Repeat until Error 4 (EOF).
Maybe something like this, using your Flatten approach:
03-07-2017 09:00 AM
I am a little late to the game but I will share anyway.
As I wrote in this NI Week paper, the Rad From Spreadsheet uses a an I32 as the byte offset. it will choke when the value goes to 2Gig and the file offset will appear to be negative.
The Read From Spreadsheet is not password protected and it is possible to go through all of the sub-VIs and change them to an I64.
It ain't easy but I have done it and it did work.
Also (see that paper) be aware that it takes time to convert the strings to numbers for files that large. Be patient!
In my case I broke the work up into multiple producer/consumers chained together so tht I put all of the cores to work at the same time. Some reading text from file, others looking for new lines and yet others converting text to numbers.
And since I had to be able to process the same data file possibly multiple times (customer changes mind and wants to take another closer look), I compressed the data and wrote it as an indexed binary file so that I only had to process the huge text file once.
As I said, a little late but it may help someone someday.
Ben