Opening a large text file (.cwa)

Paddy1985 · ‎05-31-2017

Hi all,

I've got a very basic problem whereby I would like to open a file containing accelerometer data. I'm aware that it's hex-coded, which isn't the problem. The file is 245Mb large, containing accelerometer (numerical) data sampled at 25Hz for the duration of 7 days. When trying to simply just open it (without taking the data any further) my operating system gives the default message <memory full>.

I was wondering if somebody could guide me, help me, provide me, with a simple structure as to how part-read a file and process without having to load the complete dataset into the computing memory?

All help much appreciated.

Bob_Schor · ‎05-31-2017

You haven't told us much about the file except that it is Hex-encoded (which I assume means that the numeric data are expressed in hexadecimal string representation, so the number "ten" (5+5) would be written as "A" or "0A" instead of "10", but you haven't said how the strings are separated (by commas, by <NL>, by relative spacing). You also haven't shown us the code you are using to read these data (please attach the actual VI, not a picture of its Block Diagram).

Assuming that the data are, indeed, in a Text file separated by line breaks, it is easy to configure Read Text to read a single line. You could parse the file one line at a time, accumulating the data in an Array. 7 days of 25Hz acquisition is about 15 million samples, which even if taken with three channels and expressed using 10 characters per number would only be 450 MB, not something that should cause LabVIEW or your OS to choke.

I just noticed that you said "my operating system gives the default message <memory full>". How are you opening the file? Can you open it in, say, NotePad without it crashing?

Provide a few more details -- this should be solveable.

Bob Schor

billko · ‎05-31-2017

This appears to be some kind of binary, maybe proprietary, file format? Googling the file extension shows there is conversion software available that will convert the data into a CSV file. Maybe that would be easier to manipulate? (I suppose it is understood that these files are usually associated with activity over days, so I guess they wouldn't offer that solution unless it was a practical one.)

Bill

(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.

Paddy1985 · ‎06-02-2017

Hi guys, thanks for both your responses.

Bob, you’re correct, I’ve only just started to explore this file format and as just yet I’m unsure as to how it’s separated as I don’t have any code as just yet. I have only been using the manufacturer’s software (which I’m trying to steer away from) as I’ve got around 44000 datasets coming in over the next few months.

You’re correct in the ‘strange’ message saying the memory is full. The data does open up in note-pad, hence I was surprised myself that LabVIEW didn’t wanted to handle it. A snapshot of the first xx characters attached in a .txt file (which looks like gobbledygook!)

So far I’ve been trying the most basic function of opening the file, but haven’t tried anything fancy yet as this is the first time I’m attempting to open both this file extension, data and file-size.

Perhaps it would be worth posting a small file (10-minutes measurement length), to see / play for yourself?

Billko; it is indeed from a company which provides software to export as a .csv file, from where I’ve got the coding ready to get the data further analysed. However Ideally, seeing the scale as per above, I’d like to circumnavigate their transform to .csv to set the program up as a stand-alone datacruncher / server.

Further detail:

Company / sensor: Axivity, AX3

Sensor information: http://axivity.com/downloads/ax3

Github: https://github.com/digitalinteraction/openmovement/

Example of .cwa file (to large to attache here): http://axivity.com/downloads/ax3

I hope the above helps to gain a bit more insight.

Both you help much appreciated!

Ben · ‎06-02-2017

You did not share how exactly you were trying to open the file.

If you were just trying to read it as tab-delimited file, and it is not, the memory full is not a surprise.

If you were reading it as an array of something and the file format was not of that type again the memory full is expected.

Opening the file as a U8's and working from there is most likely the first step until you can get a handle on the data format.

What would help is to have a small file of the original format and the csv version of the same data. Tht may get you somewhere.

Depending on the nature of the file format, it may be possible to reveres engineer the file format. I have done so many times but I should admit that there have been some file formats that simply where not worth the time and effort to reverse-engineer.

Let me close by suggesting that as you work you keep asking yourself;

"Is this an exercise in futility?"

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

Bob_Schor · ‎06-02-2017

Yes, that is definitely a proprietary Binary format (you can see "patterns" at various places in the file, but nothing obvious in Human-readable form).

There are a number of things you could contemplate. One is to use their software to "export" the data into a more "LabVIEW-Friendly" format (probably still binary, for compactness, but with fewer, or at least "known-to-you" Bells and Whistles. Another is to examine their firmware and "figure it out for yourself". A third is to query the device's User Community to see if anyone else has figured this out.

Bob Schor

Paddy1985 · ‎06-02-2017

Hi,

And yes, both absolutely correct, however you know what it's like when 'the boss' sets you a challenge and you accept it before realising you're stuck at step 1 (knowing the other 10 steps are already completed).

As per request, opening the smaller (56Mb as per website) is simple and takes around 23seconds to load (based on 10 repeats). However the 251Mb file is impossible to open.

Ben · ‎06-02-2017

rather than try to open it as text file, open it as binary file with a data type of "U8" and start by just reading a small section at a time.

What you do with the U8's and how they are interpreted depends on the file encoding. That is where the detective work is required.

Take the decoded file and look for things like column headers. Then try to find those same headers in the encoded file.

The data may also be arrays of floats that may have a byte count of how many floats. You will have to figure out if there are byte counts, how many bytes in a byte count, and how the byte count is stored as in big endian vs small endian etc.

If you can figure out the endianess for the byte counts then it is a safe bet the same endianess will be used for the data values.

I would be asking the boss how much time is justified.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

Paddy1985 · ‎06-02-2017

Many Thanks Knight! I'll be spending a bit of time on this tonight.

Bob_Schor · ‎06-02-2017

The Good News is that opening, reading, and closing the longitudinal_data.cwa file that lists as 251,406KB as a binary U8 takes less than 1 second and returns 257,439,744 bytes (which is consistent if a KB is 1024 bytes). There is still the "decoding" problem ...

Bob Schor

LabVIEW

Opening a large text file (.cwa)

Opening a large text file (.cwa)

Re: Opening a large text file (.cwa)

Re: Opening a large text file (.cwa)

Re: Opening a large text file (.cwa)

Re: Opening a large text file (.cwa)

Re: Opening a large text file (.cwa)

Re: Opening a large text file (.cwa)

Re: Opening a large text file (.cwa)

Re: Opening a large text file (.cwa)

Re: Opening a large text file (.cwa)