LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Using LabVIEW to read file of unknown format/ type/content

Basically wondering if anyone has a good strategy for this.  (Anyone working for the NSA recently?).  File has header info followed by streamed RF waveforms. I started doing some things (see attached VI).  I expect it is just a question of trial and error, determining header bytes, what format data following the header was streamed in, little endian-big endian, waveforms I16 or U8 per point, etc.  Attached is a VI I started with to read the attached file. (Change the extension on the .csv file to .dat or anything else - I just used .csv because the forum attachment suffix format did not like .dat.)  Clearly there is some header information revealed in 'delimited string' but also non-sensical characters (even in header).....thanks, Don
Download All
0 Kudos
Message 1 of 11
(4,140 Views)
I've done a lot of this sort of thing (though not with LabVIEW). I have reverse-engineered several different tape backup formats, as well as data communications protocols. I even started a personal project to make a blackboard system (a type of AI) to help a human do this kind of work. I've also written a number of such formats (I was one of the authors of the Microsoft Tape Format).

The easiest thing to do is to try to find a person who can send you a spec or at least a header file. Never underestimate the power of "social engineering". The ease of doing this depends, of course, on your relationship with the entity that wrote the program that generated the data.

Failing that, if you have access to the application that generated the data, you can generate different data files, trying to modify a single variable at a time and seeing what changed in the data file. It helps to set up some automation so you can try experiments quickly.

If you have the application, try to look at its file I/O patterns (use procmon if you're stuck using Windows, or strace, etc. if you're using a Unix-like system). Does it open a file and just stream data to it, or does it write stuff and then seek backwards to re-write a header? Applications that never seek backwards have to either put unknown data (i.e. lengths of streams) at the end of the file, or they have to use delimiters (out of band data); if they're writing uncompressed binary, this may be difficult to do. So look for a chunk at the end of the file that may contain index or length information.

If there's a header, see what fields change when you acquire more or less data, or add or subtract channels.

See if it looks like it's compressed or not. Long runs of repeated bytes are probably indicative of non-compressed data. Or there may be simple compression like run-length or delta coding.

Look for patterns that are indicative of compression (i.e. blocks of compression codes followed by dense data); try to pipe chunks of this through command-line uncompression filters (like zcat) or use scripting language libraries to try to decompress at given offsets.

Get a good editor that lets you seek to specific bytes and display in hex or as characters.

You can use a scripting language in interactive mode to explore a file and test hunches; write little test functions on the fly and check them out. Compiled languages will just slow you down; you want to have a fast test/examine cycle.

I'd recommend Ruby (and its interactive shell irb), or Python, or even Perl (either using the debugger, or just 'perl -de1').

Once you have documentation of its structure, write your program in LabVIEW.

Message 2 of 11
(4,108 Views)
Here's a succesful example using a simple strategy.
 
It is often easiest to recognize valid data by sending it to a waveform graph. Just slice out the header and unflatten the remaining binary data to the various data types in big and little endian format. For any multibyte data, make sure to check all possible frame shifts, e.g. for I32 cut off a header size that is 4N, 4N+1, 4N+2, and 4N+3.
 
 
 
0 Kudos
Message 3 of 11
(4,100 Views)
Hi Don,

I just had a look at it. Here's some hints. I am also the author of the Perl module Archive::Zip, so I'm reasonably familiar with the Zip file format. Looking through a hex dump of your data, my eyes were drawn to the string "PK" (for Phil Katz), which precedes the various pieces of a Zip file (see the PKWARE application note). The only legitimate signature was at file offset 0x285; this was "PK\x01\x02", which precedes a Zip file Central Directory structure (which is written at the end of a zip file).

Decoding this chunk of data (from 0x285-0x2b2):

0000280:                50 4b 01 02 14 0b 14 00 00 00 08       PK.........
0000290: 00 20 84 50 36 00 00 20 84 ff ff ff 7f 00 00 00  . .P6.. ........
00002a0: 00 05 00 00 00 00 00 00 00 ff ff 00 00 00 00 00  ................
00002b0: 00 00 00

yielded these notes:

sig
50 4b 01 02
Version made by  14 (2.0)
file attrib format 0b
version needed to extract 14
bitflag 0
desiredCompressionMethod 08 (deflated)
lastModFileDateTime date=3650 time=8420 (DOS format)
 day=16
 month=2
 year=2007
crc32 84200000
compressedSize 7fffffff
uncompressedSize 00000000
fileNameLength 5
extrafieldlength 0
filecommentlength 0
disknumberstart 0
internalfileattribs ffff
externalfileattribs 0
local header relative offset 0

This is followed by a file name of "CorRD".

This is clearly not a zip file (there are no local file headers, for instance), but the presence of the valid (or nearly-valid) Zip CD structure would lead me to consider the data immediately following this as the contents of a Zip data stream compressed using the Deflate algorithm.

My Archive::Zip module may be of help (especially if you use/understand Perl); it uses Compress::Zlib to uncompress the actual data streams.

You should be able to use Compress::Zlib to decompress this (if it is indeed zlib-compressed data).

The only magic that I recall is the initialization of the inflator (the docs for Compress::Zlib mention this):

use Compress::Zlib qw( Z_OK Z_STREAM_END MAX_WBITS );
($inflater, $status) = Compress::Zlib::inflateInit( '-WindowBits' => -MAX_WBITS() );

# then repeat:
$buffer = <get next chunk from file>
($out, $status) = $inflater->inflate($buffer);

# until status is Z_STREAM_END or other than Z_OK (meaning error).
# $out will have uncompressed (inflated) chunk.


Message 4 of 11
(4,098 Views)
Yeah, that looks like it. I'm assuming that the sample you posted wasn't complete (I get an "unexpected EOF" error from zcat), but when I stick a gzip file format header in front of your data stream (which is right after the CorRD string at 0x2B8) and run it through zcat, I get something that looks very much like a plain waveform dump in 16-bit little-endian format (bytes 0x10 through 0x152F are all 00):

0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0001530: 0000 0000 0000 0000 0000 0000 0000 01ff  ................
0001540: 03ff 03ff 035c 0254 02ff 03ff 03ff 03ff  .....\.T........
0001550: 03ff 03ff 03ff 03ff 03ff 03b0 03ff 035a  ...............Z
0001560: 025c 028a 0154 0132 0144 0254 015a 011c  .\...T.2.D.T.Z..
0001570: 01f8 00b8 0050 0176 0140 01da 0094 0028  .....P.v.@.....(
0001580: 0052 005e 00ce 00c2 006c 004e 003a 0030  .R.^.....l.N.:.0
0001590: 002a 0034 0024 001c 000e 001c 001e 0010  .*.4.$..........
00015a0: 0016 0020 000c 0012 0016 000e 000a 0006  ... ............
00015b0: 0008 0004 000a 0010 000e 0012 000e 0010  ................
00015c0: 0006 0008 0008 000a 0010 000a 0006 0004  ................
00015d0: 0008 000a 0018 0018 0016 000c 001a 000c  ................
00015e0: 0008 0008 001a 0018 001c 001a 001c 001c  ................
00015f0: 001a 001e 001c 0014 0016 0016 0012 0016  ................
0001600: 0018 0012 0014 0014 000c 000c 000c 000e  ................
0001610: 0008 0006 0006 0006 0008 0006 0004 0006  ................
0001620: 0006 0006 000c 0008 000a 000c 0008 000a  ................
0001630: 000e 000c 000e 0010 000c 000a 000c 000c  ................
0001640: 000a 0008 000e 000c 000c 000c 0006 000a  ................
0001650: 000c 000c 000c 0006 000a 0008 0006 0008  ................
0001660: 000c 0006 0008 000c 0006 000a 000e 000e  ................
0001670: 000e 000c 0012 000c 000e 000c 0006 0008  ................
0001680: 0006 0008 0008 0008 0006 0008 0004 0006  ................
0001690: 0006 0006 0004 0008 000a 0008 0006 0006  ................
00016a0: 000a 0008 0008 000c 000a 000a 000a 000a  ................
00016b0: 0006 0004 0006 0006 0008 0006 0006 0006  ................
00016c0: 0004 0006 0004 0004 0006 0008 0004 0008  ................
00016d0: 0008 000c 0008 0008 0008 000a 000a 0004  ................
00016e0: 0008 000a 000a 0006 0006 0006 0006 0004  ................
00016f0: 0006 000a 0008 000c 0006 0008 000c 000a  ................
0001700: 000c 000c 000e 000a 000c 0008 000a 000c  ................
0001710: 000a 000a 0008 0006 000a 0006 0008 0008  ................
0001720: 0004 0006 0004 0004 0004 0002 0004 0004  ................
0001730: 0008 0006 0006 0008 0008 0004 0006 0004  ................
0001740: 0008 000a 0008 000a 000c 000e 000c 000a  ................
0001750: 000c 000a 000c 0008 000a 0004 0008 0006  ................
0001760: 0004 0006 0006 0004 0004 0002 0004 0004  ................
0001770: 0004 0002 0006 0008 0006 0006 0006 0006  ................
0001780: 0004 0008 0004 0006 0008 0008 0008 0006  ................
0001790: 0004 0004 0008 0006 0004 0004 0006 0004  ................
00017a0: 0004 0004 0004 0006 0004 0004 0004 0004  ................
00017b0: 0006 0004 0004 0004 0004 0004 0004 0004  ................
00017c0: 0004 0004 0004 0002 0004 0004 0004 0002  ................
00017d0: 0006 0006 0004 0004 0004 0004 0004 0006  ................
00017e0: 0006 0004 0004 0006 0006 0006 0004 0006  ................
00017f0: 0006 0004 0004 0004 0004 0006 0004 0004  ................
0001800: 0002 0004 0004 0006 0006 0004 0006 0006  ................
0001810: 0006 0004 0004 0004 0006 0006 0004 0004  ................
0001820: 0004 0006 0006 0004 0006 0004 0006 0006  ................
0001830: 0006 0006 0006 0008 0004 0004 0004 0004  ................
0001840: 0006 0004 0006 0006 0006 0006 0004 0006  ................
0001850: 0006 0004 0004 0006 0006 0004 0006 0000  ................
0001860: 01ff 03ff 03ff 03e4 02e4 02ff 03ff 03ff  ................
0001870: 03ff 03ff 03ff 03ff 03ff 03ff 03c2 0392  ................
0001880: 0300 0208 0246 0132 01ee 0002 021c 0122  .....F.2......."
0001890: 0100 01e4 0098 0038 0160 013c 01de 0094  .......8.`.<....
00018a0: 0026 004c 004e 00ba 00b8 0066 004c 0036  .&.L.N.....f.L.6
00018b0: 0036 002e 0034 0028 0020 0014 001a 001a  .6...4.(. ......
00018c0: 0014 0016 001c 000c 0012 0014 000c 0006  ................
00018d0: 0008 0008 0006 000c 000c 000c 0012 0010  ................
00018e0: 0010 0004 0008 000a 0008 000e 0008 0006  ................
00018f0: 0004 0004 0006 0012 0014 0016 0008 0018  ................
0001900: 000e 0006 000a 0018 0018 001a 001a 001e  ................
0001910: 001a 0018 0020 001a 0014 0016 0014 000e  ..... ..........
0001920: 0014 0012 000c 0010 0010 0012 000c 0006  ................



0 Kudos
Message 5 of 11
(4,096 Views)
It is probably significant that your data file is completely missing "\r" while having twice the statistical expected count of "\n". Thus it is unlikely that simple unflattening will do the trick. The data seems to be additionally processed in some way. maybe this irregularity can give some hints. 😉
 
 
 
 
 


Message Edited by altenbach on 02-02-2008 11:16 AM
0 Kudos
Message 6 of 11
(4,093 Views)
Not sure about the little-endian data. Here's a view by byte (attached is the uncompressed result in a zip file).

0000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
*
0001530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ff  ................
0001540: 03 ff 03 ff 03 5c 02 54 02 ff 03 ff 03 ff 03 ff  .....\.T........
0001550: 03 ff 03 ff 03 ff 03 ff 03 ff 03 b0 03 ff 03 5a  ...............Z
0001560: 02 5c 02 8a 01 54 01 32 01 44 02 54 01 5a 01 1c  .\...T.2.D.T.Z..
0001570: 01 f8 00 b8 00 50 01 76 01 40 01 da 00 94 00 28  .....P.v.@.....(
0001580: 00 52 00 5e 00 ce 00 c2 00 6c 00 4e 00 3a 00 30  .R.^.....l.N.:.0
0001590: 00 2a 00 34 00 24 00 1c 00 0e 00 1c 00 1e 00 10  .*.4.$..........
00015a0: 00 16 00 20 00 0c 00 12 00 16 00 0e 00 0a 00 06  ... ............
00015b0: 00 08 00 04 00 0a 00 10 00 0e 00 12 00 0e 00 10  ................
00015c0: 00 06 00 08 00 08 00 0a 00 10 00 0a 00 06 00 04  ................
00015d0: 00 08 00 0a 00 18 00 18 00 16 00 0c 00 1a 00 0c  ................
00015e0: 00 08 00 08 00 1a 00 18 00 1c 00 1a 00 1c 00 1c  ................
00015f0: 00 1a 00 1e 00 1c 00 14 00 16 00 16 00 12 00 16  ................
0001600: 00 18 00 12 00 14 00 14 00 0c 00 0c 00 0c 00 0e  ................
0001610: 00 08 00 06 00 06 00 06 00 08 00 06 00 04 00 06  ................
0001620: 00 06 00 06 00 0c 00 08 00 0a 00 0c 00 08 00 0a  ................
0001630: 00 0e 00 0c 00 0e 00 10 00 0c 00 0a 00 0c 00 0c  ................
0001640: 00 0a 00 08 00 0e 00 0c 00 0c 00 0c 00 06 00 0a  ................
0001650: 00 0c 00 0c 00 0c 00 06 00 0a 00 08 00 06 00 08  ................
0001660: 00 0c 00 06 00 08 00 0c 00 06 00 0a 00 0e 00 0e  ................
0001670: 00 0e 00 0c 00 12 00 0c 00 0e 00 0c 00 06 00 08  ................
0001680: 00 06 00 08 00 08 00 08 00 06 00 08 00 04 00 06  ................
0001690: 00 06 00 06 00 04 00 08 00 0a 00 08 00 06 00 06  ................
00016a0: 00 0a 00 08 00 08 00 0c 00 0a 00 0a 00 0a 00 0a  ................
00016b0: 00 06 00 04 00 06 00 06 00 08 00 06 00 06 00 06  ................
00016c0: 00 04 00 06 00 04 00 04 00 06 00 08 00 04 00 08  ................
00016d0: 00 08 00 0c 00 08 00 08 00 08 00 0a 00 0a 00 04  ................



Message Edited by ned_konz on 02-02-2008 01:25 PM
0 Kudos
Message 7 of 11
(4,088 Views)
Hi Guys - I really appreciate you looking at this.  I have written many translators for data for whose file format / header spec, etc. was known.  In this case, at this time, I have no knowledge of the format (which is why I posted in the first place) other than this is an ultrasonic A-scan file for some sort of X by Y scan.  I am trying the 'social engineering' strategy too, but I thought it would be interesting to post this type of problem to the forum since I have gotten so much help here before.  I cannot even comment on the waveform that Christian shows because all I know is that it an ultrasonic waveform file.  I don't know the morphology of the inspection as to whether there are just thru-transmitted echoes (that is what is indicated since there is only one large echo followed by noise with no other major echoes) or the investigation was in reflection mode.  So I do not know exactly what the waveforms should look like. Don't know the scan size but thought a solid decoding of the header would certainly tell us that and give us most of the other pertinent info.

Sonix, Inc. (don't know if these guys are the vendor for this file) used to have a strange 8-bit format for their waves where I think one had to subtract 128 or add 128 depending on whether a value of the wavetrain was positive or negative, to decode the actual waveform stream.  That's all for now......Don
0 Kudos
Message 8 of 11
(4,073 Views)


@altenbach wrote:
It is probably significant that your data file is completely missing "\r" while having twice the statistical expected count of "\n". Thus it is unlikely that simple unflattening will do the trick. The data seems to be additionally processed in some way. maybe this irregularity can give some hints. 😉  


Actually, I'm not seeing that. I suspect that the file got damaged on your end (maybe by your browser or editor?).
My counts of bytes starting at offset 0x285 start like this:

0       3137    "\000"
1       3209    "\001"
2       3385    "\002"
3       3340    "\003"
4       3620    "\004"
5       3350    "\005"
6       3454    "\006"
7       3372    "\a"
8       3575    "\b"
9       3909    "\t"
10      3396    "\n"
11      3669    "\v"
12      3896    "\f"
13      3844    "\r"
14      3807    "\016"
15      3616    "\017"
16      3593    "\020"
17      3607    "\021"
18      3950    "\022"
19      3647    "\023"
20      3380    "\024"
21      3521    "\025"
22      3854    "\026"
23      3548    "\027"
24      3841    "\030"
25      3764    "\031"
26      3883    "\032"
27      3981    "\e"
28      3807    "\034"
29      3881    "\035"
30      3763    "\036"
31      3625    "\037"
32      3659    " "

0 Kudos
Message 9 of 11
(4,060 Views)
Looking at the uncompressed data as a little-endian signed 16-bit PCM stream with no compression seems to make sense.
I read the uncompressed data into Audacity and it looked like something that could come from a sensor.
Attached are the waveform view and the spectrum view (x=time, y=frequency, color=intensity) of the beginning of the data.

Does this look like something that makes sense?



Message Edited by ned_konz on 02-02-2008 05:37 PM
Download All
Message 10 of 11
(4,052 Views)