Transpose large 2D array, memory problem

jonni · ‎05-17-2006

I have 300Mb binary file (actually it is 2D array of DBL) I need to transpose it in order to read it faster in next step.

If I read the date directly from file, calculating the location of the next number to retrieve on fly, it takes extremely long time.

Reading number by number from the file and saving it transposed to the new file takes few hours (Pentium 4 3.0GHz)

If I read whole 300Mb in memory in order to use "transpose array" and then to save it to the new file, LabVIew giving me an error "Memory Full".

Any ideas how could I convert large file with data?

mfitzsimons · ‎05-17-2006

Is the data always the same size or is it variable? Seeing that you are trying to Transpose the data it seems that the data is arranged in a fix width x height. The only real solution to me would be to do it all with file pointers. Have one pointer reading the file going down the columns and the other write file writing the rows. I may be slower than doing it in memory but would remove the Out of Memory errors. If the file is fixed size it should not be too painful.

Other solution is obvious, buy even more memory. Starting to see PC sold new with 2 Gbytes of RAM.

Matt

Matthew Fitzsimons

Certified LabVIEW Architect
LabVIEW 6.1 ... 2013, LVOOP, GOOP, TestStand, DAQ, and Vison

LuI · ‎05-17-2006

Jonni,

I've checked this with LV71 and found out that 'transpose array' does not work inline. In fact it creates a buffer about twice as large as the original data.and does not release that on its own.
So if your machine does not have enough RAM to hold that data you're running into that problem.
What to do?
I'd suggest two possible solutions:
1. Try to increase the amount of memory availabel to LabVIEW.
Either put some more RAM into your machine or increase the swap file size. The latter can be done in ControlPanel, System, Advanced (I have a German XP where this states 'Erweitert'), SystemPower ('Systemleistung'), extended, virtual memory. Make this limit to at least twice the amount of your physical memory in your machine.
2. Create your own method to transpose your array
In the most memory effective version it needs just two extra core variables, in your case dbl. Iterate through your array, read two element at opposite indicees (e.g. i,j and j,i) and replace them with each other. Repeat that until all are done.
There's another quicker solution that needs a little more than twice the ariginal array size: Create an array of the 'inverse' size of your original array. Read a full row from the original array and replace that as a column in the other array. repeat until done.
If your system memory can not affort this, you may delete a row from the original array and append this as a new column to the new one. This will probably take longer as LabVIEW has to fire up its memory manager more often, but it takes just the extra memory of a single row of the original data.

HTH and
Greetings from Germany!
--
Uwe

jonni · ‎05-17-2006

Thanks you for your suggestions,

1. Program has to work on any reasonable PC, so trick with buying more memory will not work.

2. I have huge swap size, it does not help

3. The way I do it now: I read number from one file at the specific place and save it in another file (transposing data), it takes one and half hour to convert 300Mb file this way. This is to long and I wanted to find quicker way.

4. it looks like the only thing left to try is to "Create an array of the 'inverse' size " suggested by Lul, but the problem is if instead of 300Mb file I will have 500Mb (the size of the data is not fixed and depends on the model of calculations which produces this result I need to analyse), then it will be a problem even just to load this into one array.

Message Edited by jonni on 05-17-2006 08:40 AM

unclebump · ‎05-17-2006

Try the redim.zip file located here. http://darkfader.net/toolbox/

It contains an exe file and a .cpp file. I have not run this program.

mfitzsimons · ‎05-17-2006

I still think the best approch is to leave the file on disk and use file pointers to read, transpose, and write data. Sounds like a good challenge if you have the time to try it. If you are pressed for time than use more direct methods for solving the problem.

Matt

Matthew Fitzsimons

Certified LabVIEW Architect
LabVIEW 6.1 ... 2013, LVOOP, GOOP, TestStand, DAQ, and Vison

altenbach · ‎05-17-2006

@jonni wrote:

Thanks you for your suggestions,

3. The way I do it now: I read number from one file at the specific place and save it in another file (transposing data), it takes one and half hour to convert 300Mb file this way. This is to long and I wanted to find quicker way.

You should be able to dramatically speed this up by finding the right balance between memory usage and speed. You can easily read large chunks of adjacent elements corresponding to N columns (e.g. 10-25% of your data), transpose the subarray, and write it to the new file. Now go back and get the next chunk of colums, do the same, and append it to the output file. repeat until all columns are processed.

for example, if you have an array:

abcd
efgh
ijkl
mnop

read "ab, ef, ij, mn" with four read operations, transpose the 2D subarray, then write the rows:

aeim
bfjn

repeat with the remaining colums and append.

To save memory, don't use any array indicators on the FP.

LabVIEW Champion.

LuI · ‎05-17-2006

Disk operations are really slow compared with memory transactions!

I just got an idea:
If you think of that 2D array as a table, you may as well cut it into 4 pieces, transpose them one after the other and write em back onto the right position in the file.
If you can do this with 4 pieces (e.g. upper-left, upper-right, lower-left and lower-right), you can do this as well for any symmetric cut of the data...

Greetings from Germany!
--
Uwe

ps: This is exactly was altenbach wrote. His message appeared while I was editing...

Message Edited by LuI on 05-17-2006 05:54 PM

mfitzsimons · ‎05-17-2006

Agree, in most cases Disk operations are thousand or more times slower than memory. In this case the first priority is to make the application work on any computer. First goal is always to make it work on the target.

In this case since the file is so large the memory is Disk Caching anyway so it has to hit the hard drive so any improvements of running this in memory are negated by the Disk Caching operation. At lease this way the program would run on any PC.

Just an opinion,

Matt

Matthew Fitzsimons

Certified LabVIEW Architect
LabVIEW 6.1 ... 2013, LVOOP, GOOP, TestStand, DAQ, and Vison

jonni · ‎05-17-2006

Thank you all for suggestions.

I am reading now column from one file and saving a row to another file. Instead of reading number by number. This didn't increase speed much.

I guess I have to live with that 🙂

LabVIEW

Transpose large 2D array, memory problem

Transpose large 2D array, memory problem

Re: Transpose large 2D array, memory problem

Re: Transpose large 2D array, memory problem

Re: Transpose large 2D array, memory problem

Re: Transpose large 2D array, memory problem

Re: Transpose large 2D array, memory problem

Re: Transpose large 2D array, memory problem

Re: Transpose large 2D array, memory problem

Re: Transpose large 2D array, memory problem

Re: Transpose large 2D array, memory problem