LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Labview is slow but only in dev mode

Solved!
Go to solution

Hello folks!

I have programmed LV for a few years now, but no near any mastery of the arts yet. I learned most here at work, but really getting the hang of it and its also very fun... We had here at work a rather big setup with (1200-1500 sub-) VI's back from around LV 6 and 7 running perfectly fine in LV8.5.1. I got the job of upgrading the computer systems and all the SW and HW used, so I did this the past year.

The computer now runs win7 x64 (Yep, it was slow XP before) and drives an Aerotech system consisting of 2 axis and bunch of other stuff. The main reason I jumped to LV2015 now was that we had to upgrade the drivers and FW for the Aerotech and the lowest it would support was LV2010, and I don't fancy getting LV2010 since well... its old too.

Getting the code and VI to run on LV2015 x64 was trivial, and rebuliding for use with new drivers also worked fine. However in the core of this are some older c-code nodes (it used to run those old external code nodes that are gone in newer LV). I am sadly no good at C, but I found the source code and got help to make it into .dll's that LV can understand instead. And this is where things start getting crazy...

When I run the main VI and let things run, nothing goes bad, no mem leaks, no increase in RAM nor anything bad, but when I get to the part it uses the .dll's nothing seems odd at first... But if I exit the main program, I can stare at the rotating mouse coursor for an hour. Now there is more to it, this freeze don't happen straight away. If I run the .dll once, and then leaves the PC for an hour or two with the main VI running in idle, then it will be like that when I come back. There is more: If I dont exit, but use windows shortcuts (ctrl+c, ctrl+v) inside LV (not other programs like for example excel) it will freeze the PC (i saw the PC clock froze too!) for a few seconds. The length of the freeze increase the longer time since I run that .dll... Wierd! When it start taking 5+ seconds, its horrible to work with, and exiting LV usually require ctrl+alt+del and loose all work done because of the huge "exit lag". No increase in RAM happens, no notable CPU is being used, and no disk activity, however there is a big CPU spike when i do win shortcuts and get that lag spike... What could have happened?

The .dll consist of alot of big calculations and will output 3 big 1D arrays with maybe 5-20 million elements in each. If LV was to do those calculations, it would take minutes, the dll does it all in 10 seconds so it can't be avoided. Is there anything in that code or in LV that need to be set up so I can flush the memory after use or something? I see the RAM usage stay high (1GB) after I run the .dll, but it is much higher (2-3GB) when its actually running since its processing loads of data. Before running the .dll part, LV usually use 4-500 MB. This is probably normal as the output data is now held by LV.

I also get random "Access Violation" CTD's some time after I run the dll's. Could be when I try to run it again, exit LV, click a button, or just typing stuff in another program while the main VI is running but idle. Its like LV is "infested" and going to die horrible at any moment if I touch that part of code...

 

Any idea what could be happening? It's very frustrating...

There is a way around however: When I compile a .exe and run everything there are no freezes, CTD's lag or whatever... Its just in dev mode this strangeness happen. However I need to run in dev mode to debug...

 

I tried searching up and down here and on google but to no help yet, although there are reports of funny things going on with .NET, and that is why i mentioned Aerotech above, all the new drivers are programmed in .NET so I rely heavily on those now. Before in LV 8.5.1 we never used that. Could there be a connection? I can crash the .NET Aerotech driver if I start Visual Studio or MAX for example, but their support doesn't know anything about such an issue.

And there is also a framegrabber card installed, NI PCIe 1427 attached to a CCD camera that we use sometimes. I also know those cards can overload a weak PC, but thats why I got a beasty PC now.

 

PC main specs:

16GB RAM (checked OK)

i7 5820K (6 cores, not OC) @3.3GHz

Windows 7 Professional 64 SP1

 

Thanks and sorry for any typoes, I am not natively English but Viking 🙂

0 Kudos
Message 1 of 8
(1,973 Views)

@Krokanwood wrote:

The .dll consist of alot of big calculations and will output 3 big 1D arrays with maybe 5-20 million elements in each. If LV was to do those calculations, it would take minutes, the dll does it all in 10 seconds so it can't be avoided.

 

PC main specs:

16GB RAM (checked OK)

@i7 5820K (6 cores, not OC) @3.3GHz

Windows 7 Professional 64 SP1


Based on your PC specs, I would say to recode the DLL in Labview and see if you can't do it faster on this new PC.  That is a hoss of a machine and should be able to handle the calculations, if you are efficient with your array operations and memory usage.  You don't post any code to show what kind of calculations you are doing so it will be hard for anyone to help you out.


aputman
LabVIEW 2017
LabVIEW Programming
0 Kudos
Message 2 of 8
(1,959 Views)

@Krokanwood wrote:

The .dll consist of alot of big calculations and will output 3 big 1D arrays with maybe 5-20 million elements in each. If LV was to do those calculations, it would take minutes, the dll does it all in 10 seconds so it can't be avoided. Is there anything in that code or in LV that need to be set up so I can flush the memory after use or something? I see the RAM usage stay high (1GB) after I run the .dll, but it is much higher (2-3GB) when its actually running since its processing loads of data. Before running the .dll part, LV usually use 4-500 MB. This is probably normal as the output data is now held by LV.

 


Can you share with us (a) the inputs to the DLL, (b) the outputs from the DLL, and (c) what the DLL is supposed to be doing?  I'll bet (my usual wager is a dime) that "modern LabVIEW" can do this quite rapidly, particularly on a modern i7 with 16GB of memory.

 

Bob Schor

0 Kudos
Message 3 of 8
(1,891 Views)

Aputman and Bob, thanks for the reply. I will try and explain and show the C code that the .dlls are doing. I'm not good with C, so I can't really tell in detail whats going on, but I'll show you the code.

 

I am sorry if this is a lengthy nerdy post, but I try my best to explain and paste all the C code I got.

 

There are 2 if them in total, one is interpolating linearily, but I havent really looked into it a lot because it works. I haven't seen LV behave crazy after I run just that, as this dll is also used in our camera software. However it is also used before the waveform generator that the other .dll is doing. It is also looped at least a few hundred thousans times so it need to be fast. I just mention it because if I want to become "dll free" I need to fix this one too. Speed is important to this one tho.

Code follows:

 

Linterp.jpg

#include <windows.h>
#include <string.h>
#include <math.h>
#include <float.h>
#include "LinearIPDLL.h"

#define YIPARNUM 3


BOOL APIENTRY DllMain(HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
break;
case DLL_THREAD_ATTACH:
break;
case DLL_THREAD_DETACH:
break;
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}


__declspec(dllexport) int LinintDLL(const double *px, const double *py, const int N, const double *pxi, const int Ni, double *pyi)
{

int iD = 0, iIP = 0, Nm1;

double x1, x2, y1, y2, xi, a, b;
double Dx;

//DbgPrintf("LinearInterpolation: N:%d, Ni:%d",N, Ni);

y1 = *py;
x1 = *px;
xi = *pxi;

// try to get a flying start
if (N > 2)
{
x2 = *(px + 1);
Dx = x2 - x1;
iD = (int)((xi - x1) / Dx - 1);
if (!(iD < N))
iD = 0;
if (iD < 0)
iD = 0;
if (*(px + iD) > xi)
iD = 0;
//DbgPrintf("Indx estimation (xi:%4.2f, *(x+iD):%4.2f) : %d",xi, *(px+iD),iD);
}


// extrapolate lower bounds
while ((xi < x1) && (iIP < Ni)) //Xi=-1
{
*pyi++ = y1;
iIP++;
xi = *(pxi + iIP);

}

// interpolate
Nm1 = N - 1;
while ((iIP < Ni) && (iD < Nm1))
{
x1 = *(px + iD);
x2 = *(px + iD + 1);
xi = *(pxi + iIP);

if ((xi >= x1) && (xi < x2))
{
// found correct data range
y1 = *(py + iD);
y2 = *(py + iD + 1);

a = (y2 - y1) / (x2 - x1);
b = y1 - a*x1;
*pyi++ = a*xi + b;
iIP++;

// here it is likely that next xi is < x2 and we could have used the same a and b...
xi = *(pxi + iIP);
while ((xi < x2) && (iIP < Ni))
{
*pyi++ = a*xi + b;
iIP++;
xi = *(pxi + iIP);
}

}
else
{
// xi not in range[x1, x2> => must move on...
iD++;

}

}
//extrapolate upper bounds
y2 = *(py + N - 1);
while (iIP < Ni)
{
*pyi++ = y2;
iIP++;
}

//DbgPrintf("LinearInterpolation: iD:%d, iIP:%d\n",iD, iIP);
return 0;

 

Now the important one, this one is the big calculator. It takes a 2D waveform table in, and computes a polynom so that the waveform is not so sharp when going from top to bottom of a sawtooth signal. 

I'm sorry, I said it has 3 output arrays, but there is just 2. The third one is generated based on the data from the other two and is done on the outside.

The inputs: WFtable is the input table. The first row are positions in micro meters relative to a start. That will be the X-axis of the data. It will be negative early on. The second and third row contains data to be polynomal fitted.. in a way I do not know because it is very complicated. But there are two arrays coming out, so it is the base for those. The original code are from the 90's, written in Matlab, and made into CIN node, and now slapsticked by me into a dll with the help of a programmer that knows C, but not what the code really do 🙂

The other inputs are for calculating the signal as we go. There are a few. The code follows after it, and yes, its extremely messy. Speed is second here, but it should take no longer than 15 seconds to run this.

 

A typical input would be a WF table of 100k to 2M columns in 3 rows and N between 400k to 4M. dX is usually 0.015 which is a resolution of millimeter movement. The AOM is the opening and closing of a acustic optical modulator, so it would be 1 or 0. At every point of X we go, things will happen, the AOM will either be 0 or 1 and a voltage will change. This is administered by a DAQ task afterwards.

 

PolyFitV5.png

#include <windows.h>
#include <string.h>
#include <math.h>
#include <float.h>
#include "LVpolyfit5.h"

BOOL APIENTRY DllMain(HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}


// Function Prototypes
void CompPoly5(double p5[], const double a[], const double b[], const double r[], const double d);


__declspec(dllexport) int Poly5(const double *WFtable, const int wftM, const int wftN,
const double RampDuty, const double startX, const double Dx, const int N, const double aomDuty, const double aomPhase, char *AOM, double *Y)
{
double x, t, d, dd, A;
double *pY;
const double *r, *a, *b;
double p5[6];
//aom variables
double dAom, aom1, aom2;
char *pAOM;
// to hold variables while we do poly5 first
int iAfterRArrayStart, iAfterPoly5;
double xAfterRArrayStart;

int i, si;

// output vector Y
//Y = (double *)malloc(N*sizeof(double));

// output vector AOM
//AOM = (int *)malloc(N*sizeof(int));

// wft
if (wftN != 3)
{
return 1001;
}

r = WFtable;
a = r + wftM;
b = r + 2 * wftM;

// DO WORK
pY = Y;
d = 1 - RampDuty;
dd = 0;
x = startX;
pAOM = AOM;
dAom = 1 - aomDuty;

si = 0;
i = 0;

// fill in zeros for x less than start for r-array
while ((x < *r) && (i < N))
{
*pY++ = 0.0;
*pAOM++ = 1;
i++;
x += Dx;
}
iAfterRArrayStart = i;
xAfterRArrayStart = x;

// MUST initialize first
dd = d*0.5*(*(r + si + 2) - *(r + si));
// comp polynom for this slot
CompPoly5(p5, (a + si), (b + si), (r + si), dd);
// and slope of tri
A = (*(b + si) - *(a + si)) / (*(r + si + 1) - *(r + si));

// POLY 5
while (i < N)
{
// find slot
while (!((x >= *(r + si)) && (x < (*(r + si + 1) + dd / 2))) && (si < (wftM - 2)))
{
si++;
dd = d*0.5*(*(r + si + 2) - *(r + si));
// comp polynom for this slot
CompPoly5(p5, (a + si), (b + si), (r + si), dd);

// and slope of tri
A = (*(b + si) - *(a + si)) / (*(r + si + 1) - *(r + si));
}

if (si == (wftM - 2))
break;

// poly5 or line?
t = x - *(r + si);
if (x < (*(r + si + 1) - dd / 2))
{
// tri
*pY++ = A*t + *(a + si);
i++;
}
else
{
// poly5
*pY++ = p5[0] * t*t*t*t*t + p5[1] * t*t*t*t + p5[2] * t*t*t + p5[3] * t*t + p5[4] * t + p5[5];
i++;
}
x += Dx;
}
iAfterPoly5 = i;

// DO AOM SEPERATELY
i = iAfterRArrayStart;
x = xAfterRArrayStart;
si = 0;
//ddAom = dAom*0.5*(*(r+si+2)- *(r+si));
aom1 = (1 - aomPhase)*dAom*(0.25*(*(r + si + 2) - *(r + si)));
aom2 = (1 + aomPhase)*dAom*(0.25*(*(r + si + 2) - *(r + si)));

while (i < N)
{
// find slot
while (!((x >= *(r + si)) && (x < (*(r + si + 1) + aom2))) && (si < (wftM - 2)) && (i < iAfterPoly5))
{
si++;
//ddAom = dAom*0.5*(*(r+si+2)- *(r+si));
aom1 = (1 - aomPhase)*dAom*(0.25*(*(r + si + 2) - *(r + si)));
aom2 = (1 + aomPhase)*dAom*(0.25*(*(r + si + 2) - *(r + si)));
}

if (si == (wftM - 2))
break;

// AOM on or off?
if (x < (*(r + si + 1) - aom1))
{
//on
*pAOM++ = 1;
i++;
}
else
{
//off
*pAOM++ = 0;
i++;
}

x += Dx;
}

//fill in aom zeros
while (i < iAfterPoly5)
{
*pAOM++ = 1;
i++;
}


// fill in zeros..
while (i < N)
{
*pY++ = 0.0;
*pAOM++ = 1;
i++;
}

return 1000;
}


void CompPoly5(double p5[], const double a[], const double b[], const double r[], const double d)
{
#define n 1
register char i;
char bNaNfound = FALSE;
double dx[2];
dx[0] = r[1] - r[0];
dx[1] = r[2] - r[1];


// B
p5[4] = -(-4.0*b[n - 1] * dx[n] * d*d*d*d*d + 4.0*a[n - 1] * dx[n] * d*d*d*d*d + 240.0*dx[n]
* pow(dx[n - 1], 5.0)*b[n - 1] - 240.0*dx[n] * pow(dx[n - 1], 5.0)*a[n] + 16.0*pow(dx[n - 1], 4.0
)*a[n] * d*d - 16.0*pow(dx[n - 1], 4.0)*b[n] * d*d + 4.0*d*d*d*d*d*a[n] * dx[n - 1] - 4.0*d*d*d*
d*d*b[n] * dx[n - 1] - 104.0*dx[n] * pow(dx[n - 1], 3.0)*d*d*b[n - 1] + 120.0*dx[n] * pow(dx[n - 1
], 3.0)*d*d*a[n] - 16.0*pow(dx[n - 1], 3.0)*dx[n] * a[n - 1] * d*d + 3.0*dx[n - 1] * d*d*d*d*dx
[n] * b[n - 1] + 12.0*dx[n - 1] * d*d*d*d*dx[n] * a[n - 1] - 15.0*dx[n] * dx[n - 1] * d*d*d*d*a[n] +
12.0*pow(dx[n - 1], 2.0)*d*d*d*d*b[n] - 12.0*pow(dx[n - 1], 2.0)*d*d*d*d*a[n]) / dx[n] / (d
*d*d*d*d) / dx[n - 1] / 8.0;

// F
p5[0] = 6.0*(-b[n - 1] + a[n]) / (d*d*d*d*d);


// E
p5[1] = (60.0*dx[n] * pow(dx[n - 1], 2.0)*b[n - 1] - 60.0*dx[n] * pow(dx[n - 1], 2.0)*a[n]
+ dx[n] * b[n - 1] * d*d - dx[n] * a[n - 1] * d*d + a[n] * dx[n - 1] * d*d - b[n] * dx[n - 1] * d*d) / dx[n] / (d*
d*d*d*d) / dx[n - 1] / 2.0;

// C
p5[3] = -3.0 / 4.0*(-4.0*pow(dx[n - 1], 3.0)*a[n] * d*d + 4.0*pow(dx[n - 1], 3.0)*b[n] * d
*d + d*d*d*d*dx[n] * b[n - 1] - 80.0*dx[n] * pow(dx[n - 1], 4.0)*b[n - 1] - d*d*d*d*b[n] * dx[n - 1]
+ 16.0*dx[n] * pow(dx[n - 1], 2.0)*d*d*b[n - 1] - 20.0*dx[n] * pow(dx[n - 1], 2.0)*d*d*a[n] +
4.0*pow(dx[n - 1], 2.0)*dx[n] * a[n - 1] * d*d + 80.0*dx[n] * pow(dx[n - 1], 4.0)*a[n] + d*d*d*d*
a[n] * dx[n - 1] - d*d*d*d*dx[n] * a[n - 1]) / dx[n] / (d*d*d*d*d) / dx[n - 1];

// D
p5[2] = -(60.0*dx[n] * pow(dx[n - 1], 2.0)*b[n - 1] - 60.0*dx[n] * pow(dx[n - 1], 2.0)*a
[n] - 3.0*dx[n] * b[n - 1] * d*d + 5.0*dx[n] * d*d*a[n] - 2.0*dx[n] * a[n - 1] * d*d + 2.0*a[n] * dx[n
- 1] * d*d - 2.0*b[n] * dx[n - 1] * d*d) / dx[n] / (d*d*d*d*d);

// A
p5[5] = -(-16.0*dx[n - 1] * d*d*d*d*d*dx[n] * a[n - 1] - 16.0*dx[n] * dx[n - 1] * d*d*d*d*d*
a[n] + 144.0*dx[n] * pow(dx[n - 1], 4.0)*d*d*b[n - 1] - 160.0*dx[n] * pow(dx[n - 1], 4.0)*d*d*a
[n] + 16.0*pow(dx[n - 1], 4.0)*dx[n] * a[n - 1] * d*d - 36.0*pow(dx[n - 1], 2.0)*d*d*d*d*dx[n] *
b[n - 1] - 24.0*pow(dx[n - 1], 2.0)*d*d*d*d*dx[n] * a[n - 1] + 60.0*dx[n] * pow(dx[n - 1], 2.0)*d
*d*d*d*a[n] + 3.0*b[n - 1] * d*d*d*d*d*d*dx[n] - 3.0*a[n - 1] * d*d*d*d*d*d*dx[n] + 3.0*d*d*d
*d*d*d*a[n] * dx[n - 1] - 3.0*d*d*d*d*d*d*b[n] * dx[n - 1] + 16.0*pow(dx[n - 1], 2.0)*d*d*d*d*
d*b[n] - 16.0*pow(dx[n - 1], 2.0)*d*d*d*d*d*a[n] - 16.0*pow(dx[n - 1], 5.0)*a[n] * d*d + 16.0
*pow(dx[n - 1], 5.0)*b[n] * d*d - 24.0*pow(dx[n - 1], 3.0)*d*d*d*d*b[n] + 24.0*pow(dx[n - 1],
3.0)*d*d*d*d*a[n] + 192.0*dx[n] * pow(dx[n - 1], 6.0)*a[n] - 192.0*dx[n] * pow(dx[n - 1], 6.0
)*b[n - 1]) / dx[n] / (d*d*d*d*d) / dx[n - 1] / 32.0;

for (i = 0; i < 6; i++)
{
if (_isnan(p5[i]))
{
bNaNfound = TRUE;
p5[i] = 0.0;
}
}

if (bNaNfound)
{
// do mean
p5[0] = p5[1] = p5[2] = p5[3] = p5[4] = 0.0;
p5[5] = (b[n - 1] + a[n]) / 2;
}
}

 

Now that I write this post, I really think I could do it in LV and dump this code... Just will take me time. What do you think?

 

0 Kudos
Message 4 of 8
(1,868 Views)

Krokanwood wrote:

Now that I write this post, I really think I could do it in LV and dump this code... Just will take me time. What do you think?


Yes, it will take a little bit of time.  Just from a glance, it should not take a lot of time to rewrite it in LabVIEW.  I would highly recommend the rewrite and get rid of the DLLs.  It will make your life easier on many fronts.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
0 Kudos
Message 5 of 8
(1,865 Views)
Solution
Accepted by topic author Krokanwood

Definitely, do it.  That looks fun.  Smiley Very Happy

Just remember that Labview is inherently multi-threaded so don't limit it by using sequence structures.  And use subVIs when possible for code reuse.  Good luck.


aputman
LabVIEW 2017
LabVIEW Programming
0 Kudos
Message 6 of 8
(1,853 Views)

Thanks.. I managed to rewrite the first simpler one already. Formula nodes work like a charm and its funny that they are faster in execution than using labview primitives like + and - 🙂

 

Actually, its faster than the .dll by a factor x5! Good riddance to that one.

 

0 Kudos
Message 7 of 8
(1,826 Views)

@Krokanwood wrote:

Formula nodes work like a charm and its funny that they are faster in execution than using labview primitives like + and -


You mileage may vary.  I have ran into situations where the Formula Node is faster, but any recent benchmarks I have done show that the primitives are faster.  Yes, it has been a while since I did the benchmarks.  But the primitives can perform in parallel where the FN is sequencial, so you are likely to get a little better performance with the primitives.

 

Again, YMMV.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
0 Kudos
Message 8 of 8
(1,820 Views)