LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

LV threading 8.5 much worse than 8.2??? (or is it VISA or NIDAQmxbase?)

Scott, beware !!!

Your vi doesn't return the proper result. It works only if the gain and polarity are the same for all the channels ! 

Let's say we have two channels and one scan, so that the raw data array contains 2 elements.

According to your vi, the loop will run one time and only the first cluster element in the channels array will be read...

 

Chilly Charly    (aka CC)
Message 11 of 21
(1,785 Views)
This is a quick correction of the problem, that gives correct results whatever the setting of the different channels. In comparison with the original vi, it still runs much faster than the original vi. The speed increase seems proportionnal to the data size. For 2500 scans x 32 channels, it's about 200 times faster.
Chilly Charly    (aka CC)
Message 12 of 21
(1,784 Views)

Additional comments :

1/ After correction of a bias in the timing vi, the measured speed increase is close to 1600 :). Could probably be better if some parallelism was introduced.

2/ I believe the scaling constant should be 4095 and 65535 instead of 4096 and 65536 

 

Chilly Charly    (aka CC)
Message 13 of 21
(1,754 Views)

Alright CC.  I didn't actually test the variable gain feature, but since the gain array was auto indexed, it should select the correct gain and polarity for each row of the array.  At least that was the design of my modification, but I admit I didn't test it with variable gain.  I will try to look at it again on Monday but it is looking like a busy week.  I must have missed something obvious.

 

I used the same scaling constants that NI used in their original VI.  I assumed that they had the correct value.  I guess it is a question for the board designer if 2^12 == 10 V or if it is 2^12-1 == 10V.   AFAIK the common practice is the second one since that means that your range is inclusive of 10 V and not one LSB less, so I think you are correct.  But, I defer to NI on their calbration values.

 

 The value of the speedup is of course hugely dependent on the size of the array and the CPU.  The G5 FPU is really fast if you schedule it so both arithmetic units are running at full bore.  The dual FPUs mean that it can do 2 FLOPS/clock tick.  Thus if it was fully utilized by an efficient compiler it could do the 32X2500 array in 16 microseconds!  That is only using 1 core with no parallelism. But real world problems should keep it only a factor of 10 worse.   There is still obviously room for improvement of that algorithm.

LabVIEW ChampionLabVIEW Channel Wires

0 Kudos
Message 14 of 21
(1,740 Views)

Sorry, guys, I don't have DaqMX installed, but I am curious about the following.

 

How was this VI in the earlier (fast!) version? Why was it changed?

 

Message 15 of 21
(1,738 Views)
CC,
It was that dang matrix transpose.  Just move it to the other side of the for loop and it works fine.  I had the wrong dimensions first,   I find arrays confusing if columns come before rows or when DAQ returns a 2-D array which is samples and which is channels.  Since all my gains were the same it works for my case, but thanks for pointing out the error.  I am posting (if Lithium will let me) both the fixed VI and an improved test that uses all gains and all bipolar/unipolar combinations and checks for differences.
 
BTW, today I was working on an Intel Mac so it isn't the compiler making inefficient code for one CPU.  I get the same factor of 1000 speed up on that CPU as well.
 
The oldest version of the VI that I can find installed is from NIDAQmx base V2.1.0.  It seems to use the same slow algorithm.  I am not sure why the problem suddenly showed up.  Maybe the newer version finally used the DMA features and called this VI?  Tom W, may be able to shed light on  the historical process.
 
Dohhh.  Lithium got me.  Let me switch browsers.

LabVIEW ChampionLabVIEW Channel Wires

Download All
Message 16 of 21
(1,700 Views)

I was actually going to propose your "transpose+autoindex" variation to replace the "index column with [i]" version of CC. However it seems they are nearly indistinguishable in performance so it does not really matter. I guess the overhead of the transposition is balanced by the overhead of indexing out colums (where the elements are not adjacent in memory).

 

If you ever want to slighty simplify the code even more, take the "divisions" out of the case structure since it occurs in all of the cases. One instance is enough 🙂  (I actually prefer CCs solution of indexing into a LUT).

 

You can also get away with a single unbundle node, just resize it for two outputs. Less clutter.

Message 17 of 21
(1,693 Views)

Altenbach made nearly all the comments I could have added to the discussion. 🙂 The transpose array seems to be slightly slower than indexation. Depends on array size ? To me this is a logic result. I was expecting a larger diffference. May be we should ask NI to design autoindexing tunnels with selectable indexes (autoindex rows or autoindex columns) ?

Scott,
your test vi has still a strong bias problem with timing : the first frame executes immediately, while the data are generated in the For loop. This results in a large time penality for the first tested vi. You should have wired the data feeding the vi through the first frame instead of the side of the second frame. I usually check a vi against itself before challenging vis 😉
And I can't resist to tease you : you should add posts both to the Rube Goldberg thread : the comparison of the resulting arrays can be made with a single "not equal ?" node, and to the Ya know it's going to be a bad day when.... thread :D:D:D

I confess I have the same difficulties with autoindexing loops. I can't be proud of my 50% properly wired arrays 😄


I also have problems with the Lithium editor, but I refuse to quit Safari. When I need to attach a vi, I first zip it to avoid the "Unexpected error message"

And I return to HTML mode when I want to insert pictures or links in the message. What a pity !

Chilly Charly    (aka CC)
Message 18 of 21
(1,686 Views)

CC,

I agree with everything except the Rube Goldberg comment!!  There is a method to my madness!  NEVER EVER compare real numbers without an epsilon.  !  I was worried about single bit variations when changing the possible order of computations.  This could give variances of 10^-17 or so for double precision.  So I started it down that path to compute the percent variation which is the real test.  It turned out that the computation was exactly equal, but that is just luck.

 

 I wasn't going to worry about errors in the 1 part in 10^17 and I was testing to see how small my epsilon needed to be to check.  It turned out for an epsilon of zero in this case but that was something I didn't want to count on.

 

Using the array is one way I considered, but does not have the "default" case.  I went for bug-for-bug compatibility rather than that speedup.  If you use the array then you should check bounds and do the default case if the "gain" is not one of the listed types.  Otherwise you change the behavior in error cases.

 

I can't see any significant difference between the transpose/auto index vs explicit indexing in arrays up to 10,000 X 32.  That was a factor of 3400 speedup but the difference between CC and sth was only 1mS or the timing resolution.  I really like the idea of selecting the index for auto-indexing for higher dimensional arrays.  However moving bytes around memory is what modern CPUs are really really good at doing.

 

You can also trade a few clock cycles for memory by doing the divide first, letting the LV compiler do it's constant folding magic and then selecting between the two arrays.  You double the  gain array size but save a bunch of floating divides. 

 

The timing bias was just my making all the wires symmetric.   Bad choice in this case!!

 

Thanks for the pointer on Safari, I see you posted this to the thread.  Hopefully they will get this fixed.  The NI web pages have never ever passed the W3 validator for HTML.  Pitiful adherence to standards.

 

I think we have beaten this particular VI to death.  Now for the other 3000 VIs in the NIDAQmx base package!  The good thing is that since it is all LV it is completely transportable and debuggable and not a black box (or Dark Matter as it used to be called) that you can't see what is happening. 

LabVIEW ChampionLabVIEW Channel Wires

Message 19 of 21
(1,667 Views)

sth a écrit: ... I agree with everything except the Rube Goldberg comment!!  There is a method to my madness!  NEVER EVER compare real numbers without an epsilon.  !  I was worried about single bit variations when changing the possible order of computations.  This could give variances of 10^-17 or so for double precision. [...]
From a theoretical point of view, you are absolutely right, and I should apologize. 😞 However, from a practical point of view, in this specific case, the operations are exactly the same and there is no reason to get a difference even at the 10^-17 level. So I don't have to apologize. 😄
However again, I must confess that my thoughts never went so far...  you are again right, I have been lucky and I would have been puzzled for a while in many other situations. So I do humbly apologize. 🙂 😄

 

Chilly Charly    (aka CC)
Message 20 of 21
(1,654 Views)