09-14-2008 02:28 AM
Scott, beware !!!
Your vi doesn't return the proper result. It works only if the gain and polarity are the same for all the channels !
Let's say we have two channels and one scan, so that the raw data array contains 2 elements.
According to your vi, the loop will run one time and only the first cluster element in the channels array will be read...
09-14-2008 02:53 AM
09-14-2008 10:48 AM
Additional comments :
1/ After correction of a bias in the timing vi, the measured speed increase is close to 1600 :). Could probably be better if some parallelism was introduced.
2/ I believe the scaling constant should be 4095 and 65535 instead of 4096 and 65536
09-14-2008 03:17 PM
Alright CC. I didn't actually test the variable gain feature, but since the gain array was auto indexed, it should select the correct gain and polarity for each row of the array. At least that was the design of my modification, but I admit I didn't test it with variable gain. I will try to look at it again on Monday but it is looking like a busy week. I must have missed something obvious.
I used the same scaling constants that NI used in their original VI. I assumed that they had the correct value. I guess it is a question for the board designer if 2^12 == 10 V or if it is 2^12-1 == 10V. AFAIK the common practice is the second one since that means that your range is inclusive of 10 V and not one LSB less, so I think you are correct. But, I defer to NI on their calbration values.
The value of the speedup is of course hugely dependent on the size of the array and the CPU. The G5 FPU is really fast if you schedule it so both arithmetic units are running at full bore. The dual FPUs mean that it can do 2 FLOPS/clock tick. Thus if it was fully utilized by an efficient compiler it could do the 32X2500 array in 16 microseconds! That is only using 1 core with no parallelism. But real world problems should keep it only a factor of 10 worse. There is still obviously room for improvement of that algorithm.
09-14-2008 03:49 PM
Sorry, guys, I don't have DaqMX installed, but I am curious about the following.
How was this VI in the earlier (fast!) version? Why was it changed?
09-15-2008 07:01 PM
09-15-2008 07:58 PM
I was actually going to propose your "transpose+autoindex" variation to replace the "index column with [i]" version of CC. However it seems they are nearly indistinguishable in performance so it does not really matter. I guess the overhead of the transposition is balanced by the overhead of indexing out colums (where the elements are not adjacent in memory).
If you ever want to slighty simplify the code even more, take the "divisions" out of the case structure since it occurs in all of the cases. One instance is enough 🙂 (I actually prefer CCs solution of indexing into a LUT).
You can also get away with a single unbundle node, just resize it for two outputs. Less clutter.
09-15-2008 09:00 PM
Altenbach made nearly all the comments I could have added to the discussion. 🙂 The transpose array seems to be slightly slower than indexation. Depends on array size ? To me this is a logic result. I was expecting a larger diffference. May be we should ask NI to design autoindexing tunnels with selectable indexes (autoindex rows or autoindex columns) ?
Scott,
your test vi has still a strong bias problem with timing : the first frame executes immediately, while the data are generated in the For loop. This results in a large time penality for the first tested vi. You should have wired the data feeding the vi through the first frame instead of the side of the second frame. I usually check a vi against itself before challenging vis 😉
And I can't resist to tease you : you should add posts both to the Rube Goldberg thread : the comparison of the resulting arrays can be made with a single "not equal ?" node, and to the Ya know it's going to be a bad day when.... thread :D:D:D
I confess I have the same difficulties with autoindexing loops. I can't be proud of my 50% properly wired arrays 😄
I also have problems with the Lithium editor, but I refuse to quit Safari. When I need to attach a vi, I first zip it to avoid the "Unexpected error message"
And I return to HTML mode when I want to insert pictures or links in the message. What a pity !
09-16-2008 07:51 AM
CC,
I agree with everything except the Rube Goldberg comment!! There is a method to my madness! NEVER EVER compare real numbers without an epsilon. ! I was worried about single bit variations when changing the possible order of computations. This could give variances of 10^-17 or so for double precision. So I started it down that path to compute the percent variation which is the real test. It turned out that the computation was exactly equal, but that is just luck.
I wasn't going to worry about errors in the 1 part in 10^17 and I was testing to see how small my epsilon needed to be to check. It turned out for an epsilon of zero in this case but that was something I didn't want to count on.
Using the array is one way I considered, but does not have the "default" case. I went for bug-for-bug compatibility rather than that speedup. If you use the array then you should check bounds and do the default case if the "gain" is not one of the listed types. Otherwise you change the behavior in error cases.
I can't see any significant difference between the transpose/auto index vs explicit indexing in arrays up to 10,000 X 32. That was a factor of 3400 speedup but the difference between CC and sth was only 1mS or the timing resolution. I really like the idea of selecting the index for auto-indexing for higher dimensional arrays. However moving bytes around memory is what modern CPUs are really really good at doing.
You can also trade a few clock cycles for memory by doing the divide first, letting the LV compiler do it's constant folding magic and then selecting between the two arrays. You double the gain array size but save a bunch of floating divides.
The timing bias was just my making all the wires symmetric. Bad choice in this case!!
Thanks for the pointer on Safari, I see you posted this to the thread. Hopefully they will get this fixed. The NI web pages have never ever passed the W3 validator for HTML. Pitiful adherence to standards.
I think we have beaten this particular VI to death. Now for the other 3000 VIs in the NIDAQmx base package! The good thing is that since it is all LV it is completely transportable and debuggable and not a black box (or Dark Matter as it used to be called) that you can't see what is happening.
09-16-2008 10:50 AM
sth a écrit: ... I agree with everything except the Rube Goldberg comment!! There is a method to my madness! NEVER EVER compare real numbers without an epsilon. ! I was worried about single bit variations when changing the possible order of computations. This could give variances of 10^-17 or so for double precision. [...]
From a theoretical point of view, you are absolutely right, and I should apologize. 😞 However, from a practical point of view, in this specific case, the operations are exactly the same and there is no reason to get a difference even at the 10^-17 level. So I don't have to apologize. 😄
However again, I must confess that my thoughts never went so far... you are again right, I have been lucky and I would have been puzzled for a while in many other situations. So I do humbly apologize. 🙂 😄