LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Number to fractional string intermittently wrong

Solved!
Go to solution
I'm having a hard time reproducing this, but here's what I tried:
 
Basically, what was happening was that I was testing a program (in LV 7.0) on my fairly new laptop and I noticed that occasionally the display showed different results even though the inputs didn't change.
Some debugging later, I found out that the Number to Fractional String primitive would sometimes return different strings for the exact same value. Specifically, it would get 103 and would return either 103.000000... or 102.9999... even though it always received the same value with the same bit pattern.
 
I came up with a simple example which would reproduce this behavior about 10-20% of the time and tried to run it on another computer using 7.1 and the current LV beta and couldn't reproduce it, so I concluded that it was probably fixed. I then tried to run it on yet another computer with LV 7.0 and couldn't reproduce it there either, so I was starting to think that maybe this has to do with the computer and not with LV, so I went back to my laptop and tried to reproduce it, but for some reason it's not happening now.
 
Here's the simple example - it takes the number 103 and tries to convert it to a fractional string in 2 separate ways. Both ways produced inconsistent results, each when it wanted to. Most of the time they produced 103.0000..., but every so often they produced a 102.9999.... which then appeared in the first rows of the sorted array. The attached screenshot shows a typical wrong run. As I said, for some reason I couldn't reproduce it now, but that is what it looked like.
 
The delay in the loop seems to be the key to this. The original code took some time to run and caused this. When I set up the example with no delay it worked fine, but when I added a delay or used a probe, it did happen. When I changed the original code to be much simpler and take less time (yes, I didn't write it) the problem seemed to go away.
 
As I said, I'm not entirely sure this is LV 7.0's fault. The different thing about this laptop is that it is a dual core and the LV process uses both CPUs (it's a Dell with Win XP). I'm suspecting this might be the reason for this and it could be happening at a lower level than what LV can control. Does anyone have any idea what could be causing this?

___________________
Try to take over the world!
Download All
Message 1 of 49
(7,982 Views)
I could not reproduce the problem on Mac OS X (PPC Dual G5) under LV7.0 or 8.2. I tried adding probes and changing the delay. Even ran it once with execution highlighting on (glad you did not use 10000 repetitions!).

Lynn
0 Kudos
Message 2 of 49
(7,947 Views)

This is baffling.

You said "on my fairly new laptop ".

This reminds me of the issue there was with the original Pentium chips that would ocationally give bad results with floating point math under high loads.

If the laptop is still under warrenty, you may want to send it back.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel
0 Kudos
Message 3 of 49
(7,942 Views)

@johnsold wrote:

Even ran it once with execution highlighting on (glad you did not use 10000 repetitions!).

I forgot to say that the delay had to be between ~5-20 ms to get this to happen properly. Even if this is reproducible, the needed number would probably vary, but execution highlighting would be too slow.

In any case, I doubt a Mac would be subject to this, because it seems to be more likely that this issue stems from the C run-time, from the OS, from the CPU or from some combination of those than from LV itself.

Thanks anyway.

Ben, I would probably need pretty compelling proof that the computer itself is to blame to get my money back, and since I can't even reproduce it on the same computer at the moment, that seems like a difficult path.

I will have to keep trying to reproduce this.


___________________
Try to take over the world!
Message 4 of 49
(7,934 Views)
Could this not be the result of round-off errors? When trying to convert a real number using full precision you may sometimes see dithering in the LSB. If you reduce the number of digits of precision a fair amount you eliminate the dithering and you should ALWAYS get the same result. Then again, maybe I'm off in left field on this.
0 Kudos
Message 5 of 49
(7,883 Views)
Operations like this must be absolutely deterministic. It is clearly the underlying data, not just the display formatting, else the sorting wouldn't have worked.
 
I suspect a hardware failure, maybe a weak memory location, for example. (most likely you are not running ECC memory). Did you notice any other irregularities (crashes, etc.) on this computer?
 
I would suggest to run the Microsoft memory diagnostic downloadable from the following link:
 
 
 
 
0 Kudos
Message 6 of 49
(7,875 Views)

OK, two updates:

First, I ran the tool suggested by Alten in both modes it has and it didn't find any memory faults.

Second, I managed to reproduce this and had a chance to play around with it a bit.

I think the problem is a race condition somewhere, because this happened when I was running the same program which can get to be very memory intensive (I have 1 GB of RAM and when I ran it now I was down to about 50 free MB. This still persists after closing LV, which is why I'm thinking that the problem is potentially with the page file (which is at 1.13 GB at the moment).
I also don't think it's faulty memory because then I would expect the results to be random and inconsistent.

Here's what I tried currently:

  • Change the affinity of the LV process to work with a single CPU: No effect.
  • Change the priority of the LV process: No effect.
  • Change debugging options, etc. on the VI: No effect.
  • Increase the priority of the VI or change the execution system of the VI to be something other than Standard: Bingo, that works. The number is always translated correctly. Change the priority or the ES back and it starts acting up again.

What's shown in the screenshot is a slight modification of the example - a while loop with no wait was added around the entire code, the internal wait was changed to 1 ms, the chart counts the number of 102.9999... in each iteration and the priority of the VI was changed to Background. As you can see, on average, about 20-30 elements in each iteration (out of 200) are wrong.


___________________
Try to take over the world!
Message 7 of 49
(7,822 Views)
One correction - the affinity does affect it. Setting the LV process to use both CPUs brings down the number of wrong answers to about a third of what it was with a single CPU.

___________________
Try to take over the world!
Message 8 of 49
(7,819 Views)
Ben, is there an efficient way of escalating this to R&D?

I assume that someone who can look inside the Number to Fractional String primitive will have much better tools to determine what could be the source of the error without needing to reproduce it.

___________________
Try to take over the world!
Message 9 of 49
(7,747 Views)
I just ran your code with the following modifications:
1. NO Delay.
2. N = 1000000
 
(I have a monster of a PC)...
 
Even under this type of ridiculous load, I was unable to replicate this error...
 
I am curious what else was going on process wise on the Laptop when you saw this occur. Have you applied any updates, removed any systray software since then...???
0 Kudos
Message 10 of 49
(7,735 Views)