Does anyone experienced missing steps in autonomous mode?

dummi · ‎02-28-2011

We have seen our robot missing some steps in the autonomous sequence sometimes , we are wondering if some one has identified the root cause for this behavior.

FTC · ‎03-02-2011

We have the problem too! It's missing the (Wait For Time) sometimes so the motor stops before the time counts. Need help!

BTreece · ‎03-03-2011

It is possible that you are running into what are called race conditions. This occurs when you do not direct the flow of your program. In LabVIEW your program executes as data is available. If you want to make sure that something executes in a certain order, you can utilize a sequence structure. A sequence structure will make sure that everything in one sequence is complete before moving to the next. This should solve your issue.

Brandon T.

National Instruments

JX.Y · ‎03-03-2011

I checked the advanced programming manual and it says NXT only supports one-step sequence structure. Will that work to strict the program flow too? I have tried the multi-step sequence structure and it made LabVIEW won't download the program to NXT (Error 1003).

I have also tried using the iteration count. But it didn't work as I expected. I set the loop to 200Hz but it didn't stop the motor after 200 counts, which I was assuming to be after 1 second. Am I wrong about that?

Thank you so much!

BTreece · ‎03-04-2011

Yes that will work as well. All you need to do is run a wire from you program through that structure so that the porgram is forced to go through the sequence structure.

When you use loop timing it is setting the speed at which you would like the loop to operate. You do this by adding a loop delay. You could implement some logic that will cause your loop to stop or change to a new state after 200 iterations which would change the state of your motors.

Brandon T.

National Instruments

msbutler · ‎05-03-2011

Well, this is a bit wordy, but I think I've discovered the essence of this problem after discussing this issue at length with both National Instruments rep Jeff Steele and with Team 3658 (BOSON - Subatomic Particles, finished #2 in Edison division) last week at the World Championships. We think that the problem is actually within the sub-VI for moving the motors in "position" mode. On the FTC site, I noted that the problem was also seen by some teams using ROBOTC, while we were experiencing the problem in LabVIEW. However, based on what I think is actually happening, it doesn't surprise me that the problem would occur in either program.

What happens is this - when you are using the "position" sub-VI, you are telling the motor to turn a certain number of rotations or degrees (for example, "move from 0 to +8130"). However, the Tetrix encoders will sometimes throw out a spurious number (e.g., 1, 2, 3, 4, 8450, 6, 7). If that spurious number is higher than the limit set by your program for the move (e.g., +8130), then the sub-VI will immediately consider the move "finished" and move to the next step. So the program is not actually "skipping" a step, it is just receiving an erroneous input that tells it to finish the move early. That's why sometimes it appears to skip a step altogether, while at other times it appears to "stop short" in the middle of a move (i.e., not complete a move).

We figured that it must be some sort of HW issue somewhere in the encoder chain (encoder-encoder wire-motor controller-data cable-NXT) that causes the spurious signal, so we started replacing everything one at a time. At one point, after replacing the encoder cables themselves, it appeared to fix the problem (we ran our program 20 times straight with no errors). Unfortunately, the problem soon appeared again, and we were stumped.

However, the coach from 3658 told me that, after extensive research and doing all the things that we did, they solved the problem by switching from "position" mode for a move to "speed and time" mode - in other words, instead of telling the motor to move a certain number of degrees, they programmed it to move at a specified speed for a certain amount of time. This solves the problem because, even though the encoders may still occasionally throw out a spurious signal, the move is now bounded not by encoder position but by time, which is of course controlled by a very accurate clock in the NXT itself. So let's say you set the motor to move at 20 deg/sec for 6 seconds - the program is still using the encoders to get feedback on motor speed, but that won't create a program error in and of itself. The motor speed readings may be (20, 20, 20, 300, 20, 20, ...) - but that obviously erroneous reading in the middle is essentially smoothed out (i.e., the controller may send out a very brief signal to slow the motor down after reading 300, but it will immediately increase voltage to the proper level as soon as it reads the desired speed of "20" again). And that slight voltage surge starts and stops so quickly that it has no noticeable effect. So the move will continue to work until the 6 seconds is completed, at which point the program will go to the next move. The only way that the program could think that the move was completed "early" is if it received an erroneous clock signal from the NXT (which is very unlikely).

So we actually reprogrammed our robot in the middle of the World Championships last week to work in this "speed/time" mode, and sure enough, it seems to have solved the problem! Unfortunately, you need to be careful, as programming in "speed" mode can create new problems - when our robot locked up on a ramp during autonomous mode, the controller kept sending power to the wheels to try to get them up to the programmed speed - but since the robot wasn't able to move, the wheels were grinding on the ground, and the controller kept trying to increase the voltage to increase the speed to the set point. This resulted in us burning out BOTH motors during autonomous mode in one match, which sunk us. So we need to figure out a better way to handle error correction based on unexpected encoder readings.

In addition, we don't think that the speed/time mode is as accurate as position mode, so we had noticeably more variation in terms of where our robot ended up at the end of a series of moves. But this certainly seemed to be an acceptable trade-off given the alternative of just skipping steps in the middle of a position move!

The best alternative would probably be to program in some better error handing in the position sub-VI - i.e., it should be able to "toss out" any reading that is not sequential (as the encoder rotates, it always must go immediately to the nearest 1/8 degree on either side, right?). It isn't physically possible for the encoder to skip from position 4 to position 8000 and then back to 5 - but the sub-VI apparently doesn't know that.

Hope this helps!

Michael Butler

Head Coach, Iron Eagles Robotics Team (FTC #3708)

JX.Y · ‎05-03-2011

Thank you very much for your feedback! That is great and I am pretty sure teams next year will find it helpful. We should open a topic and let everyone write down things that can be improved next year. I rembember our team encountered a lot of problems this year and there isn't much information online that helps...

Philbot · ‎05-09-2011

Hi Michael.

I've done extensive testing on this myself, but in the LabView world.

I narrowed the problem down to occasional bad data reads from the DC Motor controller.

It's a fact of life that every now and then the NXT will get a bad data read from the Motor controllers (which may be doing one of many actions).

The problem is that when you are doing a "run to position" command, the DC Motor Controller is actualy closing the loop on the position, but it's reporting back to the NXT as to when it's done.

So once the NXT LabVIEW sets everything in motion (by setting the target position and speed) it sits in a loop waiting for the two "busy" bits (one for each motor) to go false.

Unfortunately (For LabVIEW users at least) when a glitch occurs on the I2C communictions bus between the NXT and the Motor Controller, the most common result is a set of 0 bytes being returned instead of the real data (the low level VI's are actually coded to do this D'OH!).

So... consequently this "0" byte make it appear that the move is complete (busy bit goes to zero), so LabVIEW just returns a "done" to your autonomous code which probably moves on to the next step in the sequence (prematurely).

This action can make it appear that the robot is missing steps, because dips and glitches in your power caused by motors starting up can often cause the telemetry glitch in the first palce.

When I discovered this, I looked at the status byte coming back from the DC motor controller and realized that I could check for these error conditions and unless the data was good, I changed the code to return a "not done". Esentially there is a bit that indicates that it should be "running to position", and a bit that indicates that it's still "busy" trying to get to that position. In my modified VI, I ONLY return a "Done" when the "Run to Position" Bit is SET, and the Busy bit is CLEAR. Any other combination returns "not done".

By doing this change I basically eliminated ALL the premature move terminations.

However, don't get me started about how "Run To Position" only runs at 65% full speed.

You Time & Speed solution actually gives better speed performance.

Phil.

Get a life? This IS my life!

msbutler · ‎05-10-2011

Thanks to Phil for the awesome tips! Just FYI, my team is also using LabVIEW, and although some respondents have apparently seen the issue in RobotC, my informal surveys in St. Louis certainly seemed to indicate that the problem is less prevalent there for some reason (my guess is that they may have more robust error handling in their subroutine). And I'm not entirely sure that I agree with Phil about the actual source of the bad data reads - they may actually come from the encoder itself as opposed to the motor controller. I base this on the fact that we saw significant improvements in performance after we changed our encoder cables (but I will emphasize that this did not eliminate the problem, it just made it less prevalent).

Regardless, we all seem to be in agreement on the basic issue - a hardware error (somewhere in the encoder chain) can sporadically cause the motor controller to send a bad data point back to the NXT, and the lack of error handling by the "Run to Position" VI causes the program to return a "done" bit to the program and move on to the next step. There doesn't appear to be anything that we can do about eliminating the sporadic bad bits, so this issue needs to be handled via more robust error-handling in the sub-VI. Phil is much more of an expert on the SW side than I am, so while I understand the basic concept of his software fix, I haven't had a chance to actually investigate the details of what he is saying. However, I also think that this is something that should be corrected at the sub-VI level, and since I actually work for National Instruments, I'm going to try to elevate this issue directly to our team that works on our FTC programs. Phil, I'm also going to ask them about why "Run to Position" only runs at 65% speed (that's something that I didn't know!).

And one final comment - Phil mentions at the end of his post that the "Time & Speed" solution actually gives better speed performance. That may be true, but based on our past few tournaments, it is not nearly as accurate as the "Run to Position" mode. Even for a very simple sequence of moves over a short distance (e.g., moving forward, making a right angle turn, and then moving forward to align with the rolling goal), we have been seeing MUCH more variation in our final position than we ever saw when using the "Run to Position" mode. If it weren't for the physical design of our robot (which allows the rolling goal to sort of get "sucked in" to scoring position as we push against it), then I would guess that we would only have been able to score successfully in autonomous mode about 30-50% of the time. So I think it is imperative that these issues with "Run to Position" get addressed before next season.

I'm also going to post this on the FTC blog.

Thanks!

Michael Butler

Head Coach, Iron Eagles Robotics Team (FTC #3708)

Philbot · ‎05-10-2011

Hi

Thanks for enduring my long post.

The reason I beleive that the problem is primarily between the NXT and the Motor controller is that I discovered this behavior goes away when I accidentally ignored the "Done" state that was coming out of the Check MotorVI. So, if you issue a "Run to position" command and set the "return immediatly" flag, then just wait a fixed time before going on to the next nove, the moves ALL work perfectly. I spent a lot of time with a demo bot up on blocks, just running tests and recording the results. I could issue a "Run to position" command every 2 seconds (reversing direction each time) and assuming the actual move took about one second, it would run the correct distance with no errors for 20 minutes. If I then actually used the "Done" flag to determine if/when the move is complete, one time out of 15 (approx) the move would terminate prematurely due to an "apparent" done bit. This got worse as the 12V battery voltage dropped past 13.5V

This told me that the Motor controller was correctly reading the encoder counts correctly, and only stopping when it reached the correct position. Now I guess it's possible that it may have internally been seeing occasional bad encoder reads, but remeber that the encoders are quadrature, so the controller is "counting" overlapping pulses. It would be very difficult for the count to jump up, and then down based on noise alone. If you "see" this from the NXT, you could once again be getting a bad data transfer.

I actually posted this observation/patch a while back, but my attachement dissapeared somehow.

http://decibel.ni.com/content/message/19996#19996

As for the Run To Position speed issue. I sent a white paper to HiTechnic and got a basic "That could be true" response. I recently posted my findings on Chief Delphi and have received mixed reviews. Not everyone agrees with my determinatioin that this is a bug . Whatever... I'd just like to know one way or another. And whether it can be fixed.

http://www.chiefdelphi.com/media/papers/download/3089

I'm happy to gather up my code and make it available for review.... bearing in mind I've made a LOT of changes to "overcome" both of the issues listed in the two white papers.

Phil.

Get a life? This IS my life!

FIRST Tech Challenge Discussions