This page is part of the NIWeek 2010 Robotics Demo series of blog posts. For an introduction to the demo and links to other parts of the series, visit http://decibel.ni.com/content/docs/DOC-13031.
One of the key components of robot navigation and mapping is the robot knowing where it is. To think about this component another way, imagine having a map that shows you the location of buried treasure, but you have no idea where you are on the map. Just as knowing your location is the key to finding the treasure, the ability of the robot to navigate successfully depends on it having knowledge of its location.
Normally we would employ GPS to determine the position of robots within their environment. However, with our demo taking place indoors and on a relatively small scale, we needed the sort of accurate and reliable position information that GPS could not provide.
With GPS out of the picture, we took the approach of using an overhead camera that could see the entire robot arena and identify each robot, its position, and its orientation. In performing all of these tasks, you could say that our implementation simulated GPS, which typically provides both positional and orientation information.
You can find all of the code for localization at (path as viewed in NIWeek 2010 Robotics Demo.lvproj):
My Computer > Localization > NI_Robotics_Localization GPS.lvlib > Localize Main.vi
This VI will actually run if you select Video in the camera source/type control, run the VI, and then select the attached video as the video file. Click the OK button to start the Debug Loop VI and observe some of what goes on under the hood when our localization code runs.
In order to provide constant information about robot positions, the camera(s) performing localization needs to be able to see all of the robots all of the time. This can be done with one camera or multiple cameras. Multiple cameras offer the advantage of providing some redundancy and additional accuracy, with the drawback of complexity added by the need to synchronize image information and simply needing to physically connect all of the cameras.
Using one camera can be relatively simple, but as with multiple cameras, there are some difficulties to overcome. When using one camera, ultimately you want to maximize your area of interest (in this case, the arena) in the camera’s image sensor. So, to allow our camera to see the entire field meant it must be far enough away that the arena is enitirely within its field of view (FOV). Most cameras—especially web cams—have an FOV of about 45 degrees. This would mean that the camera would need to be roughly 25 feet above the 20 x 20 ft arena if it is facing directly down at the field. Twenty-five feet might not seem like a lot, but given our test environment—empty office space—it was significant.
Another option is to place the camera off to the side. While the camera still needs to be at least 25 feet away, at least much of that distance could be in the horizontal direction. The problem with setting a camera off to the side, however, is twofold:
For the latter reason, placing the camera off the side and viewing the arena at a sharp angle results in much of the data returned by the image sensor being useless since the field is not present. The figure that follows illustrates this idea by showing how the robot arena looks with the camera at different angles.
The only problem with the “optimal” solution is that it requires some perspective correction since, as mentioned before, the camera is not perpendicular to the field. So with all that said about the optimal solution, we decided to mount the camera directly over the arena for simplicity's sake. If you recall, just a couple paragraphs ago we said that cameras mounted directly overhead must be 25 ft above the field. So how did we manage that in our office-space test environment? Read on for our solution.
Generally speaking, a webcam would have been the cheapest and easiest camera to work with. However, as mentioned before, webcams typically have a very small FOV, which requires a large distance between the camera and the field. Our office building, with ceilings no more than 10 ft high, required a more elegant solution. For a 20 x 20 ft field, our space limitation meant that we needed a camera with a FOV of at least 90 degrees—more than twice that of a typical webcam.
Fisheye lenses can offer the kind of FOV we needed (sometimes more than 180 degrees!), but by nature they introduce a lot distortion. The distortion makes image processing challenging and often computationally intensive. Additionally, the fisheye lens stretches the acquired image, generating an inconsistent mapping of pixel distance to actual distance, just as is the case with perspective distortion.
Fortunately, Theia Technologies makes a lens that solves this issue. Using something they call Linear Optical Technology, their ultra wide lens gives the camera a huge FOV (135+ degrees) while introducing almost no distortion.
With the wide-angle lens decided upon, we just needed a camera that could get the job done. We first looked at Basler’s BIP-1600c—an IP camera with a whopping 1600 x 1200 resolution. This was great, until we ran into the issue of using an IP camera like that for an application like ours. IP cameras compress the image to reduce its size when sent over a network, because networks typically can handle only up to 100 Mbps. Our resulting images had some artifacts introduced by the compression algorithms (usually MJPEG); artifacts appear as a simple result of any kind of image compression. The resulting images were also slightly latent due to the fact that the image had to be compressed and then uncompressed. These two factors would be pretty large obstacles to overcome in an application that involves localization. For example, in the case of the artifacts, an 11 x 11 inch identifying marker on our robot shows up as a 55 x 55 pixel image within the 1200 x 1200 video feed of the 20 x 20 ft arena; relative to 55 x 55 pixels, the artifacts produced by compression became quite large. The effect of image compression just made it too difficult to perform accurate and reliable localization given the amount of time we had to work on this project. The latency was an issue too, but we could potentially have handled that.
Our solution for the issues caused by the IP camera's image compression? The Basler Scout, scA 1600-14gc—a GiGE camera with 1628 x 1236 resolution that uses Gigabit Ethernet. The latter feature allowed us to acquire the raw image (without compression!) almost instantaneously. Before you get as excited as we were, when using a GigE camera you need to make sure that your network card allows for jumbo framing and, of course, that it is Gigabit Ethernet capable. For help with acquiring images from GigE vision cameras with NI’s vision acquisition software, see the Developer Zone: Acquiring from GigE Vision Cameras with Vision Acquisition Software. You will likely also need the high-performance network driver described in KnowledgeBase 510FS1WA: Using the High-Performance Driver with Intel(R) PRO/1000 Network Cards.
To suspend the camera over the field, we built a frame out of rectangular tube steel. The structure was essentially a horizontal 10 ft long piece of steel that slid into a triangular corner bracket that also attached to an 8 ft vertical steel tube. The 8 ft vertical piece then slid in to a base that measured 18 x 18 x 24 in. The base can be filled with bags of sand to act as a counterweight to the long camera boom. At the end of the camera boom was a small plate with a hole pattern that matched the camera so the two could be mated. The small plate attached to the camera boom through an intermediate steel piece that allowed the camera’s angle to the field to be adjusted so that it was perpendicular.
The figures that follow illustrate the parts of our camera frame and its placement over the robot arena.
Image acquisition refers to the act of software capturing the image data currently in view and converting it to a format that can be worked with. To allow flexibility in the development of our localization code, the localization process can acquire images from three different sources:
Image acquisition occurs in the VI at (path as viewed in NIWeek 2010 Robotics Demo.lvproj):
My Computer > Localization > NI_Robotics_Localization GPS.lvlib > Cameras > Localization Camera.lvclass > Acquire Camera Timed Loop.vi
With the camera mounted and acquiring images, one issue remained—how would our localization code differentiate between the four identically designed robots? To that end, we applied an 11 x 11 in. marker to the top of each robot, similar to the example that follows, that uniquely identifies it.
As shown above, each corner of the robot's marker contains a yellow dot. Also, in the second column of the first row on each robot's marker is a pinkish dot. We always place the marker with the pink dot at the front of the robot to indicate the robot's orientation. We chose this pinkish/fuschia color because in the color wheel, it is very different from the green-blue color of the marker background and therefore easy to detect. Finally, the remaining dot on each marker, which is blue in color, serves as the unique robot identifier. This dot can appear in any column of the second row, or in the second column of the third row. Each of these four positions corresponds to a particular robot, so once our code detects the position, we can identify each unique robot.
You can find the code that runs each time a new image is acquired and performs the “heart” of localization at (path as viewed in NIWeek 2010 Robotics Demo.lvproj):
My Computer > Localization > NI_Robotics_Localization GPS.lvlib > Global Camera Localization.lvclass > Localize.vi
The VI is documented and can step you through the process of how exactly we implemented localization. To briefly summarize, the vision processing algorithm works as follows:
1. Apply a color lookup table to the acquired image (refer to the next section).
2. Identify potential robot markers.
3. Examine each potential marker to determine if it has the right number of dots and the correct color of dots.
4. Based on the location of the yellow dots in the four corners, determine the robot’s orientation and position in the image.
5. Identify the specific robot based on the location of the blue, identifier dot.
6. Convert the robot’s location in the image to real-world coordinates.
View the following video at minutes 00:27 to 00:54 to see the localization at work.
A color lookup table (CLUT) played a very important role in the vision processing part of our localization code. The algorithm we used is based on the algorithm presented in reference (1), below. To understand the concept of CLUTs, imagine that three planes of color in an image, such as RGB, map to a three-dimensional space. R is the x-axis, G is the y-axis, and B is the z-axis. A color lookup table allows you to define a blob or multiple blobs in the space with a type of color. The result is a three-dimensional array where the indices of elements correspond to the RGB values of a pixel, and each element in the array represents a color class. When the CLUT is applied to an image, each pixel in the image is matched to its color class in the array. The resulting image contains only the color classes you defined. Using a CLUT can work better than any other kind of thresholding technique—even better than HSL thresholding—because blobs in the color space can be of any shape.
In order to create the CLUT, see the VI at (path as viewed in NIWeek 2010 Robotics Demo.lvproj):
My Computer > Localization > CLUT.lvlib > CLUT Trainer.vi
1. T. Rfer et. al, “Germanteam 2004, technical report of the robocup-team in the sony legged robot league,” Technical Report, HU-Berlin, U-Bremen, TU-Darmstadt, U-Dortmund, 2004.
As mentioned before, when the localization code running, you can click the OK button on the front panel of the Localize Main VI at My Computer > Localization > NI_Robotics_Localization GPS.lvlib
to start the Debug Loop VI. This VI shows you what's happening under the hood with regard to localization and allows you to change some parameters at run time. Additionally, you can use the Debug Loop VI to record video that you can use later to train the CLUT or test the localization routine again.