Machine Vision

cancel
Showing results for 
Search instead for 
Did you mean: 

Random “Get Image2.vi” timeout error -1074360293 during a GigE continuous acquisition

I have never posted on this forum before but I often leverage the information available here to minimize my learning curve. I have not been able to find any solution to the issue below so I decided its time for my first post. Some of the information below has been regurgitated form the posts of others on this forum

 

 At random time intervals ranging from once per hour, to once per 20 hours, I am having a “Get Image2.vi” timeout error -1074360293. This error is happening on multiple computers, in LabVIEW, in executables, in Max, and in the camera vendor provided software. I am running 4 different computers each with 2 cameras. The surprising thing is that 1 of the computers never has the error while the other 3 do. This is true even when I swap around the GigE cards, the cables, and the cameras. The PC without the error is ~2 years old and the 3 that are failing are newer Alienware 51 computers. I have been unable to make any sense of this information despite a significant effort.

 

I am using an Adlink GIE62 card with the Intel PRO chipset. The cameras are FLIR (formally Point Gray) BFLY-PGE-50S5M-C (5MP Sony CMOS).

 

Here is a list of all the things I have tried so far without success

 

  1. The software input value of “Get Image2” was changed from “Last New” to “Next”. This change made no noticeable, positive or negative impact to the primary issue but is presumed to be a minor, generic programing improvement. We will default to “Next” moving forward.
  2. GigE driver version was evaluated and was ruled out as the primary issue. Despite this fact, a significant amount of data does exist that proves a significant value add when using the NI “High-Performance Driver” instead of the Intel PRO/1000 default driver. In addition to reducing processor overhead, this driver eliminates the potential of windows firewall and other antivirus software from blocking video packets. This type of packet blocking can be intermittent, difficult to troubleshoot, and should be avoided by always using the NI “High-Performance Driver”.
  3. GigE cable quality was previously studied and has proven to offer value regarding EMI tolerance. Regarding the current issue, only the high-quality cables have been tested.
  4. Camera thermal variance was studied and ruled out as a contributing factor. In a short (~1 hour) test one camera was limited to 28°C with heatsinking and a second was elevated to 75°C using a heater. There was no noticeable performance difference between the two cameras. As a reference, a camera mounted in a CMS will typically level off ~45°C and a “floating” camera with no airflow (in a bag) will typically be ~65°C
  5. Camera settings
    • Firewall Traversal attribute is Enabled. (This is not believed to be relevant when using the NI “High-Performance Driver” but is good practice.
    • Camera Timeout values were not studied during this analysis, but it is believed that a value of ≥1000 will minimize the likelihood of false errors. This is important when the PC momentarily becomes distracted by other processes (like network communication) especially if the PC is in any form of hibernation. This includes when the processor has throttled down to save power (windows default).
    • Windows power management has been set to “performance mode” to prevent the CPU from “throttling down” or attempting to sleep in any way.
    • Jumbo Packet size is set to a value of 9000 to provide the best performance and minimize PC overhead.
    • Test Packet Enabled is set to disabled. This was studied and circumstantially ruled out as the primary cause of our camera crashing issue. Despite this fact, there is historical Forum data on the showing that having test packets enabled can cause the exact symptoms we are experiencing. This change is known to potentially cause a new error of 0xBFF60493 which is fixed by turning off the option to "Display images on remote monitor." (In MAX (Tools->NI Vision->Remote Image Options...)
    • Packet Resends Enabled is set to disabled. The supporting logic for this change was not directly found on the NI forum but instead was formulated from multiple separate pieces of information including:
      • Since image data packets are streamed using the UDP protocol, there is no protocol level handshaking to guarantee packet delivery. Therefore, the GigE Vision standard implements a packet recovery process to ensure that images have no missing data.
      • The GigE Vision header, which is part of the UDP packet, contains the image number, packet number, and timestamp. As packets arrive over the network, the driver transfers the image data within the packet to user memory. When the driver detects that a packet has arrived out of sequence (based on the packet number), it places the packet in kernel mode memory. All subsequent packets are placed in kernel memory until the missing packet arrives. If the missing packet does not arrive within a user-defined time, the driver transmits a resend request for that packet. The driver transfers the packets from kernel memory to user memory when all missing packets have arrived.
      • CPU Usage: Unlike other machine vision bus technologies, which can DMA images directly into memory, Gigabit Ethernet requires CPU usage for packet handling. This means that you are left will fewer CPU cycles to perform image analysis. While CPU usage can be mitigated to a certain extent by using the high-performance driver, it cannot be eliminated.
      • Latency: Because network packets can get lost, requiring resends and network packet handling is CPU dependent, there is a non-trivial latency between the time a camera captures an image and the time it appears in user memory.

 

I have attached a simplified version of code that creates the error and logs the error information to a file. As I said earlier, I don’t expect the code to be helpful since the issue also happens in Max and the camera vendor provided SDK software.

 

Despite all of the above, the issue persists and I have no idea what to try next.

If you have any insight or know of any ideas I can test, please reply.

0 Kudos
Message 1 of 10
(3,893 Views)

As ugly as it sounds: Wireshark to capture the GigE vision communciation for both the "good" and bad PCs. Compare.

 

If there is no difference: Continously wireshark.. use the LabVIEW error (or: if it is not a catchable error, a change in execution state of the VI) to dump the packets.

 

0 Kudos
Message 2 of 10
(3,851 Views)

Hi rwiley,

 

Thanks for your detailed message.

 

Just a hunch, but I would recommend trying to tweak your BIOS settings. Try the following changes (if applicable on your system):

Turbo Boost: Disabled

C-States: Disabled

Speed Step: Disabled

 

Being in Windows "performance mode" is good, but the CPU could still be doing some aggressive power saving stuff that could affect your acquisition. This could explain why an older computer doesn't show this behavior but the newer ones (with more fancy features) do. I wouldn't say that everyone should have to change BIOS settings like this, but if you're in a situation where you're having problems like you describe, it would probably be worth trying to see if this helps.

 

Also, some general questions: 

Are you seeing lost packets? What does your overall CPU utilization look like for the system? What about interrupts per second? Which version of VAS are you using?

 

Just a note on packet resends: There really shouldn't be a need to disable this. We don't do extra buffering in the kernel to keep track of packets when this is enabled and things come out of order -- the work to keep track of packets is constant. Each time a packet is received, it's copied to the right place in the image buffer. Once all the packets are received OR we've decided we've gotten all the packets we're going to get for this image (some complicated logic there based on timing, how many new packets after the missing one(s) have arrived, etc.), we let the user know the image buffer is complete. You're just eliminating the actual resend part. Resends could potentially cause trouble (interrupt storm) if you're in an extreme situation where you're dropping lots of packets and having to do a lot of resends, but if you're doing a lot of resends the acquisition probably isn't that stable anyway. In your case where you're acquiring for 20 hours straight I doubt you're in such a volatile situation where you're doing enough resends to have any significant impact on performance. If you have them enabled, do you observe that you're doing resends? You can query those attributes in LV or just look at the ethernet attributes tab in MAX to see those metrics.

 

Let me know if you're able to try the BIOS stuff -- hopefully we can help you find a resolution!

-Katie

Software Engineer, Vision Acquisition Software R&D

Message 3 of 10
(3,836 Views)

b.ploetzeneder

I like your suggestion because it is something I have not tried, but I would like to understand the underlying logic. If the GigE card/Driver, the cables, and the camera have been ruled out; what variance would I expect to see by monitoring the link between 2 known good components. My testing implies that it is related to the PC but I don’t know what to test or change to uncover the primary source. I have just downloaded Wireshark. It doesn’t seem very intuitive or user friendly for a newbie but I will proceed out of desperation. Regarding changing the execution state to dump packets, please explain what you mean, as I am not familiar with this terminology.

Thx for your response

RGW  

0 Kudos
Message 4 of 10
(3,832 Views)

Katie

Wow, I’m not sure if changing the BIOS settings will work but your logic is clear and solid. I will try this right away.

To answer your questions, in the last 18 hours camera#1 lost 63,294 packets and camera#2 lost 62,234 packets (with resends disabled). The lost packets tend to come in large chunks but do not always cause a timeout error. Each camera had one timeout (several hours apart) but both were issue free again by the very next frame. CPU utilization is typically <10%, but spike on occasion to >50% for no apparent reason. This may correlate with the timeout error but I don’t know how to test for this. I have already disconnected the PC’s from the network to prevent a “windows update” or anything else external from affecting the results.

I have not looked at interrupts per second. How do you suggest I do that, and why?

We are running version 17.5 of VAS

New information: Yesterday I started tracking the buffer number when the timeout occurs. On 2 different cameras several hours apart, they both reported a timeout at buffer number 4294967294. I have no idea what this means or if it is relevant.

Your feedback on packet resends is excellent and much appreciated. I was taking a shot in the dark with that choice and thanks to you, I now know better. Even more valuable, I also know why. So thx for going the extra mile on your response.

 

RGW

0 Kudos
Message 5 of 10
(3,828 Views)

I am struggling with the same issue with a AVT Manta GigE cam with LV and or MAX. AVT GigE viewer will display an image at full frame rate for this camera (around 67fps) forever with no issues... I'm getting the impression there is something not robust with the IMAQdx function(s).

For what it is worth, 2^32 = 4294967296, 2 more than your noted buffer number - perhaps a data type overflow/rollover issue?

0 Kudos
Message 6 of 10
(3,695 Views)

RGW,

 

I apologize for the super late follow-up but I wanted to check in and see if you have any updates on your situation. Did the BIOS changes help?

 

Regarding interrupts/second, that is a field that you can select in Performance Monitor (perfmon.exe) under the category of Processor Information. It's somewhat relevant because even if you have relatively low CPU utilization, a high number of interrupts per second can translate to enough latency in your application to cause you to lose packets.

 

Regarding the buffer number 4294967294, this is a super common misconception that I don't think we have done a great job at documenting or explaining. For this reason I'll try to explain it in detail here for you and for anyone else who might have a similar question. 

 

This value doesn't mean it failed trying to acquire that particular number. In fact, IMAQdx's buffer number uses only 24 bits, so the maximum buffer number you can have is 2^24 - 1, or 16777215. After that maximum, it will wrap back around to 0. The rest of the bits are used to indicate special values, like Next, Newest, Oldest, Every, Last New, None, etc. When you fail to get an image (like in your timeout case), whatever buffer number you get out is not valid. I would expect that you should get the value that indicates None, which is 0xFFFFFFFC or 4294967292. What you see is 0xFFFFFFFE or 4294967294, aka "Newest Value." I'm guessing that's just what your policy was for getting the image, so we're seeing that buffer number out. I'll look into why it's that instead of "No Buffer Value" which is what I expected. Long story short, in an error case, you aren't actually getting a buffer back so the buffer number returned is not valid. It is expected behavior that every time you have an error you'd have the same buffer number out there, indicating you didn't actually get the buffer number that you wanted.

 

Hope this clarifies some things for you. Please let me know how your acquisition situation is going.

 

Katie

Message 7 of 10
(3,632 Views)

@wolly,

 

RGW is referring to a 10 GigE acquisition, which is significantly more processor intensive than a 1 GigE acquisition and can be more sensitive to system setup and acquisition configuration issues.

 

We have thoroughly tested and have many customers using 1 GigE cameras (including AVT Mantas) all day every day for weeks on end with no timeout issues. We fully expect this use-case to work very well, so I'm sorry to hear you are experiencing timeouts.

 

Please provide more information about your system setup and configuration, and perhaps we can help.

 

Also - please see my other post for clarification on the buffer number situation.

 

Thanks,

Katie

0 Kudos
Message 8 of 10
(3,630 Views)

Hello @all, 

 

here is Bernardo from Allied Vision in Germany. I agree with Katie, knowing a little more about the system configuration will help to understand the problem. Now, from my experience, I can tell that if you are using a NIC from Intel you might wish to set also the Interrupt Moderation Rate = Extreme, and the Receive Buffer = 2048, beside the JumboPacket = 9000. It is also very important to choose the correct Ethernet cables, we recommend Cat6. If a switch is being used, then verify that it supports also Jumbo Packets and deactivate any feature related to Storm Control or Broadcast Storm Control. Furthermore, if using a switch, please reduce the maximum bandwidth of each camera with the feature StreamBytesPerSecond. Otherwise, each camera will try to use the maximum bandwidth. 

It is also highly recommended to use the cameras in a dedicated network. That means, the cameras should not be connected to the company network + Internet. 

 

If you have further question regarding our camera, please free to contact us.

 

 

Bernardo Luck Villanueva // Applications Engineer

Allied Vision Technologies GmbH



@kensign

Bernardo Luck // Applications Engineer
Allied Vision Technologies GmbH
Klaus-Groth-Str. 1, 22926 Ahrensburg, Germany
Message 9 of 10
(3,622 Views)

Hello All,

 

This is Manoj working under RGW who was the original poster of this issue. Sorry for late reply. @Katie,Yes we did update the BIOS setting of the PC and we did make changes to the interrupt moderation rate as suggested even after all these changes the problem still exists till date. Thanks a lot for the explanation of the Buffer number count. 

 

The System Details are as follow:

Type of GigE Card used- Adlink GIE62 tried with both Intel Pro driver/NI GigE high performance driver -Problem Exists

Computer-Area 51 R5 with Intel Core i7-7800X CPU with 16GB RAM.

Windows 10 Pro OS.

Type of the Camera- FLIR (formally Point Gray) BFLY-PGE-50S5M-C (5MP Sony CMOS).

 

 

Thanks,

Manoj

0 Kudos
Message 10 of 10
(3,524 Views)