06-14-2018 04:23 PM
I have never posted on this forum before but I often leverage the information available here to minimize my learning curve. I have not been able to find any solution to the issue below so I decided its time for my first post. Some of the information below has been regurgitated form the posts of others on this forum
At random time intervals ranging from once per hour, to once per 20 hours, I am having a “Get Image2.vi” timeout error -1074360293. This error is happening on multiple computers, in LabVIEW, in executables, in Max, and in the camera vendor provided software. I am running 4 different computers each with 2 cameras. The surprising thing is that 1 of the computers never has the error while the other 3 do. This is true even when I swap around the GigE cards, the cables, and the cameras. The PC without the error is ~2 years old and the 3 that are failing are newer Alienware 51 computers. I have been unable to make any sense of this information despite a significant effort.
I am using an Adlink GIE62 card with the Intel PRO chipset. The cameras are FLIR (formally Point Gray) BFLY-PGE-50S5M-C (5MP Sony CMOS).
Here is a list of all the things I have tried so far without success
I have attached a simplified version of code that creates the error and logs the error information to a file. As I said earlier, I don’t expect the code to be helpful since the issue also happens in Max and the camera vendor provided SDK software.
Despite all of the above, the issue persists and I have no idea what to try next.
If you have any insight or know of any ideas I can test, please reply.
06-15-2018 02:14 AM
As ugly as it sounds: Wireshark to capture the GigE vision communciation for both the "good" and bad PCs. Compare.
If there is no difference: Continously wireshark.. use the LabVIEW error (or: if it is not a catchable error, a change in execution state of the VI) to dump the packets.
06-15-2018 10:05 AM
Hi rwiley,
Thanks for your detailed message.
Just a hunch, but I would recommend trying to tweak your BIOS settings. Try the following changes (if applicable on your system):
Turbo Boost: Disabled
C-States: Disabled
Speed Step: Disabled
Being in Windows "performance mode" is good, but the CPU could still be doing some aggressive power saving stuff that could affect your acquisition. This could explain why an older computer doesn't show this behavior but the newer ones (with more fancy features) do. I wouldn't say that everyone should have to change BIOS settings like this, but if you're in a situation where you're having problems like you describe, it would probably be worth trying to see if this helps.
Also, some general questions:
Are you seeing lost packets? What does your overall CPU utilization look like for the system? What about interrupts per second? Which version of VAS are you using?
Just a note on packet resends: There really shouldn't be a need to disable this. We don't do extra buffering in the kernel to keep track of packets when this is enabled and things come out of order -- the work to keep track of packets is constant. Each time a packet is received, it's copied to the right place in the image buffer. Once all the packets are received OR we've decided we've gotten all the packets we're going to get for this image (some complicated logic there based on timing, how many new packets after the missing one(s) have arrived, etc.), we let the user know the image buffer is complete. You're just eliminating the actual resend part. Resends could potentially cause trouble (interrupt storm) if you're in an extreme situation where you're dropping lots of packets and having to do a lot of resends, but if you're doing a lot of resends the acquisition probably isn't that stable anyway. In your case where you're acquiring for 20 hours straight I doubt you're in such a volatile situation where you're doing enough resends to have any significant impact on performance. If you have them enabled, do you observe that you're doing resends? You can query those attributes in LV or just look at the ethernet attributes tab in MAX to see those metrics.
Let me know if you're able to try the BIOS stuff -- hopefully we can help you find a resolution!
-Katie
Software Engineer, Vision Acquisition Software R&D
06-15-2018 10:28 AM
b.ploetzeneder
I like your suggestion because it is something I have not tried, but I would like to understand the underlying logic. If the GigE card/Driver, the cables, and the camera have been ruled out; what variance would I expect to see by monitoring the link between 2 known good components. My testing implies that it is related to the PC but I don’t know what to test or change to uncover the primary source. I have just downloaded Wireshark. It doesn’t seem very intuitive or user friendly for a newbie but I will proceed out of desperation. Regarding changing the execution state to dump packets, please explain what you mean, as I am not familiar with this terminology.
Thx for your response
RGW
06-15-2018 11:04 AM
Katie
Wow, I’m not sure if changing the BIOS settings will work but your logic is clear and solid. I will try this right away.
To answer your questions, in the last 18 hours camera#1 lost 63,294 packets and camera#2 lost 62,234 packets (with resends disabled). The lost packets tend to come in large chunks but do not always cause a timeout error. Each camera had one timeout (several hours apart) but both were issue free again by the very next frame. CPU utilization is typically <10%, but spike on occasion to >50% for no apparent reason. This may correlate with the timeout error but I don’t know how to test for this. I have already disconnected the PC’s from the network to prevent a “windows update” or anything else external from affecting the results.
I have not looked at interrupts per second. How do you suggest I do that, and why?
We are running version 17.5 of VAS
New information: Yesterday I started tracking the buffer number when the timeout occurs. On 2 different cameras several hours apart, they both reported a timeout at buffer number 4294967294. I have no idea what this means or if it is relevant.
Your feedback on packet resends is excellent and much appreciated. I was taking a shot in the dark with that choice and thanks to you, I now know better. Even more valuable, I also know why. So thx for going the extra mile on your response.
RGW
07-18-2018 11:18 AM
I am struggling with the same issue with a AVT Manta GigE cam with LV and or MAX. AVT GigE viewer will display an image at full frame rate for this camera (around 67fps) forever with no issues... I'm getting the impression there is something not robust with the IMAQdx function(s).
For what it is worth, 2^32 = 4294967296, 2 more than your noted buffer number - perhaps a data type overflow/rollover issue?
08-10-2018 03:25 PM
RGW,
I apologize for the super late follow-up but I wanted to check in and see if you have any updates on your situation. Did the BIOS changes help?
Regarding interrupts/second, that is a field that you can select in Performance Monitor (perfmon.exe) under the category of Processor Information. It's somewhat relevant because even if you have relatively low CPU utilization, a high number of interrupts per second can translate to enough latency in your application to cause you to lose packets.
Regarding the buffer number 4294967294, this is a super common misconception that I don't think we have done a great job at documenting or explaining. For this reason I'll try to explain it in detail here for you and for anyone else who might have a similar question.
This value doesn't mean it failed trying to acquire that particular number. In fact, IMAQdx's buffer number uses only 24 bits, so the maximum buffer number you can have is 2^24 - 1, or 16777215. After that maximum, it will wrap back around to 0. The rest of the bits are used to indicate special values, like Next, Newest, Oldest, Every, Last New, None, etc. When you fail to get an image (like in your timeout case), whatever buffer number you get out is not valid. I would expect that you should get the value that indicates None, which is 0xFFFFFFFC or 4294967292. What you see is 0xFFFFFFFE or 4294967294, aka "Newest Value." I'm guessing that's just what your policy was for getting the image, so we're seeing that buffer number out. I'll look into why it's that instead of "No Buffer Value" which is what I expected. Long story short, in an error case, you aren't actually getting a buffer back so the buffer number returned is not valid. It is expected behavior that every time you have an error you'd have the same buffer number out there, indicating you didn't actually get the buffer number that you wanted.
Hope this clarifies some things for you. Please let me know how your acquisition situation is going.
Katie
08-10-2018 03:34 PM
@wolly,
RGW is referring to a 10 GigE acquisition, which is significantly more processor intensive than a 1 GigE acquisition and can be more sensitive to system setup and acquisition configuration issues.
We have thoroughly tested and have many customers using 1 GigE cameras (including AVT Mantas) all day every day for weeks on end with no timeout issues. We fully expect this use-case to work very well, so I'm sorry to hear you are experiencing timeouts.
Please provide more information about your system setup and configuration, and perhaps we can help.
Also - please see my other post for clarification on the buffer number situation.
Thanks,
Katie
08-13-2018 04:58 AM
Hello @all,
here is Bernardo from Allied Vision in Germany. I agree with Katie, knowing a little more about the system configuration will help to understand the problem. Now, from my experience, I can tell that if you are using a NIC from Intel you might wish to set also the Interrupt Moderation Rate = Extreme, and the Receive Buffer = 2048, beside the JumboPacket = 9000. It is also very important to choose the correct Ethernet cables, we recommend Cat6. If a switch is being used, then verify that it supports also Jumbo Packets and deactivate any feature related to Storm Control or Broadcast Storm Control. Furthermore, if using a switch, please reduce the maximum bandwidth of each camera with the feature StreamBytesPerSecond. Otherwise, each camera will try to use the maximum bandwidth.
It is also highly recommended to use the cameras in a dedicated network. That means, the cameras should not be connected to the company network + Internet.
If you have further question regarding our camera, please free to contact us.
Bernardo Luck Villanueva // Applications Engineer
Allied Vision Technologies GmbH
@kensign
08-28-2018 05:17 PM
Hello All,
This is Manoj working under RGW who was the original poster of this issue. Sorry for late reply. @Katie,Yes we did update the BIOS setting of the PC and we did make changes to the interrupt moderation rate as suggested even after all these changes the problem still exists till date. Thanks a lot for the explanation of the Buffer number count.
The System Details are as follow:
Type of GigE Card used- Adlink GIE62 tried with both Intel Pro driver/NI GigE high performance driver -Problem Exists
Computer-Area 51 R5 with Intel Core i7-7800X CPU with 16GB RAM.
Windows 10 Pro OS.
Type of the Camera- FLIR (formally Point Gray) BFLY-PGE-50S5M-C (5MP Sony CMOS).
Thanks,
Manoj