Real-Time Measurement and Control

cancel
Showing results for 
Search instead for 
Did you mean: 

real time image processing: RT crashes intermittently

Hi, all,

 

I am working on a typical RT project. On the RT target (a desktop PC converted to RT target ), images are acquired and processed from two firewire cameras simultaneously in a timed loop. And through single-process shared variables, some image-related data is transfered to another timed loop which is specially for data communication (transfer network-published shared variables to remote PC). On another remote PC, another program ( works like a remote panel for display and control of the program running on the RT target ) has a while loop for data communicating with the RT target and updating control signals and images displayed.

 

The programs on both sides sometimes run ok up to 12 hours without interuption, but sometimes the program running on the RT target crashes after a very short time like a couple minutes and the data communication to the remote PC is disconnected. When crash happens, sometimes the RT target "reboot due to system error" and turns into "safe mode" and no program will run on the RT target anymore.  Sometimes, the RT target just reboots and starts with the startup.exe and then everything works normally again with the data communication reconnected.

 

Since this is a real time application for harsh industry task which needs the program crash free, so I am really concerned about this problem and tried everything I can to fix it but failed everytime. I will attach the error log for you to take a look (each time when it crashes, the same error happens ). One thing worthnoting is that the memery statistic shows all negative memory.  But I am confused if memory is the problem, why sometimes it can run up to 12 hours but sometimes only a few minutes?

 

BTW, I am not the only one encountered this problem. A user named Tavi in this post http://forums.ni.com/t5/Real-Time-Measurement-and/real-time-image-procession-problem-with-IMAQ-RT/m-... also got this problem but no positive response was given.

 

Any suggestion is so welcome and the full error log is attached.

 

Great thanks,

 

Wei

 

0 Kudos
Message 1 of 15
(4,886 Views)

That "negative memory" in your log is concerning me as well.  Tavi's got a very similar error, but his memory shows up correctly in the log.  A good first step would be to run a memory test just to make sure your memory's consistent.  If you have a single bad address of memory, it will crash whenever it tries accessing that address.  This access attempt could happen at any point, whether it's a few minutes in or a few hours.

 

Here's a link to a memory testing program that you can run from a USB Key.

Memtest86+

http://www.memtest.org/

 

Try running this for a few passes to check your memory, and let me know the results of this.

0 Kudos
Message 2 of 15
(4,870 Views)

Kyle,

 

Thank you so much for your reply.  Regarding the memory, I wonder one possible reason for the error.  I have 4 Gigbite ram (4 X 1 Gigbite ) installed with the MSI motherboard, which I doubt if it is problematic for the system to recognize them all or bring some memory problem.

 

Regarding the memory testing software, I wonder which version ( auto-installer for USB key or pre-compiles EXE file for USB key) should I use to test the memory installed on the RT target?  I just need to insert the USB and the RT target automatically boot with it and start to test?

 

Thanks,

 

Wei

0 Kudos
Message 3 of 15
(4,865 Views)

Exactly, you would use the auto-installer to create a bootable USB and set your Real-Time PC's BIOS to boot into "Windows/Other OS."  See "part 2" of this KnowledgeBase article for an example of how to boot onto a USB from a RTPC.  The article goes over how to do this on a Real-Time PXI, but the steps taken should be very similar.  4 x 1 GB RAM should be fine, as long as the RAM's integrity is intact.

0 Kudos
Message 4 of 15
(4,839 Views)

Hi, Kyle,

 

Thanks for the info. It seems the problem gets solved by setting up the remote PC with automatic IP address which enables the data communication smooth for almost 2 days without reboot or crash.  But after a long run, the data connection will still stop. The connection has to be "activated" by closing the VI on the remote PC and starting to run the VI again. Is this normal?

 

Regards,

 

Wei

0 Kudos
Message 5 of 15
(4,819 Views)

Hi, Kyle,

 

Just after posting the previous reply, the system crashes again after I restarted. So basically, the crash happens randomly in time.  I checked on NI website and found one post like this http://digital.ni.com/public.nsf/websearch/FD24D98FF428F21686256B64007FB6C1?OpenDocument, so I change the ethernet card on the real time target into 100 Mbps half duplex and try to see if that helps to stablize the system.  Regarding the memory problem, it still shows negative memery in the error log when crash happened. But I wonder the ram shouldn't be the problem because it could run 2 days without any problem or disconnnection or reboot before I posted the previous reply.  So do you have any idea or thoughts?

 

Thanks,

 

Wei

0 Kudos
Message 6 of 15
(4,816 Views)

The thing about RAM malfunctions is that it can happen seemingly randomly, possibly only when a specific memory location is accessed.  Did the memtest run successfully?  Another thing to do is check the manufacturer's specifications for the timing on your RAM, and make sure your BIOS are set to match those.  The default is AUTO, but sometimes this causes RAM instability.

 

Do you happen to have a screenshot of your bluescreen error?  If it automatically restarts after crashing, you can turn that off by doing the following:

 

Right Click on “My Computer” and goto “Properties”. Then goto the “Advanced” tab and under the headline “Startup and Recovery”, press the “Settings” button. Untick the “Automatically Restart” box and press OK.

 

A screenshot of the bluescreen can help us get to the bottom of this.  Thanks so much!

0 Kudos
Message 7 of 15
(4,800 Views)
Hi, Kyle,

Thank you very much for your reply. regarding the ram problem, do you think that is the sign of memory leak? I revised the program running on RT PC by eliminating "build array" function to solve possible leak problem.

Regarding timing setup as u mentioned, can you be more specific? I don't see any ram timing option in BIOS yet.

And about automatic restart, since the program restarting automatically is on RT PC, there is no way to setup as you suggested.

Btw, have u checked the error log attached in the first email? Besides the RT PC crash error, there is also an error from EXEC-SMP (as recorded in "labview error log"). Do you have any idea on this since it matters 'cause the RT PC has 4 cores?

Thanks.
0 Kudos
Message 8 of 15
(4,794 Views)

From the crash log you provided, it looks like a thread used in the shared variable engine is crashing. However, the stack trace looks like it is corrupted because the calls do not make sense. Is there any chance you can attach a different one so I can see if it looks different?

 

Are you using the recently added LabVIEW support to pass Vision images directly to shared variables? If so, could you try instead making your shared variable a string and flattening/unflattening on either end? It would be interesting to see if this shows similar behavior.

 

The oddities you are seeing with your memory size I think are expected with 4GB of RAM on RT. I'm not sure all the various APIs currently properly handle returning free memory sizes >2GB (signed 32-bit variable). For sanity checking youy could try reducing your RAM to 2GB and verify it does not change any behavior seen...

 

Eric

 

0 Kudos
Message 9 of 15
(4,787 Views)

Hi, Eric and Kyle,

 

Thank you so much for your reply first.

 

Regarding your first question, I have attached the latest error log so that you can take a look.

 

About the shared variable for vision image, yes, I am using that new type of shared variable for image transfering. But I don't think this is the problem as you can see in my first post, another guy Tavi also experienced the exactly same problem as I am experiencing now.  At that time, this new type of image shared variable was not released yet, so it shouldn't be the problem. 

 

Regarding the ram,  yes, you are exactly right, I took out two rams out of the PC, and now when the RT PC crashes, the memory statistic is positive instead of negative ( you can see this change in the attached error log ).  So I will keep using two Gigbit ram.

 

But it seems the ram is not the problem since after I only used two Gigbit ram,  the RT PC still crashed once this noon and rebooted to safe mode,

so do you have any thoughts on that?

 

Great thanks,

 

Wei

0 Kudos
Message 10 of 15
(4,773 Views)