10-07-2020 05:19 PM
Hi All, sorry in advance for the cross-post.
After running LV real-time doing some vision analysis and saving fails/passes to an SSD drive for a few months we're seeing what we suspect might be drive corruption. We don't have any hard clues, but we've noticed that if clone the drive to a new drive and boot off that drive, the problems magically go away.
Does this sound familiar to anyone and if so, would you have any recommendations as to where to start digging?
(And if the issue is a file system issue, are there any log files that we can check into?)
Thanks in advance,
Dan
Solved! Go to Solution.
10-07-2020 05:56 PM
@dmccarty wrote:
Hi All, sorry in advance for the cross-post.
After running LV real-time doing some vision analysis and saving fails/passes to an SSD drive for a few months we're seeing what we suspect might be drive corruption. We don't have any hard clues, but we've noticed that if clone the drive to a new drive and boot off that drive, the problems magically go away.
Does this sound familiar to anyone and if so, would you have any recommendations as to where to start digging?
(And if the issue is a file system issue, are there any log files that we can check into?)
Thanks in advance,
Dan
Vision analysis sounds like heavy disk i/o. Lots of disk writes. Each SSD "sector" has a finite amount of writes. I guess, by cloning it to a new drive, you give yourself some breathing room, but eventually you will run into the same issue. I think this is a more likely scenario than LV corrupting your hard drive.
Have you looked at your SSD's S.M.A.R.T. drive health report?
10-07-2020 06:03 PM
SSDs have a limited number of read/write cycles on a single cell before they go bad. In theory a drive should manage this internally and move files from failed cells to non-failed ones quietly in the background, but if you've been doing massive amounts of data saving or buying bargain-basement SSDs, it might not have many write cycles available or might be doing a bad job of managing itself.
Do you still have some of these older drives lying around? You can try one of many "drive health checkers" on them to see if they report problems.
Whatever the cause, it seems unlikely that LabVIEW specifically would be the root cause, but it's hard to know for sure.
10-07-2020 07:30 PM
Thanks for the tip, I don't believe we've tried the SMART tool, but that fits along with one of our working theories that trim isn't being actively used on the drive and the SSD is eventually "unhealthy."
(Speaking of which, does anyone know if NIOS has a trim utility for SSD's?)
I should also mention that the drive isn't completely corrupted, it just seems to cause system errors, like random crashes or bad behavior that's been very difficult to reproduce. Are there any filesys logs that NIOS keeps around that might help see whether this is the culprit?
10-08-2020 08:53 AM
Wich controller model? If it is a NI Linux based one, it's pretty standard Linux so you can use many tools that are available for Linux. You just might have to install them first with opkg.
10-08-2020 09:36 AM
These systems of ours go back to the day when NI recommended RTPC's, so it's a pretty custom system. But the drives are Swissbit SSD's and fairly new.
Link: https://www.digikey.com/en/products/detail/swissbit/SFSA120GQ1AA4TO-I-LB-226-STD/9920507
From a Linux shell are there any "SMART"-like packages to display drive health or whatnot that you'd recommend for this sort of thing?
10-08-2020 10:06 AM
First hit on Google "linux s.m.a.r.t status":
Using smartctl to get SMART status information on your hard drives
10-08-2020 10:06 AM - edited 10-08-2020 10:12 AM
A custom RTPCs, then it is almost certainly Pharlap and then you are pretty much out of luck. Whatever is contained in a Pharlap installation is pretty much all that you can get. No extra utilities to install.
In earlier days you might have been able to buy the Pharlap Development System to get access to extra tools, but that was a very expensive investment and required you do get dirty with low level programming interfaces. With Pharlap being definitely discontinued several years (~2013) ago that's not even an option anymore.
That its disk drivers might not be best suited for SSD use is probably a safe assumption.
The next version of Windows 10 will contain build in SSD performance reports, probably based on accessing the SMART interface. Under Lnux there are all kind of tools and utilities to access various aspects of a SMART drive. I used CrystalDiskInfo under Windows in the past for this.
Swissbit doesn't say anything to me. Despite its Swissness sound it may not be the quality you expect. It mostly depends on the SSD chips used and there aren't to many manufacturer of these. The various drive manufactures simply package them in some way and add some controller logic to it. Part of that controller logic can make a difference as it implements things like TRIM. But the cell quality in the chips is ultimately responsible for the number of write cycles a cell will survive. And this quality varies wildly. Even Samsung, one of the better suppliers in the market has several classes of SSD chips which vary a lot in life expectancy and price.
10-08-2020 01:09 PM
@rolfk wrote:
A custom RTPCs, then it is almost certainly Pharlap and then you are pretty much out of luck. Whatever is contained in a Pharlap installation is pretty much all that you can get. No extra utilities to install.
Could OP pull the drive and analyze it in another machine? It sounds like they already have old ones removed from the system.
10-08-2020 02:45 PM
@BertMcMahan wrote:
@rolfk wrote:
A custom RTPCs, then it is almost certainly Pharlap and then you are pretty much out of luck. Whatever is contained in a Pharlap installation is pretty much all that you can get. No extra utilities to install.
Could OP pull the drive and analyze it in another machine? It sounds like they already have old ones removed from the system.
Yes, that is why I mentioned CrystalDiskInfo under Windows. And Bill mentioned smartctrl under Unix/Linux.
But it won't change the fact that Pharlap OS isn't really an ideal choice for SSD's. Even Windows needed quite some time to support them properly and as was shown in a recent patch, still managed to mess that up with one of the latest releases.
At the time SSD's got usable both from the point of reliability and price, Pharlap OS had been already announced to be in maintenance release mode. You could get minor updates only from that point on if you had a valid and ongoing Software Development license contract for it (which NI had). Even documentation was only available under these conditions. But Intervalzero won't support Pharlap ETS forever. It's likely the reason why NI plans to discontinue support for it after LabVIEW 2020.