Re: Disk Status confusion -- MIDRANGE-L

Thanks, everyone. We had one spare drive and did the replacment. The advice that replacement was urgent before another failure would cause complete rebuild from system save was the thing that lit the fire under me.
We are now back to...

Work with Disk Status

Elapsed time:   00:00:00

           --Protection--
Unit ASP Type Status    Compression
   1    1 DPY   ACTIVE
   2    1 DPY   ACTIVE
   3    1 DPY   ACTIVE
   4    1 DPY   ACTIVE
   5    1 DPY   ACTIVE
   6    1 DPY   ACTIVE
   7    1 DPY   ACTIVE
   8    1 DPY   ACTIVE

and...

Display Disk Configuration Status

           Serial Resource                      Hot Spare
ASP Unit Number          Type Model Name Status             Protection
   1 Unprotected
        1 Y010D30090TW    198C 099 DMP013     RAID 5/Active           N
        2 Y6800TV1N64J    198C 099 DMP020     RAID 5/Active           N
        3 Y010D3008RDW    198C 099 DMP015     RAID 5/Active           N
        4 Y010D300911A    198C 099 DMP005     RAID 5/Active           N
        5 Y010D3008UC7    198C 099 DMP001     RAID 5/Active           N
        6 Y010D3008R7Y    198C 099 DMP011     RAID 5/Active           N
        7 Y010D3008UBG    198C 099 DMP007     RAID 5/Active           N
        8 Y210W7K0JQ4C    198C 099 DMP009     RAID 5/Active           N

Best Regards,

Thomas Garvey
Corporate Scientist
Unbeaten Path International
630-462-3991
/www.unpath.com <http://www.unpath.com/>
/

On 2/15/2021 3:49 PM, Patrik Schindler wrote:

Hello Thomas,

Am 15.02.2021 um 19:31 schrieb Thomas Garvey <tgarvey@xxxxxxxxxx>:

Display Device Parity Status

Parity Resource Hot Spare
Set ASP Unit Type Model Name Status Protection
1 2BE1 001 DC01 RAID 5 N
1 2 198C 099 DMP020 Unprotected
1 1 198C 099 DMP013 Unprotected
1 4 198C 099 DMP005 Unprotected
1 5 198C 099 DMP001 Unprotected
1 8 198C 099 DMP009 Failed
1 3 198C 099 DMP015 Unprotected
1 6 198C 099 DMP011 Unprotected
1 7 198C 099 DMP007 Unprotected

In addition to other people's valid comments…

I had one occasion some months ago, with a 8203-E4A containing five disks in a RAID5. One was marked as faulty over night. I know that there are other than fatal faults, so from habit in the PC-World, I just re-added that disk (forced a rebuild, I can't recall the precise thing I did in SST). If it really had a (media related) problem, rebuild would have kicked it out again.

Rebuild went without any problems. That particular disk hasn't been conspicuous for months now.

(From the beginning, there's a solid backup strategy in place for that machine, involving monthly save 21's, and daily savechgobj to an automatically created iso image being ftp'd to a backup server afterwards. IFS isn't used beyond what was installed there through the OS itself.)

This outcome matches with my decades old experiences with PC servers (not no-name crap, I'm talking about HPE and IBM, for example) running Linux. I describe that as SCSI- (1990's, early 2000's) and today, SAS-Hiccups, from the apparent lack of something being broken. RAID logic doesn't get answer from the drive in a timely manner and declares it as faulty. A lot of kernel log entries, but no clear culprit. Happens once every one or two years per machine, depending on I/O load.

Just saying. Your mileage may vary.

:wq! PoC