* failed disks, mapper, and "Invalid argument"
@ 2020-05-20 20:05 David T-G
2020-05-20 23:23 ` Wols Lists
0 siblings, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-20 20:05 UTC (permalink / raw
To: Linux RAID list
Hi, all --
I have a four-partition RAID5 array of which one disk failed while I was
out of town and a second failed just today. Both failed smartctl tests
by not even starting, although I don't have that captured. Those two
were on a SATA daughtercard, so I swapped them (formerly sde, sdf)
up to the motherboard SATA ports like the other two (still sda, sdb) and
now all are visible and happily pass smartctl checks and generally look
good ... except that my md0 doesn't :-(
I've been through the wiki and other found documentation and have scraped
the archives, but the whole mapper thing is new to me, and I don't know
enough to pin down the error. I've been attempting to fake-build my
array with overlay devices to see how it will do. Please forgive the
long post if it's a bit ridiculous; I wanted to make sure that you have
all information :-)
Here's the array after I swapped ports and booted up:
diskfarm:root:10:~> mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Mon Feb 6 00:56:35 2017
Raid Level : raid5
Used Dev Size : 4294967295
Raid Devices : 4
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Mon May 18 01:10:07 2020
State : active, FAILED, Not Started
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : diskfarm:0 (local to host diskfarm)
UUID : ca7008ef:90693dae:6c231ad7:08b3f92d
Events : 57840
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
- 0 0 1 removed
- 0 0 2 removed
4 8 1 3 active sync /dev/sda1
diskfarm:root:10:~> mdadm --examine /dev/sd[abcd]1 | egrep '/dev|vents'
/dev/sda1:
Events : 57840
/dev/sdb1:
Events : 57840
/dev/sdc1:
Events : 57836
/dev/sdd1:
Events : 48959
I'd say sdd is the former sde that went away first and sdc that was sdf
only just fell over.
In my first round, I shut down md0
diskfarm:root:12:~> mdadm --stop /dev/md0
mdadm: stopped /dev/md0
diskfarm:root:12:~> cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid5 sdf2[0] sdg2[1] sdh2[3]
1464622080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
and of course it isn't in mdstat any more. Oops. But it's down, so we
won't see any more writes that could be messy.
I whipped up four loop devices and created overlay files
diskfarm:root:13:/mnt/scratch/disks> parallel truncate -s8G overlay-{/} ::: $DEVICES
...
To silence this citation notice: run 'parallel --citation'.
diskfarm:root:13:/mnt/scratch/disks> ls -goh
total 33M
-rw-r--r-- 1 8.0G May 20 14:00 overlay-sda1
-rw-r--r-- 1 8.0G May 20 14:00 overlay-sdb1
-rw-r--r-- 1 8.0G May 20 14:00 overlay-sdc1
-rw-r--r-- 1 8.0G May 20 14:00 overlay-sdd1
-rw-r--r-- 1 11K May 20 13:20 smartctl-a.sda.out
-rw-r--r-- 1 5.3K May 20 13:20 smartctl-a.sdb.out
-rw-r--r-- 1 5.3K May 20 13:20 smartctl-a.sdc.out
-rw-r--r-- 1 5.3K May 20 13:20 smartctl-a.sdd.out
diskfarm:root:13:/mnt/scratch/disks> du -skhc overlay-sd*
8.0M overlay-sda1
8.0M overlay-sdb1
8.0M overlay-sdc1
8.0M overlay-sdd1
32M total
diskfarm:root:13:/mnt/scratch/disks> ls -goh /dev/mapper/*
crw------- 1 10, 236 May 20 08:04 /dev/mapper/control
lrwxrwxrwx 1 7 May 20 14:02 /dev/mapper/sda1 -> ../dm-1
lrwxrwxrwx 1 7 May 20 14:02 /dev/mapper/sdb1 -> ../dm-0
lrwxrwxrwx 1 7 May 20 14:02 /dev/mapper/sdc1 -> ../dm-2
lrwxrwxrwx 1 7 May 20 14:02 /dev/mapper/sdd1 -> ../dm-3
and grabbed my overlays and checked the mapper
diskfarm:root:13:/mnt/scratch/disks> OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
diskfarm:root:13:/mnt/scratch/disks> echo $OVERLAYS
/dev/mapper/sda1 /dev/mapper/sdb1 /dev/mapper/sdc1 /dev/mapper/sdd1
diskfarm:root:13:/mnt/scratch/disks> dmsetup status
sdb1: 0 3518805647 snapshot 16/16777216 16
sdc1: 0 3518805647 snapshot 16/16777216 16
sda1: 0 3518805647 snapshot 16/16777216 16
sdd1: 0 3518805647 snapshot 16/16777216 16
and so far it looks good ... as far as I know :-)
I didn't know if I should try md0, the real array name, or create a new
md1, so I took the safe approach first
diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md1 $OVERLAYS
mdadm: forcing event count in /dev/mapper/sdc1(2) from 57836 upto 57840
mdadm: clearing FAULTY flag for device 2 in /dev/md1 for /dev/mapper/sdc1
mdadm: Marking array /dev/md1 as 'clean'
mdadm: failed to add /dev/mapper/sdd1 to /dev/md1: Invalid argument
mdadm: failed to add /dev/mapper/sdc1 to /dev/md1: Invalid argument
mdadm: failed to add /dev/mapper/sda1 to /dev/md1: Invalid argument
mdadm: failed to add /dev/mapper/sdb1 to /dev/md1: Invalid argument
mdadm: failed to RUN_ARRAY /dev/md1: Invalid argument
diskfarm:root:13:/mnt/scratch/disks> cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid5 sdf2[0] sdg2[1] sdh2[3]
1464622080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
diskfarm:root:13:/mnt/scratch/disks> mdadm --examine /dev/md1
mdadm: cannot open /dev/md1: No such file or directory
but didn't fet to move on to the next wiki step. I crossed my fingers
and tried md0
diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 $OVERLAYS
mdadm: failed to add /dev/mapper/sdd1 to /dev/md0: Invalid argument
mdadm: failed to add /dev/mapper/sdc1 to /dev/md0: Invalid argument
mdadm: failed to add /dev/mapper/sda1 to /dev/md0: Invalid argument
mdadm: failed to add /dev/mapper/sdb1 to /dev/md0: Invalid argument
mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument
diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 --verbose $OVERLAYS
mdadm: looking for devices for /dev/md0
mdadm: /dev/mapper/sda1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/mapper/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/mapper/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/mapper/sdd1 is identified as a member of /dev/md0, slot 1.
mdadm: failed to add /dev/mapper/sdd1 to /dev/md0: Invalid argument
mdadm: failed to add /dev/mapper/sdc1 to /dev/md0: Invalid argument
mdadm: failed to add /dev/mapper/sda1 to /dev/md0: Invalid argument
mdadm: failed to add /dev/mapper/sdb1 to /dev/md0: Invalid argument
mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument
diskfarm:root:13:/mnt/scratch/disks> mdadm --detail /dev/md0
mdadm: cannot open /dev/md0: No such file or directory
and STILL got nowhere. It was at this point that I figured I need to
back away and call for help! I don't want to try rebuilding the actual
array in case it's out of sync and I lose data.
Soooooo... There it is. Any suggestions to correct whatever oops I've
made or complete a step I overlooked? Any ideas why my assemble didn't?
TIA & HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-20 20:05 failed disks, mapper, and "Invalid argument" David T-G
@ 2020-05-20 23:23 ` Wols Lists
2020-05-20 23:53 ` David T-G
0 siblings, 1 reply; 24+ messages in thread
From: Wols Lists @ 2020-05-20 23:23 UTC (permalink / raw
To: David T-G, Linux RAID list
On 20/05/20 21:05, David T-G wrote:
> Hi, all --
>
> I have a four-partition RAID5 array of which one disk failed while I was
> out of town and a second failed just today. Both failed smartctl tests
> by not even starting, although I don't have that captured. Those two
> were on a SATA daughtercard, so I swapped them (formerly sde, sdf)
> up to the motherboard SATA ports like the other two (still sda, sdb) and
> now all are visible and happily pass smartctl checks and generally look
> good ... except that my md0 doesn't :-(
>
> I've been through the wiki and other found documentation and have scraped
> the archives, but the whole mapper thing is new to me, and I don't know
> enough to pin down the error. I've been attempting to fake-build my
> array with overlay devices to see how it will do. Please forgive the
> long post if it's a bit ridiculous; I wanted to make sure that you have
> all information :-)
https://raid.wiki.kernel.org/index.php/Asking_for_help
Hate to say it, but if you've found the wiki, there's an awful lot of
info missing from this post ...
>
> Here's the array after I swapped ports and booted up:
>
> diskfarm:root:10:~> mdadm --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Mon Feb 6 00:56:35 2017
> Raid Level : raid5
> Used Dev Size : 4294967295
> Raid Devices : 4
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Mon May 18 01:10:07 2020
> State : active, FAILED, Not Started
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Name : diskfarm:0 (local to host diskfarm)
> UUID : ca7008ef:90693dae:6c231ad7:08b3f92d
> Events : 57840
>
> Number Major Minor RaidDevice State
> 0 8 17 0 active sync /dev/sdb1
> - 0 0 1 removed
> - 0 0 2 removed
> 4 8 1 3 active sync /dev/sda1
>
>
> diskfarm:root:10:~> mdadm --examine /dev/sd[abcd]1 | egrep '/dev|vents'
> /dev/sda1:
> Events : 57840
> /dev/sdb1:
> Events : 57840
> /dev/sdc1:
> Events : 57836
> /dev/sdd1:
> Events : 48959
>
> I'd say sdd is the former sde that went away first and sdc that was sdf
> only just fell over.
Okay, you DON'T want to include sdd in your attempts - sdc is only 4
events behind so if you can assemble those three, you'll be almost
perfect ...
>
> In my first round, I shut down md0
>
> diskfarm:root:12:~> mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> diskfarm:root:12:~> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md127 : active raid5 sdf2[0] sdg2[1] sdh2[3]
> 1464622080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
>
> unused devices: <none>
>
> and of course it isn't in mdstat any more. Oops. But it's down, so we
> won't see any more writes that could be messy.
>
> I whipped up four loop devices and created overlay files
>
> diskfarm:root:13:/mnt/scratch/disks> parallel truncate -s8G overlay-{/} ::: $DEVICES
> ...
> To silence this citation notice: run 'parallel --citation'.
>
> diskfarm:root:13:/mnt/scratch/disks> ls -goh
> total 33M
> -rw-r--r-- 1 8.0G May 20 14:00 overlay-sda1
> -rw-r--r-- 1 8.0G May 20 14:00 overlay-sdb1
> -rw-r--r-- 1 8.0G May 20 14:00 overlay-sdc1
> -rw-r--r-- 1 8.0G May 20 14:00 overlay-sdd1
> -rw-r--r-- 1 11K May 20 13:20 smartctl-a.sda.out
> -rw-r--r-- 1 5.3K May 20 13:20 smartctl-a.sdb.out
> -rw-r--r-- 1 5.3K May 20 13:20 smartctl-a.sdc.out
> -rw-r--r-- 1 5.3K May 20 13:20 smartctl-a.sdd.out
>
> diskfarm:root:13:/mnt/scratch/disks> du -skhc overlay-sd*
> 8.0M overlay-sda1
> 8.0M overlay-sdb1
> 8.0M overlay-sdc1
> 8.0M overlay-sdd1
> 32M total
>
> diskfarm:root:13:/mnt/scratch/disks> ls -goh /dev/mapper/*
> crw------- 1 10, 236 May 20 08:04 /dev/mapper/control
> lrwxrwxrwx 1 7 May 20 14:02 /dev/mapper/sda1 -> ../dm-1
> lrwxrwxrwx 1 7 May 20 14:02 /dev/mapper/sdb1 -> ../dm-0
> lrwxrwxrwx 1 7 May 20 14:02 /dev/mapper/sdc1 -> ../dm-2
> lrwxrwxrwx 1 7 May 20 14:02 /dev/mapper/sdd1 -> ../dm-3
>
> and grabbed my overlays and checked the mapper
>
> diskfarm:root:13:/mnt/scratch/disks> OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
> diskfarm:root:13:/mnt/scratch/disks> echo $OVERLAYS
> /dev/mapper/sda1 /dev/mapper/sdb1 /dev/mapper/sdc1 /dev/mapper/sdd1
> diskfarm:root:13:/mnt/scratch/disks> dmsetup status
> sdb1: 0 3518805647 snapshot 16/16777216 16
> sdc1: 0 3518805647 snapshot 16/16777216 16
> sda1: 0 3518805647 snapshot 16/16777216 16
> sdd1: 0 3518805647 snapshot 16/16777216 16
>
> and so far it looks good ... as far as I know :-)
>
> I didn't know if I should try md0, the real array name, or create a new
> md1, so I took the safe approach first
>
> diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md1 $OVERLAYS
> mdadm: forcing event count in /dev/mapper/sdc1(2) from 57836 upto 57840
> mdadm: clearing FAULTY flag for device 2 in /dev/md1 for /dev/mapper/sdc1
> mdadm: Marking array /dev/md1 as 'clean'
> mdadm: failed to add /dev/mapper/sdd1 to /dev/md1: Invalid argument
> mdadm: failed to add /dev/mapper/sdc1 to /dev/md1: Invalid argument
> mdadm: failed to add /dev/mapper/sda1 to /dev/md1: Invalid argument
> mdadm: failed to add /dev/mapper/sdb1 to /dev/md1: Invalid argument
> mdadm: failed to RUN_ARRAY /dev/md1: Invalid argument
>
> diskfarm:root:13:/mnt/scratch/disks> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md127 : active raid5 sdf2[0] sdg2[1] sdh2[3]
> 1464622080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
>
> unused devices: <none>
>
> diskfarm:root:13:/mnt/scratch/disks> mdadm --examine /dev/md1
> mdadm: cannot open /dev/md1: No such file or directory
>
> but didn't fet to move on to the next wiki step. I crossed my fingers
> and tried md0
>
> diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 $OVERLAYS
> mdadm: failed to add /dev/mapper/sdd1 to /dev/md0: Invalid argument
> mdadm: failed to add /dev/mapper/sdc1 to /dev/md0: Invalid argument
> mdadm: failed to add /dev/mapper/sda1 to /dev/md0: Invalid argument
> mdadm: failed to add /dev/mapper/sdb1 to /dev/md0: Invalid argument
> mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument
>
> diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 --verbose $OVERLAYS
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/mapper/sda1 is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/mapper/sdb1 is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/mapper/sdc1 is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/mapper/sdd1 is identified as a member of /dev/md0, slot 1.
> mdadm: failed to add /dev/mapper/sdd1 to /dev/md0: Invalid argument
> mdadm: failed to add /dev/mapper/sdc1 to /dev/md0: Invalid argument
> mdadm: failed to add /dev/mapper/sda1 to /dev/md0: Invalid argument
> mdadm: failed to add /dev/mapper/sdb1 to /dev/md0: Invalid argument
> mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument
>
> diskfarm:root:13:/mnt/scratch/disks> mdadm --detail /dev/md0
> mdadm: cannot open /dev/md0: No such file or directory
>
> and STILL got nowhere. It was at this point that I figured I need to
> back away and call for help! I don't want to try rebuilding the actual
> array in case it's out of sync and I lose data.
>
> Soooooo... There it is. Any suggestions to correct whatever oops I've
> made or complete a step I overlooked? Any ideas why my assemble didn't?
>
What I *always* jump on ...
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
You don't have any of these drives you shouldn't?
I'll let someone else play about with all the device mapper stuff, I'm
only just getting in to it, but as I say, drop sdd and you should get
your array back with pretty much no corruption. Adding sdd runs the risk
of corrupting much more ...
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-20 23:23 ` Wols Lists
@ 2020-05-20 23:53 ` David T-G
2020-05-21 8:09 ` Wols Lists
2020-05-21 8:13 ` failed disks, mapper, and "Invalid argument" Wols Lists
0 siblings, 2 replies; 24+ messages in thread
From: David T-G @ 2020-05-20 23:53 UTC (permalink / raw
To: Linux RAID list
Wols, et al --
...and then Wols Lists said...
%
% On 20/05/20 21:05, David T-G wrote:
% >
% > I have a four-partition RAID5 array of which one disk failed while I was
% > out of town and a second failed just today. Both failed smartctl tests
...
% >
% > I've been through the wiki and other found documentation and have scraped
...
% > long post if it's a bit ridiculous; I wanted to make sure that you have
% > all information :-)
%
% https://raid.wiki.kernel.org/index.php/Asking_for_help
Yep. Tried almost all of those things, too. [I don't git much, although
I'd like to in my copious free time, so I didn't bother to suck down
lsdrv and run it.]
%
% Hate to say it, but if you've found the wiki, there's an awful lot of
% info missing from this post ...
I'll take that. I never said I knew what I was doing :-) Aaaaand ...
after 8+ hours at this, now I see
If they don't, post a description of your problem, accompanied by the
output of all those commands, to the mailing list.
down at the very bottom. Yup, I missed it :-)
% >
% > Here's the array after I swapped ports and booted up:
% >
% > diskfarm:root:10:~> mdadm --detail /dev/md0
...
% >
% > diskfarm:root:10:~> mdadm --examine /dev/sd[abcd]1 | egrep '/dev|vents'
...
% >
% > I'd say sdd is the former sde that went away first and sdc that was sdf
% > only just fell over.
%
% Okay, you DON'T want to include sdd in your attempts - sdc is only 4
% events behind so if you can assemble those three, you'll be almost
% perfect ...
The easy answer didn't work :-(
diskfarm:root:13:/mnt/scratch/disks> OVERLAYS='/dev/mapper/sda1 /dev/mapper/sdb1 /dev/mapper/sdc1'
diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 --verbose $OVERLAYS
mdadm: looking for devices for /dev/md0
mdadm: /dev/mapper/sda1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/mapper/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/mapper/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: failed to add /dev/mapper/sdc1 to /dev/md0: Invalid argument
mdadm: failed to add /dev/mapper/sda1 to /dev/md0: Invalid argument
mdadm: failed to add /dev/mapper/sdb1 to /dev/md0: Invalid argument
mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument
It looks
diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 --verbose /dev/sda1 /dev/sdb1 /dev/sdc1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is busy - skipping
mdadm: /dev/sdb1 is busy - skipping
mdadm: /dev/sdc1 is busy - skipping
like the overlay is keeping me from the raw devices, so I'd have to tear
down all of that to try the real thing. I'll h old off on that...
% >
...
% > I whipped up four loop devices and created overlay files
% >
% > diskfarm:root:13:/mnt/scratch/disks> parallel truncate -s8G overlay-{/} ::: $DEVICES
...
% > and tried md0
% >
% > diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 $OVERLAYS
...
% > and STILL got nowhere. It was at this point that I figured I need to
% > back away and call for help! I don't want to try rebuilding the actual
% > array in case it's out of sync and I lose data.
% >
% > Soooooo... There it is. Any suggestions to correct whatever oops I've
% > made or complete a step I overlooked? Any ideas why my assemble didn't?
% >
% What I *always* jump on ...
%
% https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
%
% You don't have any of these drives you shouldn't?
These are too old to be SMR, but they are pretty basic:
diskfarm:root:11:/mnt/scratch/disks> for D in sd{a,b,c,d} ; do echo '## parted' ; parted /dev/$D print | egrep "Model|$D" ; echo '## Version' ; smartctl -a /dev/$D | egrep 'Version|SCT' ; echo '## scterc' ; smartctl -l scterc /dev/$D | egrep SCT ; echo '' ; done
## parted
Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sda: 4001GB
## Version
Firmware Version: CC52
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
SCT capabilities: (0x1085) SCT Status supported.
SMART Error Log Version: 1
## scterc
SCT Error Recovery Control command not supported
## parted
Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sdb: 4001GB
## Version
Firmware Version: CC54
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
SCT capabilities: (0x1085) SCT Status supported.
SMART Error Log Version: 1
## scterc
SCT Error Recovery Control command not supported
## parted
Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sdc: 4001GB
## Version
Firmware Version: CC54
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
SCT capabilities: (0x1085) SCT Status supported.
SMART Error Log Version: 1
## scterc
SCT Error Recovery Control command not supported
## parted
Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sdd: 4001GB
## Version
Firmware Version: CC54
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
SCT capabilities: (0x1085) SCT Status supported.
SMART Error Log Version: 1
## scterc
SCT Error Recovery Control command not supported
Curiously, note that querying just scterc as the wiki instructs says "not
supported", but a general smartctl query says yes. I'm not sure how to
interpret this...
%
% I'll let someone else play about with all the device mapper stuff, I'm
% only just getting in to it, but as I say, drop sdd and you should get
% your array back with pretty much no corruption. Adding sdd runs the risk
% of corrupting much more ...
I could believe that; thanks. But we still aren't up on three.
Here is everything from the Asking page:
diskfarm:root:11:/mnt/scratch/disks> for D in sd{a,b,c,d} ; do smartctl --xall /dev/$D >smartctl--xall.$D.out ; done
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.16.5-64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Desktop HDD.15
Device Model: ST4000DM000-1F2168
Serial Number: W300EYNA
LU WWN Device Id: 5 000c50 069a8d76f
Firmware Version: CC52
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed May 20 19:43:02 2020 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 602) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 535) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 117 072 006 - 166858032
3 Spin_Up_Time PO---- 092 091 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 230
5 Reallocated_Sector_Ct PO--CK 099 099 010 - 1496
7 Seek_Error_Rate POSR-- 077 060 030 - 21729719097
9 Power_On_Hours -O--CK 036 036 000 - 56266
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 232
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 096 096 099 NOW 4
187 Reported_Uncorrect -O--CK 001 001 000 - 8955
188 Command_Timeout -O--CK 100 064 000 - 10 50 50
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 054 044 045 Past 46 (Min/Max 39/46 #2)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 167
193 Load_Cycle_Count -O--CK 024 024 000 - 152866
194 Temperature_Celsius -O---K 046 056 000 - 46 (0 17 0 0 0)
197 Current_Pending_Sector -O--C- 089 076 000 - 1944
198 Offline_Uncorrectable ----C- 089 076 000 - 1944
199 UDMA_CRC_Error_Count -OSRCK 200 001 000 - 2012
240 Head_Flying_Hours ------ 100 253 000 - 30059h+12m+46.217s
241 Total_LBAs_Written ------ 100 253 000 - 43386440059
242 Total_LBAs_Read ------ 100 253 000 - 431627548432
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 5 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x24 GPL R/O 1223 Current Device Internal Status Data log
0x25 GPL R/O 1223 Saved Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa1 GPL,SL VS 20 Device vendor specific log
0xa2 GPL VS 4496 Device vendor specific log
0xa8 GPL,SL VS 129 Device vendor specific log
0xa9 GPL,SL VS 1 Device vendor specific log
0xab GPL VS 1 Device vendor specific log
0xb0 GPL VS 5176 Device vendor specific log
0xbe-0xbf GPL VS 65535 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL,SL VS 10 Device vendor specific log
0xc3 GPL,SL VS 8 Device vendor specific log
0xc4 GPL,SL VS 5 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 8958 (device log contains only the most recent 20 errors)
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 8958 [17] occurred at disk power-on lifetime: 55189 hours (2299 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4d 52 79 60 00 00 Error: UNC at LBA = 0x14d527960 = 5592217952
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 02 c0 00 01 4d 54 95 78 40 00 29d+01:34:04.331 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 54 90 38 40 00 29d+01:34:04.330 READ FPDMA QUEUED
60 00 00 02 c0 00 01 4d 54 8d 78 40 00 29d+01:34:04.309 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 54 88 38 40 00 29d+01:34:04.308 READ FPDMA QUEUED
60 00 00 02 c0 00 01 4d 54 85 78 40 00 29d+01:34:04.288 READ FPDMA QUEUED
Error 8957 [16] occurred at disk power-on lifetime: 55189 hours (2299 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4d 52 73 c0 00 00 Error: UNC at LBA = 0x14d5273c0 = 5592216512
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 00 08 00 01 4d 52 73 f0 40 00 29d+01:33:59.553 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 52 73 e8 40 00 29d+01:33:59.543 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 52 73 e0 40 00 29d+01:33:59.532 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 52 73 d8 40 00 29d+01:33:59.532 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 52 73 d0 40 00 29d+01:33:59.523 READ FPDMA QUEUED
Error 8956 [15] occurred at disk power-on lifetime: 55189 hours (2299 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4d 52 1a 40 00 00 Error: UNC at LBA = 0x14d521a40 = 5592193600
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 00 08 00 01 4d 52 1b 30 40 00 29d+01:33:55.394 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 52 1b 28 40 00 29d+01:33:55.393 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 52 1b 20 40 00 29d+01:33:55.383 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 52 1b 18 40 00 29d+01:33:55.383 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 52 1b 10 40 00 29d+01:33:55.383 READ FPDMA QUEUED
Error 8955 [14] occurred at disk power-on lifetime: 55189 hours (2299 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4d 52 79 60 00 00 Error: UNC at LBA = 0x14d527960 = 5592217952
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 05 40 00 01 4d 52 b3 d0 40 00 29d+01:33:46.322 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 52 ae 90 40 00 29d+01:33:46.086 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 52 a9 50 40 00 29d+01:33:45.890 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 52 a4 10 40 00 29d+01:33:45.890 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 52 9e d0 40 00 29d+01:33:45.506 READ FPDMA QUEUED
Error 8954 [13] occurred at disk power-on lifetime: 55189 hours (2299 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4d 52 73 c0 00 00 Error: UNC at LBA = 0x14d5273c0 = 5592216512
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 05 40 00 01 4d 52 94 50 40 00 29d+01:33:41.717 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 52 54 50 40 00 29d+01:33:41.717 READ FPDMA QUEUED
60 00 00 02 c0 00 01 4d 52 59 90 40 00 29d+01:33:41.716 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 52 5c 50 40 00 29d+01:33:41.716 READ FPDMA QUEUED
60 00 00 02 c0 00 01 4d 52 61 90 40 00 29d+01:33:41.716 READ FPDMA QUEUED
Error 8953 [12] occurred at disk power-on lifetime: 55189 hours (2299 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4d 52 1a 40 00 00 Error: UNC at LBA = 0x14d521a40 = 5592193600
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 02 c0 00 01 4d 52 49 90 40 00 29d+01:33:38.068 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 52 44 50 40 00 29d+01:33:38.068 READ FPDMA QUEUED
60 00 00 02 c0 00 01 4d 52 41 90 40 00 29d+01:33:38.068 READ FPDMA QUEUED
60 00 00 05 40 00 01 4d 52 3c 50 40 00 29d+01:33:38.068 READ FPDMA QUEUED
60 00 00 02 c0 00 01 4d 52 39 90 40 00 29d+01:33:38.068 READ FPDMA QUEUED
Error 8952 [11] occurred at disk power-on lifetime: 55189 hours (2299 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4d 48 ac b0 00 00 Error: UNC at LBA = 0x14d48acb0 = 5591575728
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 00 08 00 01 4d 48 ad a0 40 00 29d+01:33:29.016 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 48 ad 98 40 00 29d+01:33:29.005 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 48 ad 90 40 00 29d+01:33:29.004 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 48 ad 88 40 00 29d+01:33:28.995 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 48 ad 80 40 00 29d+01:33:28.995 READ FPDMA QUEUED
Error 8951 [10] occurred at disk power-on lifetime: 55189 hours (2299 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4d 48 a4 60 00 00 Error: UNC at LBA = 0x14d48a460 = 5591573600
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 00 00 08 00 01 4d 48 a5 50 40 00 29d+01:33:24.927 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 48 a5 48 40 00 29d+01:33:24.916 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 48 a5 40 40 00 29d+01:33:24.907 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 48 a5 38 40 00 29d+01:33:24.907 READ FPDMA QUEUED
60 00 00 00 08 00 01 4d 48 a5 30 40 00 29d+01:33:24.907 READ FPDMA QUEUED
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 54480 66032
# 2 Short offline Completed: read failure 90% 54456 66032
# 3 Extended offline Completed without error 00% 18918 -
# 4 Short offline Completed without error 00% 18909 -
# 5 Extended captive Completed without error 00% 17667 -
# 6 Short captive Completed without error 00% 17659 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 45 Celsius
Power Cycle Min/Max Temperature: 39/45 Celsius
Lifetime Min/Max Temperature: 17/55 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Data Table command not supported
SCT Error Recovery Control command not supported
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 4 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.16.5-64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Desktop HDD.15
Device Model: ST4000DM000-1F2168
Serial Number: Z3035ZY3
LU WWN Device Id: 5 000c50 07a720d6c
Firmware Version: CC54
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed May 20 19:43:03 2020 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 107) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 503) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 117 099 006 - 134186720
3 Spin_Up_Time PO---- 092 092 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 58
5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
7 Seek_Error_Rate POSR-- 083 060 030 - 230358180
9 Power_On_Hours -O--CK 068 068 000 - 28032
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 58
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 100 000 - 0 0 0
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 056 047 045 - 44 (Min/Max 38/44)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 0
193 Load_Cycle_Count -O--CK 042 042 000 - 117617
194 Temperature_Celsius -O---K 044 053 000 - 44 (0 19 0 0 0)
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 10752h+43m+59.647s
241 Total_LBAs_Written ------ 100 253 000 - 24192515816
242 Total_LBAs_Read ------ 100 253 000 - 2898959142485
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 5 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x24 GPL R/O 1223 Current Device Internal Status Data log
0x25 GPL R/O 1223 Saved Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa1 GPL,SL VS 20 Device vendor specific log
0xa2 GPL VS 4496 Device vendor specific log
0xa8 GPL,SL VS 129 Device vendor specific log
0xa9 GPL,SL VS 1 Device vendor specific log
0xab GPL VS 1 Device vendor specific log
0xb0 GPL VS 5176 Device vendor specific log
0xbe-0xbf GPL VS 65535 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL,SL VS 10 Device vendor specific log
0xc3 GPL,SL VS 8 Device vendor specific log
0xc4 GPL,SL VS 5 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 26246 -
# 2 Short offline Completed without error 00% 26222 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 44 Celsius
Power Cycle Min/Max Temperature: 38/44 Celsius
Lifetime Min/Max Temperature: 19/53 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Data Table command not supported
SCT Error Recovery Control command not supported
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 2) ==
0x01 0x008 4 58 --- Lifetime Power-On Resets
0x01 0x010 4 28032 --- Power-on Hours
0x01 0x018 6 24186897631 --- Logical Sectors Written
0x01 0x020 6 83891648 --- Number of Write Commands
0x01 0x028 6 291514196745 --- Logical Sectors Read
0x01 0x030 6 1032191692 --- Number of Read Commands
0x01 0x038 6 - --- Date and Time TimeStamp
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 28032 --- Spindle Motor Power-on Hours
0x03 0x010 4 7604 --- Head Flying Hours
0x03 0x018 4 117617 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 0 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 0 --- Resets Between Cmd Acceptance and Completion
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 4 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.16.5-64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Desktop HDD.15
Device Model: ST4000DM000-1F2168
Serial Number: Z3035YD9
LU WWN Device Id: 5 000c50 07a7290ae
Firmware Version: CC54
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed May 20 19:43:03 2020 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 117) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 497) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 119 099 006 - 208585728
3 Spin_Up_Time PO---- 092 092 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 58
5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
7 Seek_Error_Rate POSR-- 084 060 030 - 274639681
9 Power_On_Hours -O--CK 069 069 000 - 28031
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 58
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 100 000 - 0 0 0
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 057 048 045 - 43 (Min/Max 39/43)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 0
193 Load_Cycle_Count -O--CK 042 042 000 - 117317
194 Temperature_Celsius -O---K 043 052 000 - 43 (0 19 0 0 0)
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 196 000 - 28238
240 Head_Flying_Hours ------ 100 253 000 - 10748h+14m+07.951s
241 Total_LBAs_Written ------ 100 253 000 - 32001521506
242 Total_LBAs_Read ------ 100 253 000 - 1283277590183
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 5 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x24 GPL R/O 1223 Current Device Internal Status Data log
0x25 GPL R/O 1223 Saved Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa1 GPL,SL VS 20 Device vendor specific log
0xa2 GPL VS 4496 Device vendor specific log
0xa8 GPL,SL VS 129 Device vendor specific log
0xa9 GPL,SL VS 1 Device vendor specific log
0xab GPL VS 1 Device vendor specific log
0xb0 GPL VS 5176 Device vendor specific log
0xbe-0xbf GPL VS 65535 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL,SL VS 10 Device vendor specific log
0xc3 GPL,SL VS 8 Device vendor specific log
0xc4 GPL,SL VS 5 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 26246 -
# 2 Short offline Completed without error 00% 26222 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 43 Celsius
Power Cycle Min/Max Temperature: 39/43 Celsius
Lifetime Min/Max Temperature: 19/52 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Data Table command not supported
SCT Error Recovery Control command not supported
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 2) ==
0x01 0x008 4 58 --- Lifetime Power-On Resets
0x01 0x010 4 28031 --- Power-on Hours
0x01 0x018 6 32001349231 --- Logical Sectors Written
0x01 0x020 6 88313724 --- Number of Write Commands
0x01 0x028 6 276446233896 --- Logical Sectors Read
0x01 0x030 6 677150509 --- Number of Read Commands
0x01 0x038 6 - --- Date and Time TimeStamp
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 28031 --- Spindle Motor Power-on Hours
0x03 0x010 4 7500 --- Head Flying Hours
0x03 0x018 4 117317 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 0 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 0 --- Resets Between Cmd Acceptance and Completion
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 4 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.16.5-64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Desktop HDD.15
Device Model: ST4000DM000-1F2168
Serial Number: Z3037GC5
LU WWN Device Id: 5 000c50 07a8050a3
Firmware Version: CC54
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed May 20 19:43:03 2020 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 117) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 493) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 110 099 006 - 25174232
3 Spin_Up_Time PO---- 092 091 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 61
5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
7 Seek_Error_Rate POSR-- 084 060 030 - 268138154
9 Power_On_Hours -O--CK 069 069 000 - 27636
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 61
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 099 000 - 1 1 1
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 059 052 045 - 41 (Min/Max 38/41)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 2
193 Load_Cycle_Count -O--CK 043 043 000 - 114940
194 Temperature_Celsius -O---K 041 048 000 - 41 (0 18 0 0 0)
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 188 000 - 28116
240 Head_Flying_Hours ------ 100 253 000 - 10650h+15m+10.785s
241 Total_LBAs_Written ------ 100 253 000 - 24437975895
242 Total_LBAs_Read ------ 100 253 000 - 1681117138889
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 5 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x24 GPL R/O 1223 Current Device Internal Status Data log
0x25 GPL R/O 1223 Saved Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa1 GPL,SL VS 20 Device vendor specific log
0xa2 GPL VS 4496 Device vendor specific log
0xa8 GPL,SL VS 129 Device vendor specific log
0xa9 GPL,SL VS 1 Device vendor specific log
0xab GPL VS 1 Device vendor specific log
0xb0 GPL VS 5176 Device vendor specific log
0xbe-0xbf GPL VS 65535 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL,SL VS 10 Device vendor specific log
0xc3 GPL,SL VS 8 Device vendor specific log
0xc4 GPL,SL VS 5 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 26248 -
# 2 Short offline Completed without error 00% 26224 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 41 Celsius
Power Cycle Min/Max Temperature: 39/41 Celsius
Lifetime Min/Max Temperature: 18/48 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Data Table command not supported
SCT Error Recovery Control command not supported
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 2) ==
0x01 0x008 4 61 --- Lifetime Power-On Resets
0x01 0x010 4 27636 --- Power-on Hours
0x01 0x018 6 24437103178 --- Logical Sectors Written
0x01 0x020 6 78600744 --- Number of Write Commands
0x01 0x028 6 283564947845 --- Logical Sectors Read
0x01 0x030 6 696278294 --- Number of Read Commands
0x01 0x038 6 - --- Date and Time TimeStamp
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 27636 --- Spindle Motor Power-on Hours
0x03 0x010 4 7490 --- Head Flying Hours
0x03 0x018 4 114940 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 0 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 1 --- Resets Between Cmd Acceptance and Completion
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 4 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
diskfarm:root:11:/mnt/scratch/disks> for D in sd{a,b,c,d} ; do mdadm --examine /dev/${D} >mdadm--examine.${D}.out ; mdadm --examine /dev/${D}1 >mdadm--examine.${D}1.out ; done
/dev/sda:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ca7008ef:90693dae:6c231ad7:08b3f92d
Name : diskfarm:0 (local to host diskfarm)
Creation Time : Mon Feb 6 00:56:35 2017
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 7813510799 (3725.77 GiB 4000.52 GB)
Array Size : 11720265216 (11177.32 GiB 12001.55 GB)
Used Dev Size : 7813510144 (3725.77 GiB 4000.52 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=655 sectors
State : clean
Device UUID : f05a143b:50c9b024:36714b9a:44b6a159
Update Time : Mon May 18 01:10:07 2020
Checksum : 48106c75 - correct
Events : 57840
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : A..A ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ca7008ef:90693dae:6c231ad7:08b3f92d
Name : diskfarm:0 (local to host diskfarm)
Creation Time : Mon Feb 6 00:56:35 2017
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 7813510799 (3725.77 GiB 4000.52 GB)
Array Size : 11720265216 (11177.32 GiB 12001.55 GB)
Used Dev Size : 7813510144 (3725.77 GiB 4000.52 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=655 sectors
State : clean
Device UUID : bbcf5aff:e4a928b8:4fd788c2:c3f298da
Update Time : Mon May 18 01:10:07 2020
Checksum : 49035472 - correct
Events : 57840
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : A..A ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ca7008ef:90693dae:6c231ad7:08b3f92d
Name : diskfarm:0 (local to host diskfarm)
Creation Time : Mon Feb 6 00:56:35 2017
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 7813510799 (3725.77 GiB 4000.52 GB)
Array Size : 11720265216 (11177.32 GiB 12001.55 GB)
Used Dev Size : 7813510144 (3725.77 GiB 4000.52 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=655 sectors
State : clean
Device UUID : c0a32425:2d206e98:78f9c264:d39e9720
Update Time : Mon May 18 01:03:28 2020
Checksum : 374f6d76 - correct
Events : 57836
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : A.AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0xa
Array UUID : ca7008ef:90693dae:6c231ad7:08b3f92d
Name : diskfarm:0 (local to host diskfarm)
Creation Time : Mon Feb 6 00:56:35 2017
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 7813510799 (3725.77 GiB 4000.52 GB)
Array Size : 11720265216 (11177.32 GiB 12001.55 GB)
Used Dev Size : 7813510144 (3725.77 GiB 4000.52 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Recovery Offset : 210494872 sectors
Unused Space : before=261864 sectors, after=655 sectors
State : clean
Device UUID : a1109a7b:abd58fc5:89313c87:232df49b
Update Time : Sun May 3 23:03:44 2020
Bad Block Log : 512 entries available at offset 264 sectors - bad blocks present.
Checksum : 65408715 - correct
Events : 48959
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
diskfarm:root:11:/mnt/scratch/disks> mdadm --detail /dev/md0 >mdadm--detail.md0.out
/dev/md0:
Version :
Raid Level : raid0
Total Devices : 0
State : inactive
Number Major Minor RaidDevice
diskfarm:root:11:/mnt/scratch/disks> cat /proc/mdstat >mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid5 sdf2[0] sdg2[1] sdh2[3]
1464622080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
diskfarm:root:11:/mnt/scratch/disks> ./lsdrv/lsdrv >lsdrv.out 2>&1
Traceback (most recent call last):
File "./lsdrv/lsdrv", line 423, in <module>
probe_block('/sys/block/'+x)
File "./lsdrv/lsdrv", line 419, in probe_block
probe_block(blkpath+'/'+part)
File "./lsdrv/lsdrv", line 399, in probe_block
blk.FS = "MD %s (%s/%s)%s %s" % (blk.array.md.LEVEL, blk.slave.slot, blk.array.md.raid_disks, peers, blk.slave.state)
AttributeError: 'NoneType' object has no attribute 'LEVEL'
Thank you all SO VERY MUCH. Guide me!
%
% Cheers,
% Wol
%
HANN
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-20 23:53 ` David T-G
@ 2020-05-21 8:09 ` Wols Lists
2020-05-21 11:01 ` David T-G
2020-05-21 11:01 ` failed disks, mapper, and "Invalid argument" David T-G
2020-05-21 8:13 ` failed disks, mapper, and "Invalid argument" Wols Lists
1 sibling, 2 replies; 24+ messages in thread
From: Wols Lists @ 2020-05-21 8:09 UTC (permalink / raw
To: David T-G, Linux RAID list; +Cc: Phil Turmel
On 21/05/20 00:53, David T-G wrote:
> ## parted
> Model: ATA ST4000DM000-1F21 (scsi)
> Disk /dev/sdd: 4001GB
> ## Version
> Firmware Version: CC54
> ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
> SCT capabilities: (0x1085) SCT Status supported.
> SMART Error Log Version: 1
> ## scterc
> SCT Error Recovery Control command not supported
>
> Curiously, note that querying just scterc as the wiki instructs says "not
> supported", but a general smartctl query says yes. I'm not sure how to
> interpret this...
Seagate Barracudas :-(
As for smartctl, you're asking two different things. Firstly is SCT
supported (yes). Secondly, is the ERC feature supported (no).
And that second question is the killer. Your drives do not support error
recovery. Plan to replace them with ones that do ASAP!
I'm currently running on two 3TB Barracudas mirrored. I've finally got
around to building a system with two 4TB Ironwolves to replace them. You
need to think about the same.
In the meantime, make sure you're running Brad's script, and watch out
for any hint of lengthening read/write times. That's unlikely to be why
your overlay drives won't mount - I suspect a problem with loopback, but
I don't know.
What I don't want to advise, but I strongly suspect will work, is to
force-assemble the two good drives and the nearly-good drive. Because it
has no redundancy it won't scramble your data because it can't do a
rebuild, but I would VERY STRONGLY suggest you download lsdrv and get
the output. The whole point of this script is to get the information you
need so that if everything does go pear shaped, you can rebuild the
metadata from first principles. It's easy - git clone, run.
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-20 23:53 ` David T-G
2020-05-21 8:09 ` Wols Lists
@ 2020-05-21 8:13 ` Wols Lists
2020-05-21 11:04 ` David T-G
1 sibling, 1 reply; 24+ messages in thread
From: Wols Lists @ 2020-05-21 8:13 UTC (permalink / raw
To: David T-G, Linux RAID list
On 21/05/20 00:53, David T-G wrote:
> diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 --verbose /dev/sda1 /dev/sdb1 /dev/sdc1
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sda1 is busy - skipping
> mdadm: /dev/sdb1 is busy - skipping
> mdadm: /dev/sdc1 is busy - skipping
>
> like the overlay is keeping me from the raw devices, so I'd have to tear
> down all of that to try the real thing. I'll h old off on that...
Did you do an mdadm --stop before trying the force assemble? That
implies to me you've got the remnants of a previous attempt lying around...
Not sure which command it is - "cat /proc/mdstat" maybe, but make sure
ALL your arrays are stopped (unless you know they are running okay)
before you try stuff.
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-21 8:09 ` Wols Lists
@ 2020-05-21 11:01 ` David T-G
2020-05-21 11:55 ` Wols Lists
2020-05-21 11:01 ` failed disks, mapper, and "Invalid argument" David T-G
1 sibling, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-21 11:01 UTC (permalink / raw
To: Linux RAID list; +Cc: Phil Turmel
Wols, et al --
...and then Wols Lists said...
%
% On 21/05/20 00:53, David T-G wrote:
% > ## parted
% > Model: ATA ST4000DM000-1F21 (scsi)
...
% > SCT capabilities: (0x1085) SCT Status supported.
% > SMART Error Log Version: 1
% > ## scterc
% > SCT Error Recovery Control command not supported
% >
% > Curiously, note that querying just scterc as the wiki instructs says "not
% > supported", but a general smartctl query says yes. I'm not sure how to
% > interpret this...
%
% Seagate Barracudas :-(
Yep. They were good "back in the day" ...
%
% As for smartctl, you're asking two different things. Firstly is SCT
% supported (yes). Secondly, is the ERC feature supported (no).
%
% And that second question is the killer. Your drives do not support error
% recovery. Plan to replace them with ones that do ASAP!
That would be nice. I actually have wanted for quite some time
to grow these from 4T to 8T, but budget hasn't permitted. Got any
particularly-affordable recommendations?
This whole problem sounds familiar to me. I thought that it was possible
to adjust the timeouts on the software side to match the longer disk time
or similar. Of course, I didn't know that I had a real problem in the
first place ... But does that sound familiar to anyone?
%
...
%
% In the meantime, make sure you're running Brad's script, and watch out
% for any hint of lengthening read/write times. That's unlikely to be why
% your overlay drives won't mount - I suspect a problem with loopback, but
% I don't know.
I most definitely also want to be able to spot trends to get ahead of
failures. I just don't know for what to look or how to parse it to write
a script that will say "hey, this thingie here is growing, and you said
you cared ...".
%
% What I don't want to advise, but I strongly suspect will work, is to
% force-assemble the two good drives and the nearly-good drive. Because it
% has no redundancy it won't scramble your data because it can't do a
Should I, then, get rid of the mapper overlay stuff? I tried pointing to
even just three devs and got that they're busy.
% rebuild, but I would VERY STRONGLY suggest you download lsdrv and get
% the output. The whole point of this script is to get the information you
You mean the output that is some error and a few lines of traceback?
Yeah, I saw that, but I don't know how to fix it. Another problem in the
queue.
% need so that if everything does go pear shaped, you can rebuild the
% metadata from first principles. It's easy - git clone, run.
... and then debug ;-)
%
% Cheers,
% Wol
Thanks again & HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-21 8:09 ` Wols Lists
2020-05-21 11:01 ` David T-G
@ 2020-05-21 11:01 ` David T-G
2020-05-21 11:24 ` David T-G
1 sibling, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-21 11:01 UTC (permalink / raw
To: Linux RAID list; +Cc: Phil Turmel
Wol, et al --
...and then Wols Lists said...
%
% On 21/05/20 00:53, David T-G wrote:
% > ## parted
% > Model: ATA ST4000DM000-1F21 (scsi)
...
% > SCT capabilities: (0x1085) SCT Status supported.
% > SMART Error Log Version: 1
% > ## scterc
% > SCT Error Recovery Control command not supported
% >
% > Curiously, note that querying just scterc as the wiki instructs says "not
% > supported", but a general smartctl query says yes. I'm not sure how to
% > interpret this...
%
% Seagate Barracudas :-(
Yep. They were good "back in the day" ...
%
% As for smartctl, you're asking two different things. Firstly is SCT
% supported (yes). Secondly, is the ERC feature supported (no).
%
% And that second question is the killer. Your drives do not support error
% recovery. Plan to replace them with ones that do ASAP!
That would be nice. I actually have wanted for quite some time
to grow these from 4T to 8T, but budget hasn't permitted. Got any
particularly-affordable recommendations?
This whole problem sounds familiar to me. I thought that it was possible
to adjust the timeouts on the software side to match the longer disk time
or similar. Of course, I didn't know that I had a real problem in the
first place ... But does that sound familiar to anyone?
%
...
%
% In the meantime, make sure you're running Brad's script, and watch out
% for any hint of lengthening read/write times. That's unlikely to be why
% your overlay drives won't mount - I suspect a problem with loopback, but
% I don't know.
I most definitely also want to be able to spot trends to get ahead of
failures. I just don't know for what to look or how to parse it to write
a script that will say "hey, this thingie here is growing, and you said
you cared ...".
%
% What I don't want to advise, but I strongly suspect will work, is to
% force-assemble the two good drives and the nearly-good drive. Because it
% has no redundancy it won't scramble your data because it can't do a
Should I, then, get rid of the mapper overlay stuff? I tried pointing to
even just three devs and got that they're busy.
% rebuild, but I would VERY STRONGLY suggest you download lsdrv and get
% the output. The whole point of this script is to get the information you
You mean the output that is some error and a few lines of traceback?
Yeah, I saw that, but I don't know how to fix it. Another problem in the
queue.
% need so that if everything does go pear shaped, you can rebuild the
% metadata from first principles. It's easy - git clone, run.
... and then debug ;-)
%
% Cheers,
% Wol
Thanks again & HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-21 8:13 ` failed disks, mapper, and "Invalid argument" Wols Lists
@ 2020-05-21 11:04 ` David T-G
0 siblings, 0 replies; 24+ messages in thread
From: David T-G @ 2020-05-21 11:04 UTC (permalink / raw
To: Linux RAID list
Wol, et al --
...and then Wols Lists said...
%
% On 21/05/20 00:53, David T-G wrote:
% > diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 --verbose /dev/sda1 /dev/sdb1 /dev/sdc1
% > mdadm: looking for devices for /dev/md0
% > mdadm: /dev/sda1 is busy - skipping
% > mdadm: /dev/sdb1 is busy - skipping
% > mdadm: /dev/sdc1 is busy - skipping
% >
% > like the overlay is keeping me from the raw devices, so I'd have to tear
% > down all of that to try the real thing. I'll h old off on that...
%
% Did you do an mdadm --stop before trying the force assemble? That
% implies to me you've got the remnants of a previous attempt lying around...
Yes, I did. md0 doesn't exist at all on the system at the moment.
%
% Not sure which command it is - "cat /proc/mdstat" maybe, but make sure
% ALL your arrays are stopped (unless you know they are running okay)
% before you try stuff.
The mish-mash array (md127, and no I don't understand how these things
are named!) is fine. The problem array (md0) is on exactly those four
disks (now sda, sdb, sdc, sdd).
%
% Cheers,
% Wol
Thanks again & HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-21 11:01 ` failed disks, mapper, and "Invalid argument" David T-G
@ 2020-05-21 11:24 ` David T-G
2020-05-21 12:00 ` Wols Lists
0 siblings, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-21 11:24 UTC (permalink / raw
To: Linux RAID list
Wol, et al --
...and then davidtg-robot@justpickone.org said...
%
% ...and then Wols Lists said...
% %
...
% % What I don't want to advise, but I strongly suspect will work, is to
% % force-assemble the two good drives and the nearly-good drive. Because it
% % has no redundancy it won't scramble your data because it can't do a
%
% Should I, then, get rid of the mapper overlay stuff? I tried pointing to
% even just three devs and got that they're busy.
[snip]
I was thinking of this last night but hesitant, so I went ahead and tried
it this morning. Perhaps my overlay and mapper config was all broken,
because this apparently worked out. Yay, part one.
diskfarm:root:13:/mnt/scratch/disks> parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
diskfarm:root:13:/mnt/scratch/disks> parallel losetup -d ::: /dev/loop1[01234]
losetup: /dev/loop11: detach failed: No such device or address
losetup: /dev/loop12: detach failed: No such device or address
losetup: /dev/loop13: detach failed: No such device or address
losetup: /dev/loop14: detach failed: No such device or address
This was odd... Yes, I know I listed too many, but I couldn't remember
whether or not I started counting at zero.
diskfarm:root:14:~> ls -goh /dev/loop1?
brw-rw---- 1 7, 11 May 21 07:15 /dev/loop11
brw-rw---- 1 7, 12 May 21 07:15 /dev/loop12
brw-rw---- 1 7, 13 May 21 07:15 /dev/loop13
brw-rw---- 1 7, 14 May 21 07:15 /dev/loop14
diskfarm:root:13:/mnt/scratch/disks> parallel losetup -d ::: /dev/loop1[1234]
losetup: /dev/loop11: detach failed: No such device or address
losetup: /dev/loop12: detach failed: No such device or address
losetup: /dev/loop13: detach failed: No such device or address
losetup: /dev/loop14: detach failed: No such device or address
Even listing only the actual devices didn't seem to help much. Huh?
Never mind; let's move on.
diskfarm:root:13:/mnt/scratch/disks> dmsetup status
No devices found
diskfarm:root:13:/mnt/scratch/disks> mdadm --assemble --force /dev/md0 --verbose /dev/sda1 /dev/sdb1 /dev/sdc1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: forcing event count in /dev/sdc1(2) from 57836 upto 57840
mdadm: clearing FAULTY flag for device 2 in /dev/md0 for /dev/sdc1
mdadm: Marking array /dev/md0 as 'clean'
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/sdc1 to /dev/md0 as 2
mdadm: added /dev/sda1 to /dev/md0 as 3
mdadm: added /dev/sdb1 to /dev/md0 as 0
mdadm: /dev/md0 has been started with 3 drives (out of 4).
diskfarm:root:13:/mnt/scratch/disks> cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active (auto-read-only) raid5 sdb1[0] sda1[4] sdc1[3]
11720265216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [U_UU]
md127 : active raid5 sdf2[0] sdg2[1] sdh2[3]
1464622080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
This looks good! No protection, but it functions.
diskfarm:root:13:/mnt/scratch/disks> mount /mnt/4Traid5md
diskfarm:root:13:/mnt/scratch/disks> df -kh !$
df -kh /mnt/4Traid5md
Filesystem Size Used Avail Use% Mounted on
/dev/md0p1 11T 11T 3.7G 100% /mnt/4Traid5md
Sure enough, there it is. Yay.
Now ... What do I do with the last drive? Can I put it back in and let
it catch up, or should it reinitialize and build from scratch?
Thanks again & HANd
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-21 11:01 ` David T-G
@ 2020-05-21 11:55 ` Wols Lists
2020-05-21 12:30 ` disks & prices plus python (was "Re: failed disks, mapper, and "Invalid argument"") David T-G
0 siblings, 1 reply; 24+ messages in thread
From: Wols Lists @ 2020-05-21 11:55 UTC (permalink / raw
To: David T-G, Linux RAID list; +Cc: Phil Turmel
On 21/05/20 12:01, David T-G wrote:
> Wols, et al --
>
> ...and then Wols Lists said...
> %
> % On 21/05/20 00:53, David T-G wrote:
> % > ## parted
> % > Model: ATA ST4000DM000-1F21 (scsi)
> ...
> % > SCT capabilities: (0x1085) SCT Status supported.
> % > SMART Error Log Version: 1
> % > ## scterc
> % > SCT Error Recovery Control command not supported
> % >
> % > Curiously, note that querying just scterc as the wiki instructs says "not
> % > supported", but a general smartctl query says yes. I'm not sure how to
> % > interpret this...
> %
> % Seagate Barracudas :-(
>
> Yep. They were good "back in the day" ...
>
Still are. Just not for raid..
>
> %
> % As for smartctl, you're asking two different things. Firstly is SCT
> % supported (yes). Secondly, is the ERC feature supported (no).
> %
> % And that second question is the killer. Your drives do not support error
> % recovery. Plan to replace them with ones that do ASAP!
>
> That would be nice. I actually have wanted for quite some time
> to grow these from 4T to 8T, but budget hasn't permitted. Got any
> particularly-affordable recommendations?
8TB WD Reds are still CMR and okay AT THE MOMENT. I wouldn't trust them
though (or make sure you can RMA them if they've changed!)
I haven't heard of Ironwolves using SMR (yet).
Looking quickly on Amazon
WD Red 8TB £232
Toshiba N300 8TB £239
Seagate Ironwolf 8TB £260
Seagate Ironwolf 8TB Silver £263 (optimised for raid it claims)
WD Red 8TB Pro £270
Seagate Ironwolf 8TB Pro £360
Given that the Red and the N300 are similar in price, I'd go for the
N300. Bear in mind that I *never* see those drives mentioned here, I
really don't know what they're like.
Going up a bit, Ironwolf or Red Pro? My personal preference is Ironwolf.
The Reds were always preferred on the list, but WD have really dropped
the ball with making some of these drives SMR. These SMR drives *don't*
*work* in raid full stop, which is bad seeing as they are marketed as
raid drives! I don't know about Ironwolf Silver, but if it's optimised
for raid the £3 is worth it :-)
Ironwolf Pro? Probably overkill.
On all of these, caveat emptor. I'm in the UK, so if the web page or
marketing blurb says "suitable for raid", then I can RMA them as "unfit
for purpose". I don't know what your legal regime is.
>
> This whole problem sounds familiar to me. I thought that it was possible
> to adjust the timeouts on the software side to match the longer disk time
> or similar. Of course, I didn't know that I had a real problem in the
> first place ... But does that sound familiar to anyone?
>
:-) :-) :-)
>
> %
> ...
> %
> % In the meantime, make sure you're running Brad's script, and watch out
> % for any hint of lengthening read/write times. That's unlikely to be why
> % your overlay drives won't mount - I suspect a problem with loopback, but
> % I don't know.
>
> I most definitely also want to be able to spot trends to get ahead of
> failures. I just don't know for what to look or how to parse it to write
> a script that will say "hey, this thingie here is growing, and you said
> you cared ...".
>
>
> %
> % What I don't want to advise, but I strongly suspect will work, is to
> % force-assemble the two good drives and the nearly-good drive. Because it
> % has no redundancy it won't scramble your data because it can't do a
>
> Should I, then, get rid of the mapper overlay stuff? I tried pointing to
> even just three devs and got that they're busy.
>
>
> % rebuild, but I would VERY STRONGLY suggest you download lsdrv and get
> % the output. The whole point of this script is to get the information you
>
> You mean the output that is some error and a few lines of traceback?
> Yeah, I saw that, but I don't know how to fix it. Another problem in the
> queue.
>
Last time I ran it, it was Python 2.7. I needed to edit the shebang
line. I think Phil's fixed that.
>
> % need so that if everything does go pear shaped, you can rebuild the
> % metadata from first principles. It's easy - git clone, run.
>
> ... and then debug ;-)
>
>
> %
> % Cheers,
> % Wol
>
>
> Thanks again & HAND
>
> :-D
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: failed disks, mapper, and "Invalid argument"
2020-05-21 11:24 ` David T-G
@ 2020-05-21 12:00 ` Wols Lists
2020-05-21 12:33 ` re-add syntax (was "Re: failed disks, mapper, and "Invalid argument"") David T-G
0 siblings, 1 reply; 24+ messages in thread
From: Wols Lists @ 2020-05-21 12:00 UTC (permalink / raw
To: David T-G, Linux RAID list
On 21/05/20 12:24, David T-G wrote:
> Sure enough, there it is. Yay.
>
> Now ... What do I do with the last drive? Can I put it back in and let
> it catch up, or should it reinitialize and build from scratch?
Can't remember the syntax, but there's a re-add option. If it can find
and replay a log of failed updates, it will bring the drive straight
back in. Otherwise it will rebuild from scratch.
That's probably the safest way - let mdadm choose the best option.
Oh - and when you get your Ironwolves or whatever, read up on the
replace option. Much the safest option.
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: disks & prices plus python (was "Re: failed disks, mapper, and "Invalid argument"")
2020-05-21 11:55 ` Wols Lists
@ 2020-05-21 12:30 ` David T-G
2020-05-21 13:07 ` antlists
0 siblings, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-21 12:30 UTC (permalink / raw
To: Linux RAID list; +Cc: Phil Turmel
Wol, et al --
...and then Wols Lists said...
%
% On 21/05/20 12:01, David T-G wrote:
% >
% > ...and then Wols Lists said...
% > %
...
% > %
% > % Seagate Barracudas :-(
% >
% > Yep. They were good "back in the day" ...
%
% Still are. Just not for raid..
Oh! Well, that's nice to know. Of course, I had been hoping to move
these out to another system after upgrading to larger, but maybe that's
not an option :-( They are going to be worlds better than the existing
crap drives in there now, though, so here's hoping I can put them to use.
% >
...
% > % recovery. Plan to replace them with ones that do ASAP!
% >
% > That would be nice. I actually have wanted for quite some time
% > to grow these from 4T to 8T, but budget hasn't permitted. Got any
% > particularly-affordable recommendations?
%
% 8TB WD Reds are still CMR and okay AT THE MOMENT. I wouldn't trust them
% though (or make sure you can RMA them if they've changed!)
Thanks!
%
% I haven't heard of Ironwolves using SMR (yet).
%
% Looking quickly on Amazon
% WD Red 8TB £232
% Toshiba N300 8TB £239
% Seagate Ironwolf 8TB £260
% Seagate Ironwolf 8TB Silver £263 (optimised for raid it claims)
% WD Red 8TB Pro £270
% Seagate Ironwolf 8TB Pro £360
Ouch. I sure hope they're cheaper over here! Unfortunately, when I was
shopping I was looking at ... Barracudas :-/
%
% Given that the Red and the N300 are similar in price, I'd go for the
% N300. Bear in mind that I *never* see those drives mentioned here, I
% really don't know what they're like.
Thanks; I'll have a look.
%
...
% >
% > % rebuild, but I would VERY STRONGLY suggest you download lsdrv and get
% > % the output. The whole point of this script is to get the information you
% >
% > You mean the output that is some error and a few lines of traceback?
% > Yeah, I saw that, but I don't know how to fix it. Another problem in the
% > queue.
%
% Last time I ran it, it was Python 2.7. I needed to edit the shebang
% line. I think Phil's fixed that.
I checked and I have 2.7 on this box, so I figure it would work. But I
can barely spell Python, much less understand it.
Thanks again & HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* re-add syntax (was "Re: failed disks, mapper, and "Invalid argument"")
2020-05-21 12:00 ` Wols Lists
@ 2020-05-21 12:33 ` David T-G
2020-05-21 13:01 ` antlists
0 siblings, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-21 12:33 UTC (permalink / raw
To: Linux RAID list
...and then Wols Lists said...
%
% On 21/05/20 12:24, David T-G wrote:
% > Sure enough, there it is. Yay.
% >
% > Now ... What do I do with the last drive? Can I put it back in and let
% > it catch up, or should it reinitialize and build from scratch?
%
% Can't remember the syntax, but there's a re-add option. If it can find
% and replay a log of failed updates, it will bring the drive straight
% back in. Otherwise it will rebuild from scratch.
%
% That's probably the safest way - let mdadm choose the best option.
OK; yay. I'm still confused, though, between "add" and "readd". I'll
take any pointers to docs I can get.
%
% Oh - and when you get your Ironwolves or whatever, read up on the
% replace option. Much the safest option.
THAT sounds familiar. Thanks.
%
% Cheers,
% Wol
HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: re-add syntax (was "Re: failed disks, mapper, and "Invalid argument"")
2020-05-21 12:33 ` re-add syntax (was "Re: failed disks, mapper, and "Invalid argument"") David T-G
@ 2020-05-21 13:01 ` antlists
2020-05-21 13:15 ` re-add syntax David T-G
0 siblings, 1 reply; 24+ messages in thread
From: antlists @ 2020-05-21 13:01 UTC (permalink / raw
To: David T-G, Linux RAID list
On 21/05/2020 13:33, David T-G wrote:
> % Can't remember the syntax, but there's a re-add option. If it can find
> % and replay a log of failed updates, it will bring the drive straight
> % back in. Otherwise it will rebuild from scratch.
> %
> % That's probably the safest way - let mdadm choose the best option.
>
> OK; yay. I'm still confused, though, between "add" and "readd". I'll
> take any pointers to docs I can get.
Add just adds the drive back and rebuilds it.
Readd will play a journal if it can. If it can't, it will fall back and
do an add.
So *you* should choose re-add. Let mdadm choose add if it can't do a re-add.
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: disks & prices plus python (was "Re: failed disks, mapper, and "Invalid argument"")
2020-05-21 12:30 ` disks & prices plus python (was "Re: failed disks, mapper, and "Invalid argument"") David T-G
@ 2020-05-21 13:07 ` antlists
2020-05-21 13:17 ` disks & prices plus python David T-G
0 siblings, 1 reply; 24+ messages in thread
From: antlists @ 2020-05-21 13:07 UTC (permalink / raw
To: David T-G, Linux RAID list
On 21/05/2020 13:30, David T-G wrote:
> % > %
> % > % Seagate Barracudas:-(
> % >
> % > Yep. They were good "back in the day" ...
> %
> % Still are. Just not for raid..
>
> Oh! Well, that's nice to know. Of course, I had been hoping to move
> these out to another system after upgrading to larger, but maybe that's
> not an option:-( They are going to be worlds better than the existing
> crap drives in there now, though, so here's hoping I can put them to use.
General advice is don't use them for parity raid - ie 5 or 6! They're
okay (but not adviseable) for mirrors.
So if you really want to use them in a raid array, I'd go for a 6TB
raid-10. Okay, you've lost 3TB of disk space, but you've bought a 66%
chance of surviving a 2-disk failure.
I'm not sure what I'm going to do with mine. I've bought an add-in eSATA
card to go with my eSATA drive bay, so I may well use them as external
backups.
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: re-add syntax
2020-05-21 13:01 ` antlists
@ 2020-05-21 13:15 ` David T-G
2020-05-21 18:07 ` David T-G
0 siblings, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-21 13:15 UTC (permalink / raw
To: Linux RAID list
Wol, et al (is there anyone else, even??) --
...and then antlists said...
%
% On 21/05/2020 13:33, David T-G wrote:
% >% Can't remember the syntax, but there's a re-add option. If it can find
% >% and replay a log of failed updates, it will bring the drive straight
% >% back in. Otherwise it will rebuild from scratch.
% >%
% >% That's probably the safest way - let mdadm choose the best option.
% >
% >OK; yay. I'm still confused, though, between "add" and "readd". I'll
% >take any pointers to docs I can get.
%
% Add just adds the drive back and rebuilds it.
%
% Readd will play a journal if it can. If it can't, it will fall back
% and do an add.
OK. Sounds good.
%
% So *you* should choose re-add. Let mdadm choose add if it can't do a re-add.
Thanks. Sooooo ... Given this
diskfarm:root:10:~> cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[0] sda1[4] sdc1[3]
11720265216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [U_UU]
md127 : active raid5 sdf2[0] sdg2[1] sdh2[3]
1464622080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
diskfarm:root:10:~> mdadm --examine /dev/sd[abcd]1 | egrep '/dev/sd|Event'
/dev/sda1:
Events : 57862
/dev/sdb1:
Events : 57862
/dev/sdc1:
Events : 57862
/dev/sdd1:
Events : 48959
does this
mdadm --manage /dev/md0 --re-add /dev/sdd1
look like the right next step?
%
% Cheers,
% Wol
Thanks again & HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: disks & prices plus python
2020-05-21 13:07 ` antlists
@ 2020-05-21 13:17 ` David T-G
2020-05-21 13:42 ` Wols Lists
0 siblings, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-21 13:17 UTC (permalink / raw
To: Linux RAID list
Wol --
...and then antlists said...
%
% On 21/05/2020 13:30, David T-G wrote:
% >
% >Oh! Well, that's nice to know. Of course, I had been hoping to move
% >these out to another system after upgrading to larger, but maybe that's
% >not an option:-( They are going to be worlds better than the existing
% >crap drives in there now, though, so here's hoping I can put them to use.
%
% General advice is don't use them for parity raid - ie 5 or 6!
% They're okay (but not adviseable) for mirrors.
Hmmm...
%
% So if you really want to use them in a raid array, I'd go for a 6TB
% raid-10. Okay, you've lost 3TB of disk space, but you've bought a
% 66% chance of surviving a 2-disk failure.
Heh. Except that this box has only three ports, so one was going to
become USB storage or some such anyway. But that means I can't have a
RAID10 vol ...
%
% I'm not sure what I'm going to do with mine. I've bought an add-in
% eSATA card to go with my eSATA drive bay, so I may well use them as
% external backups.
Good plan.
%
% Cheers,
% Wol
HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: disks & prices plus python
2020-05-21 13:17 ` disks & prices plus python David T-G
@ 2020-05-21 13:42 ` Wols Lists
2020-05-21 13:46 ` David T-G
0 siblings, 1 reply; 24+ messages in thread
From: Wols Lists @ 2020-05-21 13:42 UTC (permalink / raw
To: David T-G, Linux RAID list
On 21/05/20 14:17, David T-G wrote:
> % So if you really want to use them in a raid array, I'd go for a 6TB
> % raid-10. Okay, you've lost 3TB of disk space, but you've bought a
> % 66% chance of surviving a 2-disk failure.
>
> Heh. Except that this box has only three ports, so one was going to
> become USB storage or some such anyway. But that means I can't have a
> RAID10 vol ...
Three SATA ports? That sounds very stingy! (Or is it four, but one's the
DVD?)
This is what I've bought for my new PC
https://www.amazon.co.uk/StarTech-com-Port-Express-eSATA-Controller-Silver-Black/dp/B00952N2DQ/ref=pd_ybh_a_47?_encoding=UTF8&psc=1&refRID=2B248DH9763DDGHGN5A6
An expensive gaming mobo, with 6 SATA ports, but if I want NVMe, or a
second graphics card, or whatever whatever, some of the lanes get taken
away. And of course I do want a 2nd graphics card (this machine is
destined to be double-headed, when I can get it to work :-)
And then for my old system, I'm planning to buy
https://www.amazon.co.uk/gp/product/B07T8XNQT6/ref=ox_sc_saved_title_2?smid=A32IGEZ3DX93HZ&psc=1
or something similar. That machine has 5 SATA ports (or only 4 if
PATA-mode is enabled), and I'm planning to stick a bunch of 500GB or 1TB
drives in it for playing with. So especially if I split the 1TB drives
into 500GB partitions, that's some humungous raids for testing with :-)
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: disks & prices plus python
2020-05-21 13:42 ` Wols Lists
@ 2020-05-21 13:46 ` David T-G
0 siblings, 0 replies; 24+ messages in thread
From: David T-G @ 2020-05-21 13:46 UTC (permalink / raw
To: Linux RAID list
Wol, et al --
...and then Wols Lists said...
%
% On 21/05/20 14:17, David T-G wrote:
% >
% > Heh. Except that this box has only three ports, so one was going to
% > become USB storage or some such anyway. But that means I can't have a
% > RAID10 vol ...
%
% Three SATA ports? That sounds very stingy! (Or is it four, but one's the
% DVD?)
[snip]
Nope; it's three. It's an old Acer tower. You wouldn't believe how old
and pieced-together my gear is ... If memory serves, we're pushing 20
years now on diskfarm, and the "new" little system is probably 10.
HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: re-add syntax
2020-05-21 13:15 ` re-add syntax David T-G
@ 2020-05-21 18:07 ` David T-G
2020-05-21 18:40 ` Roger Heflin
0 siblings, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-21 18:07 UTC (permalink / raw
To: Linux RAID list
Hi, all --
...and then davidtg-robot@justpickone.org said...
%
% ...and then antlists said...
% %
% % So *you* should choose re-add. Let mdadm choose add if it can't do a re-add.
%
% Thanks. Sooooo ... Given this
...
% does this
%
% mdadm --manage /dev/md0 --re-add /dev/sdd1
%
% look like the right next step?
Perhaps it did, but it wasn't to be:
diskfarm:root:10:~> mdadm --manage /dev/md0 --re-add /dev/sdd1
mdadm: --re-add for /dev/sdd1 to /dev/md0 is not possible
So we'll try "add"
diskfarm:root:10:~> mdadm --manage /dev/md0 --add /dev/sdd1
mdadm: added /dev/sdd1
and now we wait :-)
Thanks again to all
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: re-add syntax
2020-05-21 18:07 ` David T-G
@ 2020-05-21 18:40 ` Roger Heflin
2020-05-21 22:52 ` David T-G
0 siblings, 1 reply; 24+ messages in thread
From: Roger Heflin @ 2020-05-21 18:40 UTC (permalink / raw
To: David T-G; +Cc: Linux RAID list
For re-add to work the array must have a bitmap, so that mdadm knows
what parts of the disk need updating.
mine looks like this:
md13 : active raid6 sdi3[9] sdf3[12] sdg3[6] sdd3[1] sdc3[5] sdb3[7] sde3[10]
3612623360 blocks super 1.2 level 6, 512k chunk, algorithm 2
[7/7] [UUUUUUU]
bitmap: 0/6 pages [0KB], 65536KB chunk
On Thu, May 21, 2020 at 1:10 PM David T-G <davidtg-robot@justpickone.org> wrote:
>
> Hi, all --
>
> ...and then davidtg-robot@justpickone.org said...
> %
> % ...and then antlists said...
> % %
> % % So *you* should choose re-add. Let mdadm choose add if it can't do a re-add.
> %
> % Thanks. Sooooo ... Given this
> ...
> % does this
> %
> % mdadm --manage /dev/md0 --re-add /dev/sdd1
> %
> % look like the right next step?
>
> Perhaps it did, but it wasn't to be:
>
> diskfarm:root:10:~> mdadm --manage /dev/md0 --re-add /dev/sdd1
> mdadm: --re-add for /dev/sdd1 to /dev/md0 is not possible
>
> So we'll try "add"
>
> diskfarm:root:10:~> mdadm --manage /dev/md0 --add /dev/sdd1
> mdadm: added /dev/sdd1
>
> and now we wait :-)
>
>
> Thanks again to all
>
> :-D
> --
> David T-G
> See http://justpickone.org/davidtg/email/
> See http://justpickone.org/davidtg/tofu.txt
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: re-add syntax
2020-05-21 18:40 ` Roger Heflin
@ 2020-05-21 22:52 ` David T-G
2020-05-21 23:17 ` antlists
0 siblings, 1 reply; 24+ messages in thread
From: David T-G @ 2020-05-21 22:52 UTC (permalink / raw
To: Linux RAID list
Roger, et al --
...and then Roger Heflin said...
%
% For re-add to work the array must have a bitmap, so that mdadm knows
% what parts of the disk need updating.
[snip]
Ahhhhh... Thanks!
I've wondered about an internal bitmap vs not. I also wonder how big the
bitmap is and where else I might stick it ...
HANW
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: re-add syntax
2020-05-21 22:52 ` David T-G
@ 2020-05-21 23:17 ` antlists
2020-05-21 23:53 ` David T-G
0 siblings, 1 reply; 24+ messages in thread
From: antlists @ 2020-05-21 23:17 UTC (permalink / raw
To: David T-G, Linux RAID list
On 21/05/2020 23:52, David T-G wrote:
> Roger, et al --
>
> ...and then Roger Heflin said...
> %
> % For re-add to work the array must have a bitmap, so that mdadm knows
> % what parts of the disk need updating.
> [snip]
>
> Ahhhhh... Thanks!
>
> I've wondered about an internal bitmap vs not. I also wonder how big the
> bitmap is and where else I might stick it ...
>
Bear in mind the bitmap is obsolete ... I need to get my head round it,
but you should upgrade from bitmap to journal ... amongst other things,
this fixes the "raid 5 write hole" - not sure what it is but it seems
inherent in the design of raid 5 that if something goes wrong it is easy
to lose data. Journalling presumably fixes that the same way it fixes
write losses in general ...
(Oh - and if you somehow manage to switch on bitmaps and journals
together the resulting array will refuse to assemble. The current tools
won't let you have both, but older versions can.)
Cheers,
Wol
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: re-add syntax
2020-05-21 23:17 ` antlists
@ 2020-05-21 23:53 ` David T-G
0 siblings, 0 replies; 24+ messages in thread
From: David T-G @ 2020-05-21 23:53 UTC (permalink / raw
To: Linux RAID list
Wol, et al --
...and then antlists said...
%
% On 21/05/2020 23:52, David T-G wrote:
% >
...
% >
% >I've wondered about an internal bitmap vs not. I also wonder how big the
% >bitmap is and where else I might stick it ...
%
% Bear in mind the bitmap is obsolete ... I need to get my head round
% it, but you should upgrade from bitmap to journal ... amongst other
[snip]
Ahhhhh... Very good to know! Thanks.
HANW
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2020-05-21 23:53 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-20 20:05 failed disks, mapper, and "Invalid argument" David T-G
2020-05-20 23:23 ` Wols Lists
2020-05-20 23:53 ` David T-G
2020-05-21 8:09 ` Wols Lists
2020-05-21 11:01 ` David T-G
2020-05-21 11:55 ` Wols Lists
2020-05-21 12:30 ` disks & prices plus python (was "Re: failed disks, mapper, and "Invalid argument"") David T-G
2020-05-21 13:07 ` antlists
2020-05-21 13:17 ` disks & prices plus python David T-G
2020-05-21 13:42 ` Wols Lists
2020-05-21 13:46 ` David T-G
2020-05-21 11:01 ` failed disks, mapper, and "Invalid argument" David T-G
2020-05-21 11:24 ` David T-G
2020-05-21 12:00 ` Wols Lists
2020-05-21 12:33 ` re-add syntax (was "Re: failed disks, mapper, and "Invalid argument"") David T-G
2020-05-21 13:01 ` antlists
2020-05-21 13:15 ` re-add syntax David T-G
2020-05-21 18:07 ` David T-G
2020-05-21 18:40 ` Roger Heflin
2020-05-21 22:52 ` David T-G
2020-05-21 23:17 ` antlists
2020-05-21 23:53 ` David T-G
2020-05-21 8:13 ` failed disks, mapper, and "Invalid argument" Wols Lists
2020-05-21 11:04 ` David T-G
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.