11.3U2 Constant disk "non-media errors" (smartctl)
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Alexander Motin August 5, 2020 at 8:07 PM
Closing due to lack of feedback. The issue may be specific to the disk model or firmware.
Alexander Motin June 4, 2020 at 2:20 PM
@Alex Diamantopulo, guessing the problem is in requesting list of log pages, could you run `sg_logs /dev/da6`? It should be the same SCSI command just run with different tool. Will it also fail and increment error counter? If so, than this drive is just insane, since while it is OK to not support log pages, it is definitely not an error for the host to try request it.
Roy A Sutton May 13, 2020 at 6:47 PM
@Alex Diamantopulo, apologies... No wish/intention to muddy up the original issue. My problem too began with the 11.3 upgrade and I now see/notice numerous recorded "Non-medium error count" across the affected drives (not exactly sure when they were recorded). My errors are definitely not constant with time and are not caused by smartctrl. My issue appears repeatably with each attempted pool scrub. Pool operates otherwise normally but fails with R/W CAM errors during scrub. I will look elsewhere.
Alex Diamantopulo May 13, 2020 at 6:17 PM
@Roy A Sutton, You're having a completely different problem.
My problem is triggered by checking the SMART status of the disks by running:
**
smartctl -a /dev/da6
Every time after executing smartctl my disks getting registered one more "non-media errors" error.
Roy A Sutton May 13, 2020 at 5:17 PM
@Alexander Motin, Here is the result:
$ bash
$ for d in 4 5 6 7; do sg_readcap --16 /dev/da$d; done
Read Capacity results:
Protection: prot_en=0, p_type=0, p_i_exponent=0
Logical block provisioning: lbpme=0, lbprz=0
Last LBA=5860533167 (0x15d50a3af), Number of logical blocks=5860533168
Logical block length=512 bytes
Logical blocks per physical block exponent=0
Lowest aligned LBA=0
Hence:
Device size: 3000592982016 bytes, 2861588.5 MiB, 3000.59 GB, 3.00 TB
Read Capacity results:
Protection: prot_en=0, p_type=0, p_i_exponent=0
Logical block provisioning: lbpme=0, lbprz=0
Last LBA=5860533167 (0x15d50a3af), Number of logical blocks=5860533168
Logical block length=512 bytes
Logical blocks per physical block exponent=0
Lowest aligned LBA=0
Hence:
Device size: 3000592982016 bytes, 2861588.5 MiB, 3000.59 GB, 3.00 TB
Read Capacity results:
Protection: prot_en=0, p_type=0, p_i_exponent=0
Logical block provisioning: lbpme=0, lbprz=0
Last LBA=5860533167 (0x15d50a3af), Number of logical blocks=5860533168
Logical block length=512 bytes
Logical blocks per physical block exponent=0
Lowest aligned LBA=0
Hence:
Device size: 3000592982016 bytes, 2861588.5 MiB, 3000.59 GB, 3.00 TB
Read Capacity results:
Protection: prot_en=0, p_type=0, p_i_exponent=0
Logical block provisioning: lbpme=0, lbprz=0
Last LBA=5860533167 (0x15d50a3af), Number of logical blocks=5860533168
Logical block length=512 bytes
Logical blocks per physical block exponent=0
Lowest aligned LBA=0
Hence:
Device size: 3000592982016 bytes, 2861588.5 MiB, 3000.59 GB, 3.00 TB
available for other tests..
After a recent upgrade from 11.2-U8 to 11.3-U2 we have noticed a rapid and constant growth of "non-media errors" in all 12 connected disks simultaneously. I tried to replicate the issue on the test server and with the same hardware but fresh installation (no pools) new disks showed the same behavior.
UPDATE: This happens every time I run smartctl, like: smartctl -a /dev/da6
It affects 11.3 only, no issues on 11.2 or older.
No such problems with 11.2-U8.
After rolling back from 11.3-U2 to 11.2-U8 errors stopped growing.
Of course, now we can't change the counter on our new disks back to normal values...
Chassis: CSE-846E16-R1200B
Motherboard: X9DRi-F
Backplane: BPN-SAS2-846EL1 24-port 4U SAS2 6Gbps
Controller: 1x LSI00301 (9207-8i)
Disks: MB6000FEDAU (12x6TB SAS)
Please check the attached screenshots for non-media disk errors and other details.
Before the upgrade, all disks had <500 errors.