11.3U2 Constant disk "non-media errors" (smartctl)

Description

After a recent upgrade from 11.2-U8 to 11.3-U2 we have noticed a rapid and constant growth of "non-media errors" in all 12 connected disks simultaneously. I tried to replicate the issue on the test server and with the same hardware but fresh installation (no pools) new disks showed the same behavior.

UPDATE: This happens every time I run smartctl, like: smartctl -a /dev/da6

It affects 11.3 only, no issues on 11.2 or older.

No such problems with 11.2-U8.
After rolling back from 11.3-U2 to 11.2-U8 errors stopped growing.
Of course, now we can't change the counter on our new disks back to normal values...
Chassis: CSE-846E16-R1200B
Motherboard: X9DRi-F
Backplane: BPN-SAS2-846EL1 24-port 4U SAS2 6Gbps
Controller: 1x LSI00301 (9207-8i)
Disks: MB6000FEDAU (12x6TB SAS)

Please check the attached screenshots for non-media disk errors and other details.
Before the upgrade, all disks had <500 errors.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

Alexander Motin August 5, 2020 at 8:07 PM

Closing due to lack of feedback.  The issue may be specific to the disk model or firmware.

Alexander Motin June 4, 2020 at 2:20 PM

, guessing the problem is in requesting list of log pages, could you run `sg_logs /dev/da6`?  It should be the same SCSI command just run with different tool.  Will it also fail and increment error counter?  If so, than this drive is just insane, since while it is OK to not support log pages, it is definitely not an error for the host to try request it.

Roy A Sutton May 13, 2020 at 6:47 PM

, apologies... No wish/intention to muddy up the original issue. My problem too began with the 11.3 upgrade and I now see/notice numerous recorded "Non-medium error count" across the affected drives (not exactly sure when they were recorded). My errors are definitely not constant with time and are not caused by smartctrl. My issue appears repeatably with each attempted pool scrub. Pool operates otherwise normally but fails with R/W CAM errors during scrub. I will look elsewhere. 

Alex Diamantopulo May 13, 2020 at 6:17 PM

, You're having a completely different problem.

My problem is triggered by checking the SMART status of the disks by running:
**

smartctl -a /dev/da6

Every time after executing smartctl my disks getting registered one more "non-media errors" error.

Roy A Sutton May 13, 2020 at 5:17 PM

, Here is the result:

$ bash $ for d in 4 5 6 7; do sg_readcap --16 /dev/da$d; done
Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=5860533167 (0x15d50a3af), Number of logical blocks=5860533168 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 3000592982016 bytes, 2861588.5 MiB, 3000.59 GB, 3.00 TB Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=5860533167 (0x15d50a3af), Number of logical blocks=5860533168 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 3000592982016 bytes, 2861588.5 MiB, 3000.59 GB, 3.00 TB Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=5860533167 (0x15d50a3af), Number of logical blocks=5860533168 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 3000592982016 bytes, 2861588.5 MiB, 3000.59 GB, 3.00 TB Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=5860533167 (0x15d50a3af), Number of logical blocks=5860533168 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 3000592982016 bytes, 2861588.5 MiB, 3000.59 GB, 3.00 TB

 available for other tests.. 

Cannot Reproduce
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created April 14, 2020 at 6:06 AM
Updated July 1, 2022 at 4:49 PM
Resolved August 5, 2020 at 8:07 PM