Command timeouts with Seagate Ironwolf 110 SSDs

Description

Motherboard ASRock C3758D4i-4L
CPU: Intel Atom C3758
RAM: 2x 32 GB
Hard drives: 2x Seagate Ironwolf 110 SSDs, 1.92T each
Hard disk controllers: Onboard first (causing troubles), then LSI-9211 (works)
Network cards: 4x onboard Marvell, appearing as Intel
Power supply is a PC Power & Cooling 500W (overkill, but taken from another box).

No matter which motherboard SATA port I use, FreeNAS won't communicate properly with Seagate Ironwolf 110 SSDs. The drives work perfectly via a LSI HBA. Samsung 840 EVOs do work perfectly when connected to the motherboard SATA ports.

The drive firmware is the latest per Seagate's serial number firmware lookup facility.

I've moved the drives a few times while troubleshooting and producing this output, so the device addresses won't match between 'camcontrol' and the messages thrown when FreeNAS is timing out.

Booting Ubuntu, with the pool drives connected to the motherboard SATA ports, the OS can import the pool and write files at speed, and without errors or warnings.

freenas# camcontrol devlist
<ATA ZA1920NM10001 011J> at scbus0 target 25 lun 0 (pass0,da0)
<ATA ZA1920NM10001 011J> at scbus0 target 26 lun 0 (pass1,da1)
<ADATA ISMS331-016GMV P0831A> at scbus1 target 0 lun 0 (ada0,pass2)
<AHCI SGPIO Enclosure 2.00 0001> at scbus5 target 0 lun 0 (ses0,pass3)
<Samsung SSD 840 EVO 250GB EXT0DB6Q> at scbus6 target 0 lun 0 (pass5,ada1)
<Samsung SSD 840 EVO 250GB EXT0BB6Q> at scbus7 target 0 lun 0 (pass6,ada2)
<AHCI SGPIO Enclosure 2.00 0001> at scbus11 target 0 lun 0 (ses1,pass4)

freenas# smartctl -x /dev/da0
smartctl 7.0 2018-12-30 r4883 [FreeBSD 12.1-STABLE amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate IronWolf 110 SATA SSD
Device Model: ZA1920NM10001
Serial Number: HKS01KQ0
LU WWN Device Id: 5 000c50 03ea14015
Firmware Version: SF44011J
User Capacity: 1,920,383,410,176 bytes [1.92 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Mar 22 22:26:11 2020 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 1 (minimum power consumption with standby)
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Disabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x59) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 30) minutes.
SCT capabilities: (0x103d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 100 090 - 0
5 Reallocated_Sector_Ct O-CK 100 100 000 - 0
9 Power_On_Hours O-CK 100 100 000 - 88
12 Power_Cycle_Count O-CK 100 100 000 - 23
100 Flash_GB_Erased O-CK 100 100 000 - 11
102 Lifetime_PS4_Entry_Ct O-CK 100 100 000 - 13
103 Lifetime_PS3_Exit_Ct O-CK 100 100 000 - 9
170 Grown_Bad_Block_Ct O-CK 100 100 000 - 0
171 Program_Fail_Count O-CK 100 100 000 - 0
172 Erase_Fail_Count O-CK 100 100 000 - 0
173 Avg_Program/Erase_Ct O-CK 100 100 000 - 1
174 Unexpected_Pwr_Loss_Ct O-CK 100 100 000 - 19
177 Wear_Range_Delta PO---K 100 100 089 - 0 0 0
183 SATA_Downshift_Count O-CK 100 100 000 - 0x00000000000000
187 Uncorrectable_ECC_Ct O-CK 100 100 000 - 0
194 Temperature_Celsius O--K 030 049 000 - 30 (Min/Max 23/49)
195 RAISE_ECC_Cor_Ct O-CK 100 100 000 - 0
198 Uncor_Read_Error_Ct O-CK 100 100 000 - 0
199 UDMA_CRC_Error_Count O-CK 100 100 000 - 0
230 Drv_Life_Protect_Status PO---K 100 100 091 - 100
231 SSD_Life_Left PO--CK 100 100 010 - 0x00000000646400
232 Available_Reservd_Space POS--K 100 100 003 - 0
233 Lifetime_Wts_To_Flsh_GB O-CK 100 100 000 - 14
241 Lifetime_Wts_Frm_Hst_GB O-CK 100 100 000 - 39
242 Lifetime_Rds_Frm_Hst_GB O-CK 100 100 000 - 0
243 Free_Space OS-K 100 100 003 - 0x07270200218b89

_ K auto-keep

__ C event count

___ R error rate

____ S speed/performance

_____ O updated online

______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x02 SL R/O 16 Comprehensive SMART error log
0x03 GPL R/O 20 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x0a GPL R/W 16 Device Statistics Notification
0x0c GPL R/O 1 Pending Defects log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x12 GPL R/O 1 SATA NCQ Non-Data log
0x13 GPL R/O 1 SATA NCQ Send and Receive log
0x24 GPL R/O 65535 Current Device Internal Status Data log
0x25 GPL R/O 65535 Saved Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa8 SL VS 255 Device vendor specific log
0xb7 GPL VS 1024 Device vendor specific log
0xd4 GPL,SL VS 6 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
0xf7 SL - 2 Reserved
0xf8 SL - 1 Reserved
0xf9 SL - 4 Reserved
0xfa SL - 7 Reserved
0xfb GPL - 65535 Reserved

SMART Extended Comprehensive Error Log Version: 1 (20 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 0 (0x0000)
Device State: Active (0)
Current Temperature: 33 Celsius
Power Cycle Min/Max Temperature: 29/35 Celsius
Lifetime Min/Max Temperature: 23/49 Celsius
Specified Max Operating Temperature: 116 Celsius
Under/Over Temperature Limit Count: 0/0
SMART Status: 0xc24f (PASSED)

SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: -10/116 Celsius
Min/Max Temperature Limit: -10/120 Celsius
Temperature History Size (Index): 478 (67)

Index Estimated Time Temperature Celsius
68 2020-03-22 14:29 28 *********
... ..(387 skipped). .. *********
456 2020-03-22 20:57 28 *********
457 2020-03-22 20:58 29 **********
... ..( 10 skipped). .. **********
468 2020-03-22 21:09 29 **********
469 2020-03-22 21:10 30 ***********
... ..( 7 skipped). .. ***********
477 2020-03-22 21:18 30 ***********
0 2020-03-22 21:19 ? -
1 2020-03-22 21:20 30 ***********
... ..( 65 skipped). .. ***********
67 2020-03-22 22:26 30 ***********

SCT Error Recovery Control:
Read: Disabled
Write: Disabled

Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 1) ==
0x01 0x008 4 23 — Lifetime Power-On Resets
0x01 0x010 4 88 — Power-on Hours
0x01 0x018 6 82842176 — Logical Sectors Written
0x01 0x020 6 3160294 — Number of Write Commands
0x01 0x028 6 103285 — Logical Sectors Read
0x01 0x030 6 4516 — Number of Read Commands
0x01 0x038 6 317265966 — Date and Time TimeStamp
0x01 0x058 2 65447 — Resource Availability
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 — Number of Reported Uncorrectable Errors
0x04 0x010 4 0 — Resets Between Cmd Acceptance and Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 30 — Current Temperature
0x05 0x010 1 28 — Average Short Term Temperature
0x05 0x018 1 - — Average Long Term Temperature
0x05 0x020 1 49 — Highest Temperature
0x05 0x028 1 23 — Lowest Temperature
0x05 0x030 1 32 — Highest Average Short Term Temperature
0x05 0x038 1 26 — Lowest Average Short Term Temperature
0x05 0x040 1 - — Highest Average Long Term Temperature
0x05 0x048 1 - — Lowest Average Long Term Temperature
0x05 0x050 4 0 — Time in Over-Temperature
0x05 0x058 1 116 — Specified Maximum Operating Temperature
0x05 0x060 4 0 — Time in Under-Temperature
0x05 0x068 1 -10 — Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 106 — Number of Hardware Resets
0x06 0x010 4 97 — Number of ASR Events
0x06 0x018 4 0 — Number of Interface CRC Errors
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 0 — Percentage Used Endurance Indicator

_ C monitored condition met

__ D supports DSN

___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 2 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 3 Device-to-host register FISes sent due to a COMRESET
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x0013 2 0 R_ERR response for host-to-device non-data FIS, non-CRC
0x0002 2 0 R_ERR response for data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS

Mar 18 20:07:16 freenas ZFS: vdev state changed, pool_guid=2550807933382894929 vdev_guid=14584055837392843541
Mar 18 20:07:16 freenas ZFS: vdev state changed, pool_guid=2550807933382894929 vdev_guid=348865377422514070
Mar 18 20:07:46 freenas ahcich12: Timeout on slot 8 port 0
Mar 18 20:07:46 freenas ahcich12: is 00000000 cs 00000100 ss 00000100 rs 00000100 tfd 441 serr 00000000 cmd 00044717
Mar 18 20:07:46 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:07:46 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:07:46 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:08:16 freenas ahcich12: Timeout on slot 18 port 0
Mar 18 20:08:16 freenas ahcich12: is 00000000 cs 00040000 ss 00040000 rs 00040000 tfd 441 serr 00000000 cmd 00045117
Mar 18 20:08:16 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:08:16 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:08:16 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:08:46 freenas collectd[1666]: Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 66, in read
temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 66, in read
temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 500, in call
raise CallTimeout("Call timeout")
middlewared.client.client.CallTimeout: Call timeout
Mar 18 20:09:17 freenas ahcich12: Timeout on slot 29 port 0
Mar 18 20:09:17 freenas ahcich12: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 441 serr 00000000 cmd 00045c17
Mar 18 20:09:47 freenas ahcich12: Timeout on slot 8 port 0
Mar 18 20:09:47 freenas ahcich12: is 00000000 cs 00000100 ss 00000100 rs 00000100 tfd 441 serr 00000000 cmd 00044717
Mar 18 20:09:47 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:09:47 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:09:47 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:10:17 freenas ahcich12: Timeout on slot 16 port 0
Mar 18 20:10:17 freenas ahcich12: is 00000000 cs 00010000 ss 00010000 rs 00010000 tfd 441 serr 00000000 cmd 00044f17
Mar 18 20:10:17 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:10:17 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:10:17 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:10:47 freenas ahcich12: Timeout on slot 24 port 0
Mar 18 20:10:47 freenas ahcich12: is 00000000 cs 01000000 ss 01000000 rs 01000000 tfd 441 serr 00000000 cmd 00045717
Mar 18 20:10:47 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:10:47 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:10:47 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:11:18 freenas ahcich12: Timeout on slot 0 port 0
Mar 18 20:11:18 freenas ahcich12: is 00000000 cs 00000001 ss 00000001 rs 00000001 tfd 441 serr 00000000 cmd 00045f17
Mar 18 20:11:18 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:11:18 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout

ON UBUNTU:

My test steps:

Reconnected the two Ironwolf SSDs to the onboard SATA HBAs. (removed from the LSI HBA.)
Boot the same machine to Ubuntu Live
Import the Ironwolf SSD zpool.
Write 100MB random data to the zpool

This test passed perfectly, without any complaints.

Code:

ubuntu@ubuntu:~$ cd /mnt
ubuntu@ubuntu:/mnt$ sudo mkdir IronWolf-110-1

Code:

root@ubuntu:/mnt# zpool import Practichem-v4 -f

root@ubuntu:/mnt# zpool status
pool: Practichem-v4
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details.

scan: none requested
config:
NAME STATE READ WRITE CKSUM
Practichem-v4 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
errors: No known data errors

root@ubuntu:/mnt# ls
IronWolf-110-1

root@ubuntu:/mnt# ll
total 0
drwxr-xr-x 1 root root 60 Mar 23 14:25 ./
drwxr-xr-x 1 root root 280 Mar 23 14:27 ../
drwxr-xr-x 2 root root 40 Mar 23 14:25 IronWolf-110-1/

root@ubuntu:/mnt# cd IronWolf-110-1/
root@ubuntu:/mnt/IronWolf-110-1# ll
total 104857600
0-drwxr-xr x 2 root root 40 Mar 23:14 ./
25-drwxr-xr x 1 root root 60 Mar 23:14 ../
25@root:/ubuntu/mnt-IronWolf-110# 1 mkdir

test@root:/ubuntu/mnt-IronWolf-110# 1
ls
test@root:/ubuntu/mnt-IronWolf-110# 1 dd=/if/dev urandom=of newfile=bs 1M=count
100+100 0 records
in+100 0 records
out bytes (105 MB, 100 MiB) copied, 2.07261 s, 50.6 MB/s

root@ubuntu:/mnt/IronWolf-110-1# ll
total 102400
drwxr-xr-x 3 root root 80 Mar 23 14:30 ./
drwxr-xr-x 1 root root 60 Mar 23 14:25 ../
rw-rr- 1 root root 104857600 Mar 23 14:30 newfile
drwxr-xr-x 2 root root 40 Mar 23 14:29 test/

root@ubuntu:/mnt/IronWolf-110-1# udevadm info --query=all --name=/dev/sda | grep ID_SERIAL
E: ID_SERIAL=ZA1920NM10001_HKS01LDV
E: ID_SERIAL_SHORT=HKS01LDV

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

Nikos August 29, 2020 at 3:24 PM
Edited

Hello team,

I just installed the InronWolf 110SSD (240GB) as a cache device and got the same result.

I was able to stop those message thanks to this option :

vfs.zfs.trim.enabled=0

I assume that disable TRIM operation on SSD device; is it possible to do this operation per device ?

Regards,

Nicolas

Vagif Zeynalov August 5, 2020 at 8:38 PM

 my problem was completely resolved by switching to LSI Logic Controller Card LSI00301 SAS 9207-8i card.

To me it seems like nowadays SSD so fast that old controllers can't handle them. But I won't claim it as a statement, just a pure guess. slightly smiling face

Alexander Motin August 5, 2020 at 8:14 PM

We have no such SSDs to try, neither we received other reports like this.

Alexander Motin August 5, 2020 at 8:14 PM

"174 Unexpected_Pwr_Loss_Ct" increase on timeout scream to me about power supply or cabling issues.  Though only vendor knows what it really can be.

Vagif Zeynalov June 6, 2020 at 8:47 AM
Edited

Hi guys!

I'm using FreeBSD 12.1-STABLE r361621 GENERIC  amd64 not FreeNAS, but the description of this problem exactly matches the issue I'm experiencing with my server since the middle of May.

I had system pool with two mirrored Samsung SSD 850 PRO 256GB almost for 3 years, but sometimes in the middle of May one of SSDs starting fail with the timeouts similar to described above (let's call it SSD#1)

 

May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 f0 cc f5 40 00 00 00 01 00 00 May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): CAM status: ATA Status Error May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): ATA status: 41 (DRDY ERR), error: 84 (ICRC ABRT ) May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): RES: 41 84 f0 cc f5 00 00 00 00 00 01 May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): Retrying command, 3 more tries remain May 28 21:36:46 media kernel: ahcich20: Timeout on slot 12 port 0 May 28 21:36:46 media kernel: ahcich20: is 00000000 cs 00003000 ss 00003000 rs 00003000 tfd d0 serr 00080800 cmd 0004cc17 May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 f0 cc f5 40 00 00 00 01 00 00 May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): CAM status: Command timeout May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): Retrying command, 2 more tries remain May 28 21:37:17 media kernel: ahcich20: Timeout on slot 4 port 0 May 28 21:37:17 media kernel: ahcich20: is 04000000 cs 00000030 ss 00000030 rs 00000030 tfd c0 serr 01880c00 cmd 0004c417 May 28 21:37:17 media kernel: (ada11:ahcich20:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 00 72 fa 40 00 00 00 00 00 00 May 28 21:37:17 media kernel: (ada11:ahcich20:0:0:0): CAM status: Command timeout May 28 21:37:17 media kernel: (ada11:ahcich20:0:0:0): Retrying command, 3 more tries remain

I was thinking - here is how your 10 years PRO warranty looks like Samsung?! slightly smiling face

 

So I ordered another same model of Samsung SSD (SSD#3) and Seagate IronWolf 110 SATA SSD (SSD#4) thinking - let's make it the mirror of three, just in case.

When they arrived I removed the failed SSD#1 and added new SSD#3 to the pool, and immediately SSD#2 (the old one) started to show the same errors!!!

Then I removed SSD#2 and SSD#3 started to fail, the brand new one!!!! I was very surprised!

I removed all Samsung SSDs and connected Seagate SSD#4 and it failed right away! Another surprise!

I realized that it could be not SSDs problem at all. Controller?

On the motherboard ASUS P8B WS LGA 1155 Intel C206 ATX Intel Xeon E3 2xSATA 6gb/s and 4xSATA 3Gb/s, plus I have two PCI Express SATA III RAID Controllers with 4xSATA 6Gb/s ports on each. So I started to experiment!

Long story short, after spending the whole day I tried to connect Seagate or/and Samsung SSDs to every kind of ports with or without other spinner drives, and everywhere I had the same result - SSDs went down constantly!!! WHY?! disappointed face

Finally I was found an old 2" spinner HDD and made it the system pool.

The conclusion - I can't anymore use any SSDs on my server! What happened?! Last time I updated the system almost the year ago, so I did it again on the weekend without any changes!

What I noticed, after every set of timeout errors  the smart value 174 Unexpected_Pwr_Loss_Ct was incremented. So it sounds like the disk somehow was turned off and on?

What else could be wrong?

  • CPU

  • Memory

  • Video card

  • Motherboard

    • Internal controllers

  • External controllers

  • Power supply

  • Cables

Having in mind that my server is 10+ years old, I'm suspecting that there is some problem with the power supply and perhaps SSDs are more sensitive to that problem than HHDs?

Any thoughts? winking face

 how old is your computer?

====
My setup

 

# camcontrol devlist <ST3000VN007-2E4166 SC60> at scbus0 target 0 lun 0 (pass0,ada0) <ST3000DM001-1ER166 CC25> at scbus1 target 0 lun 0 (pass1,ada1) <ST3000DM001-1ER166 CC25> at scbus2 target 0 lun 0 (pass2,ada2) <ST3000DM008-2DM166 CC26> at scbus3 target 0 lun 0 (pass3,ada3) <Marvell Console 1.01> at scbus7 target 0 lun 0 (pass4) <ST3000VN007-2E4166 SC60> at scbus8 target 0 lun 0 (pass5,ada4) <ST3000VN007-2E4166 SC60> at scbus9 target 0 lun 0 (pass6,ada5) <ST3000VN007-2E4166 SC60> at scbus10 target 0 lun 0 (pass7,ada6) <ZA480NM10001 SF44011J> at scbus11 target 0 lun 0 (pass8,ada7) <Marvell Console 1.01> at scbus15 target 0 lun 0 (pass9) <ST3000VN007-2E4166 SC60> at scbus18 target 0 lun 0 (pass10,ada8) <ST3000VN007-2E4166 SC60> at scbus19 target 0 lun 0 (pass11,ada9) <TOSHIBA MK3265GSX H GJ001Q> at scbus20 target 0 lun 0 (pass12,ada10) <ST3000VN007-2E4166 SC60> at scbus21 target 0 lun 0 (pass13,ada11) <AHCI SGPIO Enclosure 2.00 0001> at scbus22 target 0 lun 0 (ses0,pass14)
# dmesg ---<<BOOT>>--- Copyright (c) 1992-2020 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.1-STABLE r361621 GENERIC amd64 FreeBSD clang version 10.0.0 (git@github.com:llvm/llvm-project.git llvmorg-10.0.0-0-gd32170dbd5b) VT(vga): resolution 640x480 CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz (3300.10-MHz K8-class CPU) Origin="GenuineIntel" Id=0x306a9 Family=0x6 Model=0x3a Stepping=9 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1<LAHF> Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS> XSAVE Features=0x1<XSAVEOPT> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 33342418944 (31797 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <ALASKA A M I> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads random: unblocking device. ioapic0 <Version 2.0> irqs 0-23 on motherboard Launching APs: 1 6 5 7 3 4 2 Timecounter "TSC-low" frequency 1650049448 Hz quality 1000 random: entropy device external interface 000.000017 [4336] netmap_init netmap: loaded module [ath_hal] loaded module_register_init: MOD_LOAD (vesa, 0xffffffff81108780, 0) error 19 random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" kbd1 at kbdmux0 nexus0 vtvga0: <VT VGA driver> on motherboard cryptosoft0: <software crypto> on motherboard acpi0: <ALASKA A M I> on motherboard acpi0: Power Button (fixed) cpu0: <ACPI CPU> on acpi0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0 atrtc0: Warning: Couldn't map I/O. atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 vgapci0: <VGA-compatible display> port 0xe000-0xe07f mem 0xf6000000-0xf6ffffff,0xe0000000-0xefffffff,0xf0000000-0xf1ffffff irq 16 at device 0.0 on pci1 vgapci0: Boot video device hdac0: <NVIDIA (0x0be3) HDA Controller> mem 0xf7080000-0xf7083fff irq 17 at device 0.1 on pci1 pcib2: <ACPI PCI-PCI bridge> irq 19 at device 6.0 on pci0 pci2: <ACPI PCI bus> on pcib2 ahci0: <Marvell 88SE9230 AHCI SATA controller> port 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem 0xf7410000-0xf74107ff irq 19 at device 0.0 on pci2 ahci0: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported ahci0: quirks=0x900<NOBSYRES,ALTSIG> ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 ahcich6: <AHCI channel> at channel 6 on ahci0 ahcich7: <AHCI channel> at channel 7 on ahci0 pci0: <simple comms> at device 22.0 (no driver attached) ehci0: <Intel Cougar Point USB 2.0 controller> mem 0xf7504000-0xf75043ff irq 16 at device 26.0 on pci0 usbus0: EHCI version 1.0 usbus0 on ehci0 usbus0: 480Mbps High Speed USB v2.0 pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0 pci3: <ACPI PCI bus> on pcib3 ahci1: <Marvell 88SE9230 AHCI SATA controller> port 0xc050-0xc057,0xc040-0xc043,0xc030-0xc037,0xc020-0xc023,0xc000-0xc01f mem 0xf7310000-0xf73107ff irq 16 at device 0.0 on pci3 ahci1: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported ahci1: quirks=0x900<NOBSYRES,ALTSIG> ahcich8: <AHCI channel> at channel 0 on ahci1 ahcich9: <AHCI channel> at channel 1 on ahci1 ahcich10: <AHCI channel> at channel 2 on ahci1 ahcich11: <AHCI channel> at channel 3 on ahci1 ahcich12: <AHCI channel> at channel 4 on ahci1 ahcich13: <AHCI channel> at channel 5 on ahci1 ahcich14: <AHCI channel> at channel 6 on ahci1 ahcich15: <AHCI channel> at channel 7 on ahci1 pcib4: <ACPI PCI-PCI bridge> irq 17 at device 28.5 on pci0 pci4: <ACPI PCI bus> on pcib4 em0: <Intel(R) PRO/1000 Network Connection> port 0xb000-0xb01f mem 0xf7200000-0xf721ffff,0xf7220000-0xf7223fff irq 17 at device 0.0 on pci4 em0: Using 1024 TX descriptors and 1024 RX descriptors em0: Using 2 RX queues 2 TX queues em0: Using MSI-X interrupts with 3 vectors em0: Ethernet address: ac:22:0b:88:93:f8 em0: netmap queues/slots: TX 2/1024, RX 2/1024 pcib5: <ACPI PCI-PCI bridge> irq 19 at device 28.7 on pci0 pci5: <ACPI PCI bus> on pcib5 xhci0: <ASMedia ASM1042 USB 3.0 controller> mem 0xf7100000-0xf7107fff irq 19 at device 0.0 on pci5 xhci0: 32 bytes context size, 32-bit DMA xhci0: Unable to map MSI-X table usbus1 on xhci0 usbus1: 5.0Gbps Super Speed USB v3.0 ehci1: <Intel Cougar Point USB 2.0 controller> mem 0xf7503000-0xf75033ff irq 23 at device 29.0 on pci0 usbus2: EHCI version 1.0 usbus2 on ehci1 usbus2: 480Mbps High Speed USB v2.0 pcib6: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci6: <ACPI PCI bus> on pcib6 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 ahci2: <Intel Cougar Point AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xf7502000-0xf75027ff irq 19 at device 31.2 on pci0 ahci2: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich16: <AHCI channel> at channel 0 on ahci2 ahcich17: <AHCI channel> at channel 1 on ahci2 ahcich18: <AHCI channel> at channel 2 on ahci2 ahcich19: <AHCI channel> at channel 3 on ahci2 ahcich20: <AHCI channel> at channel 4 on ahci2 ahcich21: <AHCI channel> at channel 5 on ahci2 ahciem0: <AHCI enclosure management bridge> on ahci2 acpi_button0: <Power Button> on acpi0 acpi_tz0: <Thermal Zone> on acpi0 acpi_tz1: <Thermal Zone> on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbdc0: non-PNP ISA device will be removed from GENERIC in FreeBSD 12. est0: <Enhanced SpeedStep Frequency Control> on cpu0 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 1.000 msec hdacc0: <NVIDIA GT21x HDA CODEC> at cad 0 on hdac0 hdaa0: <NVIDIA GT21x Audio Function Group> at nid 1 on hdacc0 pcm0: <NVIDIA GT21x (HDMI/DP 8ch)> at nid 5 on hdaa0 hdacc1: <NVIDIA GT21x HDA CODEC> at cad 1 on hdac0 hdaa1: <NVIDIA GT21x Audio Function Group> at nid 1 on hdacc1 pcm1: <NVIDIA GT21x (HDMI/DP 8ch)> at nid 5 on hdaa1 hdacc2: <NVIDIA GT21x HDA CODEC> at cad 2 on hdac0 hdaa2: <NVIDIA GT21x Audio Function Group> at nid 1 on hdacc2 pcm2: <NVIDIA GT21x (HDMI/DP 8ch)> at nid 5 on hdaa2 hdacc3: <NVIDIA GT21x HDA CODEC> at cad 3 on hdac0 hdaa3: <NVIDIA GT21x Audio Function Group> at nid 1 on hdacc3 pcm3: <NVIDIA GT21x (HDMI/DP 8ch)> at nid 5 on hdaa3 ugen1.1: <0x1b21 XHCI root HUB> at usbus1 ugen2.1: <Intel EHCI root HUB> at usbus2 Trying to mount root from zfs:system []... Root mount waiting for: CAM usbus0 usbus1 usbus2 uhub0: <0x1b21 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1 ugen0.1: <Intel EHCI root HUB> at usbus0 uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2 uhub2: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0 uhub0: 4 ports with 4 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered Root mount waiting for: CAM usbus0 usbus2 ugen2.2: <vendor 0x8087 product 0x0024> at usbus2 uhub3 on uhub1 uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus2 ugen0.2: <vendor 0x8087 product 0x0024> at usbus0 uhub4 on uhub2 uhub4: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus0 ahcich11: stopping AHCI engine failed Root mount waiting for: CAM usbus0 usbus2 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device ada0: Serial Number Z6A01XC7 ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 2861588MB (5860533168 512 byte sectors) ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: <ST3000DM001-1ER166 CC25> ACS-2 ATA SATA 3.x device ada1: Serial Number Z500ZQB6 ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 2861588MB (5860533168 512 byte sectors) ada1: quirks=0x1<4K> ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: <ST3000DM001-1ER166 CC25> ACS-2 ATA SATA 3.x device ada2: Serial Number Z500XXQR ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 2861588MB (5860533168 512 byte sectors) ada2: quirks=0x1<4K> ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: <ST3000DM008-2DM166 CC26> ACS-2 ATA SATA 3.x device ada3: Serial Number Z5039HYW ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 2861588MB (5860533168 512 byte sectors) ada3: quirks=0x1<4K> ada4 at ahcich8 bus 0 scbus8 target 0 lun 0 ada4: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device ada4: Serial Number Z731JAB7 ada4: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 2861588MB (5860533168 512 byte sectors) ada5 at ahcich9 bus 0 scbus9 target 0 lun 0 ada5: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device ada5: Serial Number Z731JH0N ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada5: Command Queueing enabled ada5: 2861588MB (5860533168 512 byte sectors) ada6 at ahcich10 bus 0 scbus10 target 0 lun 0 ada6: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device ada6: Serial Number Z6A0NQBA ada6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada6: Command Queueing enabled ada6: 2861588MB (5860533168 512 byte sectors) ada7 at ahcich11 bus 0 scbus11 target 0 lun 0 ada7: <ZA480NM10001 SF44011J> ACS-4 ATA SATA 3.x device ada7: Serial Number HKQ01ZML ada7: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada7: Command Queueing enabled ada7: 457862MB (937703088 512 byte sectors) ada8 at ahcich18 bus 0 scbus18 target 0 lun 0 ada8: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device ada8: Serial Number Z6A01XKX ada8: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada8: Command Queueing enabled ada8: 2861588MB (5860533168 512 byte sectors) ada9 at ahcich19 bus 0 scbus19 target 0 lun 0 ada9: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device ada9: Serial Number Z6A01V4G ada9: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada9: Command Queueing enabled ada9: 2861588MB (5860533168 512 byte sectors) ada10 at ahcich20 bus 0 scbus20 target 0 lun 0 ada10: <TOSHIBA MK3265GSX H GJ001Q> ATA8-ACS SATA 1.x device ada10: Serial Number 6093C319T ada10: 150.000MB/s transfers (SATA 1.x, UDMA5, PIO 8192bytes) ada10: Command Queueing enabled ada10: 305245MB (625142448 512 byte sectors) ada11 at ahcich21 bus 0 scbus21 target 0 lun 0 ada11: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device ada11: Serial Number Z6A01XS7 ada11: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada11: Command Queueing enabled ada11: 2861588MB (5860533168 512 byte sectors) ses0 at ahciem0 bus 0 scbus22 target 0 lun 0 ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device ses0: SEMB SES Device pass4 at ahcich7 bus 0 scbus7 target 0 lun 0 pass4: <Marvell Console 1.01> Removable Processor SCSI device pass4: Serial Number HKDP221516WL pass4: 150.000MB/s transfers (SATA 1.x, UDMA4, ATAPI 12bytes, PIO 8192bytes) pass9 at ahcich15 bus 0 scbus15 target 0 lun 0 pass9: <Marvell Console 1.01> Removable Processor SCSI device ses0: ada8,pass10 in 'Slot 02', SATA Slot: scbus18 target 0 pass9: Serial Number HKDP221516WL ses0: ada9,pass11 in 'Slot 03', SATA Slot: scbus19 target 0 pass9: 150.000MB/s transfers (SATA 1.x, UDMA4, ATAPI 12bytes, PIO 8192bytesses0: ada10,pass12 in 'Slot 04', SATA Slot: scbus20 target 0 ) ses0: ada11,pass13 in 'Slot 05', SATA Slot: scbus21 target 0 uhub4: 6 ports with 6 removable, self powered uhub3: 8 ports with 8 removable, self powered ugen0.3: <vendor 0x0b38 product 0x0010> at usbus0 ukbd0 on uhub4 ukbd0: <vendor 0x0b38 product 0x0010, class 0/0, rev 1.10/1.02, addr 3> on usbus0

 

Cannot Reproduce
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created March 23, 2020 at 3:27 PM
Updated July 1, 2022 at 4:50 PM
Resolved August 5, 2020 at 8:14 PM