Command timeouts with Seagate Ironwolf 110 SSDs
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Nikos August 29, 2020 at 3:24 PMEdited
Hello team,
I just installed the InronWolf 110SSD (240GB) as a cache device and got the same result.
I was able to stop those message thanks to this option :
vfs.zfs.trim.enabled=0
I assume that disable TRIM operation on SSD device; is it possible to do this operation per device ?
Regards,
Nicolas
Vagif Zeynalov August 5, 2020 at 8:38 PM
@Alexander Motin my problem was completely resolved by switching to LSI Logic Controller Card LSI00301 SAS 9207-8i card.
To me it seems like nowadays SSD so fast that old controllers can't handle them. But I won't claim it as a statement, just a pure guess.
Alexander Motin August 5, 2020 at 8:14 PM
We have no such SSDs to try, neither we received other reports like this.
Alexander Motin August 5, 2020 at 8:14 PM
@Vagif Zeynalov "174 Unexpected_Pwr_Loss_Ct" increase on timeout scream to me about power supply or cabling issues. Though only vendor knows what it really can be.
Vagif Zeynalov June 6, 2020 at 8:47 AMEdited
Hi guys!
I'm using FreeBSD 12.1-STABLE r361621 GENERIC amd64 not FreeNAS, but the description of this problem exactly matches the issue I'm experiencing with my server since the middle of May.
I had system pool with two mirrored Samsung SSD 850 PRO 256GB almost for 3 years, but sometimes in the middle of May one of SSDs starting fail with the timeouts similar to described above (let's call it SSD#1)
May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 f0 cc f5 40 00 00 00 01 00 00
May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): CAM status: ATA Status Error
May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): ATA status: 41 (DRDY ERR), error: 84 (ICRC ABRT )
May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): RES: 41 84 f0 cc f5 00 00 00 00 00 01
May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): Retrying command, 3 more tries remain
May 28 21:36:46 media kernel: ahcich20: Timeout on slot 12 port 0
May 28 21:36:46 media kernel: ahcich20: is 00000000 cs 00003000 ss 00003000 rs 00003000 tfd d0 serr 00080800 cmd 0004cc17
May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 f0 cc f5 40 00 00 00 01 00 00
May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): CAM status: Command timeout
May 28 21:36:46 media kernel: (ada11:ahcich20:0:0:0): Retrying command, 2 more tries remain
May 28 21:37:17 media kernel: ahcich20: Timeout on slot 4 port 0
May 28 21:37:17 media kernel: ahcich20: is 04000000 cs 00000030 ss 00000030 rs 00000030 tfd c0 serr 01880c00 cmd 0004c417
May 28 21:37:17 media kernel: (ada11:ahcich20:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 00 72 fa 40 00 00 00 00 00 00
May 28 21:37:17 media kernel: (ada11:ahcich20:0:0:0): CAM status: Command timeout
May 28 21:37:17 media kernel: (ada11:ahcich20:0:0:0): Retrying command, 3 more tries remain
I was thinking - here is how your 10 years PRO warranty looks like Samsung?!
So I ordered another same model of Samsung SSD (SSD#3) and Seagate IronWolf 110 SATA SSD (SSD#4) thinking - let's make it the mirror of three, just in case.
When they arrived I removed the failed SSD#1 and added new SSD#3 to the pool, and immediately SSD#2 (the old one) started to show the same errors!!!
Then I removed SSD#2 and SSD#3 started to fail, the brand new one!!!! I was very surprised!
I removed all Samsung SSDs and connected Seagate SSD#4 and it failed right away! Another surprise!
I realized that it could be not SSDs problem at all. Controller?
On the motherboard ASUS P8B WS LGA 1155 Intel C206 ATX Intel Xeon E3 2xSATA 6gb/s and 4xSATA 3Gb/s, plus I have two PCI Express SATA III RAID Controllers with 4xSATA 6Gb/s ports on each. So I started to experiment!
Long story short, after spending the whole day I tried to connect Seagate or/and Samsung SSDs to every kind of ports with or without other spinner drives, and everywhere I had the same result - SSDs went down constantly!!! WHY?!
Finally I was found an old 2" spinner HDD and made it the system pool.
The conclusion - I can't anymore use any SSDs on my server! What happened?! Last time I updated the system almost the year ago, so I did it again on the weekend without any changes!
What I noticed, after every set of timeout errors the smart value 174 Unexpected_Pwr_Loss_Ct was incremented. So it sounds like the disk somehow was turned off and on?
What else could be wrong?
CPU
Memory
Video card
Motherboard
Internal controllers
External controllers
Power supply
Cables
Having in mind that my server is 10+ years old, I'm suspecting that there is some problem with the power supply and perhaps SSDs are more sensitive to that problem than HHDs?
Any thoughts?
@Nick DeMarco how old is your computer?
====
My setup
# camcontrol devlist
<ST3000VN007-2E4166 SC60> at scbus0 target 0 lun 0 (pass0,ada0)
<ST3000DM001-1ER166 CC25> at scbus1 target 0 lun 0 (pass1,ada1)
<ST3000DM001-1ER166 CC25> at scbus2 target 0 lun 0 (pass2,ada2)
<ST3000DM008-2DM166 CC26> at scbus3 target 0 lun 0 (pass3,ada3)
<Marvell Console 1.01> at scbus7 target 0 lun 0 (pass4)
<ST3000VN007-2E4166 SC60> at scbus8 target 0 lun 0 (pass5,ada4)
<ST3000VN007-2E4166 SC60> at scbus9 target 0 lun 0 (pass6,ada5)
<ST3000VN007-2E4166 SC60> at scbus10 target 0 lun 0 (pass7,ada6)
<ZA480NM10001 SF44011J> at scbus11 target 0 lun 0 (pass8,ada7)
<Marvell Console 1.01> at scbus15 target 0 lun 0 (pass9)
<ST3000VN007-2E4166 SC60> at scbus18 target 0 lun 0 (pass10,ada8)
<ST3000VN007-2E4166 SC60> at scbus19 target 0 lun 0 (pass11,ada9)
<TOSHIBA MK3265GSX H GJ001Q> at scbus20 target 0 lun 0 (pass12,ada10)
<ST3000VN007-2E4166 SC60> at scbus21 target 0 lun 0 (pass13,ada11)
<AHCI SGPIO Enclosure 2.00 0001> at scbus22 target 0 lun 0 (ses0,pass14)
# dmesg
---<<BOOT>>---
Copyright (c) 1992-2020 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.1-STABLE r361621 GENERIC amd64
FreeBSD clang version 10.0.0 (git@github.com:llvm/llvm-project.git llvmorg-10.0.0-0-gd32170dbd5b)
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz (3300.10-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x306a9 Family=0x6 Model=0x3a Stepping=9
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x1<LAHF>
Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
XSAVE Features=0x1<XSAVEOPT>
VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
TSC: P-state invariant, performance statistics
real memory = 34359738368 (32768 MB)
avail memory = 33342418944 (31797 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads
random: unblocking device.
ioapic0 <Version 2.0> irqs 0-23 on motherboard
Launching APs: 1 6 5 7 3 4 2
Timecounter "TSC-low" frequency 1650049448 Hz quality 1000
random: entropy device external interface
000.000017 [4336] netmap_init netmap: loaded module
[ath_hal] loaded
module_register_init: MOD_LOAD (vesa, 0xffffffff81108780, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
kbd1 at kbdmux0
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <ALASKA A M I> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 550
atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0
atrtc0: Warning: Couldn't map I/O.
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> port 0xe000-0xe07f mem 0xf6000000-0xf6ffffff,0xe0000000-0xefffffff,0xf0000000-0xf1ffffff irq 16 at device 0.0 on pci1
vgapci0: Boot video device
hdac0: <NVIDIA (0x0be3) HDA Controller> mem 0xf7080000-0xf7083fff irq 17 at device 0.1 on pci1
pcib2: <ACPI PCI-PCI bridge> irq 19 at device 6.0 on pci0
pci2: <ACPI PCI bus> on pcib2
ahci0: <Marvell 88SE9230 AHCI SATA controller> port 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem 0xf7410000-0xf74107ff irq 19 at device 0.0 on pci2
ahci0: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported
ahci0: quirks=0x900<NOBSYRES,ALTSIG>
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ahcich6: <AHCI channel> at channel 6 on ahci0
ahcich7: <AHCI channel> at channel 7 on ahci0
pci0: <simple comms> at device 22.0 (no driver attached)
ehci0: <Intel Cougar Point USB 2.0 controller> mem 0xf7504000-0xf75043ff irq 16 at device 26.0 on pci0
usbus0: EHCI version 1.0
usbus0 on ehci0
usbus0: 480Mbps High Speed USB v2.0
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
pci3: <ACPI PCI bus> on pcib3
ahci1: <Marvell 88SE9230 AHCI SATA controller> port 0xc050-0xc057,0xc040-0xc043,0xc030-0xc037,0xc020-0xc023,0xc000-0xc01f mem 0xf7310000-0xf73107ff irq 16 at device 0.0 on pci3
ahci1: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported
ahci1: quirks=0x900<NOBSYRES,ALTSIG>
ahcich8: <AHCI channel> at channel 0 on ahci1
ahcich9: <AHCI channel> at channel 1 on ahci1
ahcich10: <AHCI channel> at channel 2 on ahci1
ahcich11: <AHCI channel> at channel 3 on ahci1
ahcich12: <AHCI channel> at channel 4 on ahci1
ahcich13: <AHCI channel> at channel 5 on ahci1
ahcich14: <AHCI channel> at channel 6 on ahci1
ahcich15: <AHCI channel> at channel 7 on ahci1
pcib4: <ACPI PCI-PCI bridge> irq 17 at device 28.5 on pci0
pci4: <ACPI PCI bus> on pcib4
em0: <Intel(R) PRO/1000 Network Connection> port 0xb000-0xb01f mem 0xf7200000-0xf721ffff,0xf7220000-0xf7223fff irq 17 at device 0.0 on pci4
em0: Using 1024 TX descriptors and 1024 RX descriptors
em0: Using 2 RX queues 2 TX queues
em0: Using MSI-X interrupts with 3 vectors
em0: Ethernet address: ac:22:0b:88:93:f8
em0: netmap queues/slots: TX 2/1024, RX 2/1024
pcib5: <ACPI PCI-PCI bridge> irq 19 at device 28.7 on pci0
pci5: <ACPI PCI bus> on pcib5
xhci0: <ASMedia ASM1042 USB 3.0 controller> mem 0xf7100000-0xf7107fff irq 19 at device 0.0 on pci5
xhci0: 32 bytes context size, 32-bit DMA
xhci0: Unable to map MSI-X table
usbus1 on xhci0
usbus1: 5.0Gbps Super Speed USB v3.0
ehci1: <Intel Cougar Point USB 2.0 controller> mem 0xf7503000-0xf75033ff irq 23 at device 29.0 on pci0
usbus2: EHCI version 1.0
usbus2 on ehci1
usbus2: 480Mbps High Speed USB v2.0
pcib6: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci6: <ACPI PCI bus> on pcib6
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
ahci2: <Intel Cougar Point AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xf7502000-0xf75027ff irq 19 at device 31.2 on pci0
ahci2: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich16: <AHCI channel> at channel 0 on ahci2
ahcich17: <AHCI channel> at channel 1 on ahci2
ahcich18: <AHCI channel> at channel 2 on ahci2
ahcich19: <AHCI channel> at channel 3 on ahci2
ahcich20: <AHCI channel> at channel 4 on ahci2
ahcich21: <AHCI channel> at channel 5 on ahci2
ahciem0: <AHCI enclosure management bridge> on ahci2
acpi_button0: <Power Button> on acpi0
acpi_tz0: <Thermal Zone> on acpi0
acpi_tz1: <Thermal Zone> on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbdc0: non-PNP ISA device will be removed from GENERIC in FreeBSD 12.
est0: <Enhanced SpeedStep Frequency Control> on cpu0
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 1.000 msec
hdacc0: <NVIDIA GT21x HDA CODEC> at cad 0 on hdac0
hdaa0: <NVIDIA GT21x Audio Function Group> at nid 1 on hdacc0
pcm0: <NVIDIA GT21x (HDMI/DP 8ch)> at nid 5 on hdaa0
hdacc1: <NVIDIA GT21x HDA CODEC> at cad 1 on hdac0
hdaa1: <NVIDIA GT21x Audio Function Group> at nid 1 on hdacc1
pcm1: <NVIDIA GT21x (HDMI/DP 8ch)> at nid 5 on hdaa1
hdacc2: <NVIDIA GT21x HDA CODEC> at cad 2 on hdac0
hdaa2: <NVIDIA GT21x Audio Function Group> at nid 1 on hdacc2
pcm2: <NVIDIA GT21x (HDMI/DP 8ch)> at nid 5 on hdaa2
hdacc3: <NVIDIA GT21x HDA CODEC> at cad 3 on hdac0
hdaa3: <NVIDIA GT21x Audio Function Group> at nid 1 on hdacc3
pcm3: <NVIDIA GT21x (HDMI/DP 8ch)> at nid 5 on hdaa3
ugen1.1: <0x1b21 XHCI root HUB> at usbus1
ugen2.1: <Intel EHCI root HUB> at usbus2
Trying to mount root from zfs:system []...
Root mount waiting for: CAM usbus0 usbus1 usbus2
uhub0: <0x1b21 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
ugen0.1: <Intel EHCI root HUB> at usbus0
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
uhub2: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0
uhub0: 4 ports with 4 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
Root mount waiting for: CAM usbus0 usbus2
ugen2.2: <vendor 0x8087 product 0x0024> at usbus2
uhub3 on uhub1
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus2
ugen0.2: <vendor 0x8087 product 0x0024> at usbus0
uhub4 on uhub2
uhub4: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus0
ahcich11: stopping AHCI engine failed
Root mount waiting for: CAM usbus0 usbus2
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device
ada0: Serial Number Z6A01XC7
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 2861588MB (5860533168 512 byte sectors)
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST3000DM001-1ER166 CC25> ACS-2 ATA SATA 3.x device
ada1: Serial Number Z500ZQB6
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2861588MB (5860533168 512 byte sectors)
ada1: quirks=0x1<4K>
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <ST3000DM001-1ER166 CC25> ACS-2 ATA SATA 3.x device
ada2: Serial Number Z500XXQR
ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 2861588MB (5860533168 512 byte sectors)
ada2: quirks=0x1<4K>
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <ST3000DM008-2DM166 CC26> ACS-2 ATA SATA 3.x device
ada3: Serial Number Z5039HYW
ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 2861588MB (5860533168 512 byte sectors)
ada3: quirks=0x1<4K>
ada4 at ahcich8 bus 0 scbus8 target 0 lun 0
ada4: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device
ada4: Serial Number Z731JAB7
ada4: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 2861588MB (5860533168 512 byte sectors)
ada5 at ahcich9 bus 0 scbus9 target 0 lun 0
ada5: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device
ada5: Serial Number Z731JH0N
ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 2861588MB (5860533168 512 byte sectors)
ada6 at ahcich10 bus 0 scbus10 target 0 lun 0
ada6: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device
ada6: Serial Number Z6A0NQBA
ada6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada6: Command Queueing enabled
ada6: 2861588MB (5860533168 512 byte sectors)
ada7 at ahcich11 bus 0 scbus11 target 0 lun 0
ada7: <ZA480NM10001 SF44011J> ACS-4 ATA SATA 3.x device
ada7: Serial Number HKQ01ZML
ada7: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada7: Command Queueing enabled
ada7: 457862MB (937703088 512 byte sectors)
ada8 at ahcich18 bus 0 scbus18 target 0 lun 0
ada8: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device
ada8: Serial Number Z6A01XKX
ada8: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada8: Command Queueing enabled
ada8: 2861588MB (5860533168 512 byte sectors)
ada9 at ahcich19 bus 0 scbus19 target 0 lun 0
ada9: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device
ada9: Serial Number Z6A01V4G
ada9: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada9: Command Queueing enabled
ada9: 2861588MB (5860533168 512 byte sectors)
ada10 at ahcich20 bus 0 scbus20 target 0 lun 0
ada10: <TOSHIBA MK3265GSX H GJ001Q> ATA8-ACS SATA 1.x device
ada10: Serial Number 6093C319T
ada10: 150.000MB/s transfers (SATA 1.x, UDMA5, PIO 8192bytes)
ada10: Command Queueing enabled
ada10: 305245MB (625142448 512 byte sectors)
ada11 at ahcich21 bus 0 scbus21 target 0 lun 0
ada11: <ST3000VN007-2E4166 SC60> ACS-2 ATA SATA 3.x device
ada11: Serial Number Z6A01XS7
ada11: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada11: Command Queueing enabled
ada11: 2861588MB (5860533168 512 byte sectors)
ses0 at ahciem0 bus 0 scbus22 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device
ses0: SEMB SES Device
pass4 at ahcich7 bus 0 scbus7 target 0 lun 0
pass4: <Marvell Console 1.01> Removable Processor SCSI device
pass4: Serial Number HKDP221516WL
pass4: 150.000MB/s transfers (SATA 1.x, UDMA4, ATAPI 12bytes, PIO 8192bytes)
pass9 at ahcich15 bus 0 scbus15 target 0 lun 0
pass9: <Marvell Console 1.01> Removable Processor SCSI device
ses0: ada8,pass10 in 'Slot 02', SATA Slot: scbus18 target 0
pass9: Serial Number HKDP221516WL
ses0: ada9,pass11 in 'Slot 03', SATA Slot: scbus19 target 0
pass9: 150.000MB/s transfers (SATA 1.x, UDMA4, ATAPI 12bytes, PIO 8192bytesses0: ada10,pass12 in 'Slot 04', SATA Slot: scbus20 target 0
)
ses0: ada11,pass13 in 'Slot 05', SATA Slot: scbus21 target 0
uhub4: 6 ports with 6 removable, self powered
uhub3: 8 ports with 8 removable, self powered
ugen0.3: <vendor 0x0b38 product 0x0010> at usbus0
ukbd0 on uhub4
ukbd0: <vendor 0x0b38 product 0x0010, class 0/0, rev 1.10/1.02, addr 3> on usbus0
Motherboard ASRock C3758D4i-4L
CPU: Intel Atom C3758
RAM: 2x 32 GB
Hard drives: 2x Seagate Ironwolf 110 SSDs, 1.92T each
Hard disk controllers: Onboard first (causing troubles), then LSI-9211 (works)
Network cards: 4x onboard Marvell, appearing as Intel
Power supply is a PC Power & Cooling 500W (overkill, but taken from another box).
No matter which motherboard SATA port I use, FreeNAS won't communicate properly with Seagate Ironwolf 110 SSDs. The drives work perfectly via a LSI HBA. Samsung 840 EVOs do work perfectly when connected to the motherboard SATA ports.
The drive firmware is the latest per Seagate's serial number firmware lookup facility.
I've moved the drives a few times while troubleshooting and producing this output, so the device addresses won't match between 'camcontrol' and the messages thrown when FreeNAS is timing out.
Booting Ubuntu, with the pool drives connected to the motherboard SATA ports, the OS can import the pool and write files at speed, and without errors or warnings.
freenas# camcontrol devlist
<ATA ZA1920NM10001 011J> at scbus0 target 25 lun 0 (pass0,da0)
<ATA ZA1920NM10001 011J> at scbus0 target 26 lun 0 (pass1,da1)
<ADATA ISMS331-016GMV P0831A> at scbus1 target 0 lun 0 (ada0,pass2)
<AHCI SGPIO Enclosure 2.00 0001> at scbus5 target 0 lun 0 (ses0,pass3)
<Samsung SSD 840 EVO 250GB EXT0DB6Q> at scbus6 target 0 lun 0 (pass5,ada1)
<Samsung SSD 840 EVO 250GB EXT0BB6Q> at scbus7 target 0 lun 0 (pass6,ada2)
<AHCI SGPIO Enclosure 2.00 0001> at scbus11 target 0 lun 0 (ses1,pass4)
freenas# smartctl -x /dev/da0
smartctl 7.0 2018-12-30 r4883 [FreeBSD 12.1-STABLE amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate IronWolf 110 SATA SSD
Device Model: ZA1920NM10001
Serial Number: HKS01KQ0
LU WWN Device Id: 5 000c50 03ea14015
Firmware Version: SF44011J
User Capacity: 1,920,383,410,176 bytes [1.92 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Mar 22 22:26:11 2020 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 1 (minimum power consumption with standby)
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Disabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x59) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 30) minutes.
SCT capabilities: (0x103d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 100 090 - 0
5 Reallocated_Sector_Ct O-CK 100 100 000 - 0
9 Power_On_Hours O-CK 100 100 000 - 88
12 Power_Cycle_Count O-CK 100 100 000 - 23
100 Flash_GB_Erased O-CK 100 100 000 - 11
102 Lifetime_PS4_Entry_Ct O-CK 100 100 000 - 13
103 Lifetime_PS3_Exit_Ct O-CK 100 100 000 - 9
170 Grown_Bad_Block_Ct O-CK 100 100 000 - 0
171 Program_Fail_Count O-CK 100 100 000 - 0
172 Erase_Fail_Count O-CK 100 100 000 - 0
173 Avg_Program/Erase_Ct O-CK 100 100 000 - 1
174 Unexpected_Pwr_Loss_Ct O-CK 100 100 000 - 19
177 Wear_Range_Delta PO---K 100 100 089 - 0 0 0
183 SATA_Downshift_Count O-CK 100 100 000 - 0x00000000000000
187 Uncorrectable_ECC_Ct O-CK 100 100 000 - 0
194 Temperature_Celsius O--K 030 049 000 - 30 (Min/Max 23/49)
195 RAISE_ECC_Cor_Ct O-CK 100 100 000 - 0
198 Uncor_Read_Error_Ct O-CK 100 100 000 - 0
199 UDMA_CRC_Error_Count O-CK 100 100 000 - 0
230 Drv_Life_Protect_Status PO---K 100 100 091 - 100
231 SSD_Life_Left PO--CK 100 100 010 - 0x00000000646400
232 Available_Reservd_Space POS--K 100 100 003 - 0
233 Lifetime_Wts_To_Flsh_GB O-CK 100 100 000 - 14
241 Lifetime_Wts_Frm_Hst_GB O-CK 100 100 000 - 39
242 Lifetime_Rds_Frm_Hst_GB O-CK 100 100 000 - 0
243 Free_Space OS-K 100 100 003 - 0x07270200218b89
_ K auto-keep
__ C event count
___ R error rate
____ S speed/performance
_____ O updated online
______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x02 SL R/O 16 Comprehensive SMART error log
0x03 GPL R/O 20 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x0a GPL R/W 16 Device Statistics Notification
0x0c GPL R/O 1 Pending Defects log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x12 GPL R/O 1 SATA NCQ Non-Data log
0x13 GPL R/O 1 SATA NCQ Send and Receive log
0x24 GPL R/O 65535 Current Device Internal Status Data log
0x25 GPL R/O 65535 Saved Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa8 SL VS 255 Device vendor specific log
0xb7 GPL VS 1024 Device vendor specific log
0xd4 GPL,SL VS 6 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
0xf7 SL - 2 Reserved
0xf8 SL - 1 Reserved
0xf9 SL - 4 Reserved
0xfa SL - 7 Reserved
0xfb GPL - 65535 Reserved
SMART Extended Comprehensive Error Log Version: 1 (20 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 0 (0x0000)
Device State: Active (0)
Current Temperature: 33 Celsius
Power Cycle Min/Max Temperature: 29/35 Celsius
Lifetime Min/Max Temperature: 23/49 Celsius
Specified Max Operating Temperature: 116 Celsius
Under/Over Temperature Limit Count: 0/0
SMART Status: 0xc24f (PASSED)
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: -10/116 Celsius
Min/Max Temperature Limit: -10/120 Celsius
Temperature History Size (Index): 478 (67)
Index Estimated Time Temperature Celsius
68 2020-03-22 14:29 28 *********
... ..(387 skipped). .. *********
456 2020-03-22 20:57 28 *********
457 2020-03-22 20:58 29 **********
... ..( 10 skipped). .. **********
468 2020-03-22 21:09 29 **********
469 2020-03-22 21:10 30 ***********
... ..( 7 skipped). .. ***********
477 2020-03-22 21:18 30 ***********
0 2020-03-22 21:19 ? -
1 2020-03-22 21:20 30 ***********
... ..( 65 skipped). .. ***********
67 2020-03-22 22:26 30 ***********
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 1) ==
0x01 0x008 4 23 — Lifetime Power-On Resets
0x01 0x010 4 88 — Power-on Hours
0x01 0x018 6 82842176 — Logical Sectors Written
0x01 0x020 6 3160294 — Number of Write Commands
0x01 0x028 6 103285 — Logical Sectors Read
0x01 0x030 6 4516 — Number of Read Commands
0x01 0x038 6 317265966 — Date and Time TimeStamp
0x01 0x058 2 65447 — Resource Availability
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 — Number of Reported Uncorrectable Errors
0x04 0x010 4 0 — Resets Between Cmd Acceptance and Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 30 — Current Temperature
0x05 0x010 1 28 — Average Short Term Temperature
0x05 0x018 1 - — Average Long Term Temperature
0x05 0x020 1 49 — Highest Temperature
0x05 0x028 1 23 — Lowest Temperature
0x05 0x030 1 32 — Highest Average Short Term Temperature
0x05 0x038 1 26 — Lowest Average Short Term Temperature
0x05 0x040 1 - — Highest Average Long Term Temperature
0x05 0x048 1 - — Lowest Average Long Term Temperature
0x05 0x050 4 0 — Time in Over-Temperature
0x05 0x058 1 116 — Specified Maximum Operating Temperature
0x05 0x060 4 0 — Time in Under-Temperature
0x05 0x068 1 -10 — Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 106 — Number of Hardware Resets
0x06 0x010 4 97 — Number of ASR Events
0x06 0x018 4 0 — Number of Interface CRC Errors
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 0 — Percentage Used Endurance Indicator
_ C monitored condition met
__ D supports DSN
___ N normalized value
Pending Defects log (GP Log 0x0c)
No Defects Logged
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 2 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 3 Device-to-host register FISes sent due to a COMRESET
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x0013 2 0 R_ERR response for host-to-device non-data FIS, non-CRC
0x0002 2 0 R_ERR response for data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
Mar 18 20:07:16 freenas ZFS: vdev state changed, pool_guid=2550807933382894929 vdev_guid=14584055837392843541
Mar 18 20:07:16 freenas ZFS: vdev state changed, pool_guid=2550807933382894929 vdev_guid=348865377422514070
Mar 18 20:07:46 freenas ahcich12: Timeout on slot 8 port 0
Mar 18 20:07:46 freenas ahcich12: is 00000000 cs 00000100 ss 00000100 rs 00000100 tfd 441 serr 00000000 cmd 00044717
Mar 18 20:07:46 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:07:46 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:07:46 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:08:16 freenas ahcich12: Timeout on slot 18 port 0
Mar 18 20:08:16 freenas ahcich12: is 00000000 cs 00040000 ss 00040000 rs 00040000 tfd 441 serr 00000000 cmd 00045117
Mar 18 20:08:16 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:08:16 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:08:16 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:08:46 freenas collectd[1666]: Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 66, in read
temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 66, in read
temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 500, in call
raise CallTimeout("Call timeout")
middlewared.client.client.CallTimeout: Call timeout
Mar 18 20:09:17 freenas ahcich12: Timeout on slot 29 port 0
Mar 18 20:09:17 freenas ahcich12: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 441 serr 00000000 cmd 00045c17
Mar 18 20:09:47 freenas ahcich12: Timeout on slot 8 port 0
Mar 18 20:09:47 freenas ahcich12: is 00000000 cs 00000100 ss 00000100 rs 00000100 tfd 441 serr 00000000 cmd 00044717
Mar 18 20:09:47 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:09:47 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:09:47 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:10:17 freenas ahcich12: Timeout on slot 16 port 0
Mar 18 20:10:17 freenas ahcich12: is 00000000 cs 00010000 ss 00010000 rs 00010000 tfd 441 serr 00000000 cmd 00044f17
Mar 18 20:10:17 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:10:17 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:10:17 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:10:47 freenas ahcich12: Timeout on slot 24 port 0
Mar 18 20:10:47 freenas ahcich12: is 00000000 cs 01000000 ss 01000000 rs 01000000 tfd 441 serr 00000000 cmd 00045717
Mar 18 20:10:47 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:10:47 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
Mar 18 20:10:47 freenas (ada4:ahcich12:0:0:0): Retrying command
Mar 18 20:11:18 freenas ahcich12: Timeout on slot 0 port 0
Mar 18 20:11:18 freenas ahcich12: is 00000000 cs 00000001 ss 00000001 rs 00000001 tfd 441 serr 00000000 cmd 00045f17
Mar 18 20:11:18 freenas (ada4:ahcich12:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 08 00 00 00 40 00 00 00 00 00 00
Mar 18 20:11:18 freenas (ada4:ahcich12:0:0:0): CAM status: Command timeout
ON UBUNTU:
My test steps:
Reconnected the two Ironwolf SSDs to the onboard SATA HBAs. (removed from the LSI HBA.)
Boot the same machine to Ubuntu Live
Import the Ironwolf SSD zpool.
Write 100MB random data to the zpool
This test passed perfectly, without any complaints.
Code:
ubuntu@ubuntu:~$ cd /mnt
ubuntu@ubuntu:/mnt$ sudo mkdir IronWolf-110-1
Code:
root@ubuntu:/mnt# zpool import Practichem-v4 -f
root@ubuntu:/mnt# zpool status
pool: Practichem-v4
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details.
scan: none requested
config:
NAME STATE READ WRITE CKSUM
Practichem-v4 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
errors: No known data errors
root@ubuntu:/mnt# ls
IronWolf-110-1
root@ubuntu:/mnt# ll
total 0
drwxr-xr-x 1 root root 60 Mar 23 14:25 ./
drwxr-xr-x 1 root root 280 Mar 23 14:27 ../
drwxr-xr-x 2 root root 40 Mar 23 14:25 IronWolf-110-1/
root@ubuntu:/mnt# cd IronWolf-110-1/
root@ubuntu:/mnt/IronWolf-110-1# ll
total 104857600
0-drwxr-xr x 2 root root 40 Mar 23:14 ./
25-drwxr-xr x 1 root root 60 Mar 23:14 ../
25@root:/ubuntu/mnt-IronWolf-110# 1 mkdir
test@root:/ubuntu/mnt-IronWolf-110# 1
ls
test@root:/ubuntu/mnt-IronWolf-110# 1 dd=/if/dev urandom=of newfile=bs 1M=count
100+100 0 records
in+100 0 records
out bytes (105 MB, 100 MiB) copied, 2.07261 s, 50.6 MB/s
root@ubuntu:/mnt/IronWolf-110-1# ll
total 102400
drwxr-xr-x 3 root root 80 Mar 23 14:30 ./
drwxr-xr-x 1 root root 60 Mar 23 14:25 ../
rw-rr- 1 root root 104857600 Mar 23 14:30 newfile
drwxr-xr-x 2 root root 40 Mar 23 14:29 test/
root@ubuntu:/mnt/IronWolf-110-1# udevadm info --query=all --name=/dev/sda | grep ID_SERIAL
E: ID_SERIAL=ZA1920NM10001_HKS01LDV
E: ID_SERIAL_SHORT=HKS01LDV