Crash Reboot after import pool

Description

System was running fine. After a few days I found it to be unreachable.

Console showed it hung just after listing the drives and USB ports.

A force reset appeared to boot up properly, but after it imports the disks/pool, it has a kernel crash and reboots itself, every time.

It looks similar to NAS108257 and NAS-107953, but I don't have an encrypted pool, and I thought maybe the issue was caused by a bunch of client NFS machines trying to access it, and all attacking it as soon as it becomes available again, causing it to immediately crash, but unplugging the network cables makes no difference.

If I unplug all the disks of the pool, then it boots up fine.

The crash message every time is attached.

For searching purposes (I used OCR from a screenshot rather than type this out manually.. I think it's correct) it includes:

{{KDB: stack backtrace:
db_trace_self_wrapper() at db_trace self wrapper.0x2b/frame Oxfffffe015d6c5910
vpanic() at vpanic.0x17b/frame Oxfffffe015d6c5960
spl_panic() at spl_panic.0x3a/frame Oxfffffe015d6c59c0
avl_add() at avl_add.0x156/frame Oxfffffe015d6c5a00
dsl_livelist_iterate() at dsl_livelist iterate.OxbO/frame Oxfffffe015d6c5a60
bpobj_iterate_blkptrs() at bpobj_iterate_blkptrs.Oxff/frame Oxfffffe015d6c5b00
bpobj_iterate_impl() at bpobj_iterate_imp1.0x14d/frame Oxfffffe015d6c5bd0
dsl_process_sub_livelist() at dsl_process_sub_livelist.0x5c/frame Oxfffffe015d6c
spa_livelist_condense_cb() at spa_livelist_condense_cb.Oxal/frame Oxfffffe015d6c
zthr_procedure() at zthr_procedure.0x94/frame Oxfffffe015d6c5cf0
fork_exit() at fork exit.0x7e/frame Oxfffffe015d6c5d30
fork trampoline() at fork_trampoline.Oxe/frame Oxfffffe015d6c5d30
— trap 0 rip = 0 rsp = 0, rbp = 0 —
KDB: enter: panic
[ thread pid 38 tid 102647 ]
Stopped at kdb_enter.0x37: movg $0,0x164afc6(%rip)
db:0:kdb.enter.default> write cn_mute
cn_mute 0 0x1}}

Hardware is a Dell R710, with 128GB RAM (memtested fine), 2x X5670 CPUs, Ubuntu 20.04 is installed on the metal, using KVM to boot a VM, with raw access to partitions on an SSD to boot from, and it has PCI passthrough for 2 NICs (which are an LACP bond), and an LSI HBA controller with 5x 3TB disks attached.

It was working fine, for the past few weeks, and now it's just stuck in this boot loop.

The only thing I changed recently was to set the pool to have sync disabled, as the disk performance was abysmal (2MB/sec writes when trying to copy ISOs etc).

I tried booting up a different VM, but using the same disks, and the result was the same.

Problem/Justification

None

Impact

None

Linked issues

is duplicated by

NAS-110172

Panic in spa_livelist_condense_cb()

NAS-109116

TrueNAS 12.0-U1 kernel panic when deleting a dataset

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

Alexander MotinJanuary 31, 2022 at 8:16 PM

As you have shown, ZFS proposes you to use `-F` flag to import pool at some older transaction group, loosing few seconds of data. That is what I'd do.

Richard KurkaJanuary 31, 2022 at 7:35 PM

Now tried again after updating to U7 (which includes OpenZFS 2.0.6). When I try to import via UI, I get an I/O error with this details:

Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 94, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 45, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 979, in nf
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 371, in import_pool
self.logger.error(
File "libzfs.pyx", line 391, in libzfs.ZFS._exit_
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 365, in import_pool
zfs.import_pool(found, new_name or found.name, options, any_host=any_host)
File "libzfs.pyx", line 1095, in libzfs.ZFS.import_pool
File "libzfs.pyx", line 1123, in libzfs.ZFS.__import_pool
libzfs.ZFSException: I/O error
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 367, in run
await self.future
File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 403, in __run_body
rv = await self.method(*([self] + args))
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 975, in nf
return await f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 1421, in import_pool
await self.middleware.call('zfs.pool.import_pool', pool['guid'], {
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1256, in call
return await self._call(
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1221, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1227, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1154, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1128, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
libzfs.ZFSException: ('I/O error',)

When I try to import via shell I do by "zpool import -o readonly=on -f Poolname" and get this back:

Preview unavailable

Would be great to get a little guidance how to procede from here please. I am not sure if the new release never solved the issue or if I damaged my Pool because of so many tries to get the pool imported again. For me it looks like I the second thing.

And if you here dont have so much time, even a few words what I should try would be great. I will look up any details in the forums.

Cheers

Alexander MotinNovember 17, 2021 at 7:35 PM

Merged.

Alexander MotinSeptember 14, 2021 at 4:38 PM
Edited

I've noticed identical tickets in upstream OpenZFS: https://github.com/openzfs/zfs/issues/11480 and https://github.com/openzfs/zfs/issues/12559 . The problem is triggered by dedup, which I see you also have enabled. The fix has been backported and will be included in the upcoming OpenZFS 2.0.6 release, which should be in time for TrueNAS 12.0-U7.

HenrikApril 14, 2021 at 7:52 AM

But I am able to mount the pool r/o.

Complete

Details
Assignee
Alexander Motin
Reporter
Simon Quigley
Labels
Impact
Medium
Components
ZFS
Fix versions
12.0-U7
Affects versions
12.0-U1
Priority
Low

More fields

Katalon Platform

Created December 22, 2020 at 8:49 PM

Updated July 1, 2022 at 3:30 PM

Resolved November 17, 2021 at 7:35 PM

Crash Reboot after import pool

Description

Problem/Justification

Impact

Linked issues

is duplicated by

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Alexander MotinJanuary 31, 2022 at 8:16 PM

Richard KurkaJanuary 31, 2022 at 7:35 PM

Alexander MotinNovember 17, 2021 at 7:35 PM

Alexander MotinSeptember 14, 2021 at 4:38 PM
Edited

HenrikApril 14, 2021 at 7:52 AM

Details
Assignee
Alexander Motin
Reporter
Simon Quigley
Labels
Impact
Medium
Components
ZFS
Fix versions
12.0-U7
Affects versions
12.0-U1
Priority
Low

Details

Assignee

Reporter

Labels

Impact

Components

Fix versions

Affects versions

Priority

More fields

More fields

Katalon Platform

Katalon Platform

Flag notifications

Something's gone wrong

Crash Reboot after import pool

Description

Problem/Justification

Impact

Linked issues

is duplicated by

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Alexander MotinJanuary 31, 2022 at 8:16 PM

Richard KurkaJanuary 31, 2022 at 7:35 PM

Alexander MotinNovember 17, 2021 at 7:35 PM

Alexander MotinSeptember 14, 2021 at 4:38 PMEdited

HenrikApril 14, 2021 at 7:52 AM

DetailsAssigneeAlexander MotinAlexander MotinReporterSimon QuigleySimon QuigleyLabelsready_for_reviewImpactMediumComponentsZFSFix versions12.0-U7Affects versions12.0-U1PriorityLow

Details

Assignee

Reporter

Labels

Impact

Components

Fix versions

Affects versions

Priority

More fieldsTime tracking

More fields

Katalon PlatformLinked Test Cases, Katalon Defect Results, Katalon Studio Test Results

Katalon Platform

Flag notifications

Something's gone wrong

Alexander MotinSeptember 14, 2021 at 4:38 PM
Edited

Details
Assignee
Alexander Motin
Reporter
Simon Quigley
Labels
Impact
Medium
Components
ZFS
Fix versions
12.0-U7
Affects versions
12.0-U1
Priority
Low

More fields

Katalon Platform