Thanks for using the TrueNAS Community Edition issue tracker! TrueNAS Enterprise users receive direct support for their reports from our support portal.

Crash Reboot after import pool

Description

System was running fine. After a few days I found it to be unreachable.

Console showed it hung just after listing the drives and USB ports.

A force reset appeared to boot up properly, but after it imports the disks/pool, it has a kernel crash and reboots itself, every time.

It looks similar to NAS108257 and NAS-107953, but I don't have an encrypted pool, and I thought maybe the issue was caused by a bunch of client NFS machines trying to access it, and all attacking it as soon as it becomes available again, causing it to immediately crash, but unplugging the network cables makes no difference.

If I unplug all the disks of the pool, then it boots up fine.

The crash message every time is attached.

For searching purposes (I used OCR from a screenshot rather than type this out manually.. I think it's correct) it includes:

{{KDB: stack backtrace:
db_trace_self_wrapper() at db_trace self wrapper.0x2b/frame Oxfffffe015d6c5910
vpanic() at vpanic.0x17b/frame Oxfffffe015d6c5960
spl_panic() at spl_panic.0x3a/frame Oxfffffe015d6c59c0
avl_add() at avl_add.0x156/frame Oxfffffe015d6c5a00
dsl_livelist_iterate() at dsl_livelist iterate.OxbO/frame Oxfffffe015d6c5a60
bpobj_iterate_blkptrs() at bpobj_iterate_blkptrs.Oxff/frame Oxfffffe015d6c5b00
bpobj_iterate_impl() at bpobj_iterate_imp1.0x14d/frame Oxfffffe015d6c5bd0
dsl_process_sub_livelist() at dsl_process_sub_livelist.0x5c/frame Oxfffffe015d6c
spa_livelist_condense_cb() at spa_livelist_condense_cb.Oxal/frame Oxfffffe015d6c
zthr_procedure() at zthr_procedure.0x94/frame Oxfffffe015d6c5cf0
fork_exit() at fork exit.0x7e/frame Oxfffffe015d6c5d30
fork trampoline() at fork_trampoline.Oxe/frame Oxfffffe015d6c5d30
— trap 0 rip = 0 rsp = 0, rbp = 0 —
KDB: enter: panic
[ thread pid 38 tid 102647 ]
Stopped at kdb_enter.0x37: movg $0,0x164afc6(%rip)
db:0:kdb.enter.default> write cn_mute
cn_mute 0 0x1}}

Hardware is a Dell R710, with 128GB RAM (memtested fine), 2x X5670 CPUs, Ubuntu 20.04 is installed on the metal, using KVM to boot a VM, with raw access to partitions on an SSD to boot from, and it has PCI passthrough for 2 NICs (which are an LACP bond), and an LSI HBA controller with 5x 3TB disks attached.

It was working fine, for the past few weeks, and now it's just stuck in this boot loop.

The only thing I changed recently was to set the pool to have sync disabled, as the disk performance was abysmal (2MB/sec writes when trying to copy ISOs etc).

I tried booting up a different VM, but using the same disks, and the result was the same.

Problem/Justification

None

Impact

None

is duplicated by

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

Alexander MotinJanuary 31, 2022 at 8:16 PM

As you have shown, ZFS proposes you to use `-F` flag to import pool at some older transaction group, loosing few seconds of data.  That is what I'd do.

Richard KurkaJanuary 31, 2022 at 7:35 PM

Now tried again after updating to U7 (which includes OpenZFS 2.0.6). When I try to import via UI, I get an I/O error with this details:

 


Error: concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 94, in main_worker
    res = MIDDLEWARE._run(*call_args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 45, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 979, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 371, in import_pool
    self.logger.error(
  File "libzfs.pyx", line 391, in libzfs.ZFS._exit_
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 365, in import_pool
    zfs.import_pool(found, new_name or found.name, options, any_host=any_host)
  File "libzfs.pyx", line 1095, in libzfs.ZFS.import_pool
  File "libzfs.pyx", line 1123, in libzfs.ZFS.__import_pool
libzfs.ZFSException: I/O error
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 367, in run
    await self.future
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 403, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 975, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 1421, in import_pool
    await self.middleware.call('zfs.pool.import_pool', pool['guid'], {
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1256, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1221, in _call
    return await self._call_worker(name, *prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1227, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1154, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1128, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
libzfs.ZFSException: ('I/O error',)


 

When I try to import via shell I do by "zpool import -o readonly=on -f Poolname" and get this back:

Preview unavailable

 

Would be great to get a little guidance how to procede from here please. I am not sure if the new release never solved the issue or if I damaged my Pool because of so many tries to get the pool imported again. For me it looks like I the second thing.

And if you here dont have so much time, even a few words what I should try would be great. I will look up any details in the forums.

Cheers

Alexander MotinNovember 17, 2021 at 7:35 PM

Merged.

Alexander MotinSeptember 14, 2021 at 4:38 PM
Edited

I've noticed identical tickets in upstream OpenZFS: https://github.com/openzfs/zfs/issues/11480 and https://github.com/openzfs/zfs/issues/12559 . The problem is triggered by dedup, which I see you also have enabled.  The fix has been backported and will be included in the upcoming OpenZFS 2.0.6 release, which should be in time for TrueNAS 12.0-U7.

HenrikApril 14, 2021 at 7:52 AM

But I am able to mount the pool r/o.

Complete

Details

Assignee

Reporter

Labels

Impact

Medium

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created December 22, 2020 at 8:49 PM
Updated July 1, 2022 at 3:30 PM
Resolved November 17, 2021 at 7:35 PM

Flag notifications