Encrypted ZFS Receive / ZFS Destroy causes kernel panic

Description

Since updating to 22.12.3.2, replications see to cause a kernel panic.

This is both on a local replication (NVMe to HDD, SCALE 22.12.3.2) and a remote replication (22.12.3.2 to CORE 13.0-U5.2).

This is with “replication from scratch” enabled.

Additionally, attempting to manually delete a ZFS dataset, also crashes the server.

This seems to be related to the following issue:

and possibly this PR:

Problem/Justification

None

Impact

None

Activity

Show:

Automation for Jira July 20, 2023 at 7:50 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

TempleHasFallen July 20, 2023 at 3:17 PM

Sadly, I’m unable to reproduce the issue in CORE again. I initially wiped the pool and started replication without source dataset properties and by setting an encryption passphrase. After your comment I enabled debug kernel and tried a new replication with native zfs encryption but there were no more kernel panics. I tried a variety of datasets with a variety of block sizes but no crashes on CORE for now.

Alexander Motin July 20, 2023 at 2:37 PM

In the debug provided from TrueNAS-13.0-U5.2 I see two different kinds of kernel panics:
VERIFY3(0 == dmu_object_set_blocksize(rwa->os, drro->drr_object, drro->drr_blksz, drro->drr_indblkshift, tx)) failed (0 == 45)
and
Solaris(panic): zfs: adding existent segment to range tree (offset=33259113000 size=1000).

The first one is probably triggered by change of indirect block size in TrueNAS SCALE 22.12.3, used as replication source here. It is a known issue, with the patch just waiting to be merged into the next releases, see .

The second panic so far was reported only from TrueNAS SCALE. This is the first time I see it reported for Core, but replicating from SCALE, that makes it interesting. It makes me think it may have the same trigger also, or even be a consequence of the same bug, just triggering different of the two scenarios I have identified there. Unfortunately the double free panic probably happens much later than actual problem, so it is hard to trace it to the source. While this is not possible on SCALE 22.12 now, on Core you have an opportunity to enable in settings and reboot with debug kernel, that may give us earlier kernel panics, close to the origin of the problem. It may either tell us it is the same issue, or give us some additional input.

tuxsudo July 18, 2023 at 4:04 AM
Edited

I’ve recently observed this exact same issue with the similar errors/panic as in the first Github issue link. Am attempting to do a full system replication using SSH from TrueNAS → TrueNAS. Was bashing my head for a day thinking it was some hardware I was using (Old AMD FX system) but upon changing the hardware out for a newer Ryzen system, I had run into the same problem. Also installed truenas on a new boot drive, re-imported the config, same issue. Went back to 22.12.2 and I was still running into this issue. Not sure what’s been screwed up, because I’ve done this before I think on 22.12.2 IIRC. All I want to do is have a backup, lmao. (Also doing replication from scratch).

Even doing a local replication from one pool to another on an entirely separate system running 22.12.3.1 causes it to panic and reboot. This is extremely detrimental.

Bonnie Follweiler July 17, 2023 at 12:58 PM

Thank you for your ticket submission .
I have moved this ticket into our queue to review now.
An engineering representative will update with any further questions or details in the near future.

Complete

Pinned fields

Click on the next to a field label to start pinning.

Details

Assignee

Alexander Motin

Reporter

TempleHasFallen

Labels

ready_for_review

Impact

High

Components

Fix versions

13.0-U5.3

SCALE-22.12.3.3

Affects versions

13.0-U5.2

SCALE-22.12.3.2

Priority

High

More fields

Katalon Platform

Created July 17, 2023 at 11:41 AM

Updated July 20, 2023 at 7:50 PM

Resolved July 20, 2023 at 7:50 PM

Configure

Encrypted ZFS Receive / ZFS Destroy causes kernel panic

Description

Problem/Justification

Impact

Activity

Automation for Jira July 20, 2023 at 7:50 PM

TempleHasFallen July 20, 2023 at 3:17 PM

Alexander Motin July 20, 2023 at 2:37 PM

tuxsudo July 18, 2023 at 4:04 AMEdited

Bonnie Follweiler July 17, 2023 at 12:58 PM

Details

Assignee

Reporter

Labels

Impact

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

tuxsudo July 18, 2023 at 4:04 AM
Edited