zfs receive locks the pool temporarily, making nfsd hang and not serve data

Description

Hi,

On 13.1-RELEASE-p7, close to the completion of an incremental “zfs receive” to a read-only dataset, the underlying pool is locked temporarily for reads. This in turn puts nfsd into “zfs teardown” state, making it unable to serve data from the locked pool. Moreover, nfsd is further not able to serve from any other pool, likely because the workers are starved due to the requests to the hanging pool.

This issue makes the entire NFS unresponsive periodically and for us, this about 30 seconds every 10 minutes as we are replicating one dataset with 10 minutes snapshots.

The impact of the issue visible from network i/o or CPU utilization, which drops periodically as snapshots are received (Please see attached screenshot). It can also be reproduced by reading a file from a dataset while sending an incremental snapshot or even doing “zfs rollback”.

Looking at the openbsd source, we suspect this behavior is introduced by zfs_teardown_lock_t in commit 926ad187fdb32b15a57180d314ad4094560a9e9b.

Problem/Justification

None

Impact

None

Attachments

1

Activity

Show:

Automation for Jira December 5, 2023 at 2:13 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Bonnie Follweiler December 5, 2023 at 2:13 PM

At this time there is insufficient information to proceed with the investigation. If at any time additional debugging information is supplied this ticket may be reopened for evaluation.

Michelle Johnson November 22, 2023 at 12:22 PM

Thank you for your report, !

Please use the link in the system-generated message below to attach a system debug file. Link to this ticket after you upload the file and before you click Save.

To generate a debug file on TrueNAS CORE, log in to the TrueNAS web interface, go to System > Advanced, then click Save Debug and wait for the file to download to your local system.

Olgun A. November 22, 2023 at 2:31 AM

Looking at the openbsd freebsd source

Automation for Jira November 22, 2023 at 2:28 AM

Thank you for submitting this TrueNAS Bug Report! So that we can quickly investigate your issue, please attach a Debug file and any other information related to this issue through our secure and private upload service below. Debug files can be generated in the UI by navigating to System -> Advanced -> Save Debug.

https://ixsystems.atlassian.net/servicedesk/customer/portal/15/group/37/create/153

Need additional information
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Impact

High

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created November 22, 2023 at 2:28 AM
Updated December 5, 2023 at 2:13 PM
Resolved December 5, 2023 at 2:13 PM