Client NFS mounts go stale after reboot when filesystems are exported using LDAP netgroups.

Description

I’ve seen several cases where rebooting SCALE server has caused the client NFS mounts to go stale. I’ve mainly seen this when rebooting after an upgrade but have seen it on reboots also. This causes multiple problems especially when the files are VM images for virtual machines. Recovery requires killing the virtual machines and force remounting the NFS filesystems.

In tracing this it looks like this is a race condition that occurs when the NFS exports are made with LDAP netgroup entries. During bootup the nfs-service.service is started before nslcd starts and so the netgroups are not available. Trying to add a systemd After: nfs-service.service to nslcd results in an ordering cycle causing nslcd to not be started at all. This is because nslcd is an LSB service and has an After dependency on remote-fs.target.

Also, looking at syslog (not seen on console log), the nfs-service is actually started twice. It is initially started before nslcd, then nslcd is started, then nfs-service is stopped and then started again. I believe the stale filesystem only occurs when the client checks the nfs connection during the first start of nfs-service.

Possible solutions:

  1. Make nslcd start before the first nfs-service start.

  2. Eliminate the first nfs-service start.

Thanks,

Problem/Justification

None

Impact

None

Activity

Show:

Automation for Jira October 30, 2023 at 2:57 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Mark Grimes October 30, 2023 at 2:56 PM

Reproduction of this could some times appear consistent and other times appear not reproducible. With enough reboots, I did find that NFS could start before the network subsystem was up and would result in a failure to get netgroup information from the LDAP server.

The changes in address this issue.

Mark Grimes July 11, 2023 at 11:23 PM

Might be related to fsid. Will investigate.

Michelle Johnson February 20, 2023 at 1:35 PM

Thank you for your report, !

This issue ticket is now in the queue for review.

Automation for Jira February 18, 2023 at 8:17 PM

Thank you for submitting this TrueNAS Bug Report! So that we can quickly investigate your issue, please attach a Debug file and any other information related to this issue through our secure and private upload service below. Debug files can be generated in the UI by navigating to System -> Advanced -> Save Debug.

https://ixsystems.atlassian.net/servicedesk/customer/portal/15/group/37/create/153

Complete
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Impact

Medium

Ready For Review?

True

Components

Fix versions

Priority

More fields

Katalon Platform

Created February 18, 2023 at 8:17 PM
Updated October 30, 2023 at 2:57 PM
Resolved October 30, 2023 at 2:57 PM
Loading...