Active Directory causing stopped SMB service on DC failover + WBC_ERR_WINBIND_NOT_AVAILABLE
Description
Problem/Justification
The problem to be solved is that one AD domain controller failure, with another online DC available, should not result in a state where the SMB service has to be manually restarted on the TrueNAS machine.
Impact
None
Attachments
1
Activity
Show:
Pinned fields
Click on the next to a field label to start pinning.
Details
Details
Assignee
Andrew Walker
Andrew WalkerReporter
Julian
JulianComponents
Fix versions
Affects versions
Priority
More fields
Time tracking
More fields
Time trackingKatalon Platform
Linked Test Cases, Katalon Defect Results, Katalon Studio Test Results
Katalon Platform
Linked Test Cases, Katalon Defect Results, Katalon Studio Test Results
Created last week
Updated 3 days ago
I have a setup with two Windows Server 2025 domain controllers and a TrueNAS machine called “vertex”.
The “primary” domain controller “prime-win” is at 192.168.1.13 → set as the primary DNS in TN
The secondary domain controller “vertex-win” is at 192.168.1.12 → set as the secondary DNS in TN
The “vertex” TrueNAS machine can reach both just fine and AD replication is working successfully between the two domain controllers. TrueNAS can also join the domain just fine.
At this point, running
directory_service activedirectory domain_info
viacli
returned the LDAP server name and KDC server pointing to the name/IP of “prime-win”.Now, if the primary DC “prime-win” goes offline for a few minutes (e.g. for Windows Updates), I get a
WBC_ERR_WINBIND_NOT_AVAILABLE
error from “vertex” and the SMB service is turned off (“Stopped”) after a couple of minutes. The AD service is marked as “Faulted” at that point. This is a screenshot of the alert:After a few more minutes (here: 7 minutes after the initial alert), the above
WBC_ERR_WINBIND_NOT_AVAILABLE
error is cleared automatically and AD shows as “Healthy” again. However, the SMB service is still stopped and needs to be started manually again. This is unexpected.Now, running
domain_info
viacli
returned the LDAP server name and KDC server pointing to the name/IP of "vertex-win".I’d also think that the
WBC_ERR_WINBIND_NOT_AVAILABLE
error should not happen and/or that “recovery/failover” shouldn’t take 7 minutes.I guess some expected behaviors could be (1) to keep the SMB service running during the “faulted” AD state (mostly everything should be cached?), or (2) restart the SMB service automatically afterwards, or (3) not even cause this long faulted state (of 7 minutes) in the first place.
I’ll attach two debug files: The first one was taken during the “AD fault” with one domain controller down and the SMB service already shut down after I got the
WBC_ERR_WINBIND_NOT_AVAILABLE
error.The second file was taken after AD “failed over” and was “Healthy” again with the above alert cleared, but the SMB service still shut down.
(There’ll probably be a lot of noise in the logs before, as I did some testing and also left and re-joined the domain at some points to verify the issue. You can see the timestamp of the alert (“2025-03-18 05:10:39”) in the screenshot. Everything around that timestamp should be “good”.)
(On another note, I’m often still seeing
winbind
core dumps when restarting TrueNAS with Active Directory set up. Guessing it’s still the same issue as . Would love to have more information on that as well.)