Replication resume token is not updated without data writes

Description

I’m getting

CRITICAL Replication "truenas-home push hosts" failed: Replication has stuck..

I have tried numerous times, but it doesn’t finish the replication. I don’t have clear instruction how to reproduce it. It only happens on one of my servers with one single dataset. Logs attached.

Problem/Justification

None

Impact

None

Attachments

3
  • 18 Jan 2024, 11:20 AM
  • 18 Jan 2024, 08:54 AM
  • 13 Jan 2024, 09:03 AM

Activity

Show:

Bug Clerk April 2, 2024 at 7:00 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Alexander Motin April 2, 2024 at 7:00 PM

We've merged the patch into upcoming SCALE 24.04 and Core 13.3 releases.

Alexander Motin February 23, 2024 at 7:54 PM

I was able to reproduce the scenario, and this slightly hackish patch fixes it for me: https://github.com/openzfs/zfs/pull/15927. Lets see what the community think about it.

Marco February 13, 2024 at 5:51 PM

I’m glad you found the issue! I’ve deactivated atime now as I don’t need it. I haven’t touched this setting since years. I may have ran a find command on the filesystem, therefore touching every file. I don’t know, but that would be an explanation. However, setting atime should not break replication. But I suppose that’s what you’re fixing right now. Let me know if you need additional infos or if I should try something else. Thanks for taking the time to look into that.

Alexander Motin February 13, 2024 at 5:40 PM

I see the problem. ZFS updates receive_resume_token only when receiving some data. But the stream in this case includes no data writes at all. Without token updates for third-party observer it looks like replication is stuck, and after restart it restart from the beginning again. I need to look for other good point to update the token.

Complete
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Impact

Medium

Components

Affects versions

Priority

More fields

Katalon Platform

Created January 13, 2024 at 9:02 AM
Updated May 2, 2024 at 1:41 PM
Resolved April 2, 2024 at 7:00 PM