Destination dataset already exists and is it's own encryption root.

Description

Yet another replication error. After running for 2 weeks, right when it if finishing this error occurs:

Task State
Error
Destination dataset 'vol_1/backups/remote_sea-file-01.btsys.org/photos' already exists and is it's own encryption root. This configuration is not supported yet. If you want to replicate into an encrypted dataset, please, encrypt it's parent dataset.
Logs
[2021/02/20 04:13:50] INFO [Thread-89] [zettarepl.paramiko.replication_task__task_2] Connected (version 2.0, client OpenSSH_8.2-hpn14v15)
[2021/02/20 04:13:51] INFO [Thread-89] [zettarepl.paramiko.replication_task__task_2] Authentication (publickey) successful!
[2021/02/20 04:13:54] ERROR [replication_task__task_2] [zettarepl.replication.run] For task 'task_2' non-recoverable replication error ReplicationError("Destination dataset 'vol_1/backups/remote_sea-file-01.btsys.org/photos' already exists and is it's own encryption root. This configuration is not supported yet. If you want to replicate into an encrypted dataset, please, encrypt it's parent dataset.")

The source is encrypted, as is the destination. This hasn't been an issue for a dozen other datasets, but this one refuses to finish replicating because of this.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

Josh Wisely March 5, 2021 at 8:22 PM
Edited

You're correct that I did turn on "Synchronize Destination Snapshots With Source" no the last run, but that was only after the replication had already failed multiple times, which seems to have led to the error in the subject.

However your assertion that the root failure was the lack of a common snapshot seems suspect. I see the message "No incremental base on dataset" but I assert that is incorrect and a common base did exist.

Some interesting notes

The Save Pending Snapshots option was selected for this job.
The named snapshot "auto_20210214_0252_Hourly_sea-file-01", which is the snapshot that was being replicated, still existed on the source when the job ended in error claiming there was no common base.
I find no destroy snapshot messages for the snapshot that was being used as the base.
While likely unrelated, there are repeated unhandled exception in dataset size observer.

Here's the series of events:

2021/02/14 02:52:00 - The snapshot "auto_20210214_0252_Hourly_sea-file-01" is created.
2021/02/21 03:27:51 - The job starts replicating the snapshot "auto_20210214_0252_Hourly_sea-file-01"
The job gets restarted multiple times due to other issues with replication (mainly that all replication hangs at some point and the source system has t be rebooted).
- [2021/02/23 00:00:01]
- [2021/02/24 07:07:50]
- [2021/02/25 05:19:15]
- [2021/02/27 07:33:39]
- [2021/02/28 02:52:22]
- [2021/02/28 03:48:38]
- [2021/02/28 07:08:04]
[2021/03/03 15:12:25] - The job says it finished, but the UI showed the job running until ~ 20:30
[2021/03/03 20:30:00] Approx - The UI shows the job changing from Running to Error, but no log messages are logged.

- The base snapshot still existed at this time.
[2021/03/03 20:40:02] - I try to run the job manually and start getting errors about no common base.
[2021/03/03 20:40:49] - I change the job to enable "Synchronize Destination Snapshots With Source" and try again and start getting errors about the encrypted parent.

It seems there are at least 2 main issues here:

When "Synchronize Destination Snapshots With Source" is on, and the root is encrypted, and the job determines it's necessary to replicate a new base, an error is thrown. This seems like what you fixed.
When a job has been running for a long time, with multiple restarts, it finishes in error without logging a message. It claims there is no common base even though one exists.

To make looking through the logs easier, I've attached a few more files that contain a subset of the lines from the main logs.

Note the snapshot the job was using as a base did have a lifespan shorter than how long the job took to run the first time, but the snapshot was still there at the end. To remove any schenagians that might be happening with the snapshot, I'll create one that lives even longer and try this again. It will take about a week to run

Bug Clerk March 5, 2021 at 9:38 AM

21.04 PR: https://github.com/truenas/zettarepl/pull/146

Vladimir Vinogradenko March 5, 2021 at 9:38 AM

@Josh Wisely we'll fix this bug, but the real issue with your replication is

[2021/02/20 00:10:02] WARNING  [replication_task__task_2] [zettarepl.replication.run] No incremental base for replication task 'task_2' on dataset 'vol_1/photos', destroying all destination snapshots
[2021/02/20 00:10:02] INFO     [replication_task__task_2] [zettarepl.snapshot.destroy] On <Shell(<SSH Transport(rep-sea-file-01@192.168.24.50)>)> for dataset 'vol_1/backups/remote_sea-file-01.btsys.org/photos' destroying snapshots {'auto-20210115-000000-Daily-sea-file-01'}

Your source did not have any common snapshots with the destination at that moment, and you had "Synchronize Destination Snapshots With Source" option enabled (which we don't recommend), that's why all destination snapshots were destroyed (we'll be destroying entire destination dataset now).

Bug Clerk March 4, 2021 at 6:27 PM

12.0 PR: https://github.com/truenas/middleware/pull/6540

Bug Clerk March 4, 2021 at 6:18 PM

21.04 PR: https://github.com/truenas/middleware/pull/6539

Complete

Details
Assignee
Vladimir Vinogradenko
Reporter
Josh Wisely
Labels
Components
Replication
Fix versions
SCALE-21.04-ALPHA.1
12.0-U3
Affects versions
12.0-U2
Priority
Low

More fields

Katalon Platform

Created February 20, 2021 at 4:47 AM

Updated July 1, 2022 at 2:48 PM

Resolved March 5, 2021 at 11:22 AM

Destination dataset already exists and is it's own encryption root.

Description

Problem/Justification

Impact

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Josh Wisely March 5, 2021 at 8:22 PM
Edited

Bug Clerk March 5, 2021 at 9:38 AM

Vladimir Vinogradenko March 5, 2021 at 9:38 AM

Bug Clerk March 4, 2021 at 6:27 PM

Bug Clerk March 4, 2021 at 6:18 PM

Details
Assignee
Vladimir Vinogradenko
Reporter
Josh Wisely
Labels
Components
Replication
Fix versions
SCALE-21.04-ALPHA.1
12.0-U3
Affects versions
12.0-U2
Priority
Low

Details

Assignee

Reporter

Labels

Components

Fix versions

Affects versions

Priority

More fields

More fields

Katalon Platform

Katalon Platform

Flag notifications

Something's gone wrong

Destination dataset already exists and is it's own encryption root.

Description

Problem/Justification

Impact

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Josh Wisely March 5, 2021 at 8:22 PMEdited

Bug Clerk March 5, 2021 at 9:38 AM

Vladimir Vinogradenko March 5, 2021 at 9:38 AM

Bug Clerk March 4, 2021 at 6:27 PM

Bug Clerk March 4, 2021 at 6:18 PM

DetailsAssigneeVladimir VinogradenkoVladimir VinogradenkoReporterJosh WiselyJosh WiselyLabelsready_for_reviewComponentsReplicationFix versionsSCALE-21.04-ALPHA.112.0-U3Affects versions12.0-U2PriorityLow

Details

Assignee

Reporter

Labels

Components

Fix versions

Affects versions

Priority

More fieldsTime tracking

More fields

Katalon PlatformLinked Test Cases, Katalon Defect Results, Katalon Studio Test Results

Katalon Platform

Flag notifications

Something's gone wrong

Josh Wisely March 5, 2021 at 8:22 PM
Edited

Details
Assignee
Vladimir Vinogradenko
Reporter
Josh Wisely
Labels
Components
Replication
Fix versions
SCALE-21.04-ALPHA.1
12.0-U3
Affects versions
12.0-U2
Priority
Low

More fields

Katalon Platform