k3s fails to come up after a reboot
Description
Problem/Justification
Impact
Activity

Waqar Ahmed December 1, 2021 at 4:41 PM
i think i follow you. So when you replace the ix-apps dataset entirely you get into the same state where the pod fails to go away resulting in k3s refusing to start. That's different from the problem i was referring to in my last comment. However this is only then the docker zfs driver problem where it's not able to keep mapping of docker image/container's layers with zfs datasets in sync which results in what you are seeing. We are working on removing the zfs driver entirely and once some upstream issues related to zfs/overlayfs are resolved - we can have those changes.
I'm meanwhile closing this ticket in favor of as that would be responsible for fixing/tracking the upstream issues. Thank you for clarifying!

Cody McCuistion December 1, 2021 at 3:21 PM
Hey Waqar. I have no idea what happens under the hood at 40% chart deletion to create this condition so I cant reproduce the original problem in the original fashion. All I did was install several charts via the GUI. Decided to remove ix-chart_praqmanetworkmultitool Via the gui, and as I said it stalled on removal at 40% for several days until eventually the machine had to be rebooted. No backup or restores had taken place before this. All I did was do a backup of the ix-applications dataset before I did any troubleshooting of K3S which means after fixing K3S by manually creating those named datasets exactly in the format that it was complaining about being missing, if I restore that ixdataset, I can fix the problem again by repeating the same steps again. But whatever caused 40% chart deletion to stall which essentially broke k3s entirely is what I see as the real issue/bug and would expect that logs of some sort would show what took place there. But I do not know enough about this k3s implementation to provide much more than that.

Waqar Ahmed December 1, 2021 at 12:49 PM
ping

Waqar Ahmed November 28, 2021 at 12:57 PM
we have a known issue which has been fixed in master where if a backup of ix-applications dataset was created on a different pool and then selected to use, it would error out with k3s not starting. If that's the issue you are able to reproduce, then it has been fixed. Otherwise, please kindly can you lay out your reproduction steps like how/where the backup is being generated and how the system is being instructed to use it so that i can retrace them in that order to reproduce. Thanks!

Cody McCuistion November 26, 2021 at 4:44 PM
Hello Waqar and thanks for the response. I don't know that I could produce the initial issue as it started due to a pod/app failing to delete, stuck at 40% for several days till an eventual reboot occurred at which point this issue cropped up. I know I can recreate the scenario where K3S fails to start due to the problem by simply placing a backup dataset of the Ix-applications dataset back in place at which point K3S will fail to start again until that pod is removed using the same method I previously described. So depending on which piece of this you are trying to understand, I don't know how to recreate the app failing to remove. I can recreate the K3S failing to start
Details
Details
Assignee

Reporter

several pods deployed on system via k3s gui
system failed to remove pod/app/container for k8s_ix-chart_praqmanetworkmultitool. removal stalled at 40% for several days, eventually disappeared from gui but remained running as pod on system. several reboots occurred during this timeframe with no ill effects. a reboot today however resulted in k3s no longer starting. determined the following issue occuring.
kubelet.go:1391] "Failed to start ContainerManager" err="failed to build map of initial containers from runtime: no PodsandBox found with Id 'b9b6bfa7f828bb508143b1029a24d8a02231604f8cae49ac35730ff66f83a4f0'"
docker ps -a --filter "label=io.kubernetes.sandbox.id=b9b6bfa7f828bb508143b1029a24d8a02231604f8cae49ac35730ff66f83a4f0"
Determined which container this is with this command
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bd92e73f3fde baf9c5720a1d "/bin/sh /docker/ent…" 7 days ago Dead k8s_ix-chart_praqmanetworkmultitool-ix-chart-69c89565ff-dw9kq_ix-praqmanetworkmultitool_8fd6c932-1149-4e3c-90f4-b05e191bf253_0
attempting to remove or force remove fails
docker rm k8s_ix-chart_praqmanetworkmultitool-ix-chart-69c89565ff-dw9kq_ix-praqmanetworkmultitool_8fd6c932-1149-4e3c-90f4-b05e191bf253_0 --force
Error response from daemon: container bd92e73f3fde095f39df9c9c8c70927489b6e027595d9be782307fcc30b76aa5: driver "zfs" failed to remove root filesystem: exit status 1: "/usr/sbin/zfs fs destroy -r Tank-02/ix-applications/docker/3072bcd3675c6870dfe5bace524a17fcf13bfa18bd6b3f88d1700cff9855a75a" => can does not exist
it appears I cannot force remove due to the dataset already having been removed but the container reference remaining.