Thanks for using the TrueNAS Community Edition issue tracker! TrueNAS Enterprise users receive direct support for their reports from our support portal.

Close stdin/stdout/stderr

Description

24.04 PR: https://github.com/truenas/zettarepl/pull/288

This PR fixes a fatal memory leak in our TrueNAS CORE setup. I am not 100% sure why it works so let me tell you how I arrived at this.

The TrueNAS server ran out of its 128GB of memory on a monthly basis, fixable only by a reboot. I finally checked it mid-month and noticed a huge `zettarepl` process.

Digging in, zettarepl would start leaking memory some hours after a restart – but not due to any particular job (so there probably is a kind of race involved).

I noticed that whenever the leaking started, `zettarepl` would grow from five to seven permanent threads. The two extra threads were named like `Thread-17428` and `retention.close_shell`.

Hmm, what the heck is a "close shell" thread ... well, it turns out that it was introduced in #166 to fix a replication hang reported in https://ixsystems.atlassian.net/browse/NAS-110234. It seems like the hang was concealed more than actually fixed and leaving the thread around somehow causes the memory leak ("somehow" => this is the part where I am not 100% sure why).

Anyway, I searched for issues with Paramiko hanging on close and ended up at https://github.com/paramiko/paramiko/issues/2075#issuecomment-1178468092 which states:

> `stdin`, `stdout` and `stderr` objects from `SSHClient.exec_command()` should be closed before closing an instance of `SSHClient` if you invoked `SSHClient.exec_command()`.

Looking for `std*.close()` in zettarepl I didn't find much and what I did find was rarely called.

So on a hunch, I added explicit closing of all the fds and hooray, no more leaks.

(I think #166 can be reverted after this change but I have no way to verify the original report. So to keep the PR in "it can't hurt" territorry, I didn't include that cleanup.)

Problem/Justification

None

Impact

None

Activity

Show:

Bug Clerk November 24, 2023 at 11:31 AM

Bug Clerk November 24, 2023 at 11:29 AM

Bug Clerk November 23, 2023 at 1:03 PM

Complete
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Components

Priority

More fields

Katalon Platform

Created November 23, 2023 at 1:01 PM
Updated February 6, 2024 at 4:35 PM
Resolved November 24, 2023 at 11:34 AM