fix exorbitant middlewared service memory usage

Description

PR: https://github.com/truenas/middleware/pull/8197

The fact that we even have this makes me squirm in my seat. The "debugability" that this provides doesn't outweigh the cons that it introduces. I've found 3 major problems.

1. because we put all websocket results in this queue, a reference is held which prevents the memory from being reclaimed. The ONLY way memory is "reclaimed" is when the deque is full and a new entry is added AND the entry in the deque that is being replaced is smaller than the entry being inserted.....so this gets out of hand quite quickly
2. we were storing the non-serialized results so the memory usage was, in theory, considerably larger than what it should be
3. we set the deque size to 1000 entries which is kind of insane, this allows the potential of growing exponentially

This immediately starts eating memory on large systems (systems with 100's of hard drives) because every 5 seconds a websocket call is made to `disk.temperatures`. On a system with lots of hard drives, the result is quite large. While a single entry isn't that big a deal, 1000 of them is.

Furthermore, this is used by the webUI team and we capture this in the debug by running `core.get_websocket_messages` HOWEVER when you call that message via websocket it returns the entire contents of the `deque` and then turns around and stores those results in the `deque`.....

To fix, I've done somethings:

1. shrunk the deque to 50....no reason to have 1000
2. if the serialized string of the result is greater than 1MB in size, then instead of storing the result we overwrite the results with a string letting the end user know.

With my fixes, this resolves massive memory usage on a system with 20k snapshots and calling `zfs.snapshot.query` multiple times. (Calling it 5 times grew the parent process to ~1.4GB of resident memory).

Problem/Justification

None

Impact

None

Activity

Show:

Bug Clerk February 4, 2022 at 12:52 AM
Edited

13.0 PR: (not relevant)

Complete

Details

Assignee

Reporter

Labels

Time remaining

0m

Components

Fix versions

Priority

Katalon Platform

Created February 2, 2022 at 12:24 PM
Updated July 11, 2022 at 4:43 PM
Resolved February 4, 2022 at 12:56 AM