After controller error, some alerts are stuck in time and never update
Description
Problem/Justification
Impact
Activity
Bug Clerk December 11, 2024 at 6:11 PM
This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.
Marshalleq December 9, 2024 at 6:17 PM
I did not realise i could dismiss them! I have now found where to do that and indeed there were a lot there. I suspect this will solve the issue. Will know when it emails me the alerts have been cleared. Which would be tomorrow I think. Thanks!
William Gryzbowski December 9, 2024 at 1:54 PM
smart issues are events that happen in a specific time, they wont go away automatically, you have to dismiss them.
Are you saying they come back after you dismiss them?
Bug Clerk December 5, 2024 at 9:17 PM
Thanks for your submission! This is in our queue to review now. An engineering representative will update with any further questions or details in the near future.
Marshalleq December 3, 2024 at 7:08 PM
Done.
Details
Details
Assignee
Reporter
Components
Fix versions
Affects versions
Priority

In short, I had what seems to have been an overheating problem with an LSI controller, which eventually I replaced along with creating a brand new pool on brand new disks. However some issues persisted such as seeing pools that were no longer there (couldn’t export them, couldn’t delete them) receiving alerts about those pools and also continually receiving SMART errors by email (pasted below). Some of the issues I resolved by doing a backup, wiping the boot pool and restoring. However this smart issue seems to persist. I have run smartctl in terminal to both read what the alert says is unreadable and test a drive which is said to be untestable, both seemingly successful.
Therefore, I assume Truenas stores something somewhere for purposes of sending me this alert email that needs a kick in the pants. Can you advise what I need to do to get accurate alerts again?
BTW some alerts are accurate, new ones are i.e. a replication task is accurately reporting failures. So it’s not the whole alerting that is frozen.