Raidz expansion has caused miscalculation of storage.

Description

As written in

https://forums.truenas.com/t/24-10-rc2-raidz-expansion-caused-miscalculated-available-storage/15358/7

I was forced to extend my pool twice due to hardware limitations, and now have a 6x18TB (16TiB) RAIDz2 vdev, which only reports 47TiB of storage. My vdev however, should have at least 64TiB of storage (16x4).

According to outputs from zpool list, and zfs list, it seems this isn't an issue with the UI, as zfs itself reports only 47TiB.

Session ID: 67cbc2ce-bb70-4f33-6ee9-d63ddd3bfeab

Problem/Justification

None

Impact

None

Activity

Show:

Bug Clerk October 16, 2024 at 3:15 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Alexander Motin October 16, 2024 at 3:15 PM

I’ve dug into the code and it appears a known implementation limitation. RAIDZ will always report available space based on its original pre-expansion parity ratio. A quote from the commit message:

And a quote from the code:

So at the end there should be no space leak, and as you fill the pool further you should see AVAIL approaching zero only when there is really nothing left, but it will reduce/grow proportionally slower than you write/delete the data.

Alexander Motin October 16, 2024 at 2:27 PM

Looking into the debug, I see such a pool stats:

, which makes total sense.  In the dataset stats I see:

where 8.47T of USED matches 17.5T of raw pool usage for original topology of 4-wide RAIDZ2.  Why is the AVAIL reported as 39.0T I don't know for sure yet. My expectation in perfect world would be ~53.8T ((98.2-17.5)*4/6), but I had no time to look closely on that math yet. 39.0T could be the AVAIL if all newly available raw pool capacity was used with old space efficiency ((98.2-17.5)*2/4 = 40.35T).  So I can speculate that it might be right (in case ZFS estimates future space efficiency based on existing one) and in that case it will improve as more data is written. But I need to look deeper into the code to confirm how it works.

Hittsy October 14, 2024 at 3:38 PM
Edited

thank you for the concern about parity ratio, but I ran the zfs-inplace-rebalancing script already, which recovered about 2TiB of used capacity.

I also only have 8TiB of data, I have doubts that any parity ratio mismatch with 8TiB could lead to an entire 16TiB’s worth of data disappearing.

Finally, the lost capacity is represented by the “usable capacity” on my storage page - not the used or available capacity metrics (which seem to be the only ones affected by a parity ratio mismatch).

William Gryzbowski October 14, 2024 at 1:01 PM

Did you read the warning during expansion that ratio parity is unchanged and to get full capacity you need to rewrite all the data?

Behaves as Intended
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Components

Fix versions

Priority

More fields

Katalon Platform

Created October 11, 2024 at 8:35 AM
Updated October 30, 2024 at 2:18 PM
Resolved October 16, 2024 at 3:15 PM