Description
Steps to Reproduce
Expected Result
Actual Result
Environment
Hardware Health
Error Message (if applicable)
Attachments
duplicates
is duplicated by
Activity
Dink Nasty February 12, 2024 at 4:07 AM
I have tried expanding the partitions manually using parted but the way TrueNAS replace formatted the drive partitions it seems to have made the zfs data partition first and then added a second partition after. I am not able to expand the partition because I get the error along the lines of “can’t have overlapping partitions.” I am not aware why the replace operation would create them in this order when my original disks were formatted with the swap? partition first and the zfs data partition after.
I guess I will have to give manual replacement a try in CLI.
See my previous post for image of lsblk output.
Thanks all for your help.
Dan Brown February 11, 2024 at 10:54 PMEdited
Jason, you could always adjust the partitions manually. See:
Or do the drive replacement manually at the CLI; see:
Jason DeWeese February 11, 2024 at 10:13 PMEdited
Is there is other more manual work around can we can use for now? Or just stuck at the lower capacity. I also thought I saw the PR is tagged with SCALE-23.10.2 release in github. We will need to upgrade to 24 to get this fixed? Thank you
Vladimir Vinogradenko February 11, 2024 at 9:35 PM
this behavior will only be changed in 24.10. SInce 24.10, every time you replace a drive, it will be formatted up to its maximum capacity (not the lowest drive’s capacity in the pool as it was before): So you’ll need to replace one yout 6TB drive with a 3TB drive, then replace it back, then repeat this for all 6TB drives.
Dink Nasty February 10, 2024 at 11:02 PM
Sorry for my ignorance but I’m a bit confused on how this fix will be implemented.
Running TrueNAS-SCALE-23.10.0.1
I replaced 6x3TB drives with 6x6TB drives and having this issue. Will I have to switch back to my 3TB drives, resilvering 1 at a time, install 23.10.2 and then resilver back to my 6TB drives? Will I be able to keep my 6TB drives in, update TrueNAS, and hit the expand button in GUI to fix the partitions?
Thanks in advance.
I have encountered what I believe to be a bug when attempting to grow a vdev by replacing the existing drives with larger versions. I’ve documented the saga in the forums and have solicited help, but haven’t made any progress. You can find that post here, but I’ll reproduce the salient points directly in the bug.
The system in question is home-built. Supermicro X11SSH-LN4F motherboard, E3-1275v6 processor, 64GB ECC RAM, LSI 9211-8i HBA in IT mode. This system and the drives have been incrementally upgraded over the years, with the storage pool’s history dating back to FreeNAS 9. The storage pool in question, named Tier3, consists of 6 3TB SATA drives connected to the motherboard’s SATA controller and 6 4TB SAS drives connected to the 9211, each configured as a RAID-Z2 and then aggregated into a single pool. There’s nothing overwhelmingly unique about this pool - it is currently sitting at around 93% usage (hangs head in shame).
After getting a couple SMART errors on one of the 3TB drives and realizing that those drives are pushing 10 years old with more than 80K hours, I figured it was time to fix my capacity issue and proactively replace drives. I purchased 6 20TB Exos drives for this undertaking. The drives were tested using the usual process - SMART short, SMART conveyance, badblocks, and then SMART long. No errors were noted after the process completed (almost 10 days later). I then set about removing one drive at a time, installing a new drive, resilver. Repeat for a total of 6 times. I then eagerly checked my pool capacity to find that… there was no change. I confirmed that autoextend was on - it was.
I figured something was confused and I would manually extend via the GUI. That failed abruptly with this error:
The pool remained OK until I rebooted - at that point, I lost a drive. The partuuid still exists, so I tried a zpool online, which fails. I went through a complete wipe to zero of the confused drive, did a replace and resilver, and tried several other things suggested in various forum posts - offlining each drive then onlining with the -e flag, exporting and importing the pool, etc. Each step is documented in the forum post I referenced above along with data from lsblk, zpool list, etc. Nothing has improved matters.
I did observe that the partition on the drive that goes missing after hitting extend does get properly grown to fill the drive and does show to still be a zfs member. I suspect the issue is somewhere in the partition resize process.
At this point, I have ordered an additional 6 drives. My intention, unless anyone has a better idea, is to bring those new drives online after another testing/burn-in process, create a new pool, replicate my data over, destroy the old pool and wipe the drives, then bring the 6 currently-in-use 20TB drives up as a second vdev in the new pool. This should (I hope) resolve the issue but does result in quite a bit of time consumed.
If this reaches someone in the near future, I’ll gladly provide any additional information or debugging that I can.