Backups of the XFS file system from CentOS 7.8 and Amazon Linux 2 doesn't mount #59

e-kov · 2020-10-30T17:39:24Z

Preconditions:
CentOS 7 or Amazon Linux 2 machine with the root volume, formatted with XFS.

Steps to reproduce:

Make a snapshot:

sudo elioctl setup-snapshot /dev/vda1 /.elastio 0

Mount snapshot device:

sudo mount /dev/elastio-snap0 /mnt/

It's failing:

mount: /mnt: wrong fs type, bad option, bad superblock on /dev/elastio-snap0, missing codepage or helper program, or other error.

Make a copy of the snapshot device:

sudo dd if=/dev/elastio-snap0 of=/home/elastio/big_vol/restore/dd_snap.img bs=1M

Bind a loop device to the backup file:

sudo losetup --find --show ~/big_vol/restore/dd_snap.img 
/dev/loop0

Mount loop device of the backup:

sudo mount /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
sudo mount -t xfs -o ro,norecovery /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

Mount throws the same error regardless of the mount options.

Check file system in the loop device:

sudo xfs_repair -n /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
sb_fdblocks 4030093, counted 4036625
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 3
        - agno = 2
        - agno = 0
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

It always reports about wrong count of the sb_fdblocks. In this particular case, the xfs_repair output is pretty good. Time to time it also reports about disconnected inodes, disconnected buckets, corrupted suberblock or missing secondary superblock etc.

Try to repair filesystem:

sudo xfs_repair /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

It says to mount file system, but mount command was failing before and tried to mount it again, but it still won't.

Run xfs_repair -L as a last resort and then repeat xfs_repair without parameters:

[elastio@amazon2-amd64-gpt_xfs ~]$ sudo xfs_repair -L /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
sb_fdblocks 4030093, counted 4036625
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (20:2281) is ahead of log (1:2).
Format log to cycle 23.
done
[elastio@amazon2-amd64-gpt_xfs ~]$ sudo xfs_repair /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

The file system seems to be repaired.

Try to mount it again:

[elastio@amazon2-amd64-gpt_xfs ~]$ sudo mount -t xfs -o ro,norecovery /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
[elastio@amazon2-amd64-gpt_xfs ~]$ sudo mount -t xfs /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
[elastio@amazon2-amd64-gpt_xfs ~]$ sudo mount /dev/loop0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

Expected result:

Mount shouldn't fail on the snapshot device (step 2) and on the backup file, binded as loop device (step 5). The check of the file system shouldn't complain about corrupted suberblock or secondary superblock (step 6).

The text was updated successfully, but these errors were encountered:

freeze_super do much more work comparetively with the freeze_bdev. It syncs filesystem and waits for pending writes. In case of XFS v5 freeze_bdev doesn't freeze super. Used freeze_super instead of freeze_bdev for any fs and all kernels, where this function is present. Fixes #59

e-kov added the bug Something isn't working label Oct 30, 2020

e-kov self-assigned this Oct 30, 2020

anelson added the project/data-plane label Nov 1, 2020

anelson added this to the Sprint 001 milestone Nov 2, 2020

e-kov mentioned this issue Nov 13, 2020

Fix fs corrupt on XFS v5 #62

Merged

e-kov closed this as completed in #62 Nov 13, 2020

e-kov mentioned this issue Nov 16, 2020

XFS logs aren't consistent in a snapshot from CentOS 7.8 and Amazon Linux 2 #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backups of the XFS file system from CentOS 7.8 and Amazon Linux 2 doesn't mount #59

Backups of the XFS file system from CentOS 7.8 and Amazon Linux 2 doesn't mount #59

e-kov commented Oct 30, 2020

Backups of the XFS file system from CentOS 7.8 and Amazon Linux 2 doesn't mount #59

Backups of the XFS file system from CentOS 7.8 and Amazon Linux 2 doesn't mount #59

Comments

e-kov commented Oct 30, 2020