Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocky Linux does not start after migrate2linux (sysroot on software RAID 1) #163

Open
andreabravetti opened this issue Feb 25, 2022 · 10 comments

Comments

@andreabravetti
Copy link

I recently migrated a CentOS 8 server with software RAID 1.

The migrate2rocky script worked flawlessly but after reboot the server failed to start because of the switch-root service and you will end up in the recovery console.

I managed to start the server with this:

mdadm --assemble --scan
mount /dev/md3 /sysroot
logout

After this the server start normally and everything works.

I do not have the proof but it seems to me that md devices now have different naming from CentOS but I can't understand why.

I'm going to replicate this problem again on a vm to collect more details.

@pajamian
Copy link
Collaborator

Let me know what you find out. This will likely at least be a candidate for a note in the "known issues" section of the README file, that is if it's not feasible to fix it directly in the script itself.

@komitov
Copy link

komitov commented Feb 25, 2022

This should fix this in the script: #162

@andreabravetti
Copy link
Author

I failed to reproduce the problem on a test VM:

While the problem is still present on a server I have and I need to assemble and mount the raid at every boot on the new host everything works properly and the system boot without errors.

On the old production server I have:

Feb 20 20:21:39 sun systemd[1]: Starting Switch Root... Feb 20 20:21:39 sun systemctl[1256]: Failed to switch root: Specified switch root path '/sysroot' does not seem to be an OS tree. os-release file is missing. Feb 20 20:21:39 sun systemd[1]: initrd-switch-root.service: Main process exited, code=exited, status=1/FAILURE Feb 20 20:21:39 sun systemd[1]: initrd-switch-root.service: Failed with result 'exit-code'. Feb 20 20:21:39 sun systemd[1]: Failed to start Switch Root.

On the console I see no running md devices at all.

Step to (try to) reproduce:

Before you can install CentOS 8 you need a vm with two disks where you will create some raid before install.

Start the installer, ctrl+alt+F2, do some fdisk and mdadm:

For semplicity make identical layout on the two disks:

Device Boot Start End Sectors Size Id Type /dev/sda1 * 2048 2099199 2097152 1G 83 Linux /dev/sda2 2099200 6293503 4194304 2G 82 Linux swap / Solaris /dev/sda3 6293504 41943039 35649536 17G fd Linux raid autodetect

Then create the raid:

mdadm --create /dev/md3 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3

Reboot or it may see old disk layout or wrong /dev/md/3 size.

Restart the installer.

Choose custom installation destination and use existing /dev/md/3 for root (you will see it as "Unknown/Unknown/3"), choose reformat, ext4 and /, for boot and swap may use stand alone partition previously created.

Install CentOS on top of it, with root on a /dev/md/3 (note that on CentOS it will be called "/dev/md/3").

Now you have a working CentOS with root on raid:

[user@cotto2 ~]$ mount |grep md3 /dev/md3 on / type ext4 (rw,relatime,seclabel)

Let's migrate to Rocky:

curl https://raw.githubusercontent.com/rocky-linux/rocky-tools/main/migrate2rocky/migrate2rocky.sh -o migrate2rocky.sh chmod u+x migrate2rocky.sh sudo ./migrate2rocky.sh -r

After some time:

Done, please reboot your system.

This is almost the same thing I have done some year ago on the production server, except it has much more partitions.

@andreabravetti
Copy link
Author

This should fix this in the script: #162

Great, but it is still not merged I don't understand why I'm failing to reproduce it.

@komitov
Copy link

komitov commented Feb 25, 2022

Could you please try it to check if it works for you?
The only difference is that it runs "grub2-mkconfig -o /boot/grub2/grub.cfg" and this should fix the boot issue.

@andreabravetti
Copy link
Author

I just noticed the old server, installed at the time of CentOS 8.0, I have this:

admin@sun:~$ grep rd /etc/default/grub GRUB_CMDLINE_LINUX="biosdevname=0 crashkernel=auto nomodeset rd.auto=1 consoleblank=0"

Can rd.auto=1 be the cause of the problem?

I don't remember to have added this option manually years ago.

On the new VM just installed with CentOS 8.5 and then migrated to Rocky I have rd.md.uuid=2421ed30:2315ef0b:861800b6:f99953d0 instead.

@andreabravetti
Copy link
Author

Could you please try it to check if it works for you? The only difference is that it runs "grub2-mkconfig -o /boot/grub2/grub.cfg" and this should fix the boot issue.

Yes, I'm going to try asap, but not now.

Maybe late in the weekend (it's a production server).

@andreabravetti
Copy link
Author

Could you please try it to check if it works for you? The only difference is that it runs "grub2-mkconfig -o /boot/grub2/grub.cfg" and this should fix the boot issue.

I can confirm it fixed my boot issue.

Now the server boot normally.

Thank you!

@sstonemen
Copy link

We use >50 server with amd epic cpu. All use software raid, centos 8.5 and by every migration test the servers crash / hang after the REBOOT. Our solution was a new install.
Your hint to run grub2-mkconfig after migrate2rocky.sh script solve our big problem. Thank you.

grub2-mkconfig -o /boot/grub2/grub.cfg

@komitov
Copy link

komitov commented May 25, 2022

This pull request fixes exactly this issue #162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants