Error “unknown filesystem” in GRUB, no matter what I do
https://askubuntu.com/questions/1566355/error-unknown-filesystem-in-grub-no-matter-what-i-do
Two days ago I ran “apt upgrade” on a Ubuntu 24.04 machine to get a kernel update, and as usual a reboot was suggested. However, the machine (in a data center far away) did not come back up, and using a remote console I found out that GRUB seems to have some kind of issue:
<- this is supposed to be an inline image: https://imgur.com/a/X6vrel0
GNU GRUB version 2.12
Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists possible
device or file completions. To enable less(1)-like paging, "set
pager=1". ESC exits at any time.
grub> set pager=1
grub> lsmod
error: unknown filesystem
grub> ls
error: unknown filesystem
grub> lspci
error: unknown filesystem
grub> _
How did an “apt upgrade” get my GRUB into a state where it cannot even perform an “ls?” And more important, how do I get GRUB out of that state?
(This question clearly differs from Grub rescue - error: unknown filesystem, in that whatever is going on in this question can obviously not be solved by running “ls” because, as shown above, only an error message is displayed, as reaction to many basic GRUB commands, except set, insmod, and recordfail. I’ve tried recreating the steps that grub.cfg lists for the default “Ubuntu” menu entry, but everything that’s not an insmod gets met with “error: unknown filesystem”. I’m not sure how to proceed from here.)
This answer recommends using GRUB’s auto-completion to list stuff, and I can indeed get GRUB to list partitions of the installed disks:
grub> ls (hd0,
Possible partitions are:
Partition hd0,gpt1: No known filesystem detected - Partition start at 2048KiB - Total size 33554432KiB
Partition hd0,gpt2: No known filesystem detected - Partition start at 33556480KiB - Total size 1048576KiB
Partition hd0,gpt3: No known filesystem detected - Partition start at 34605056KiB - Total size 2111832064KiB
Partition hd0,gpt4: No known filesystem detected - Partition start at 2146437120KiB - Total size 1760581447.5KiB
Partition hd0,gpt5: No known filesystem detected - Partition start at 1024KiB - Total size 1024KiB
grub> ls (hd0,gpt
The ext2 module has been inserted (insmod ext2 did not show an error), but GRUB still does not recognize the filesystems on any of the partitions, and at least three of them it should recognize! hd0,2 is /boot, which is ext3 (I think?), and hd0,3 and hd0,4 are / and /home, respectively, which are also ext3 or ext4. (I honestly don’t care, as it should be irrelevant here. Relevant is that GRUB should be able to recognize the filesystems, as GRUB can handle all of them.) Just as before, running the actual ls command results in “error: unknown filesystem” the device it is supposed to read its configuration from!
Okay, more details: there’s two identically-sized disks, 5 partitions each, and four of those are RAID-1’ed into four md devices, 0 to 3 (swap, /boot, /, /home). (Fifth is “BIOS boot” for GRUB’s second stage loader.)
The grub.cfg contains the following lines:
set root='mduuid/c35208cb3d5eb477ed4dad145decf0cb'
search --no-floppy --fs-uuid --set=root d6db4f8c-c03a-4948-b07f-52116b495cf2
The first ID corresponds to an MD device, ls /dev/disk/by-id:
lrwxrwxrwx 1 root root 9 May 1 20:08 md-uuid-c35208cb:3d5eb477:ed4dad14:5decf0cb -> ../../md1
The second ID corresponds to the filesystem inside the device, according to blkid:
/dev/md1: UUID="d6db4f8c-c03a-4948-b07f-52116b495cf2" BLOCK_SIZE="4096" TYPE="ext3"
(These results were obtained from a rescue system running on the machine.)
Now, on the GRUB shell, I can see the following:
grub> ls (mduuid/c35208cb3d5eb477ed4dad145decf0cb)/
error: unknown filesystem
grub> ls (mduuid/a404642c5311376de00e5317dbd699bd)/
lost+found/ boot/ home/ sys/ tmp/ srv/ media/ usr/ root/ lib64 [abbreviated because I don’t want to type it all]
grub> ls (md/2)/
lost+found/ boot/ home/ sys/ tmp/ srv/ media/ usr/ root/ lib64 [same as above]
grub> ls (md/1)/
error: unknown filesystem
Clearly GRUB has not completely forgotten how to read filesystems, it can show the contents of md/2 (which is /) without any issues, but for some reason md/1 is a problem. I can boot into the rescue system and mount the device md/1 without any problems. fsck.ext3 reports zero problems, even when forcing the check.
Also, as requested, I tried to find out what $root and $prefix are set to, and grub cuts off the output.
grub> $root
error: can't find command 'mduuid/c35208cb3d5eb'.
grub> $prefix
error: can't find command '(mduuid/c35208cb3d5e'.
I am also 99% confident that EFI is not a factor here; when putting the disks into a QEMU (a service offered by the DC operator), the system actually boots up, and running efibootmgr basically refuses to do anything, as if the system was not booted via EFI.