* [gentoo-user] Fun with mdadm (Software RAID) @ 2024-12-20 10:47 Alan Mackenzie 2024-12-20 14:50 ` karl 0 siblings, 1 reply; 23+ messages in thread From: Alan Mackenzie @ 2024-12-20 10:47 UTC (permalink / raw To: gentoo-user Hello, Gentoo. After having got the syslinux boot manager working well, I lost the root partition on my newer machine. I spent the entire evening yesterday trying to get it back again, with various expedients for recovering ext4 partitions from backup superblocks, and so on. It wasn't until the middle of the night that it dawned on me what had happened, and I immediately got up and had it fixed within twenty minutes. The cause was me booting up the machine with a rescue disk. This assembled my RAID partitions /dev/md127 and /dev/md126 reversed, but also wrote those wrong identifiers, 126 and 127, into the "preferred minor" field of the partitions' super blocks. In essence, they got swapped. Hence trying to boot up into my normal system, /dev/md126, the root partition, was an unformatted empty space on the SSD. I don't blame the rescue disk for this occurrence. For some reason, when the kernel assembles /dev/md devices, it only seems to pay attention to the "preferred minor" fields when they are wrong. :-( mdadm appears to write the "preferred minor" fields at random when assembling the RAID arrays. I don't think it should, unless explicitly asked. There is an argument to mdadm which specifies the writing of these fields. In fact I used this to effect a repair, ironically enough, from the rescue disk booted with the option to suppress the automatic assembly of the arrays. Just for the record, all my RAID arrays have metadata version 0.90, the (old fashioned) one that allows auto-assembly by the kernel without the need of an initramfs. The moral of the story: if your system uses software RAID, be careful indeed before you boot up with a rescue disk. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 10:47 [gentoo-user] Fun with mdadm (Software RAID) Alan Mackenzie @ 2024-12-20 14:50 ` karl 2024-12-20 15:28 ` Alan Mackenzie 0 siblings, 1 reply; 23+ messages in thread From: karl @ 2024-12-20 14:50 UTC (permalink / raw To: gentoo-user Alan Mackenzie: ... > The cause was me booting up the machine with a rescue disk. This > assembled my RAID partitions /dev/md127 and /dev/md126 reversed, but > also wrote those wrong identifiers, 126 and 127, into the "preferred > minor" field of the partitions' super blocks. In essence, they got > swapped. ... > Just for the record, all my RAID arrays have metadata version 0.90, the > (old fashioned) one that allows auto-assembly by the kernel without the > need of an initramfs. > > The moral of the story: if your system uses software RAID, be careful > indeed before you boot up with a rescue disk. So, why don't you simple add "root=902 md=2,/dev/sda2,/dev/sdb2" or similar to your boot loader kernel command line ? /// And... what is the need for dynamic minors now when dev_t is 32bits: $ grep dev_t /Net/git/linux-stable/include/linux/types.h typedef u32 __kernel_dev_t; typedef __kernel_dev_t dev_t; $ and we have 20 bits minors: $ grep -A1 MINORBITS /Net/git/linux-stable/include/linux/kdev_t.h #define MINORBITS 20 #define MINORMASK ((1U << MINORBITS) - 1) #define MAJOR(dev) ((unsigned int) ((dev) >> MINORBITS)) #define MINOR(dev) ((unsigned int) ((dev) & MINORMASK)) #define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi)) Regards, /Karl Hammar ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 14:50 ` karl @ 2024-12-20 15:28 ` Alan Mackenzie 2024-12-20 17:44 ` karl 0 siblings, 1 reply; 23+ messages in thread From: Alan Mackenzie @ 2024-12-20 15:28 UTC (permalink / raw To: gentoo-user Hello, Karl. On Fri, Dec 20, 2024 at 15:50:53 +0100, karl@aspodata.se wrote: > Alan Mackenzie: > ... > > The cause was me booting up the machine with a rescue disk. This > > assembled my RAID partitions /dev/md127 and /dev/md126 reversed, but > > also wrote those wrong identifiers, 126 and 127, into the "preferred > > minor" field of the partitions' super blocks. In essence, they got > > swapped. > ... > > Just for the record, all my RAID arrays have metadata version 0.90, the > > (old fashioned) one that allows auto-assembly by the kernel without the > > need of an initramfs. > > The moral of the story: if your system uses software RAID, be careful > > indeed before you boot up with a rescue disk. > So, why don't you simple add "root=902 md=2,/dev/sda2,/dev/sdb2" or similar to > your boot loader kernel command line ? Because I didn't know about it. I found out about it this morning, and immediately tested it by setting up an "md=126,/dev/nvme0n1p4,/dev/nvme1n1p4" on the kernel command line, using the rescue disk to make the "preferred minor"s wrong, and testing it. It worked! If I understand things correctly, with this mechanism one can have the kernel assemble the RAID arrays at boot up time with a modern metadata, but still without needing the initramfs. My arrays are still at metadata 0.90. > /// > And... what is the need for dynamic minors now when dev_t is 32bits: Dynamic minors? I don't think I follow you, here. > $ grep dev_t /Net/git/linux-stable/include/linux/types.h > typedef u32 __kernel_dev_t; > typedef __kernel_dev_t dev_t; > $ > and we have 20 bits minors: > $ grep -A1 MINORBITS /Net/git/linux-stable/include/linux/kdev_t.h > #define MINORBITS 20 > #define MINORMASK ((1U << MINORBITS) - 1) > #define MAJOR(dev) ((unsigned int) ((dev) >> MINORBITS)) > #define MINOR(dev) ((unsigned int) ((dev) & MINORMASK)) > #define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi)) > Regards, > /Karl Hammar -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 15:28 ` Alan Mackenzie @ 2024-12-20 17:44 ` karl 2024-12-20 20:19 ` Alan Mackenzie 2024-12-22 12:02 ` Wols Lists 0 siblings, 2 replies; 23+ messages in thread From: karl @ 2024-12-20 17:44 UTC (permalink / raw To: gentoo-user Alan Mackenzie: > On Fri, Dec 20, 2024 at 15:50:53 +0100, karl@aspodata.se wrote: ... > Because I didn't know about it. I found out about it this morning, and > immediately tested it by setting up an > "md=126,/dev/nvme0n1p4,/dev/nvme1n1p4" on the kernel command line, using > the rescue disk to make the "preferred minor"s wrong, and testing it. > It worked! > > If I understand things correctly, with this mechanism one can have the > kernel assemble the RAID arrays at boot up time with a modern metadata, > but still without needing the initramfs. My arrays are still at > metadata 0.90. Please tell if you make booting with metadata 1.2 work. I havn't tested that. /// ... > > And... what is the need for dynamic minors now when dev_t is 32bits: > Dynamic minors? I don't think I follow you, here. If you partition the md device, the partitions will get a device with a dynamic minor. # mdadm -C /dev/md11 -n 1 -l 1 --force /dev/sdc2 # mdadm -C /dev/md10 -n 1 -l 1 -e 0 --force /dev/sdc1 ... create partitions # fdisk -l /dev/md10 ... Device Boot Start End Sectors Size Id Type /dev/md10p1 2048 22527 20480 10M 83 Linux /dev/md10p2 22528 192383 169856 82.9M 83 Linux # fdisk -l /dev/md11 ... Device Boot Start End Sectors Size Id Type /dev/md11p1 2048 206847 204800 100M 83 Linux /dev/md11p2 206848 1757183 1550336 757M 83 Linux # cat /sys/block/md10/md10p1/dev 259:0 # cat /sys/block/md10/md10p2/dev 259:1 # cat /sys/block/md11/md11p1/dev 259:2 # cat /sys/block/md11/md11p2/dev 259:3 $ grep -A2 '259 block' /Net/git/linux-stable/Documentation/admin-guide/devices.txt 259 block Block Extended Major Used dynamically to hold additional partition minor numbers and allow large numbers of partitions per device So, to boot to a md device partition (as /) might be a hit and miss unless you use some initramfs magic. Regards, /Karl Hammar ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 17:44 ` karl @ 2024-12-20 20:19 ` Alan Mackenzie 2024-12-20 20:38 ` Hoël Bézier ` (3 more replies) 2024-12-22 12:02 ` Wols Lists 1 sibling, 4 replies; 23+ messages in thread From: Alan Mackenzie @ 2024-12-20 20:19 UTC (permalink / raw To: gentoo-user Hello, Karl. On Fri, Dec 20, 2024 at 18:44:53 +0100, karl@aspodata.se wrote: > Alan Mackenzie: > > On Fri, Dec 20, 2024 at 15:50:53 +0100, karl@aspodata.se wrote: > ... > > Because I didn't know about it. I found out about it this morning, and > > immediately tested it by setting up an > > "md=126,/dev/nvme0n1p4,/dev/nvme1n1p4" on the kernel command line, using > > the rescue disk to make the "preferred minor"s wrong, and testing it. > > It worked! > > If I understand things correctly, with this mechanism one can have the > > kernel assemble the RAID arrays at boot up time with a modern metadata, > > but still without needing the initramfs. My arrays are still at > > metadata 0.90. > Please tell if you make booting with metadata 1.2 work. > I havn't tested that. I've just tried it, with metadata 1.2, and it doesn't work. I got error messages at boot up to the effect that the component partitions were lacking valid version 0.0 super blocks. People without initramfs appear not to be in the sights of the maintainers of this software. They could so easily have made the assembly of metadata 1.2 components on the kernel command line work. :-( By the way, do you know an easy way for copying an entire filesystem, such as the root system, but without copying other systems mounted in it? I tried for some while with rsync and various combinations of find's and xargs's, and in the end booted up into the rescue disc to do it. I shouldn't have to do that. > /// > ... > > > And... what is the need for dynamic minors now when dev_t is 32bits: > > Dynamic minors? I don't think I follow you, here. > If you partition the md device, the partitions will get a device with a > dynamic minor. > # mdadm -C /dev/md11 -n 1 -l 1 --force /dev/sdc2 > # mdadm -C /dev/md10 -n 1 -l 1 -e 0 --force /dev/sdc1 > ... create partitions > # fdisk -l /dev/md10 > ... > Device Boot Start End Sectors Size Id Type > /dev/md10p1 2048 22527 20480 10M 83 Linux > /dev/md10p2 22528 192383 169856 82.9M 83 Linux > # fdisk -l /dev/md11 > ... > Device Boot Start End Sectors Size Id Type > /dev/md11p1 2048 206847 204800 100M 83 Linux > /dev/md11p2 206848 1757183 1550336 757M 83 Linux > # cat /sys/block/md10/md10p1/dev > 259:0 > # cat /sys/block/md10/md10p2/dev > 259:1 > # cat /sys/block/md11/md11p1/dev > 259:2 > # cat /sys/block/md11/md11p2/dev > 259:3 > $ grep -A2 '259 block' /Net/git/linux-stable/Documentation/admin-guide/devices.txt > 259 block Block Extended Major > Used dynamically to hold additional partition minor > numbers and allow large numbers of partitions per device > So, to boot to a md device partition (as /) might be a hit and miss > unless you use some initramfs magic. OK, thanks for the explanation. My root partition is an entire device, /dev/md126. I've only had problems with it when accidents have happened, like yesterday evening. > Regards, > /Karl Hammar -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 20:19 ` Alan Mackenzie @ 2024-12-20 20:38 ` Hoël Bézier 2024-12-20 20:53 ` Alan Mackenzie 2024-12-20 22:02 ` karl ` (2 subsequent siblings) 3 siblings, 1 reply; 23+ messages in thread From: Hoël Bézier @ 2024-12-20 20:38 UTC (permalink / raw To: gentoo-user Am Fr, Dez 20, 2024 am 08:19:55 +0000 schrieb Alan Mackenzie: >By the way, do you know an easy way for copying an entire filesystem, >such as the root system, but without copying other systems mounted in >it? I tried for some while with rsync and various combinations of >find's and xargs's, and in the end booted up into the rescue disc to do >it. I shouldn't have to do that. rsync -x / /some-other-place From man rsync: --one-file-system, -x don’t cross filesystem boundaries ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 20:38 ` Hoël Bézier @ 2024-12-20 20:53 ` Alan Mackenzie 0 siblings, 0 replies; 23+ messages in thread From: Alan Mackenzie @ 2024-12-20 20:53 UTC (permalink / raw To: gentoo-user Hello, Hoël. On Fri, Dec 20, 2024 at 21:38:42 +0100, Hoël Bézier wrote: > Am Fr, Dez 20, 2024 am 08:19:55 +0000 schrieb Alan Mackenzie: > >By the way, do you know an easy way for copying an entire filesystem, > >such as the root system, but without copying other systems mounted in > >it? I tried for some while with rsync and various combinations of > >find's and xargs's, and in the end booted up into the rescue disc to do > >it. I shouldn't have to do that. > rsync -x / /some-other-place > From man rsync: > --one-file-system, -x don’t cross filesystem boundaries Thanks! I'll remember that. For some reason I didn't find it when searching the rsync man page. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 20:19 ` Alan Mackenzie 2024-12-20 20:38 ` Hoël Bézier @ 2024-12-20 22:02 ` karl 2024-12-30 4:08 ` Frank Steinmetzger 2024-12-20 22:02 ` karl 2024-12-22 12:08 ` Wols Lists 3 siblings, 1 reply; 23+ messages in thread From: karl @ 2024-12-20 22:02 UTC (permalink / raw To: gentoo-user Alan Mackenzie: ... > By the way, do you know an easy way for copying an entire filesystem, > such as the root system, but without copying other systems mounted in > it? I tried for some while with rsync and various combinations of > find's and xargs's, and in the end booted up into the rescue disc to do > it. I shouldn't have to do that. rsync as other people have suggested. There is also cp -x dump/restore find -xdev etc. You can also do it by accessing the /dev/-file like dd if=source of=dest (cp works here also but dd is more the norm). /// When something is mounted on a mount point, the files below the mount point is hidden and the mounted filessystem will be available instead. Do you want to copy thoose hidden files also ? Regards, /Karl Hammar ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 22:02 ` karl @ 2024-12-30 4:08 ` Frank Steinmetzger 0 siblings, 0 replies; 23+ messages in thread From: Frank Steinmetzger @ 2024-12-30 4:08 UTC (permalink / raw To: gentoo-user Am Fri, Dec 20, 2024 at 11:02:55PM +0100 schrieb karl@aspodata.se: > Alan Mackenzie: > ... > > By the way, do you know an easy way for copying an entire filesystem, > > such as the root system, but without copying other systems mounted in > > it? I tried for some while with rsync and various combinations of > > find's and xargs's, and in the end booted up into the rescue disc to do > > it. I shouldn't have to do that. > > rsync as other people have suggested. > There is also > cp -x > dump/restore > find -xdev > etc. > > You can also do it by accessing the /dev/-file like > dd if=source of=dest (cp works here also but dd is more the norm). > > /// > > When something is mounted on a mount point, the files below the > mount point is hidden and the mounted filessystem will be available > instead. Do you want to copy thoose hidden files also ? To circumnavigate this, I usually bind-mount the filesystem to another directory first. I usually only do this when I’m dealing with /, as my FS structure is not complex: mount --bind / /mnt/bind rsync -axAHX /mnt/bind/ /path/to/destination/ (-x is not needed then, but it’s part of muscle memory) -- Grüße | Greetings | Salut | Qapla’ Please do not share anything from, with or about me on any social network. Keyboard not connected, press F1 to continue. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 20:19 ` Alan Mackenzie 2024-12-20 20:38 ` Hoël Bézier 2024-12-20 22:02 ` karl @ 2024-12-20 22:02 ` karl 2024-12-21 12:43 ` Alan Mackenzie 2024-12-22 12:08 ` Wols Lists 3 siblings, 1 reply; 23+ messages in thread From: karl @ 2024-12-20 22:02 UTC (permalink / raw To: gentoo-user Alan Mackenzie: > On Fri, Dec 20, 2024 at 18:44:53 +0100, karl@aspodata.se wrote: ... > > Please tell if you make booting with metadata 1.2 work. > > I havn't tested that. > > I've just tried it, with metadata 1.2, and it doesn't work. I got error > messages at boot up to the effect that the component partitions were > lacking valid version 0.0 super blocks. > > People without initramfs appear not to be in the sights of the > maintainers of this software. They could so easily have made the > assembly of metadata 1.2 components on the kernel command line work. > :-( ... The cmd line handling and auto mounting seems to be handled in files like (depending of kernel version I guess): drivers/md/md-autodetect.c init/do_mounts_md.c you can find the correct file with find <kernel top dir> -type f -name \*.c | xargs grep MD_AUTODETECT The problem might be that in format 1.2, the superblock is at 4K from start, could format 1.1 (where the superblock is at start) work ? Regards, /Karl Hammar ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 22:02 ` karl @ 2024-12-21 12:43 ` Alan Mackenzie 2024-12-21 16:36 ` Alan Mackenzie 2024-12-22 12:16 ` Wols Lists 0 siblings, 2 replies; 23+ messages in thread From: Alan Mackenzie @ 2024-12-21 12:43 UTC (permalink / raw To: gentoo-user Hello, Karl. On Fri, Dec 20, 2024 at 23:02:58 +0100, karl@aspodata.se wrote: > Alan Mackenzie: > > On Fri, Dec 20, 2024 at 18:44:53 +0100, karl@aspodata.se wrote: > ... > > > Please tell if you make booting with metadata 1.2 work. > > > I havn't tested that. > > I've just tried it, with metadata 1.2, and it doesn't work. I got error > > messages at boot up to the effect that the component partitions were > > lacking valid version 0.0 super blocks. > > People without initramfs appear not to be in the sights of the > > maintainers of this software. They could so easily have made the > > assembly of metadata 1.2 components on the kernel command line work. > > :-( > ... > The cmd line handling and auto mounting seems to be handled in files > like (depending of kernel version I guess): > drivers/md/md-autodetect.c > init/do_mounts_md.c > you can find the correct file with > find <kernel top dir> -type f -name \*.c | xargs grep MD_AUTODETECT The pertinent functions are mainly in drivers/md/md-autodetect.c and md.c (same directory). It seems that nowhere does this code try the different metadata formats in turn, using the first valid one that it finds. Instead, it expects the metadata format to be passed in as an argument to whatever needs it. For the md kernel parameter to be able to load metadata versions 1.[012], the parameter definition would have to be enhanced, somehow. Something like: md=124,1.2,/dev/nvme0n1p6,/dev/nvme1n1p6 ^^^^ , where the extra bit is optional. This enhancement would not be difficult. The trouble is more political. I think this code is maintained by RedHat. RedHat's customers all use initramfs, so they probably think everybody else should, too, hence would be unwilling to enhance it for a small group of Gentooers. > The problem might be that in format 1.2, the superblock is at 4K from > start, could format 1.1 (where the superblock is at start) work ? This doesn't seem to be the problem. The 0.90 superblock is right at the end of the partition, for example. There are two functions in md.c, super_90_load and super_1_load which read and verify the super block of the given metadata type. Despite the 0.90 format being "deprecated", it doesn't appear to be in any danger. It was in a deprecated state in 2010, when I started using RAID, and I think the maintainers realise that to phase 0.90 out would cause a lot of pain and protest. The main limitation with 0.90 that I can see is its restriction to 2^32 512-byte blocks per component device. This is the 2 terabyte limitation, which isn't a problem for me at the moment, but might be for other people with enormous drives. Nevertheless, I might make the above enhancement, just because. > Regards, > /Karl Hammar -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-21 12:43 ` Alan Mackenzie @ 2024-12-21 16:36 ` Alan Mackenzie 2024-12-21 16:45 ` karl 2024-12-22 12:16 ` Wols Lists 1 sibling, 1 reply; 23+ messages in thread From: Alan Mackenzie @ 2024-12-21 16:36 UTC (permalink / raw To: gentoo-user Hello again, Karl. On Sat, Dec 21, 2024 at 12:43:50 +0000, Alan Mackenzie wrote: > On Fri, Dec 20, 2024 at 23:02:58 +0100, karl@aspodata.se wrote: > > Alan Mackenzie: > > > On Fri, Dec 20, 2024 at 18:44:53 +0100, karl@aspodata.se wrote: > > ... > > > > Please tell if you make booting with metadata 1.2 work. > > > > I havn't tested that. > > > I've just tried it, with metadata 1.2, and it doesn't work. I got error > > > messages at boot up to the effect that the component partitions were > > > lacking valid version 0.0 super blocks. > > > People without initramfs appear not to be in the sights of the > > > maintainers of this software. They could so easily have made the > > > assembly of metadata 1.2 components on the kernel command line work. > > > :-( > > ... I've now got working code which assembles a metadata 1.2 RAID array at boot time. The syntax needed on the command line is, again, md=124,1.2,/dev/nvme0n1p6,/dev/nvme1n1p6 .. In place of 1.2 can be any of 0.90, 1.0, 1.1, though I haven't tested it with anything but 1.2 as yet. > The pertinent functions are mainly in drivers/md/md-autodetect.c and > md.c (same directory). Actually, just in md-autodetect.c. [ .... ] > Nevertheless, I might make the above enhancement, just because. Done. > > Regards, > > /Karl Hammar -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-21 16:36 ` Alan Mackenzie @ 2024-12-21 16:45 ` karl 2024-12-21 16:58 ` Alan Mackenzie 0 siblings, 1 reply; 23+ messages in thread From: karl @ 2024-12-21 16:45 UTC (permalink / raw To: gentoo-user Alan Mackenzie: ... > I've now got working code which assembles a metadata 1.2 RAID array at > boot time. The syntax needed on the command line is, again, > > md=124,1.2,/dev/nvme0n1p6,/dev/nvme1n1p6 > > .. In place of 1.2 can be any of 0.90, 1.0, 1.1, though I haven't tested > it with anything but 1.2 as yet. ... Fun! Which kernel, can you send a patch ? Regards, /Karl Hammar ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-21 16:45 ` karl @ 2024-12-21 16:58 ` Alan Mackenzie 2024-12-22 13:08 ` Alan Mackenzie 0 siblings, 1 reply; 23+ messages in thread From: Alan Mackenzie @ 2024-12-21 16:58 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 627 bytes --] Hello, Karl. On Sat, Dec 21, 2024 at 17:45:13 +0100, karl@aspodata.se wrote: > Alan Mackenzie: > ... > > I've now got working code which assembles a metadata 1.2 RAID array at > > boot time. The syntax needed on the command line is, again, > > md=124,1.2,/dev/nvme0n1p6,/dev/nvme1n1p6 > > .. In place of 1.2 can be any of 0.90, 1.0, 1.1, though I haven't tested > > it with anything but 1.2 as yet. > ... > Fun! Which kernel, can you send a patch ? 6.6.62. Patch enclosed. It should apply cleanly from the directory ..../drivers/md. Have fun! > Regards, > /Karl Hammar -- Alan Mackenzie (Nuremberg, Germany). [-- Attachment #2: diff.20241221b.diff --] [-- Type: text/plain, Size: 1469 bytes --] diff --git a/drivers/md/md-autodetect.c b/drivers/md/md-autodetect.c index b2a00f213c2c..2cd347108284 100644 --- a/drivers/md/md-autodetect.c +++ b/drivers/md/md-autodetect.c @@ -124,6 +124,17 @@ static void __init md_setup_drive(struct md_setup_args *args) struct mddev *mddev; int err = 0, i; char name[16]; + int major_version = 0, minor_version = 90; + char *pp; + static struct { + char *metadata; + int major_version; + int minor_version; + } metadata_table[] = + {{"0.90", 0, 90}, + {"1.0", 1, 0}, + {"1.1", 1, 1}, + {"1.2", 1, 2}}; if (args->partitioned) { mdev = MKDEV(mdp_major, args->minor << MdpMinorShift); @@ -133,6 +144,21 @@ static void __init md_setup_drive(struct md_setup_args *args) sprintf(name, "md%d", args->minor); } + pp = strchr(devname, ','); + if (pp) + { + *pp = 0; + for (i = 1; i < ARRAY_SIZE(metadata_table); i++) + if (!strcmp(devname, metadata_table[i].metadata)) + { + major_version = metadata_table[i].major_version; + minor_version = metadata_table[i].minor_version; + devname = pp + 1; + break; + } + *pp = ','; + } + for (i = 0; i < MD_SB_DISKS && devname != NULL; i++) { struct kstat stat; char *p; @@ -183,6 +209,8 @@ static void __init md_setup_drive(struct md_setup_args *args) goto out_unlock; } + ainfo.major_version = major_version; + ainfo.minor_version = minor_version; if (args->level != LEVEL_NONE) { /* non-persistent */ ainfo.level = args->level; ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-21 16:58 ` Alan Mackenzie @ 2024-12-22 13:08 ` Alan Mackenzie 0 siblings, 0 replies; 23+ messages in thread From: Alan Mackenzie @ 2024-12-22 13:08 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1168 bytes --] Hello again! On Sat, Dec 21, 2024 at 16:58:59 +0000, Alan Mackenzie wrote: > Hello, Karl. > On Sat, Dec 21, 2024 at 17:45:13 +0100, karl@aspodata.se wrote: > > Alan Mackenzie: > > ... > > > I've now got working code which assembles a metadata 1.2 RAID array at > > > boot time. The syntax needed on the command line is, again, > > > md=124,1.2,/dev/nvme0n1p6,/dev/nvme1n1p6 > > > .. In place of 1.2 can be any of 0.90, 1.0, 1.1, though I haven't tested > > > it with anything but 1.2 as yet. > > ... > > Fun! Which kernel, can you send a patch ? > 6.6.62. Patch enclosed. It should apply cleanly from the directory > ..../drivers/md. There was an error in yesterday's patch. For some reason I can't fathom, I'd started a loop with for (i = 1; ....) in place of the correct for (i = 0; ....) .. The consequence was that the driver would not recognise "0.90" when given explicitly in the kernel command line, for example as md=126,0.90,/dev/nvme0n1p4,/dev/nvme1n1p4 .. Please use the enclosed patch in place of that patch from yesterday. Thanks! > Have fun! > > Regards, > > /Karl Hammar -- Alan Mackenzie (Nuremberg, Germany). [-- Attachment #2: diff.20241222.diff --] [-- Type: text/plain, Size: 1473 bytes --] diff --git a/drivers/md/md-autodetect.c b/drivers/md/md-autodetect.c index b2a00f213c2c..6bd6e9177969 100644 --- a/drivers/md/md-autodetect.c +++ b/drivers/md/md-autodetect.c @@ -124,6 +124,17 @@ static void __init md_setup_drive(struct md_setup_args *args) struct mddev *mddev; int err = 0, i; char name[16]; + int major_version = 0, minor_version = 90; + char *pp; + static struct { + char *metadata; + int major_version; + int minor_version; + } metadata_table[] = + {{"0.90", 0, 90}, + {"1.0", 1, 0}, + {"1.1", 1, 1}, + {"1.2", 1, 2}}; if (args->partitioned) { mdev = MKDEV(mdp_major, args->minor << MdpMinorShift); @@ -133,6 +144,21 @@ static void __init md_setup_drive(struct md_setup_args *args) sprintf(name, "md%d", args->minor); } + pp = strchr(devname, ','); + if (pp) + { + *pp = 0; + for (i = 0; i < ARRAY_SIZE(metadata_table); i++) + if (!strcmp(devname, metadata_table[i].metadata)) + { + major_version = metadata_table[i].major_version; + minor_version = metadata_table[i].minor_version; + devname = pp + 1; + break; + } + *pp = ','; + } + for (i = 0; i < MD_SB_DISKS && devname != NULL; i++) { struct kstat stat; char *p; @@ -183,6 +209,8 @@ static void __init md_setup_drive(struct md_setup_args *args) goto out_unlock; } + ainfo.major_version = major_version; + ainfo.minor_version = minor_version; if (args->level != LEVEL_NONE) { /* non-persistent */ ainfo.level = args->level; ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-21 12:43 ` Alan Mackenzie 2024-12-21 16:36 ` Alan Mackenzie @ 2024-12-22 12:16 ` Wols Lists 1 sibling, 0 replies; 23+ messages in thread From: Wols Lists @ 2024-12-22 12:16 UTC (permalink / raw To: gentoo-user On 21/12/2024 12:43, Alan Mackenzie wrote: > , where the extra bit is optional. This enhancement would not be > difficult. The trouble is more political. I think this code is > maintained by RedHat. RedHat's customers all use initramfs, so they > probably think everybody else should, too, hence would be unwilling to > enhance it for a small group of Gentooers. Let's blame RedHat again ... I think you're wrong. There's a fair few SUSE people in there. The person who did nearly all of the heavy lifting before he stepped down was SuSE. A lot of the "senior" team (just a couple of people, as per normal) are Far Eastern, I'm not sure of their company affiliation. About the only person I'm confident IS RedHat is the guy maintaining mdadm, which is not mdraid (it's the management tool, not the "do the work" tool). Cheers, Wol ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 20:19 ` Alan Mackenzie ` (2 preceding siblings ...) 2024-12-20 22:02 ` karl @ 2024-12-22 12:08 ` Wols Lists 3 siblings, 0 replies; 23+ messages in thread From: Wols Lists @ 2024-12-22 12:08 UTC (permalink / raw To: gentoo-user On 20/12/2024 20:19, Alan Mackenzie wrote: > I've just tried it, with metadata 1.2, and it doesn't work. I got error > messages at boot up to the effect that the component partitions were > lacking valid version 0.0 super blocks. > > People without initramfs appear not to be in the sights of the > maintainers of this software. They could so easily have made the > assembly of metadata 1.2 components on the kernel command line work. > 🙁 No they couldn't. Not if they wanted (at the time) a kernel small enough to boot successfully ... Making the disk write write identically to two disks (your basic 0.9 mirror) is pretty simple, and also extremely error prone. Making mdraid robust with all the other features of an enterprise "protect your data" system is a lot more work. mdraid has probably just protected my data - dunno what triggered it, but I lost a disk and it just got rebuilt in the background without me doing a thing ... > > By the way, do you know an easy way for copying an entire filesystem, > such as the root system, but without copying other systems mounted in > it? I tried for some while with rsync and various combinations of > find's and xargs's, and in the end booted up into the rescue disc to do > it. I shouldn't have to do that. Provided it's read-only (so yes if it's the root I might well use a rescue disk) I'd use dd. That's assuming a fairly small root that's fairly full, it's rather wasteful if it's not ... Cheers, Wol ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-20 17:44 ` karl 2024-12-20 20:19 ` Alan Mackenzie @ 2024-12-22 12:02 ` Wols Lists 2024-12-22 13:43 ` Alan Mackenzie 1 sibling, 1 reply; 23+ messages in thread From: Wols Lists @ 2024-12-22 12:02 UTC (permalink / raw To: gentoo-user On 20/12/2024 17:44, karl@aspodata.se wrote: >> If I understand things correctly, with this mechanism one can have the >> kernel assemble the RAID arrays at boot up time with a modern metadata, >> but still without needing the initramfs. My arrays are still at >> metadata 0.90. > Please tell if you make booting with metadata 1.2 work. > I havn't tested that. It is NOT supported. The kernel has no code to do so, you need an initramfs. That said, nowadays I believe you can actually load the initramfs into the kernel so it's one monolithic blob ... By the way, as to the other point of putting /dev/sda etc on the kernel command line, it's the kernel that's messing up and scrambling which physical disk is which logical sda sdb et al device, so explicitly specifying that will have exactly NO effect when your hardware/software combo changes again. I guess it was the fact your rescue disk booted from CDROM or whatever made THAT sda, and pushed the other disks out of the way. sda, sdb, sdc et al are allocated AT RANDOM by the kernel. It just so happens that the "seed" rarely changes, so in normal use the same values happen to get chosen every time - until something DOES change, and then you wonder why everything falls over. The same is also true of md127, md126 et al. If your raid counts up from md1, md2 etc then those I believe are stable, but I haven't seen them for pretty much the entire time I've been involved in mdraid (maybe a decade or so?) You need to use those UUID/GUID things. I know it's a hassle finding out whether it's a guid or a uuid, and what it is, and all that crud, but trust me they don't change, you can shuffle your disks, stick in another SATA card, move it from SATA to USB (BAD move - don't even think of it !!!), and the system will still find the correct disk. Cheers, Wol ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-22 12:02 ` Wols Lists @ 2024-12-22 13:43 ` Alan Mackenzie 2024-12-22 15:29 ` Peter Humphrey 0 siblings, 1 reply; 23+ messages in thread From: Alan Mackenzie @ 2024-12-22 13:43 UTC (permalink / raw To: gentoo-user Hello, Wol. On Sun, Dec 22, 2024 at 12:02:49 +0000, Wols Lists wrote: > On 20/12/2024 17:44, karl@aspodata.se wrote: > >> If I understand things correctly, with this mechanism one can have the > >> kernel assemble the RAID arrays at boot up time with a modern metadata, > >> but still without needing the initramfs. My arrays are still at > >> metadata 0.90. > > Please tell if you make booting with metadata 1.2 work. > > I havn't tested that. > It is NOT supported. The kernel has no code to do so, you need an > initramfs. That said, nowadays I believe you can actually load the > initramfs into the kernel so it's one monolithic blob ... With my patch from yesterday (corrected today), you can indeed instruct the kernel to assemble RAID devices with metadata 1.2. It wasn't a difficult patch by any means. One wonders why the md kernel team hadn't done it a long time ago. initramfs's are ugly ungainly things, often many times larger than the kernel itself, and appear not to have been well thought out. They are surely a source of complication and error, and are best avoided, if possible. I've never actually built one myself, and will go to some lengths, like hacking the kernel, to avoid it. > By the way, as to the other point of putting /dev/sda etc on the kernel > command line, it's the kernel that's messing up and scrambling which > physical disk is which logical sda sdb et al device, so explicitly > specifying that will have exactly NO effect when your hardware/software > combo changes again. /dev/sda (or, in my case, /dev/nvme0n1), etc. don't, in my experience, get scrambled by the kernel. They're plugged into the same sockets on the motherboard from day to day, so unless you're physically inserting or removing them, you won't have trouble. > I guess it was the fact your rescue disk booted from CDROM or whatever > made THAT sda, and pushed the other disks out of the way. No, you've misunderstood my situation. What got scrambled by the rescue disc was the assignment of /dev/md127 and /dev/md126. This has been solved by explicitly specifying the assignment with md parameters in the kernel command line. So now my system boots just fine, even after the assignment of the devices (the "preferred-minor" field in the MD superblock) has been scrambled by the rescue disk. > sda, sdb, sdc et al are allocated AT RANDOM by the kernel. Only in the sense that it may be difficult on a new machine to predict in advance which physical HDD becomes which sdx. As I said, the assignment of physical drives to logical devices is repeatable, and doesn't change from day to day. > It just so happens that the "seed" rarely changes, so in normal use > the same values happen to get chosen every time - until something DOES > change, and then you wonder why everything falls over. The same is > also true of md127, md126 et al. If your raid counts up from md1, md2 > etc then those I believe are stable, but I haven't seen them for > pretty much the entire time I've been involved in mdraid (maybe a > decade or so?) > You need to use those UUID/GUID things. I know it's a hassle finding out > whether it's a guid or a uuid, and what it is, and all that crud, but > trust me they don't change, you can shuffle your disks, stick in another > SATA card, move it from SATA to USB (BAD move - don't even think of it > !!!), and the system will still find the correct disk. The trouble being that a kernel command line, or /etc/fstab, using lots of these is not human readable, and hence is at the edge of unmaintainability. This maintenance difficulty surely outweighs the rare situation where the physical->logical assignment changes due to a broken drive. That's what we've got rescue disks for. > Cheers, > Wol -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-22 13:43 ` Alan Mackenzie @ 2024-12-22 15:29 ` Peter Humphrey 2024-12-22 16:53 ` Wols Lists 0 siblings, 1 reply; 23+ messages in thread From: Peter Humphrey @ 2024-12-22 15:29 UTC (permalink / raw To: gentoo-user On Sunday 22 December 2024 13:43:08 GMT Alan Mackenzie wrote: > The trouble [is] that a kernel command line, or /etc/fstab, using lots > of these is not human readable, and hence is at the edge of > unmaintainability. This maintenance difficulty surely outweighs the > rare situation where the physical->logical assignment changes due to a > broken drive. That's what we've got rescue disks for. Hear, hear! I never could understand why everyone seems to want to jump onto that band-wagon. -- Regards, Peter. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-22 15:29 ` Peter Humphrey @ 2024-12-22 16:53 ` Wols Lists 2024-12-22 20:05 ` Alan Mackenzie 0 siblings, 1 reply; 23+ messages in thread From: Wols Lists @ 2024-12-22 16:53 UTC (permalink / raw To: gentoo-user On 22/12/2024 15:29, Peter Humphrey wrote: > On Sunday 22 December 2024 13:43:08 GMT Alan Mackenzie wrote: > >> The trouble [is] that a kernel command line, or /etc/fstab, using lots >> of these is not human readable, and hence is at the edge of >> unmaintainability. This maintenance difficulty surely outweighs the >> rare situation where the physical->logical assignment changes due to a >> broken drive. That's what we've got rescue disks for. > > Hear, hear! I never could understand why everyone seems to want to jump onto > that band-wagon. > I have no problem with you saying all this long guid crap makes stuff unreadable (and yes, I agree, unreadable and unmaintainable aren't that far different) BUT > surely outweighs the rare situation where the physical->logical assignment changes THAT DEPENDS ON YOUR HARDWARE! For normal consumer grade hardware, I agree. I've never known it change unless I've been mucking about with add-in SATA, PATA, whatever cards. BUT. Especially on big server-grade hardware, where there's lots of trip switches so stuff doesn't all power up in one huge spike (and I've worked with such), different parts of the system come up in a completely random order, and drives re-order themselves pretty much every single boot! So yes, with our consumer hardware I'd agree with you. But the people paying big bills for reliable top-range hardware would wonder what you're smoking! Cheers, Wol ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-22 16:53 ` Wols Lists @ 2024-12-22 20:05 ` Alan Mackenzie 2024-12-25 21:16 ` Steven Lembark 0 siblings, 1 reply; 23+ messages in thread From: Alan Mackenzie @ 2024-12-22 20:05 UTC (permalink / raw To: gentoo-user Hello, Wol. On Sun, Dec 22, 2024 at 16:53:17 +0000, Wols Lists wrote: > On 22/12/2024 15:29, Peter Humphrey wrote: > > On Sunday 22 December 2024 13:43:08 GMT Alan Mackenzie wrote: > >> The trouble [is] that a kernel command line, or /etc/fstab, using lots > >> of these is not human readable, and hence is at the edge of > >> unmaintainability. This maintenance difficulty surely outweighs the > >> rare situation where the physical->logical assignment changes due to a > >> broken drive. That's what we've got rescue disks for. > > Hear, hear! I never could understand why everyone seems to want to jump onto > > that band-wagon. > I have no problem with you saying all this long guid crap makes stuff > unreadable (and yes, I agree, unreadable and unmaintainable aren't that > far different) BUT > > surely outweighs the rare situation where the physical->logical > assignment changes > THAT DEPENDS ON YOUR HARDWARE! > For normal consumer grade hardware, I agree. I've never known it change > unless I've been mucking about with add-in SATA, PATA, whatever cards. This is the desirable state of affairs. > BUT. Especially on big server-grade hardware, where there's lots of trip > switches so stuff doesn't all power up in one huge spike (and I've > worked with such), different parts of the system come up in a completely > random order, and drives re-order themselves pretty much every single boot! So all this 32 hex digit UUID stuff is a workaround for the unpredictability of server hardware. What seems to be missing is a way of associating a given disk socket on the motherboard with /dev/sda. Instead we have to put up with "content addressing". > So yes, with our consumer hardware I'd agree with you. But the people > paying big bills for reliable top-range hardware would wonder what > you're smoking! I think any system admins reading this would long for the predictability of "consumer hardware", having too often been confronted with indistinguishable 32 hex digit identifiers. I would imagine it quite likely that the said admins have written scripts to make this more manageable. > Cheers, > Wol -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [gentoo-user] Fun with mdadm (Software RAID) 2024-12-22 20:05 ` Alan Mackenzie @ 2024-12-25 21:16 ` Steven Lembark 0 siblings, 0 replies; 23+ messages in thread From: Steven Lembark @ 2024-12-25 21:16 UTC (permalink / raw To: gentoo-user; +Cc: lembark > I think any system admins reading this would long for the > predictability of "consumer hardware", having too often been > confronted with indistinguishable 32 hex digit identifiers. I would > imagine it quite likely that the said admins have written scripts to > make this more manageable. Simple fix: use LVM, let it deal with the UUID. At that point the PV's get UUID's, the VG's get UUID's, the LV's get UUID's and you never have to type or see or use them. Snippet from my /etc/fstab: /dev/vg00/root / xfs ... /dev/vg00/var /var xfs ... /dev/vg00/var-tmp /var/tmp xfs ... this is basically the same fstab on my server & notebook, hasn't changed in the transitions from ATA to SATA to SCSI to SAS to NVME. If you want mirroring then either create a mirror with mdadm and use it as a PV -- kenel will auto-start the mirror, vgscan will find it, and Viola!, it's up -- or use -m2 and mirror/stripe/ RAID5/whatever using lvcreate to spread the data across whatever you like. Here I have two nvme's (used to be scsi, then sas) which are mirrored for vg00 w/ the root, var, home filesystems another that's striped for /var/tmp and other scratch spaces. This gives an overview: https://speakerdeck.com/lembark/its-only-logical-lvm-for-linux -- Steven Lembark Workhorse Computing lembark@wrkhors.com +1 888 359 3508 ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2024-12-30 4:13 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-20 10:47 [gentoo-user] Fun with mdadm (Software RAID) Alan Mackenzie 2024-12-20 14:50 ` karl 2024-12-20 15:28 ` Alan Mackenzie 2024-12-20 17:44 ` karl 2024-12-20 20:19 ` Alan Mackenzie 2024-12-20 20:38 ` Hoël Bézier 2024-12-20 20:53 ` Alan Mackenzie 2024-12-20 22:02 ` karl 2024-12-30 4:08 ` Frank Steinmetzger 2024-12-20 22:02 ` karl 2024-12-21 12:43 ` Alan Mackenzie 2024-12-21 16:36 ` Alan Mackenzie 2024-12-21 16:45 ` karl 2024-12-21 16:58 ` Alan Mackenzie 2024-12-22 13:08 ` Alan Mackenzie 2024-12-22 12:16 ` Wols Lists 2024-12-22 12:08 ` Wols Lists 2024-12-22 12:02 ` Wols Lists 2024-12-22 13:43 ` Alan Mackenzie 2024-12-22 15:29 ` Peter Humphrey 2024-12-22 16:53 ` Wols Lists 2024-12-22 20:05 ` Alan Mackenzie 2024-12-25 21:16 ` Steven Lembark
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox