public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] New system, systemd, and dm-integrity
@ 2020-05-15  8:55 Wols Lists
  2020-05-15 10:19 ` Neil Bothwick
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2020-05-15  8:55 UTC (permalink / raw
  To: gentoo-user

I'm finally building my new system, but I'm pretty certain I'll need
some advice to get it to boot. As you might guess from the subject the
"problem" is dm-integrity.

I'm using openSUSE as my host system, which I used to set up the disk(s).

So currently I have

sdb
--> sdb3
   --> dm-integity
      --> md-raid
         --> lvm
            --> root

And my root partition is on lvm. Currently I have a custom systemd
config file that sets up dm-integrity. How do I add this to the gentoo
initramfs? Without it raid won't recognise the disk, so there'll be no
root partition to switch to at boot.

Plan for the future is to add dm-integrity recognition to upstream
mdadm, but for that I need my new system, so's I can demote my old
system to a test-bed.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-user] New system, systemd, and dm-integrity
  2020-05-15  8:55 [gentoo-user] New system, systemd, and dm-integrity Wols Lists
@ 2020-05-15 10:19 ` Neil Bothwick
  2020-05-15 10:20   ` Neil Bothwick
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Bothwick @ 2020-05-15 10:19 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

On Fri, 15 May 2020 09:55:57 +0100, Wols Lists wrote:

> So currently I have
> 
> sdb
> --> sdb3
>    --> dm-integity
>       --> md-raid
>          --> lvm
>             --> root  
> 
> And my root partition is on lvm. Currently I have a custom systemd
> config file that sets up dm-integrity. How do I add this to the gentoo
> initramfs? Without it raid won't recognise the disk, so there'll be no
> root partition to switch to at boot.

How are you generating the initramfs? If you use dracut, there are
options you can add to it's config directory, such as install_items to
make sure your service files are included.


-- 
Neil Bothwick

Last words of a Windows user: = Where do I have to click now? - There?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-user] New system, systemd, and dm-integrity
  2020-05-15 10:19 ` Neil Bothwick
@ 2020-05-15 10:20   ` Neil Bothwick
  2020-05-15 11:16     ` antlists
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Bothwick @ 2020-05-15 10:20 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 498 bytes --]

On Fri, 15 May 2020 11:19:06 +0100, Neil Bothwick wrote:

> How are you generating the initramfs? If you use dracut, there are
> options you can add to it's config directory, such as install_items to
> make sure your service files are included.

Or you can create a custom module, they are just shell scripts. I recall
reading a blog post by Rich on how to do this a few years ago.


-- 
Neil Bothwick

If the cops arrest a mime, do they tell her she has the right to remain
silent?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-user] New system, systemd, and dm-integrity
  2020-05-15 10:20   ` Neil Bothwick
@ 2020-05-15 11:16     ` antlists
  2020-05-15 11:30       ` Rich Freeman
  0 siblings, 1 reply; 9+ messages in thread
From: antlists @ 2020-05-15 11:16 UTC (permalink / raw
  To: gentoo-user

On 15/05/2020 11:20, Neil Bothwick wrote:
> On Fri, 15 May 2020 11:19:06 +0100, Neil Bothwick wrote:
> 
>> How are you generating the initramfs? If you use dracut, there are
>> options you can add to it's config directory, such as install_items to
>> make sure your service files are included.
> 
I presume I'll be using dracut ...

> Or you can create a custom module, they are just shell scripts. I recall
> reading a blog post by Rich on how to do this a few years ago.
> 
> 
My custom module calls a shell script, so it shouldn't be that hard from 
what you say. I then need to make sure the program it invokes 
(integritysetup) is in the initramfs?

Cheers,
Wol


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-user] New system, systemd, and dm-integrity
  2020-05-15 11:16     ` antlists
@ 2020-05-15 11:30       ` Rich Freeman
  2020-05-15 13:18         ` antlists
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Freeman @ 2020-05-15 11:30 UTC (permalink / raw
  To: gentoo-user

On Fri, May 15, 2020 at 7:16 AM antlists <antlists@youngman.org.uk> wrote:
>
> On 15/05/2020 11:20, Neil Bothwick wrote:
> >
> > Or you can create a custom module, they are just shell scripts. I recall
> > reading a blog post by Rich on how to do this a few years ago.
> >
> My custom module calls a shell script, so it shouldn't be that hard from
> what you say. I then need to make sure the program it invokes
> (integritysetup) is in the initramfs?

The actual problem that this module solves is no-doubt long solved
upstream, but here is the blog post on dracut modules (which is fairly
well-documented in the official docs as well):
https://rich0gentoo.wordpress.com/2012/01/21/a-quick-dracut-module/

Basically you have a shell script that tells dracut when building the
initramfs to include in it whatever you need.  Then you have the phase
hooks that actually run whatever you need to run at the appropriate
time during boot (presumably before the mdadm stuff runs).

My example doesn't install any external programs, but there is a
simple syntax for that.

If your module is reasonably generic you could probably get upstream
to merge it as well.

Good luck with it, and I'm curious as to how you like this setup vs
something more "conventional" like zfs/btrfs.  I'm using single-volume
zfs for integrity for my lizardfs chunkservers and it strikes me that
maybe dm-integrity could accomplish the same goal with perhaps better
performance (and less kernel fuss).  I'm not sure I'd want to replace
more general-purpose zfs with this, though the flexibility of
lvm+mdadm is certainly attractive.

-- 
Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-user] New system, systemd, and dm-integrity
  2020-05-15 11:30       ` Rich Freeman
@ 2020-05-15 13:18         ` antlists
  2020-05-15 13:49           ` Rich Freeman
  0 siblings, 1 reply; 9+ messages in thread
From: antlists @ 2020-05-15 13:18 UTC (permalink / raw
  To: gentoo-user

On 15/05/2020 12:30, Rich Freeman wrote:
> On Fri, May 15, 2020 at 7:16 AM antlists <antlists@youngman.org.uk> wrote:
>>
>> On 15/05/2020 11:20, Neil Bothwick wrote:
>>>
>>> Or you can create a custom module, they are just shell scripts. I recall
>>> reading a blog post by Rich on how to do this a few years ago.
>>>
>> My custom module calls a shell script, so it shouldn't be that hard from
>> what you say. I then need to make sure the program it invokes
>> (integritysetup) is in the initramfs?
> 
> The actual problem that this module solves is no-doubt long solved
> upstream, but here is the blog post on dracut modules (which is fairly
> well-documented in the official docs as well):
> https://rich0gentoo.wordpress.com/2012/01/21/a-quick-dracut-module/

I don't think it is ... certainly I'm not aware of anything other than 
LUKS that uses dm-integrity, and LUKS sets it up itself.
> 
> Basically you have a shell script that tells dracut when building the
> initramfs to include in it whatever you need.  Then you have the phase
> hooks that actually run whatever you need to run at the appropriate
> time during boot (presumably before the mdadm stuff runs).
> 
> My example doesn't install any external programs, but there is a
> simple syntax for that.
> 
> If your module is reasonably generic you could probably get upstream
> to merge it as well.

No. Like LUKS, I intend to merge the code into mdadm and let the raid 
side handle it. If mdadm detects a dm-integrity/raid setup, it'll set up 
dm-integrity and then recurse to set up raid.
> 
> Good luck with it, and I'm curious as to how you like this setup vs
> something more "conventional" like zfs/btrfs.  I'm using single-volume
> zfs for integrity for my lizardfs chunkservers and it strikes me that
> maybe dm-integrity could accomplish the same goal with perhaps better
> performance (and less kernel fuss).  I'm not sure I'd want to replace
> more general-purpose zfs with this, though the flexibility of
> lvm+mdadm is certainly attractive.
> 
openSUSE is my only experience of btrfs. And it hasn't been nice. When 
it goes wrong it's nasty. Plus only raid 1 really works - I've heard 
that 5 and 6 have design flaws which means it will be very hard to get 
them to work properly. I've never met zfs.

As the linux raid wiki says (I wrote it :-) do you want the complexity 
of a "do it all" filesystem, or the abstraction of dedicated layers?

The big problem that md-raid has is that it has no way of detecting or 
dealing with corruption underneath. Hence me wanting to put dm-integrity 
underneath, because that's dedicated to detecting corruption. So if 
something goes wrong, the raid gets a read error and sorts it out.

Then lvm provides the snap-shotting and sort-of-backups etc.

But like all these things, it's learning that's the big problem. With my 
main system, I don't want to experiment. My first gentoo system was an 
Athlon K8 Thunderbird on ext. The next one is my current Athlon X III 
mirrored across two 3TB drives. Now I'm throwing dm-integrity and lvm 
into the mix with two 4TB drives. So I'm going to try and learn KVM ... :-)

Cheers,
Wol


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-user] New system, systemd, and dm-integrity
  2020-05-15 13:18         ` antlists
@ 2020-05-15 13:49           ` Rich Freeman
  2020-05-15 15:27             ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Freeman @ 2020-05-15 13:49 UTC (permalink / raw
  To: gentoo-user

On Fri, May 15, 2020 at 9:18 AM antlists <antlists@youngman.org.uk> wrote:
>
> On 15/05/2020 12:30, Rich Freeman wrote:
> > The actual problem that this module solves is no-doubt long solved
> > upstream, but here is the blog post on dracut modules (which is fairly
> > well-documented in the official docs as well):
> > https://rich0gentoo.wordpress.com/2012/01/21/a-quick-dracut-module/
>
> I don't think it is ... certainly I'm not aware of anything other than
> LUKS that uses dm-integrity, and LUKS sets it up itself.

I was referring to my specific problem in the blog article with mdadm
not actually detecting drives for whatever reason.

I no longer use md-raid on any of my systems so I can't vouch for
whether it is still an issue, but something like that was probably
fixed somewhere.

> > If your module is reasonably generic you could probably get upstream
> > to merge it as well.
>
> No. Like LUKS, I intend to merge the code into mdadm and let the raid
> side handle it. If mdadm detects a dm-integrity/raid setup, it'll set up
> dm-integrity and then recurse to set up raid.

Seems reasonable enough, though you could probably argue for
separation of concerns to do it in dracut.  In any case, I do suspect
the dracut folks would consider such a use case valid for inclusion in
the default package if you do want to have a module for it.

> openSUSE is my only experience of btrfs. And it hasn't been nice. When
> it goes wrong it's nasty. Plus only raid 1 really works - I've heard
> that 5 and 6 have design flaws which means it will be very hard to get
> them to work properly.

Yeah, I moved away from btrfs as well for the same reasons.  I got
into it years ago thinking that it was still a bit unpolished but
seemed to be rapidly gaining traction.  For whatever reason they never
got regressions under control and I got burned more than once by it.
I did keep backups but restoration is of course painful.

> I've never met zfs.

So, compared to what you're doing I could see the following advantages:

1.  All the filesystem-layer stuff which obviously isn't in-scope for
the lower layers, including snapshots (obviously those can be done
with lvm but it is a bit cleaner at the filesystem level).  I'd argue
that some of this stuff isn't as flexible as with btrfs but it will be
far superior to something like ext4 on top of what you're doing.

2.  No RAID write-hole.  I'd think that your solution with the
integrity layer would detect corruption resulting from the write hole,
but I don't think it could prevent it, since a RAID stripe is still
overwritten in place.  But, I've never had a conversation with an
md-raid developer so perhaps you have a more educated view on the
matter.

3.  COW offers some of the data-integrity benefits of full data
journaling without the performance costs of this.  On the other hand
it probably is not going to perform as well as overwriting in place
without any data journaling.  In theory this is more of a
filesystem-level feature though.

4.  In the future COW with zfs could probably enable better
performance on SSD/SMR with TRIM by structuring writes to consolidate
free blocks into erase zones.  However, as far as I'm aware that is a
theoretical future benefit and not anything available, and I have no
idea if anybody is working on that.  This sort of benefit would
require the vertical integration that zfs uses.

In general zfs is much more stable than btrfs and far less likely to
eat your data.  And FWIW I did once (many years ago) have
ext4+lvm+mdadm eat my data - I think it was due to some kind of lvm
metadata corruption or something like that, because basically an fsck
on one ext4 partition scrambled a different ext4 partition, which
obviously should not be possible if lvm is working right.  I have no
idea what the root cause of that was - could have been bad RAM or
something which of course can mess up anything short of a distributed
filesystem with integrity checking above the host level (which, IMO,
most of the solutions don't do as well as they could).

One big disadvantage with zfs is that it is far less flexible at the
physical layer.  You can add the equivalent of LVM PVs, and you can
expand a PV, but you can't remove a PV in anything but the latest
version of zfs, and I think there are some limitations around how this
works.  You can't reshape the equivalent of an mdadm array, but you
can replace a drive in an array and grow an array if all the
underlying devices have enough space.  You can add/remove mirrors from
the equivalent of a raid1 to freely go between no-redundancy to any
multiplicity you wish.  Striped arrays are basically fixed in layout
once created.

> As the linux raid wiki says (I wrote it :-) do you want the complexity
> of a "do it all" filesystem, or the abstraction of dedicated layers?

Yeah, it is a well-established argument and has some merit.

I'm not sure I'd go this route for my regular hosts since zfs works
reasonably well (though your solution is more flexible than zfs).

However, I might evaluate how dm-integrity plus ext4 (maybe with LVM
in-between) works on my lizardfs chunkservers.  These have redundancy
above the host level, but I do want integrity checking for static data
issues, and I'm not sure that lizardfs provides any guarantees here
(plus having it at the host level would probably perform better
anyway).  If the integrity layer returned an io error lizardfs would
just overwrite the impacted files in-place most likely, so there would
be no reads from the impacted block until it was rewritten which
presumably would clear the integrity error.

That said, I'm not sure that lizardfs even overwrites anything
in-place in normal use so it might not make any difference vs zfs.  It
breaks all data into "chunks" and I'd think that if data were
overwritten in place at the filesystem level it probably would end up
in a new chunk, with the old one garbage collected if it were not
snapshotted.

-- 
Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-user] New system, systemd, and dm-integrity
  2020-05-15 13:49           ` Rich Freeman
@ 2020-05-15 15:27             ` Wols Lists
  2020-05-15 15:36               ` Rich Freeman
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2020-05-15 15:27 UTC (permalink / raw
  To: gentoo-user

On 15/05/20 14:49, Rich Freeman wrote:
> On Fri, May 15, 2020 at 9:18 AM antlists <antlists@youngman.org.uk> wrote:
>>
>> On 15/05/2020 12:30, Rich Freeman wrote:

I've snipped it, but I can't imagine dracut/mdadm having the problems
you describe today - there are too many systems out there that boot from
lvm/mdadm. My problem is I'm adding dm-integrity to the mix ...
> 
> So, compared to what you're doing I could see the following advantages:
> 
> 1.  All the filesystem-layer stuff which obviously isn't in-scope for
> the lower layers, including snapshots (obviously those can be done
> with lvm but it is a bit cleaner at the filesystem level).  I'd argue
> that some of this stuff isn't as flexible as with btrfs but it will be
> far superior to something like ext4 on top of what you're doing.
> 
> 2.  No RAID write-hole.  I'd think that your solution with the
> integrity layer would detect corruption resulting from the write hole,
> but I don't think it could prevent it, since a RAID stripe is still
> overwritten in place.  But, I've never had a conversation with an
> md-raid developer so perhaps you have a more educated view on the
> matter.

I don't know as it would. The write hole is where all the blocks are
intact, but not all of them make it to disk. That said, the write hole
has been pretty much fixed now - I think new raids use journalling which
deals with it. That's certainly been discussed on the list.
> 
> 3.  COW offers some of the data-integrity benefits of full data
> journaling without the performance costs of this.  On the other hand
> it probably is not going to perform as well as overwriting in place
> without any data journaling.  In theory this is more of a
> filesystem-level feature though.
> 
> 4.  In the future COW with zfs could probably enable better
> performance on SSD/SMR with TRIM by structuring writes to consolidate
> free blocks into erase zones.  However, as far as I'm aware that is a
> theoretical future benefit and not anything available, and I have no
> idea if anybody is working on that.  This sort of benefit would
> require the vertical integration that zfs uses.
> 
> In general zfs is much more stable than btrfs and far less likely to
> eat your data.  And FWIW I did once (many years ago) have
> ext4+lvm+mdadm eat my data - I think it was due to some kind of lvm
> metadata corruption or something like that, because basically an fsck
> on one ext4 partition scrambled a different ext4 partition, which
> obviously should not be possible if lvm is working right.  I have no
> idea what the root cause of that was - could have been bad RAM or
> something which of course can mess up anything short of a distributed
> filesystem with integrity checking above the host level (which, IMO,
> most of the solutions don't do as well as they could).
> 
> One big disadvantage with zfs is that it is far less flexible at the
> physical layer.  You can add the equivalent of LVM PVs, and you can
> expand a PV, but you can't remove a PV in anything but the latest
> version of zfs, and I think there are some limitations around how this
> works.  You can't reshape the equivalent of an mdadm array, but you
> can replace a drive in an array and grow an array if all the
> underlying devices have enough space.  You can add/remove mirrors from
> the equivalent of a raid1 to freely go between no-redundancy to any
> multiplicity you wish.  Striped arrays are basically fixed in layout
> once created.
> 
>> As the linux raid wiki says (I wrote it :-) do you want the complexity
>> of a "do it all" filesystem, or the abstraction of dedicated layers?
> 
> Yeah, it is a well-established argument and has some merit.
> 
> I'm not sure I'd go this route for my regular hosts since zfs works
> reasonably well (though your solution is more flexible than zfs).
> 
> However, I might evaluate how dm-integrity plus ext4 (maybe with LVM
> in-between) works on my lizardfs chunkservers.  These have redundancy
> above the host level, but I do want integrity checking for static data
> issues, and I'm not sure that lizardfs provides any guarantees here
> (plus having it at the host level would probably perform better
> anyway).  If the integrity layer returned an io error lizardfs would
> just overwrite the impacted files in-place most likely, so there would
> be no reads from the impacted block until it was rewritten which
> presumably would clear the integrity error.
> 
> That said, I'm not sure that lizardfs even overwrites anything
> in-place in normal use so it might not make any difference vs zfs.  It
> breaks all data into "chunks" and I'd think that if data were
> overwritten in place at the filesystem level it probably would end up
> in a new chunk, with the old one garbage collected if it were not
> snapshotted.
> 
The crucial point here is that dm-integrity protects against something
*outside* your stack trashing part of the disk. If something came along
and wrote randomly to /dev/sda, then when my filesystem tried to
retrieve a file, dm-integrity would cause sda to return a read error,
raid would say "oops", read it from sdb, and rewrite sda.

It won't protect against corruption in the stack itself, I don't think,
because if the data is corrupt when it hits dm-integrity's write path,
of course all the crc's etc will be correct.

Anyways, for a bit of info and a cookbook on dm-integrity, take a look
at https://raid.wiki.kernel.org/index.php/System2020

Cheers,
Wol



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gentoo-user] New system, systemd, and dm-integrity
  2020-05-15 15:27             ` Wols Lists
@ 2020-05-15 15:36               ` Rich Freeman
  0 siblings, 0 replies; 9+ messages in thread
From: Rich Freeman @ 2020-05-15 15:36 UTC (permalink / raw
  To: gentoo-user

On Fri, May 15, 2020 at 11:27 AM Wols Lists <antlists@youngman.org.uk> wrote:
>
> The crucial point here is that dm-integrity protects against something
> *outside* your stack trashing part of the disk. If something came along
> and wrote randomly to /dev/sda, then when my filesystem tried to
> retrieve a file, dm-integrity would cause sda to return a read error,
> raid would say "oops", read it from sdb, and rewrite sda.

Yup.  I understand what it does.

Main reason to use it would be if it performs better than zfs which
offers the same guarantees.  That is why I was talking about whether
lizardfs does overwrites in-place or not - that is where I'd expect
zfs to potentially have performance issues.

Most of the protections in lizardfs happen above the single-host level
- I can smash a single host with a hammer while it is operating and it
shouldn't cause more than a very slight delay and trigger a rebalance.
That also helps protect against failure modes like HBA failures that
could take out multiple disks at once (though to be fair you can
balance your mirrors across HBAs if you have more than one).

-- 
Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-05-15 15:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-15  8:55 [gentoo-user] New system, systemd, and dm-integrity Wols Lists
2020-05-15 10:19 ` Neil Bothwick
2020-05-15 10:20   ` Neil Bothwick
2020-05-15 11:16     ` antlists
2020-05-15 11:30       ` Rich Freeman
2020-05-15 13:18         ` antlists
2020-05-15 13:49           ` Rich Freeman
2020-05-15 15:27             ` Wols Lists
2020-05-15 15:36               ` Rich Freeman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox