public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: Richard Yao <ryao@gentoo.org>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] Re: rfc: Does OpenRC really need mount-ro
Date: Wed, 17 Feb 2016 09:24:44 -0500	[thread overview]
Message-ID: <4800E3E6-4D70-4DF8-8F40-705C6B77882B@gentoo.org> (raw)
In-Reply-To: <pan$5775c$d5db16fe$36a3366$a29b8c4d@cox.net>



> On Feb 16, 2016, at 9:20 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> 
> William Hubbs posted on Tue, 16 Feb 2016 12:41:29 -0600 as excerpted:
> 
>> What I'm trying to figure out is, what to do about re-mounting file
>> systems read-only.
>> 
>> How does systemd do this? I didn't find an equivalent of the mount-ro
>> service there.
> 
> For quite some time now, systemd has actually had a mechanism whereby the 
> main systemd process reexecs (with a pivot-root) the initr* systemd and 
> returns control to it during the shutdown process, thereby allowing a 
> more controlled shutdown than traditional init systems because the final 
> stages are actually running from the virtual-filesystem of the initr*, 
> such that after everything running on the main root is shutdown, the main 
> root itself can actually be unmounted, not just mounted read-only, 
> because there is literally nothing running on it any longer.
> 
> There's still a fallback to read-only mounting if an initr* isn't used or 
> if reinvoking the initr* version fails for some reason, but with an 
> initr*, when everything's working properly, while there are still some 
> bits of userspace running, they're no longer actually running off of the 
> main root, so main root can actually be unmounted much like any other 
> filesystem.

Systemd installs that go back into the initramfs at shutdown are rare because there is a hook for the initramfs to tell systemd that it should re-exec it and very few configurations do that. Even fewer that do it actually need it.

The biggest user of that mechanism of which I am aware is ZFS on EL/Fedora when booted with Dracut. It does not need it and it was only implemented was that someone who did not understand how ZFS was designed to integrate with the boot and startup processes thought it was a good idea.

As it turns out, that behavior actually breaks the mechanism intended to make multipath sane by marking the pool in such a way that it tells all systems with access to the disks that a pool that will be used on next boot is not going to be used by anyone. If they import it and the system boots, the pool can be damaged beyond repair.

Thankfully, no one seems to boot EL/Fedora systems off ZFS pools in multipath environments. The code to hook into this special behavior will be removed in the future, but that is a low priority as none of the developers' employers care about it and the almost negligible possibility that the mechanism would save someone from data loss  has made it too low of a priority for any of us to spend our free time on it.

> The process is explained a bit better in the copious blogposted systemd 
> documentation.  Let's see if I can find a link...
> 
> OK, this isn't where I originally read about it, which IIRC was aimed 
> more at admins, while this is aimed at initr* devs, but that's probably a 
> good thing as it includes more specific detail...
> 
> https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/
> 
> And here's some more, this time in the storage daemon controlled root and 
> initr* context...
> 
> https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/
> 
> 
> But... all that doesn't answer the original question directly, does it?  
> Where there's no return to initr*, how /does/ systemd handle read-only 
> mounting?
> 
> First, the nice ascii-diagram flow charts in the bootup (7) manpage may 
> be useful, in particular here, the shutdown diagram (tho IDK if you can 
> find such things useful or not??).
> 
> https://www.freedesktop.org/software/systemd/man/bootup.html
> 
> Here's the shutdown diagram described in words:
> 
> Initial shutdown is via two targets (as opposed to specific services), 
> shutdown.target, which conflicts with all (normal) system services 
> thereby shutting them down, and umount.target, which conflicts with file 
> mounts, swaps, cryptsetup device, etc.  Here, we're obviously interested 
> in umount.target.  Then after those two targets are reached, various low 
> level services are run or stopped, in ordered to reach final.target.  
> After final.target, the appropriate systemd-(reboot|poweroff|halt|kexec) 
> service is run, to hit the ultimate (reboot|poweroff|halt|kexec).target, 
> which of course is never actually evaluated, since the service actually 
> does the intended action.
> 
> The primary takeaway is that you might not be finding a specific systemd 
> remount-ro service, because it might be a target, defined in terms of 
> conflicts with mount units, etc, rather than a specific service.
> 
> Neither shutdown.target nor umount.target have any wants or requires by 
> default, but the various normal services and mount units conflict with 
> them, either via default or specifically, so are shut down before the 
> target can be reached.
> 
> final.target has the After=shutdown.target umount.target setting, so 
> won't be reached until they are reached.
> 
> The respective (reboot|poweroff|halt|kexec).target units Requires= and 
> After= their respective systemd-*.service units, and reboot and poweroff 
> (but not halt and kexec) have 30-minute timeouts after which they run 
> reboot-force or poweroff-force, respectively.
> 
> The respective systemd-(reboot|poweroff|halt|kexec).service units 
> Requires= and After= shutdown.target, umount.target and final.target, all 
> three, so won't be run until those complete.  They simply 
> ExecStart=/usr/bin/systemctl --force their respective actions.
> 
> And here's what the systemd.special (7) manpage says about umount.target:
> 
>  umount.target
>    A special target unit that umounts all mount and automount points
>    on system shutdown.
> 
>    Mounts that shall be unmounted on system shutdown shall add
>    Conflicts dependencies to this unit for their mount unit,
>    which is implicitly done when DefaultDependencies=yes is set
>    (the default).
> 
> But that /still/ doesn't reveal what actually does the remount-ro, as 
> opposed to umount.  I don't see that either, at the unit level, nor do I 
> see anything related to it in for instance my auto-generated from fstab 
> /run/systemd/generators/-.mount file or in the systemd-fstab-generator 
> (8) manpage.
> 
> Thus I must conclude that it's actually resolved in the mount-unit 
> conflicts handling in systemd's source code, itself.
> 
> And indeed... in systemd's tarball, we see in src/core/umount.c, in 
> mount_points_list_umount...
> 
> That the function actually remounts /everything/ (well, everything not in 
> a container) read-only, before actually trying to umount them.  Indention 
> restandardized on two-space here, to avoid unnecessary wrapping as 
> posted.  This is from systemd-228:
> 
> static int mount_points_list_umount(MountPoint **head, bool *changed, bool 
> log_error) {
>  MountPoint *m, *n;
>  int n_failed = 0;
> 
>  assert(head);
> 
>  LIST_FOREACH_SAFE(mount_point, m, n, *head) {
> 
>    /* If we are in a container, don't attempt to
>       read-only mount anything as that brings no real
>       benefits, but might confuse the host, as we remount
>       the superblock here, not the bind mound. */
>    if (detect_container() <= 0)  {
>      _cleanup_free_ char *options = NULL;
>      /* MS_REMOUNT requires that the data parameter
>       * should be the same from the original mount
>       * except for the desired changes. Since we want
>       * to remount read-only, we should filter out
>       * rw (and ro too, because it confuses the kernel) */
>      (void) fstab_filter_options(m->options, "rw\0ro\0", NULL, NULL, 
> &options);
> 
>      /* We always try to remount directories read-only
>       * first, before we go on and umount them.
>       *
>       * Mount points can be stacked. If a mount
>       * point is stacked below / or /usr, we
>       * cannot umount or remount it directly,
>       * since there is no way to refer to the
>       * underlying mount. There's nothing we can do
>       * about it for the general case, but we can
>       * do something about it if it is aliased
>       * somehwere else via a bind mount. If we
>       * explicitly remount the super block of that
>       * alias read-only we hence should be
>       * relatively safe regarding keeping the fs we
>       * can otherwise not see dirty. */
>      log_info("Remounting '%s' read-only with options '%s'.", m->path, 
> options);
>      (void) mount(NULL, m->path, NULL, MS_REMOUNT|MS_RDONLY, options);
>    }
> 
>    /* Skip / and /usr since we cannot unmount that
>     * anyway, since we are running from it. They have
>     * already been remounted ro. */
>    if (path_equal(m->path, "/")
> #ifndef HAVE_SPLIT_USR
>      || path_equal(m->path, "/usr")
> #endif
>    )
>      continue;
> 
>    /* Trying to umount. We don't force here since we rely
>        * on busy NFS and FUSE file systems to return EBUSY
>        * until we closed everything on top of them. */
>    log_info("Unmounting %s.", m->path);
>    if (umount2(m->path, 0) == 0) {
>      if (changed)
>        *changed = true;
> 
>      mount_point_free(head, m);
>    } else if (log_error) {
>      log_warning_errno(errno, "Could not unmount %s: %m", m->path);
>      n_failed++;
>    }
>  }
> 
>  return n_failed;
> }
> 
> 
> So the short answer ultimately is... Systemd has a single umount 
> function, which first does remount-ro, so it's actually remounting 
> (nearly) everything read-only, then tries umount.
> 
> 
> Meanwhile, (semi-)answering the elsewhere implied question of why only 
> Linux needs the mount-ro service...  I'm no BSD expert, but in my 
> wanderings I came across a remark that they didn't need it, because their 
> kernel reboot/halt/poweroff routines have a built-in kernelspace sync-and-
> remount-ro routine for anything that can't be unmounted, which Linux 
> lacks.  They obviously consider this a Linux deficiency, but while I've 
> not come across the Linux reason for not doing it, an educated guess is 
> that it's considered putting policy into the kernel, and that's 
> considered a no-no, policy is userspace; the kernel simply enforces it as 
> directed (which is why kernel 2.4's devfs was removed for 2.6, to be 
> replaced with the userspace-based udev).  Additionally, not kernel-
> forcing the remount-ro bit does give developers a way to test results of 
> an uncontrolled shutdown, say on a specific testing filesystem only, 
> without exposing the rest of the system, which can still be shut down 
> normally, to it.
> 
> So on Linux userspace must do the final umounts and force-read-onlys, 
> because unlike the BSDs, the Linux kernel doesn't have builtin routines 
> that automatically force it, regardless of userspace.
> 
> But as others have said, on Linux the remount-ro is _definitely_ 
> required, and "bad things _will_ happen" if it's not done.  (Just how bad 
> depends on the filesystem and its mount options, and hardware, among 
> other things.)
> 
> 
> Finally, one more thing to mention.  On systems with magic-srq in the 
> kernel...
> 
> echo 0x30 > /proc/sys/kernel/sysrq
> 
> ... enables the sync (0x10) and remount-readonly (0x20) functions.  (Of 
> course only do this at shutdown/reboot, as you don't want to disturb the 
> user's configured srq defaults in normal runtime.)
> 
> You can then force emergency sync (s) and remount-read-only (u) with...
> 
> echo s > /proc/sysrq-trigger
> echo u > /proc/sysrq-trigger
> 
> As that's kernel emergency priority, it should force-sync and force 
> everything readonly (and quiesce mid-layer layer block devices such as md 
> and dm), even if it would normally refuse to do so due to files open for 
> writing.  You might consider something like that as a fallback, if normal 
> mount-readonly fails.  Of course it won't work if magic-srq functionality 
> isn't built into the kernel, but then you're no worse off than before, 
> and are far better off on kernels where it's supported, so it's certainly 
> worth considering. =:^)
> 
> -- 
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
> 
> 


  parent reply	other threads:[~2016-02-17 14:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-16 18:05 [gentoo-dev] rfc: Does OpenRC really need mount-ro William Hubbs
2016-02-16 18:22 ` Rich Freeman
2016-02-16 18:41   ` William Hubbs
2016-02-17  2:20     ` [gentoo-dev] " Duncan
2016-02-17 13:46       ` Rich Freeman
2016-02-18  8:57         ` Duncan
2016-02-18 12:22           ` Rich Freeman
2016-02-19  5:07             ` Duncan
2016-02-25 23:46               ` Raymond Jennings
2016-02-17 14:24       ` Richard Yao [this message]
2016-02-17 17:19         ` Rich Freeman
2016-02-17 17:30           ` James Le Cuirot
2016-02-17 18:06             ` Ian Stakenvicius
2016-02-17 18:32               ` Rich Freeman
2016-02-18  3:11                 ` Richard Yao
2016-02-18  9:02                   ` James Le Cuirot
2016-02-18 11:39                   ` Rich Freeman
2016-02-17 21:50             ` Daniel Campbell
2016-02-18  3:02           ` Richard Yao
2016-02-18 10:48             ` Rich Freeman
2016-02-17 14:05     ` [gentoo-dev] " Richard Yao
2016-02-16 19:31 ` Patrick Lauer
2016-02-16 20:18   ` Rich Freeman
2016-02-17 14:06     ` Richard Yao
2016-02-17 19:01     ` Andrew Savchenko
2016-02-17 19:26       ` Rich Freeman
2016-02-18  3:26       ` Richard Yao
2016-02-18  7:53         ` Andrew Savchenko
2016-02-16 20:03 ` Daniel Campbell
2016-02-17  8:24 ` Luca Barbato
2016-02-17 14:00 ` Richard Yao
2016-02-18  7:02 ` Robin H. Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4800E3E6-4D70-4DF8-8F40-705C6B77882B@gentoo.org \
    --to=ryao@gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox