From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-dev+bounces-74765-garchives=archives.gentoo.org@lists.gentoo.org>
Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80])
	by finch.gentoo.org (Postfix) with ESMTP id 849CD1388BF
	for <garchives@archives.gentoo.org>; Wed, 17 Feb 2016 14:24:58 +0000 (UTC)
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 60C7E21C056;
	Wed, 17 Feb 2016 14:24:50 +0000 (UTC)
Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by pigeon.gentoo.org (Postfix) with ESMTPS id 54EF721C006
	for <gentoo-dev@lists.gentoo.org>; Wed, 17 Feb 2016 14:24:49 +0000 (UTC)
Received: from [IPv6:2001:470:8840::4c3a:ac04:ed57:ecaf] (unknown [IPv6:2001:470:8840:0:4c3a:ac04:ed57:ecaf])
	(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	(Authenticated sender: ryao)
	by smtp.gentoo.org (Postfix) with ESMTPSA id 4F139340B09
	for <gentoo-dev@lists.gentoo.org>; Wed, 17 Feb 2016 14:24:48 +0000 (UTC)
From: Richard Yao <ryao@gentoo.org>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-Post: <mailto:gentoo-dev@lists.gentoo.org>
List-Help: <mailto:gentoo-dev+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-dev+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-dev+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-dev.gentoo.org>
X-BeenThere: gentoo-dev@lists.gentoo.org
Reply-to: gentoo-dev@lists.gentoo.org
Mime-Version: 1.0 (1.0)
Subject: Re: [gentoo-dev] Re: rfc: Does OpenRC really need mount-ro
Message-Id: <4800E3E6-4D70-4DF8-8F40-705C6B77882B@gentoo.org>
Date: Wed, 17 Feb 2016 09:24:44 -0500
References: <20160216180533.GB1450@whubbs1.gaikai.biz> <CAGfcS_mus8xykkCSd+WXZmx68SABa6AJXHKvKSao77WMbVH9sw@mail.gmail.com> <20160216184129.GB1704@whubbs1.gaikai.biz> <pan$5775c$d5db16fe$36a3366$a29b8c4d@cox.net>
In-Reply-To: <pan$5775c$d5db16fe$36a3366$a29b8c4d@cox.net>
To: gentoo-dev@lists.gentoo.org
X-Mailer: iPhone Mail (13D15)
X-Archives-Salt: 6e735366-9d0b-4c61-8f66-e769d6d4f7ea
X-Archives-Hash: e2fd2f8e28307bc8afae2df2c7fb481f


> On Feb 16, 2016, at 9:20 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>=20
> William Hubbs posted on Tue, 16 Feb 2016 12:41:29 -0600 as excerpted:
>=20
>> What I'm trying to figure out is, what to do about re-mounting file
>> systems read-only.
>>=20
>> How does systemd do this? I didn't find an equivalent of the mount-ro
>> service there.
>=20
> For quite some time now, systemd has actually had a mechanism whereby the=20=

> main systemd process reexecs (with a pivot-root) the initr* systemd and=20=

> returns control to it during the shutdown process, thereby allowing a=20
> more controlled shutdown than traditional init systems because the final=20=

> stages are actually running from the virtual-filesystem of the initr*,=20
> such that after everything running on the main root is shutdown, the main=20=

> root itself can actually be unmounted, not just mounted read-only,=20
> because there is literally nothing running on it any longer.
>=20
> There's still a fallback to read-only mounting if an initr* isn't used or=20=

> if reinvoking the initr* version fails for some reason, but with an=20
> initr*, when everything's working properly, while there are still some=20
> bits of userspace running, they're no longer actually running off of the=20=

> main root, so main root can actually be unmounted much like any other=20
> filesystem.

Systemd installs that go back into the initramfs at shutdown are rare becaus=
e there is a hook for the initramfs to tell systemd that it should re-exec i=
t and very few configurations do that. Even fewer that do it actually need i=
t.

The biggest user of that mechanism of which I am aware is ZFS on EL/Fedora w=
hen booted with Dracut. It does not need it and it was only implemented was t=
hat someone who did not understand how ZFS was designed to integrate with th=
e boot and startup processes thought it was a good idea.

As it turns out, that behavior actually breaks the mechanism intended to mak=
e multipath sane by marking the pool in such a way that it tells all systems=
 with access to the disks that a pool that will be used on next boot is not g=
oing to be used by anyone. If they import it and the system boots, the pool c=
an be damaged beyond repair.

Thankfully, no one seems to boot EL/Fedora systems off ZFS pools in multipat=
h environments. The code to hook into this special behavior will be removed i=
n the future, but that is a low priority as none of the developers' employer=
s care about it and the almost negligible possibility that the mechanism wou=
ld save someone from data loss  has made it too low of a priority for any of=
 us to spend our free time on it.

> The process is explained a bit better in the copious blogposted systemd=20=

> documentation.  Let's see if I can find a link...
>=20
> OK, this isn't where I originally read about it, which IIRC was aimed=20
> more at admins, while this is aimed at initr* devs, but that's probably a=20=

> good thing as it includes more specific detail...
>=20
> https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/
>=20
> And here's some more, this time in the storage daemon controlled root and=20=

> initr* context...
>=20
> https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/
>=20
>=20
> But... all that doesn't answer the original question directly, does it? =20=

> Where there's no return to initr*, how /does/ systemd handle read-only=20
> mounting?
>=20
> First, the nice ascii-diagram flow charts in the bootup (7) manpage may=20=

> be useful, in particular here, the shutdown diagram (tho IDK if you can=20=

> find such things useful or not??).
>=20
> https://www.freedesktop.org/software/systemd/man/bootup.html
>=20
> Here's the shutdown diagram described in words:
>=20
> Initial shutdown is via two targets (as opposed to specific services),=20
> shutdown.target, which conflicts with all (normal) system services=20
> thereby shutting them down, and umount.target, which conflicts with file=20=

> mounts, swaps, cryptsetup device, etc.  Here, we're obviously interested=20=

> in umount.target.  Then after those two targets are reached, various low=20=

> level services are run or stopped, in ordered to reach final.target. =20
> After final.target, the appropriate systemd-(reboot|poweroff|halt|kexec)=20=

> service is run, to hit the ultimate (reboot|poweroff|halt|kexec).target,=20=

> which of course is never actually evaluated, since the service actually=20=

> does the intended action.
>=20
> The primary takeaway is that you might not be finding a specific systemd=20=

> remount-ro service, because it might be a target, defined in terms of=20
> conflicts with mount units, etc, rather than a specific service.
>=20
> Neither shutdown.target nor umount.target have any wants or requires by=20=

> default, but the various normal services and mount units conflict with=20
> them, either via default or specifically, so are shut down before the=20
> target can be reached.
>=20
> final.target has the After=3Dshutdown.target umount.target setting, so=20
> won't be reached until they are reached.
>=20
> The respective (reboot|poweroff|halt|kexec).target units Requires=3D and=20=

> After=3D their respective systemd-*.service units, and reboot and poweroff=
=20
> (but not halt and kexec) have 30-minute timeouts after which they run=20
> reboot-force or poweroff-force, respectively.
>=20
> The respective systemd-(reboot|poweroff|halt|kexec).service units=20
> Requires=3D and After=3D shutdown.target, umount.target and final.target, a=
ll=20
> three, so won't be run until those complete.  They simply=20
> ExecStart=3D/usr/bin/systemctl --force their respective actions.
>=20
> And here's what the systemd.special (7) manpage says about umount.target:
>=20
>  umount.target
>    A special target unit that umounts all mount and automount points
>    on system shutdown.
>=20
>    Mounts that shall be unmounted on system shutdown shall add
>    Conflicts dependencies to this unit for their mount unit,
>    which is implicitly done when DefaultDependencies=3Dyes is set
>    (the default).
>=20
> But that /still/ doesn't reveal what actually does the remount-ro, as=20
> opposed to umount.  I don't see that either, at the unit level, nor do I=20=

> see anything related to it in for instance my auto-generated from fstab=20=

> /run/systemd/generators/-.mount file or in the systemd-fstab-generator=20
> (8) manpage.
>=20
> Thus I must conclude that it's actually resolved in the mount-unit=20
> conflicts handling in systemd's source code, itself.
>=20
> And indeed... in systemd's tarball, we see in src/core/umount.c, in=20
> mount_points_list_umount...
>=20
> That the function actually remounts /everything/ (well, everything not in=20=

> a container) read-only, before actually trying to umount them.  Indention=20=

> restandardized on two-space here, to avoid unnecessary wrapping as=20
> posted.  This is from systemd-228:
>=20
> static int mount_points_list_umount(MountPoint **head, bool *changed, bool=
=20
> log_error) {
>  MountPoint *m, *n;
>  int n_failed =3D 0;
>=20
>  assert(head);
>=20
>  LIST_FOREACH_SAFE(mount_point, m, n, *head) {
>=20
>    /* If we are in a container, don't attempt to
>       read-only mount anything as that brings no real
>       benefits, but might confuse the host, as we remount
>       the superblock here, not the bind mound. */
>    if (detect_container() <=3D 0)  {
>      _cleanup_free_ char *options =3D NULL;
>      /* MS_REMOUNT requires that the data parameter
>       * should be the same from the original mount
>       * except for the desired changes. Since we want
>       * to remount read-only, we should filter out
>       * rw (and ro too, because it confuses the kernel) */
>      (void) fstab_filter_options(m->options, "rw\0ro\0", NULL, NULL,=20
> &options);
>=20
>      /* We always try to remount directories read-only
>       * first, before we go on and umount them.
>       *
>       * Mount points can be stacked. If a mount
>       * point is stacked below / or /usr, we
>       * cannot umount or remount it directly,
>       * since there is no way to refer to the
>       * underlying mount. There's nothing we can do
>       * about it for the general case, but we can
>       * do something about it if it is aliased
>       * somehwere else via a bind mount. If we
>       * explicitly remount the super block of that
>       * alias read-only we hence should be
>       * relatively safe regarding keeping the fs we
>       * can otherwise not see dirty. */
>      log_info("Remounting '%s' read-only with options '%s'.", m->path,=20
> options);
>      (void) mount(NULL, m->path, NULL, MS_REMOUNT|MS_RDONLY, options);
>    }
>=20
>    /* Skip / and /usr since we cannot unmount that
>     * anyway, since we are running from it. They have
>     * already been remounted ro. */
>    if (path_equal(m->path, "/")
> #ifndef HAVE_SPLIT_USR
>      || path_equal(m->path, "/usr")
> #endif
>    )
>      continue;
>=20
>    /* Trying to umount. We don't force here since we rely
>        * on busy NFS and FUSE file systems to return EBUSY
>        * until we closed everything on top of them. */
>    log_info("Unmounting %s.", m->path);
>    if (umount2(m->path, 0) =3D=3D 0) {
>      if (changed)
>        *changed =3D true;
>=20
>      mount_point_free(head, m);
>    } else if (log_error) {
>      log_warning_errno(errno, "Could not unmount %s: %m", m->path);
>      n_failed++;
>    }
>  }
>=20
>  return n_failed;
> }
>=20
>=20
> So the short answer ultimately is... Systemd has a single umount=20
> function, which first does remount-ro, so it's actually remounting=20
> (nearly) everything read-only, then tries umount.
>=20
>=20
> Meanwhile, (semi-)answering the elsewhere implied question of why only=20
> Linux needs the mount-ro service...  I'm no BSD expert, but in my=20
> wanderings I came across a remark that they didn't need it, because their=20=

> kernel reboot/halt/poweroff routines have a built-in kernelspace sync-and-=

> remount-ro routine for anything that can't be unmounted, which Linux=20
> lacks.  They obviously consider this a Linux deficiency, but while I've=20=

> not come across the Linux reason for not doing it, an educated guess is=20=

> that it's considered putting policy into the kernel, and that's=20
> considered a no-no, policy is userspace; the kernel simply enforces it as=20=

> directed (which is why kernel 2.4's devfs was removed for 2.6, to be=20
> replaced with the userspace-based udev).  Additionally, not kernel-
> forcing the remount-ro bit does give developers a way to test results of=20=

> an uncontrolled shutdown, say on a specific testing filesystem only,=20
> without exposing the rest of the system, which can still be shut down=20
> normally, to it.
>=20
> So on Linux userspace must do the final umounts and force-read-onlys,=20
> because unlike the BSDs, the Linux kernel doesn't have builtin routines=20=

> that automatically force it, regardless of userspace.
>=20
> But as others have said, on Linux the remount-ro is _definitely_=20
> required, and "bad things _will_ happen" if it's not done.  (Just how bad=20=

> depends on the filesystem and its mount options, and hardware, among=20
> other things.)
>=20
>=20
> Finally, one more thing to mention.  On systems with magic-srq in the=20
> kernel...
>=20
> echo 0x30 > /proc/sys/kernel/sysrq
>=20
> ... enables the sync (0x10) and remount-readonly (0x20) functions.  (Of=20=

> course only do this at shutdown/reboot, as you don't want to disturb the=20=

> user's configured srq defaults in normal runtime.)
>=20
> You can then force emergency sync (s) and remount-read-only (u) with...
>=20
> echo s > /proc/sysrq-trigger
> echo u > /proc/sysrq-trigger
>=20
> As that's kernel emergency priority, it should force-sync and force=20
> everything readonly (and quiesce mid-layer layer block devices such as md=20=

> and dm), even if it would normally refuse to do so due to files open for=20=

> writing.  You might consider something like that as a fallback, if normal=20=

> mount-readonly fails.  Of course it won't work if magic-srq functionality=20=

> isn't built into the kernel, but then you're no worse off than before,=20
> and are far better off on kernels where it's supported, so it's certainly=20=

> worth considering. =3D:^)
>=20
> --=20
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>=20
>=20