public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-amd64] Systemd migration: opinion and questions
@ 2015-02-24 20:15 Marc Joliet
  2015-02-24 20:41 ` Randy Barlow
                   ` (6 more replies)
  0 siblings, 7 replies; 47+ messages in thread
From: Marc Joliet @ 2015-02-24 20:15 UTC (permalink / raw
  To: gentoo-amd64

[-- Attachment #1: Type: text/plain, Size: 14723 bytes --]

Hi list,

(Normally I ask these types of questions to gentoo-user, but thought I'd try
here for once, especially given the ease with which systemd flamewars erupt on
gentoo-user, compared to the much more civil discussions I've seen here.)

So, at night on Saturday, 14th February, I decided, on a whim, to install
systemd on my Uni laptop.  Everything went well, save for some remaining
questions (see below).  The Gentoo systemd wiki entry was very helpful.

[ It's really strange how that works: I just got the sudden urge to try systemd
  out, for no real reason at all, and couldn't stop myself.  Weird... ]

Heh... I was originally going to send this *before* migrating my desktop, but
it turns out I liked systemd enough to want to switch my desktop to it before I
managed to finish writing this.  And holy crap, Duncan was right: systemd is
*so* *fast* on an SSD, it's just not funny.  It takes *3 seconds* after the
kernel boots for me to get a login screen (for a total of about 5-6 seconds
after the boot loader).  I barely get a chance to see systemd's messages, they
just start zipping by, and then *wham*: login screen.  I am *amazed*!

*ahem*

Anyway, I thought it might be interesting to report other things I did and what
I like about systemd so far.  Due to the length and additional context I'll
split the rest of this email into sections for better readability.  I apologise
for the length; in my defence, this is a fundamental system change, and I
wanted to offer enough context for everything.

== Additional changes on the laptop ==

I replaced laptop-mode-tools with tlp from the overlay of the same name (not
directly available via layman, but there's a bug on b.g.o that links to the
GitHub repo, which has installation instructions).  Since I controlled the
backlight with lmt, and tlp doesn't support that, I had to find a replacement
for that feature.  I ultimately decided on power-backlight, which I found via
the Arch Linux wiki.  I wrote an ebuild and put it in my personal overlay
(http://sourceforge.net/p/mjolietoverlay/code/ci/master/tree/app-laptop/power-backlight/),
and so far everything is working fine.

This might seem like a pointless change, but Arch switched to it, too, and
doesn't even package lmt anymore (albeit only since late 2014), so it seems
like the way forward.

== Additional changes on both systems ==

I uninstalled acpid, since the Gentoo Wiki page on systemd mentions that it
might not be necessary (see the question below).  Being able to do this was
another reason I switched to tlp, since it doesn't depend on acpid.  So far I
have yet to notice anything amiss.

I also uninstalled tmpwatch, since systemd has a built-in service for doing the
same thing (systemd-tmpfiles).

== Things I have *not* gotten rid of (yet) ==

Fcron is still around, mainly because packages might rely on it being there
(e.g., man-db and mlocate install files there), but also because I haven't
researched systemd timers yet.

I plan on uninstalling syslog-ng, but haven't done so yet.  I simply feel
better waiting a bit, even though I don't run it anymore.  Man, I feel silly
after typing that...

== Network migration on my desktop ==

My desktop has a somewhat more complicated network setup than the laptop (which
uses NetworkManager).  The wan0 network device is no problem, but I also have a
bridge with one physical device connected to it, and one dummy device.

Thus, I had to migrate my netifrc based configuration.  Due to word of mouth, I
decided on netctl.  I originally ignored systemd-networkd because I keep
hearing that it's only for simple networks, but after looking at the man page
it appears that it can do (almost) everything that I needed it to, although I'm
not sure about dummy device support.  I need that for MATLAB, which stupidly
requires the presence of an interface named eth0, whose MAC address it verifies
during its licence check.  This became a problem after renaming my network
devices following the news entry 2013-03-29-udev-upgrade ("Upgrading udev to
version >=200").

What surprised me was that netifrc doesn't seem to integrate dummy devices
properly, i.e., it doesn't seem to be possible to rename them in
/etc/conf.d/net directly.  I implemented that via an appropriate call to ip in
an /etc/local.d file, and had "net.eth0" depend on "local".  So /etc/conf.d/net
looked like this:

    config_eth0="172.16.1.1/24"
    mac_eth0="00:18:f3:97:17:72"
    rc_net_eth0_provide="!net"
    rc_net_eth0_need="local"

And the local.d script executed

    ip link set dev dummy0 name eth0

In comparison, the netctl configuration has everything in one place:

    # cat /etc/netctl/dummy
    Description='A dummy interface (for MATLAB)'
    Interface=eth0
    Connection=dummy
    IP=static
    Address=172.16.1.1
    IPCustom=( 'link set dev eth0 address 00:18:f3:97:17:72' )

It also takes care of loading the dummy kernel module, so I don't don't need an
/etc/modules-load.d/ entry, whereas with netifrc/OpenRC I needed to load it
manually via /etc/conf./modules (though maybe that has changed?).

== Stuff I liked ==

If you want to skip this, the questions are in the next section.

=== Pleasant surprises ===

Some surprising things worked out of the box, for example all /etc/local.d/
files are dynamically converted to units at run-time and executed.  I hardly
had to migrate anything.  The one exception was chrony, where I had to override
the service file to add extra command line flags that don't have corresponding
configuration file entries.

On my desktop I had to add network dependencies to some socket units so that
they wouldn't start before the network they're supposed to listen on is fully
configured (see the question below).  Otherwise, it just had a few more things
that needed doing, but that is the nature of running services on it (dovecot,
postfix, and samba).

=== The Journal ===

I like the journal.  No, sorry, I *really* *really* like the journal. I like
the filters you can apply (e.g., limiting the output by unit via the -u flag),
the priority system, the automatic use of ACLs so that you can see your own
user's logs. I also think that it's a nice detail that "systemctl status -u"
will show the last lines of log output. This actually helped me at work :-) .

=== Socket activation ===

I especially learned to like it after reading the early "systemd for admins"
articles, which made the implications of the design more clear.  I definitely
like the ability to have a service start on-demand instead of spending 99.99%
of its time idle.  Finally, I also like the simplicity of the C API from
systemd/sd-daemon.h.

=== Miscellanea ===

Some things work better, e.g., closing the laptop's lid suspends the laptop
automatically without me having to configure anything.  My desktop is *almost*
there, see the questions below.

Systemd can restart itself, e.g., for upgrades.  Nice!

Systemd-analyze is really nice.

Systemd exposes various system features that require extra work to set up
otherwise.  The example that made me think of this is "systemctl kexec", but
from reading Leannart Poettering's blog series, this is one of the goals of
systemd.  I can definitely sympathise with that.

== Questions ===

=== ACPID: needed or not? ===

Does acpid provide anything that systemd does not, and if so, what kind of
"conflicts" might I see?  The Gentoo Wiki page says that acpid is likely not
needed.

FWIW, I already unmerged it and have not noticed any missing functionality,
even after over a week of regular usage.  So I'm tending towards "no, not
needed".

=== Timers ===

Can a systemd timer depend on a mount point such that it waits until the mount
point exists before running?  Or will it fail after a timeout?  I want to
research this myself, but haven't gotten around to it yet.

The problem I have is that my external HDD does not come up properly on cold
boot, so that I have to unplug it and plug it back in in order for the kernel
to fully initialise it and for it to mount, which is problematic for my backup
fcron jobs, since they have &bootrun set. This means that the backup script
will fail, unless I'm a) fast enough at re-plugging the HDD, and b) fast enough
at logging in (so that my automounter mounts the HDD.  I will then have to
manually re-run it (i.e., fcrondyn -x "run <ID>") or wait for the next time
it's supposed to run.

Naturally, I would like a more robust system than that, and hope that systemd
timers can make my life easier here.

=== User units ===

I would like to convert some programs I start in .xprofile to units that are
started by my users's systemd instance.  I started off with mpd, but it doesn't
start automatically ("systemctl --user start mpd" works fine, though), even
though it's enabled:

    % systemctl --user status mpd
    ● mpd.service - Music Player Daemon
       Loaded: loaded (/usr/lib64/systemd/system/mpd.service; enabled)
       Active: active (running) since Di 2015-02-24 19:39:46 CET; 1h 6min ago
     Main PID: 1091 (mpd)
       CGroup: /user.slice/user-1000.slice/user@1000.service/mpd.service
               └─1091 /usr/bin/mpd --no-daemon

    Feb 24 19:39:46 marcec systemd[384]: Started Music Player Daemon.
    [...]

Also:

    % tree .config/systemd/
    .config/systemd/
    └── user
        ├── mpd.service -> /usr/lib64/systemd/system/mpd.service
        └── multi-user.target.wants
            └── mpd.service -> /home/marcec/.config/systemd/user/mpd.service

Is the symlink the problem?  Do I have to create an actual file?  Is the
target.wants wrong?

=== Suspend on the desktop ===

Like I mentioned above, my Desktop can *almost* suspend reliably, after trying
it out once every year or two (it's over 8 years old).  Mostly it would just
not wake back up.  The latest status (before systemd) was that the kernel
crashed after waking up (but I think that was a known bug that was fixed in the
meantime).

Now with systemd, it wakes back up properly, but one of the soundcard drivers
(ice1724) is apparently unreliable, so that sound stops working after wakeup.
Plus, access to the soundcard was blocked by rtkit.  I believe what one would
normally do is unload the module before suspending, but the only way I could
find to tell systemd to do that requires creating a file in /usr (as done here,
for example: http://forums.fedoraforum.org/showthread.php?t=294065).  Is there
a better way?

=== Dovecot and socket activation ===

Dovecot gave me problems with socket activation, I kept getting errors like the
following:

    Feb 20 22:58:19 marcec dovecot[6500]: master: Panic: io_add(0x1) called
    twice fd=4, callback=0x40a970 -> 0x40a970 Feb 20 22:58:19 marcec
    dovecot[6500]: master: Error: Raw
    backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x6accf) [0x7f42b9300ccf]
    -> /usr/lib64/dovecot/libdovecot.so.0(i_syslog_fatal_handler

I don't know who's at fault here, systemd or dovecot, but I had to simplify the
configuration a bit:

    diff --git a/dovecot/conf.d/10-master.conf b/dovecot/conf.d/10-master.conf
    index ddbde28..c0ca867 100644
    --- a/dovecot/conf.d/10-master.conf
    +++ b/dovecot/conf.d/10-master.conf
    @@ -16,11 +16,11 @@
     
     service imap-login {
       inet_listener imap {
    -    address = ::1, 127.0.0.1, 172.16.0.1
    +    address = ::1, 172.16.0.1
         port = 10087
       }
       inet_listener imaps {
    -    address = ::1, 127.0.0.1, 172.16.0.1
    +    address = ::1, 172.16.0.1
         port = 10887
         #ssl = yes
       }
    diff --git a/systemd/system/dovecot.socket.d/sockets.conf
    b/systemd/system/dovecot.socket.d/sockets.conf index bcc29c6..368d461 100644
    --- a/systemd/system/dovecot.socket.d/sockets.conf
    +++ b/systemd/system/dovecot.socket.d/sockets.conf
    @@ -1,8 +1,6 @@
     [Socket]
     ListenStream=
    -ListenStream=127.0.0.1:10087
     ListenStream=172.16.0.1:10087
     ListenStream=[::1]:10087
    -ListenStream=127.0.0.1:10887
     ListenStream=172.16.0.1:10887
     ListenStream=[::1]:10887

Note that it also worked if I removed 127.0.0.1 from imap, and 172.16.0.1 from
imaps (or, IIRC, vice versa).  I'm fine with the configuration being like this,
but I'm nonetheless baffled by the necessity.  I couldn't even find a bug
report or ML post.

=== Depending on a specific network interface ===

Some socket units failed to start at first, due to "resource" errors.  So I
made them depend on netctl@bridge via *.d/requires.conf files like so:

    [Unit]
    Requires=netctl@bridge.service
    After=netctl@bridge.service

That fixed the errors, but is it the correct way to depend on that interface
(ignoring the fact that I could have put symlinks at the right place instead)?

=== dmesg and syslog ===

On the laptop I didn't immediately stop running syslog-ng.  Following a thread
on gentoo-user about the journal complaining about messages not being forwarded
to syslog-ng, I got curious and checked the laptop today.

What I did was run

    diff -U8 <(zcat /var/log/messages-*) <(journalctl)

and look for differences in the time frame where both were running (note
that /var/log/messages is empty, so I only checked the rotated logs).

First of all, while I clearly remember seeing similar messages about "missing
messages", I couldn't find any in either the journal or syslogs output.  Maybe
I mixed them up with "Forward time jump detected" messages from chrony?  But
those are too rare, unless I noticed one right after resuming from suspend and
got mixed up then?  I think that I would like confirmation that those messages
are actually logged properly, and where.

However, I did find something unexpected.  I haven't looked deeply yet, but I
noticed some dmesg entries not ending up in the journal (i.e., a bunch of "-"
lines in-between the "+" lines in the diff output). Did I do something wrong, or
is this known behaviour when running the journal and syslog-ng simultaneously?
I'll look more closely tomorrow, but wondered if anybody here noticed anything
similar.

== The End ==

*phew*

Of course, some of the complications will go away once I've gotten around to
turning my desktop back into a pure desktop and not the desktop/server/router
hybrid it is now, but that will have to wait.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread
* [gentoo-amd64] Re: Systemd migration: opinion and questions
@ 2015-02-25 11:04 Duncan
  0 siblings, 0 replies; 47+ messages in thread
From: Duncan @ 2015-02-25 11:04 UTC (permalink / raw
  To: gentoo-amd64

Marc Joliet posted on Tue, 24 Feb 2015 21:15:45 +0100 as excerpted:

> Like I mentioned above, my Desktop can *almost* suspend reliably,
> after trying it out once every year or two (it's over 8 years old).
> Mostly it would just not wake back up.  The latest status (before
> systemd) was that the kernel crashed after waking up (but I think
> that was a known bug that was fixed in the meantime).

FWIW... for readers that might be considering multi-device filesystems 
(like btrfs can do if so configured) or mdraid/dmraid/etc type multi-
device backed filesystems...

What I've found with suspend/hibernate over the years and on multiple 
systems, is that while at least one or the other generally works,
that's on the condition of no multi-device mdraid, dmraid, btrfs, etc.
Once you get more than one physical device backing a filesystem, be
that mdraid/ dmraid/etc, or a direct multi-device filesystem such as
btrfs, suspend/ hibernate, or more precisely, the resume, becomes
problematic, because invariably, one device lags the others in
resuming, and the kernel apparently doesn't know how to properly wait
for multiple devices to all resume and stabilize at once.

The result is that the system resumes, but one or more physical devices 
underlying that multi-device filesystem often stabilizes slower than
the others and gets dropped from the raid or whatever.

If it's a raid0, without redundancy, that can mean you just lost it and 
everything on it, period.  With other raid types it's not so bad, but
it does often mean either manual missing-device delete and re-add
(mdraid), or (on btrfs) a quick reboot as btrfs becomes unstable when a
device drops and the system will often freeze or livelock if you don't,
and then a scrub after the reboot brings the device back in, to sync
the updates back to the device that was dropped and brought back in.

So... after finding that about half the time after a suspend or
hibernate and resume cycle you have to either reboot or do manual
system maintenance anyway, pretty soon you learn not to bother with
suspend/ hibernate in the first place, and simply shutdown, and restart
from power- off when you'd otherwise resume.

Back on spinning rust, this used to annoy me greatly, as the boot time 
wasn't bad, but it did mean starting with an empty cache, and losing
that several gigs of cache of much slower spinning rust at the reboot
was / painful/.

While reasonably fast SSDs are still slower than cache, the difference
is 2-3 orders of magnitude smaller, and losing the cache on reboot
isn't the big deal it once was.  Between that and the fact that systemd
bootup is so fast on ssd, full shutdown and restart isn't such a big
deal these days.

But it sure would be nice if the kernel could learn to handle resume
from suspend or hibernate much like it does bootup, using bootwait or
similar kernel commandline option not just at boot, but at resume as
well, so systems that can and do /boot/ multiple devices just fine,
can /resume/ them just fine as well.  Then we'd not have to worry about
such problems.  Oh, well... maybe someday...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


-- 
Duncan - No HTML messages please; they are filtered as spam.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2015-05-23 18:08 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-24 20:15 [gentoo-amd64] Systemd migration: opinion and questions Marc Joliet
2015-02-24 20:41 ` Randy Barlow
2015-02-24 23:11   ` Marc Joliet
2015-02-25 22:42     ` Marc Joliet
2015-02-27 22:29       ` Marc Joliet
2015-02-24 21:44 ` Rich Freeman
2015-02-25  7:50   ` Marc Joliet
2015-02-25 12:01     ` Rich Freeman
2015-02-25 18:25       ` Marc Joliet
2015-03-01 12:48         ` Marc Joliet
2015-03-01 13:34           ` Rich Freeman
2015-03-01 18:20             ` Marc Joliet
2015-03-01 19:13               ` Rich Freeman
2015-03-02  5:13                 ` [gentoo-amd64] " Duncan
2015-03-14 14:01                   ` Marc Joliet
2015-03-14 12:57                 ` [gentoo-amd64] " Marc Joliet
2015-03-14 13:02               ` Marc Joliet
2015-02-25 10:13   ` [gentoo-amd64] " Duncan
2015-02-25 12:13     ` Rich Freeman
2015-02-26  0:35       ` Duncan
2015-02-25 18:56     ` Marc Joliet
2015-02-26  1:55       ` Duncan
2015-02-24 21:51 ` [gentoo-amd64] " Frank Peters
2015-02-25 14:31   ` Michael Mattes
2015-02-25 20:28   ` Marc Joliet
2015-02-25 10:15 ` [gentoo-amd64] " Duncan
2015-02-25 10:33 ` Duncan
2015-02-25 19:17   ` Marc Joliet
2015-02-25 19:31     ` Rich Freeman
2015-02-25 19:54       ` Marc Joliet
2015-02-25 22:30 ` [gentoo-amd64] " Marc Joliet
2015-05-20  8:01 ` Marc Joliet
2015-05-20 10:44   ` [gentoo-amd64] " Duncan
2015-05-20 11:22     ` Rich Freeman
2015-05-21  9:36       ` Duncan
2015-05-21 11:33         ` Marc Joliet
2015-05-23  8:49         ` Marc Joliet
2015-05-23  9:32           ` Marc Joliet
2015-05-23 10:41           ` Duncan
2015-05-23 11:11             ` Marc Joliet
2015-05-23 11:37               ` Rich Freeman
2015-05-23 12:02                 ` Duncan
2015-05-23 18:07               ` Marc Joliet
2015-05-23  8:17       ` Duncan
2015-05-23 12:14         ` Duncan
2015-05-21 11:29     ` Marc Joliet
  -- strict thread matches above, loose matches on Subject: below --
2015-02-25 11:04 Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox