public inbox for gentoo-hardened@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-hardened] Weird coincidental PAX crashes
@ 2014-05-09 15:15 Michael Orlitzky
  2014-05-09 15:29 ` Mark Gomersbach
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Orlitzky @ 2014-05-09 15:15 UTC (permalink / raw
  To: gentoo-hardened

Last week, the LMTP daemon on our mail server (HP DL360 G6) crashed.
People noticed that the mail stopped coming in, so I SSHed in to check
on it, and there were some weird traces in the dmesg. While trying to
investigate, I noticed some more badness:

  # emerge -1 openntpd
  Calculating dependencies... done!

  >>> Verifying ebuild manifests
  Killed

At that point I'm thinking, "hardware problem, there goes the weekend."
Most of my tools are committing suicide so I surrender and reboot. The
thing comes up fine and has been working ever since.

Today, another one of our web servers (HP DL360 G5?) does the same
thing. The nightly log report was empty, because there's no syslog
daemon running. This morning dmesg shows:

> [Fri May  9 11:00:42 2014] PAX: refcount overflow detected in: syslog-ng:21823, uid/euid: 0/0
> [Fri May  9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted 3.11.7-hardened-r1 #1
> [Fri May  9 11:00:42 2014] task: ffff8802cffca080 ti: ffff8802cffca488 task.ti: ffff8802cffca488
> [Fri May  9 11:00:42 2014] RIP: 0010:[<ffffffff810e311e>]  [<ffffffff810e311e>] 0xffffffff810e311e
> [Fri May  9 11:00:42 2014] RSP: 0018:ffff880416f21c78  EFLAGS: 00000a96
> [Fri May  9 11:00:42 2014] RAX: ffff88041f0048a0 RBX: ffff88041a1edf00 RCX: 0000000040276333
> [Fri May  9 11:00:42 2014] RDX: 0000000040276332 RSI: 0000000000000000 RDI: ffff88041d858720
> [Fri May  9 11:00:42 2014] RBP: 0000000000000008 R08: 0000000000010bc0 R09: ffff88042fb10bc0
> [Fri May  9 11:00:42 2014] R10: 8000000000000000 R11: ffffea000fec3040 R12: ffff88041f0048a0
> [Fri May  9 11:00:42 2014] R13: ffff88026628ef00 R14: ffff88041d858720 R15: ffff88041a1edf10
> [Fri May  9 11:00:42 2014] FS:  0000000000000000(0000) GS:ffff88042fb00000(0000) knlGS:0000000000000000
> [Fri May  9 11:00:42 2014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Fri May  9 11:00:42 2014] CR2: 0000035fb5abf850 CR3: 000000000138a000 CR4: 00000000000006b0
> [Fri May  9 11:00:42 2014] Stack:
> [Fri May  9 11:00:42 2014]  0000000000000000 ffffffff818dde60 ffff8804140ac100 ffff8802cffca570
> [Fri May  9 11:00:42 2014]  ffff8802cffca080 ffff880416eb4200 ffff8802cffca080 ffffffff81052750
> [Fri May  9 11:00:42 2014]  0000000000000000 0000000000000001 ffff88038e6260d8 ffff8802cffca598
> [Fri May  9 11:00:42 2014] Call Trace:
> [Fri May  9 11:00:42 2014]  [<ffffffff81052750>] ? 0xffffffff81052750
> [Fri May  9 11:00:42 2014]  [<ffffffff81036e10>] ? 0xffffffff81036e10
> [Fri May  9 11:00:42 2014]  [<ffffffff810371e8>] ? 0xffffffff810371e8
> [Fri May  9 11:00:42 2014]  [<ffffffff810449cc>] ? 0xffffffff810449cc
> [Fri May  9 11:00:42 2014]  [<ffffffff8100241f>] ? 0xffffffff8100241f
> [Fri May  9 11:00:42 2014]  [<ffffffff81002a89>] ? 0xffffffff81002a89
> [Fri May  9 11:00:42 2014]  [<ffffffff8137c212>] ? 0xffffffff8137c212
> [Fri May  9 11:00:42 2014] Code: e9 68 fd 01 00 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 8b 7b 10 48 8b 40 30 f0 ff 88 30 01 00 00 71 09 f0 ff 80 30 01 00 00 cd 04 <0f> b7 00 89 c2 66 81 e2 00 b0 66 81 fa 00 20 0f 84 53 ff ff ff 
> [Fri May  9 11:00:42 2014] PAX: refcount overflow detected in: syslog-ng:21823, uid/euid: 0/0
> [Fri May  9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted 3.11.7-hardened-r1 #1
> [Fri May  9 11:00:42 2014] task: ffff8802cffca080 ti: ffff8802cffca488 task.ti: ffff8802cffca488
> [Fri May  9 11:00:42 2014] RIP: 0010:[<ffffffff810e311e>]  [<ffffffff810e311e>] 0xffffffff810e311e
> [Fri May  9 11:00:42 2014] RSP: 0018:ffff880416f21c78  EFLAGS: 00000a96
> [Fri May  9 11:00:42 2014] RAX: ffff88041f0048a0 RBX: ffff88041a1edc00 RCX: 0000000040c384f8
> [Fri May  9 11:00:42 2014] RDX: 0000000040c384f7 RSI: 0000000000000000 RDI: ffff88041d858720
> [Fri May  9 11:00:42 2014] RBP: 0000000000000008 R08: 0000000000010b60 R09: ffff88042fb10b60
> [Fri May  9 11:00:42 2014] R10: 8000000000000000 R11: ffffea000f26a840 R12: ffff88041f0048a0
> [Fri May  9 11:00:42 2014] R13: ffff88026628e000 R14: ffff88041d858720 R15: ffff88041a1edc10
> [Fri May  9 11:00:42 2014] FS:  0000000000000000(0000) GS:ffff88042fb00000(0000) knlGS:0000000000000000
> [Fri May  9 11:00:42 2014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Fri May  9 11:00:42 2014] CR2: 0000035fb5abf850 CR3: 000000000138a000 CR4: 00000000000006b0
> [Fri May  9 11:00:42 2014] Stack:
> [Fri May  9 11:00:42 2014]  0000000000000000 ffffffff818dde60 ffff88041a1ed400 ffff8802cffca570
> [Fri May  9 11:00:42 2014]  ffff8802cffca080 ffff880416eb4200 ffff8802cffca080 ffffffff81052750
> [Fri May  9 11:00:42 2014]  0000000000000000 0000000000000001 ffff88038e6260d8 ffff8802cffca598
> [Fri May  9 11:00:42 2014] Call Trace:
> [Fri May  9 11:00:42 2014]  [<ffffffff81052750>] ? 0xffffffff81052750
> [Fri May  9 11:00:42 2014]  [<ffffffff81036e10>] ? 0xffffffff81036e10
> [Fri May  9 11:00:42 2014]  [<ffffffff810371e8>] ? 0xffffffff810371e8
> [Fri May  9 11:00:42 2014]  [<ffffffff810449cc>] ? 0xffffffff810449cc
> [Fri May  9 11:00:42 2014]  [<ffffffff8100241f>] ? 0xffffffff8100241f
> [Fri May  9 11:00:42 2014]  [<ffffffff81002a89>] ? 0xffffffff81002a89
> [Fri May  9 11:00:42 2014]  [<ffffffff8137c212>] ? 0xffffffff8137c212
> [Fri May  9 11:00:42 2014] Code: e9 68 fd 01 00 0f 1f 84 00 00 00 00 00 48 8b 43 18 48 8b 7b 10 48 8b 40 30 f0 ff 88 30 01 00 00 71 09 f0 ff 80 30 01 00 00 cd 04 <0f> b7 00 89 c2 66 81 e2 00 b0 66 81 fa 00 20 0f 84 53 ff ff ff


And things are segfaulting randomly. These machines have been running
3.11.7-hardened-r1 since 2014-01-03 without issue until now -- all of
our servers have. So the timing seems a little coincidental.

If it's not hardware (two different machines...), does this look like a
kernel bug? Should I upgrade over the weekend and pray?


^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [gentoo-hardened] Weird coincidental PAX crashes
@ 2014-05-15 13:48 PaX Team
  2014-05-15 15:22 ` Michael Orlitzky
  0 siblings, 1 reply; 13+ messages in thread
From: PaX Team @ 2014-05-15 13:48 UTC (permalink / raw
  To: gentoo-hardened

[-- Attachment #1: Mail message body --]
[-- Type: text/plain, Size: 1630 bytes --]

On 9 May 2014 at 11:15, Michael Orlitzky wrote:

> > [Fri May  9 11:00:42 2014] PAX: refcount overflow detected in: syslog-ng:21823, uid/euid: 0/0

this is the key message, the REFCOUNT feature triggered as it detected
an overflow somewhere.

> > [Fri May  9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted 3.11.7-hardened-r1 #1

as a sidenote, this is a very old kernel and it's quite possible that
it's a false positive that we fixed since.

> > [Fri May  9 11:00:42 2014] Call Trace:
> > [Fri May  9 11:00:42 2014]  [<ffffffff81052750>] ? 0xffffffff81052750
> > [Fri May  9 11:00:42 2014]  [<ffffffff81036e10>] ? 0xffffffff81036e10
> > [Fri May  9 11:00:42 2014]  [<ffffffff810371e8>] ? 0xffffffff810371e8
> > [Fri May  9 11:00:42 2014]  [<ffffffff810449cc>] ? 0xffffffff810449cc
> > [Fri May  9 11:00:42 2014]  [<ffffffff8100241f>] ? 0xffffffff8100241f
> > [Fri May  9 11:00:42 2014]  [<ffffffff81002a89>] ? 0xffffffff81002a89
> > [Fri May  9 11:00:42 2014]  [<ffffffff8137c212>] ? 0xffffffff8137c212

unfortunately the backtrace is not usable as is due to lack of symbols.
if you still have the original vmlinux around (or can reproduce it with
all the debug symbols) then i can take a look and perhaps figure out
where the refcount overflow was detected (and whether it was a false
positive or not).

> If it's not hardware (two different machines...), does this look like a
> kernel bug? Should I upgrade over the weekend and pray?

it's not a hardware issue (at least not directly) but a software one and
regardless of it being perhaps a false positive you should be using a newer
kernel that we support ;).


[-- Attachment #2: Mail message body --]
[-- Type: Application/Octet-stream, Size: 1662 bytes --]

On 9 May 2014 at 11:15, Michael Orlitzky wrote:

> > [Fri May  9 11:00:42 2014] PAX: refcount overflow detected in: syslog-ng:21823, uid/euid: 0/0

this is the key message, the REFCOUNT feature triggered as it detected
an overflow somewhere.

> > [Fri May  9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted 3.11.7-hardened-r1 #1

as a sidenote, this is a very old kernel and it's quite possible that
it's a false positive that we fixed since.

> > [Fri May  9 11:00:42 2014] Call Trace:
> > [Fri May  9 11:00:42 2014]  [<ffffffff81052750>] ? 0xffffffff81052750
> > [Fri May  9 11:00:42 2014]  [<ffffffff81036e10>] ? 0xffffffff81036e10
> > [Fri May  9 11:00:42 2014]  [<ffffffff810371e8>] ? 0xffffffff810371e8
> > [Fri May  9 11:00:42 2014]  [<ffffffff810449cc>] ? 0xffffffff810449cc
> > [Fri May  9 11:00:42 2014]  [<ffffffff8100241f>] ? 0xffffffff8100241f
> > [Fri May  9 11:00:42 2014]  [<ffffffff81002a89>] ? 0xffffffff81002a89
> > [Fri May  9 11:00:42 2014]  [<ffffffff8137c212>] ? 0xffffffff8137c212

unfortunately the backtrace is not usable as is due to lack of symbols.
if you still have the original vmlinux around (or can reproduce it with
all the debug symbols) then i can take a look and perhaps figure out
where the refcount overflow was detected (and whether it was a false
positive or not).

> If it's not hardware (two different machines...), does this look like a
> kernel bug? Should I upgrade over the weekend and pray?

it's not a hardware issue (at least not directly) but a software one and
regardless of it being perhaps a false positive you should be using a newer
kernel that we support ;).

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [gentoo-hardened] Weird coincidental PAX crashes
@ 2014-05-15 13:49 PaX Team
  0 siblings, 0 replies; 13+ messages in thread
From: PaX Team @ 2014-05-15 13:49 UTC (permalink / raw
  To: gentoo-hardened

[-- Attachment #1: Mail message body --]
[-- Type: text/plain, Size: 403 bytes --]

On 13 May 2014 at 15:39, Joshua Kinard wrote:

> For me, I never had an actual oops.  Just a note in dmesg that pax was
> killing command-line processes at random.  Running services didn't seem to
> be affected, but I could go run grep or something and it'd just abruptly
> terminate.

when PaX kills something then there're always logs about the reason, so
you could post those and CC me on the bugs.


[-- Attachment #2: Mail message body --]
[-- Type: Application/Octet-stream, Size: 411 bytes --]

On 13 May 2014 at 15:39, Joshua Kinard wrote:

> For me, I never had an actual oops.  Just a note in dmesg that pax was
> killing command-line processes at random.  Running services didn't seem to
> be affected, but I could go run grep or something and it'd just abruptly
> terminate.

when PaX kills something then there're always logs about the reason, so
you could post those and CC me on the bugs.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-05-15 15:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-09 15:15 [gentoo-hardened] Weird coincidental PAX crashes Michael Orlitzky
2014-05-09 15:29 ` Mark Gomersbach
2014-05-09 15:39   ` Michael Orlitzky
2014-05-09 17:46     ` "Tóth Attila"
2014-05-10 11:14       ` Joshua Kinard
2014-05-10 11:39         ` Michael Orlitzky
2014-05-10 13:43           ` Anthony G. Basile
2014-05-13 19:39             ` Joshua Kinard
2014-05-15 13:11               ` Anthony G. Basile
2014-05-10 16:37     ` Mark Gomersbach
  -- strict thread matches above, loose matches on Subject: below --
2014-05-15 13:48 PaX Team
2014-05-15 15:22 ` Michael Orlitzky
2014-05-15 13:49 PaX Team

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox