* [gentoo-server] Server goes down twice in two days, looking for input
@ 2005-09-22 14:34 fire-eyes
2005-09-22 15:46 ` Yogesh Sharma
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: fire-eyes @ 2005-09-22 14:34 UTC (permalink / raw
To: gentoo-server
Hi there!
I have had our gentoo server go down twice in under two days. I am
currently trying to figure out what is happening.
Facts:
- Dual PIII 933 MHz system (ServerWorks OSB4)
- 3.5GB RAM
- 2.6.11.2-grsec-20050614 kernel (self rolled)
- SCSI: Adaptec AIC-7892P, 32MB cache
+ Disks
+ For Operating System
- 2x IBM DDYS-T09170N SCSI U160 10KRPM 9.1GB in a RAID1, 1x of the
same for hotspare
+ For storage etc
- 3x IBM IC35L036UWD210-0 SCSSI U160 10KRPM
- 1x IBM DDYS-T36950N SCSI U160 10KRPM
- In a RAID5
Tuesday afternoon, I was informed that there might be problems with this
server. I had just been working on it via shell. I went back, and found
it unresponsive.
I went into the server room, only to catch it ending a reboot and being
almost totally back up. It behaved the rest of the day. I was not able
to find any indications of problems in the logs.
Wednesday evening, I was again working on the system via ssh, and it
stopped responding. I got into the server room fast enough this time. I
tried to log in as root, and could not. I could type the username, but
upon hitting enter, nothing happened. That was true for any console.
I have syslogd output *.* to console 10, so flipping over there, I saw
nothing out of the ordinary. The last long, at the time I noticed it
stop responding, was a simple run-of-the-mill firewall log.
After a few more minutes, the system was completely unresponsive, save
for SysReq. I Synced, tErmed, Synced again, remounted everything
read-only and forced it to reboot.
Again I was not able to find any logs indicating any errors at all.
The only two possibilities I see is that I was goofing with samba at
various points, both days. However, samba was not running at either time
the system went down.
The other, more interesting one, is that at both times when the system
went down, I was creating a tar.bz2 out of a kernel source. The problems
happened well after I had started them.
Wondering about disks, I threw smartctl -a at both of the arrays (sda ,
sdb), which didn't give anything out of the ordinary.
However when I run smartctl -t offline or -t short or -t long on sda or
sdb, it immediately fails on STDOUT. This I find odd, because I have
done these tests in the past. Granted it was on a different kernel,
which I no longer have around.
Here is an example:
# smartctl -t short /dev/sda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Short Background Self Test Failed
Looking at logs, I don't see anything strange. Including dmesg.
I am worried by the smartctl results, however I realize there is a small
possibility that it's due to kernel changes.
Any ideas out there? Thank you for reading this! I *LOVE* Gentoo in
production.
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 14:34 [gentoo-server] Server goes down twice in two days, looking for input fire-eyes
@ 2005-09-22 15:46 ` Yogesh Sharma
2005-09-22 15:54 ` fire-eyes
2005-09-22 15:57 ` A. Khattri
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Yogesh Sharma @ 2005-09-22 15:46 UTC (permalink / raw
To: gentoo-server
Which network card do you have ?
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 15:46 ` Yogesh Sharma
@ 2005-09-22 15:54 ` fire-eyes
2005-09-22 16:25 ` Yogesh Sharma
0 siblings, 1 reply; 13+ messages in thread
From: fire-eyes @ 2005-09-22 15:54 UTC (permalink / raw
To: gentoo-server
Yogesh Sharma wrote:
> Which network card do you have ?
>
I have two cards. #1 is an onboard, #2 is an addon.
1) Intel EEPro 100
2) Broadcom BCM5701 gigabit
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 14:34 [gentoo-server] Server goes down twice in two days, looking for input fire-eyes
2005-09-22 15:46 ` Yogesh Sharma
@ 2005-09-22 15:57 ` A. Khattri
2005-09-22 16:07 ` fire-eyes
2005-09-22 17:01 ` Robert Larson
2005-09-23 16:14 ` Sven Vermeulen
3 siblings, 1 reply; 13+ messages in thread
From: A. Khattri @ 2005-09-22 15:57 UTC (permalink / raw
To: gentoo-server
On Thu, 22 Sep 2005, fire-eyes wrote:
> After a few more minutes, the system was completely unresponsive, save
> for SysReq. I Synced, tErmed, Synced again, remounted everything
> read-only and forced it to reboot.
>
> Again I was not able to find any logs indicating any errors at all.
Im thinking this might be a hardware issue or maybe a heat issue. If a fan
is running to slow or has failed completely or maybe a bad RAM stick (we
had a server with a bad RAM stick that kept rebooting).
Throw in a LiveCD and run the memtest.
--
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 15:57 ` A. Khattri
@ 2005-09-22 16:07 ` fire-eyes
0 siblings, 0 replies; 13+ messages in thread
From: fire-eyes @ 2005-09-22 16:07 UTC (permalink / raw
To: gentoo-server
A. Khattri wrote:
> On Thu, 22 Sep 2005, fire-eyes wrote:
>
>
>>After a few more minutes, the system was completely unresponsive, save
>>for SysReq. I Synced, tErmed, Synced again, remounted everything
>>read-only and forced it to reboot.
>>
>>Again I was not able to find any logs indicating any errors at all.
>
>
> Im thinking this might be a hardware issue or maybe a heat issue. If a fan
> is running to slow or has failed completely or maybe a bad RAM stick (we
> had a server with a bad RAM stick that kept rebooting).
>
> Throw in a LiveCD and run the memtest.
>
>
Yep I plan on it. If I can ever get down time, which at this rate they
won't have a choice.
I wouldn't think it's overheating but you never know. It's in a room air
conditioned to about 62F.
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 15:54 ` fire-eyes
@ 2005-09-22 16:25 ` Yogesh Sharma
2005-09-22 16:29 ` fire-eyes
0 siblings, 1 reply; 13+ messages in thread
From: Yogesh Sharma @ 2005-09-22 16:25 UTC (permalink / raw
To: gentoo-server
fire-eyes wrote:
>I have two cards. #1 is an onboard, #2 is an addon.
>
>1) Intel EEPro 100
>2) Broadcom BCM5701 gigabit
>
>
May be it is not related to your issue but I have one IBM A-PRO with
Broadcom. While using tg3 driver server it stopped responding couple of
times. After switching to BCM57xx driver it is running fine.
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 16:25 ` Yogesh Sharma
@ 2005-09-22 16:29 ` fire-eyes
2005-09-22 16:53 ` Yogesh Sharma
0 siblings, 1 reply; 13+ messages in thread
From: fire-eyes @ 2005-09-22 16:29 UTC (permalink / raw
To: gentoo-server
Yogesh Sharma wrote:
> fire-eyes wrote:
>
>
>>I have two cards. #1 is an onboard, #2 is an addon.
>>
>>1) Intel EEPro 100
>>2) Broadcom BCM5701 gigabit
>>
>>
>
> May be it is not related to your issue but I have one IBM A-PRO with
> Broadcom. While using tg3 driver server it stopped responding couple of
> times. After switching to BCM57xx driver it is running fine.
That's curious. I am using the tg3 driver. I wasn't aware there was
another, I'll have to dig around in the kernel config.
I've also come accross additional info, there is a windows server on
this network, with the same card. It has experienced similar issues.
If it is related to the network card I'd be surprised, it's been in
there over four months. And I've been on this kernel for about 3 months.
I'll keep this in mind, thanks!
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 16:29 ` fire-eyes
@ 2005-09-22 16:53 ` Yogesh Sharma
2005-09-22 16:58 ` fire-eyes
0 siblings, 1 reply; 13+ messages in thread
From: Yogesh Sharma @ 2005-09-22 16:53 UTC (permalink / raw
To: gentoo-server
fire-eyes wrote:
>That's curious. I am using the tg3 driver. I wasn't aware there was
>another, I'll have to dig around in the kernel config.
>
>I've also come accross additional info, there is a windows server on
>this network, with the same card. It has experienced similar issues.
>
>If it is related to the network card I'd be surprised, it's been in
>there over four months. And I've been on this kernel for about 3 months.
>
>I'll keep this in mind, thanks!
>
>
>
If you have 32 bit processor
# ACCEPT_KEYWORDS="~x86" emerge bcm570x
if you have amd64
# ACCEPT_KEYWORDS="~amd64" emerge bcm570x
# echo bcm5700 >>/etc/modules.autoload.d/kernel-2.6
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 16:53 ` Yogesh Sharma
@ 2005-09-22 16:58 ` fire-eyes
0 siblings, 0 replies; 13+ messages in thread
From: fire-eyes @ 2005-09-22 16:58 UTC (permalink / raw
To: gentoo-server
Yogesh Sharma wrote:
>>If it is related to the network card I'd be surprised, it's been in
>>there over four months. And I've been on this kernel for about 3 months.
>>
>>I'll keep this in mind, thanks!
>>
>>
>>
>
>
> If you have 32 bit processor
> # ACCEPT_KEYWORDS="~x86" emerge bcm570x
>
> if you have amd64
> # ACCEPT_KEYWORDS="~amd64" emerge bcm570x
>
> # echo bcm5700 >>/etc/modules.autoload.d/kernel-2.6
Okay, so it is not in the kernel, and only external? I show it masked. I
can unmask it, i'm just curious.
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 14:34 [gentoo-server] Server goes down twice in two days, looking for input fire-eyes
2005-09-22 15:46 ` Yogesh Sharma
2005-09-22 15:57 ` A. Khattri
@ 2005-09-22 17:01 ` Robert Larson
2005-09-22 17:57 ` fire-eyes
2005-09-23 16:14 ` Sven Vermeulen
3 siblings, 1 reply; 13+ messages in thread
From: Robert Larson @ 2005-09-22 17:01 UTC (permalink / raw
To: gentoo-server
> Tuesday afternoon, I was informed that there might be problems with this
> server. I had just been working on it via shell. I went back, and found
> it unresponsive.
>
> I went into the server room, only to catch it ending a reboot and being
> almost totally back up. It behaved the rest of the day. I was not able
> to find any indications of problems in the logs.
>
This may be a long shot, but is it possible the BIOS battery is dying or is
already dead?
Also, another thing worth checking and the main reason I respond is, have you
considered the possibility of an intrusion? I have seen things gone awry
before with the addition of a kernel rootkit, and it might be worth looking
into just in case. Probably not the case, but, better safe than sorry.
This would help, though, I couldn't seem to pull it up just now...:
http://www.rookit.org
Of course, there are some portage pkgs:
app-forensics/rkhunter
app-forensics/chkrootkit
HTH,
Robert
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 17:01 ` Robert Larson
@ 2005-09-22 17:57 ` fire-eyes
0 siblings, 0 replies; 13+ messages in thread
From: fire-eyes @ 2005-09-22 17:57 UTC (permalink / raw
To: gentoo-server
Robert Larson wrote:
> This may be a long shot, but is it possible the BIOS battery is dying or is
> already dead?
Hadn't thought of that, I'll check it.
> Also, another thing worth checking and the main reason I respond is, have you
> considered the possibility of an intrusion? I have seen things gone awry
> before with the addition of a kernel rootkit, and it might be worth looking
> into just in case. Probably not the case, but, better safe than sorry.
Yes, I considered that. I have seen no signs of such activity. Of
course, that doesn't mean it hasn't happened.
> Of course, there are some portage pkgs:
> app-forensics/rkhunter
> app-forensics/chkrootkit
I use those :) They are nice tools.
Thanks for the reply!
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-22 14:34 [gentoo-server] Server goes down twice in two days, looking for input fire-eyes
` (2 preceding siblings ...)
2005-09-22 17:01 ` Robert Larson
@ 2005-09-23 16:14 ` Sven Vermeulen
2005-09-23 17:48 ` fire-eyes
3 siblings, 1 reply; 13+ messages in thread
From: Sven Vermeulen @ 2005-09-23 16:14 UTC (permalink / raw
To: gentoo-server
[-- Attachment #1: Type: text/plain, Size: 767 bytes --]
On Thu, Sep 22, 2005 at 10:34:06AM -0400, fire-eyes wrote:
> I have had our gentoo server go down twice in under two days. I am
> currently trying to figure out what is happening.
What did you do before this behavior started? Any kernel upgrades? You might
want to consider downgrading and checking if that fixes the problem.
Try running any system monitoring processes in the future that monitor,
amongst other things, CPU usage and I/O. It might help locating the source
of the problems.
Wkr,
Sven Vermeulen
--
Gentoo Foundation Trustee | http://foundation.gentoo.org
Gentoo Documentation Project Lead | http://www.gentoo.org/proj/en/gdp
Gentoo Council Member
The Gentoo Project <<< http://www.gentoo.org >>>
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-server] Server goes down twice in two days, looking for input
2005-09-23 16:14 ` Sven Vermeulen
@ 2005-09-23 17:48 ` fire-eyes
0 siblings, 0 replies; 13+ messages in thread
From: fire-eyes @ 2005-09-23 17:48 UTC (permalink / raw
To: gentoo-server
Sven Vermeulen wrote:
> On Thu, Sep 22, 2005 at 10:34:06AM -0400, fire-eyes wrote:
>
>>I have had our gentoo server go down twice in under two days. I am
>>currently trying to figure out what is happening.
>
>
> What did you do before this behavior started? Any kernel upgrades? You might
> want to consider downgrading and checking if that fixes the problem.
>
> Try running any system monitoring processes in the future that monitor,
> amongst other things, CPU usage and I/O. It might help locating the source
> of the problems.
There was a kernel upgrade but that was over three months ago. I have
tried to replicate the situation (it went down both times when I was
creating a tar.bz2) without luck.
I'm thinking of installing an snmpd and then perhaps cacti or similar on
another system to poll and keep histories of various data. That's a
good idea.
--
gentoo-server@gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2005-09-23 17:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-22 14:34 [gentoo-server] Server goes down twice in two days, looking for input fire-eyes
2005-09-22 15:46 ` Yogesh Sharma
2005-09-22 15:54 ` fire-eyes
2005-09-22 16:25 ` Yogesh Sharma
2005-09-22 16:29 ` fire-eyes
2005-09-22 16:53 ` Yogesh Sharma
2005-09-22 16:58 ` fire-eyes
2005-09-22 15:57 ` A. Khattri
2005-09-22 16:07 ` fire-eyes
2005-09-22 17:01 ` Robert Larson
2005-09-22 17:57 ` fire-eyes
2005-09-23 16:14 ` Sven Vermeulen
2005-09-23 17:48 ` fire-eyes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox