public inbox for gentoo-server@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-server] Server goes down twice in two days, looking for input
@ 2005-09-22 14:34 fire-eyes
  2005-09-22 15:46 ` Yogesh Sharma
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: fire-eyes @ 2005-09-22 14:34 UTC (permalink / raw
  To: gentoo-server

Hi there!

I have had our gentoo server go down twice in under two days. I am
currently trying to figure out what is happening.

Facts:
- Dual PIII 933 MHz system (ServerWorks OSB4)
- 3.5GB RAM
- 2.6.11.2-grsec-20050614 kernel (self rolled)
- SCSI: Adaptec AIC-7892P, 32MB cache
+ Disks
 + For Operating System
  - 2x IBM DDYS-T09170N SCSI U160 10KRPM 9.1GB in a RAID1, 1x of the
 same for hotspare
 + For storage etc
  - 3x IBM IC35L036UWD210-0 SCSSI U160 10KRPM
  - 1x IBM DDYS-T36950N SCSI U160 10KRPM
  - In a RAID5

Tuesday afternoon, I was informed that there might be problems with this
server. I had just been working on it via shell. I went back, and found
it unresponsive.

I went into the server room, only to catch it ending a reboot and being
almost totally back up. It behaved the rest of the day. I was not able
to find any indications of problems in the logs.

Wednesday evening, I was again working on the system via ssh, and it
stopped responding. I got into the server room fast enough this time. I
tried to log in as root, and could not. I could type the username, but
upon hitting enter, nothing happened. That was true for any console.

I have syslogd output *.* to console 10, so flipping over there, I saw
nothing out of the ordinary. The last long, at the time I noticed it
stop responding, was a simple run-of-the-mill firewall log.

After a few more minutes, the system was completely unresponsive, save
for SysReq. I Synced, tErmed, Synced again, remounted everything
read-only and forced it to reboot.

Again I was not able to find any logs indicating any errors at all.

The only two possibilities I see is that I was goofing with samba at
various points, both days. However, samba was not running at either time
the system went down.

The other, more interesting one, is that at both times when the system
went down, I was creating a tar.bz2 out of a kernel source. The problems
happened well after I had started them.

Wondering about disks, I threw smartctl -a at both of the arrays (sda ,
sdb), which didn't give anything out of the ordinary.

However when I run smartctl -t offline or -t short or -t long on sda or
sdb, it immediately fails on STDOUT. This I find odd, because I have
done these tests in the past. Granted it was on a different kernel,
which I no longer have around.

Here is an example:

# smartctl -t short /dev/sda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Short Background Self Test Failed

Looking at logs, I don't see anything strange. Including dmesg.

I am worried by the smartctl results, however I realize there is a small
possibility that it's due to kernel changes.

Any ideas out there? Thank you for reading this! I *LOVE* Gentoo in
production.
-- 
gentoo-server@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-09-23 17:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-22 14:34 [gentoo-server] Server goes down twice in two days, looking for input fire-eyes
2005-09-22 15:46 ` Yogesh Sharma
2005-09-22 15:54   ` fire-eyes
2005-09-22 16:25     ` Yogesh Sharma
2005-09-22 16:29       ` fire-eyes
2005-09-22 16:53         ` Yogesh Sharma
2005-09-22 16:58           ` fire-eyes
2005-09-22 15:57 ` A. Khattri
2005-09-22 16:07   ` fire-eyes
2005-09-22 17:01 ` Robert Larson
2005-09-22 17:57   ` fire-eyes
2005-09-23 16:14 ` Sven Vermeulen
2005-09-23 17:48   ` fire-eyes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox