From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org)
	by finch.gentoo.org with esmtp (Exim 4.60)
	(envelope-from <gentoo-user+bounces-78180-garchives=archives.gentoo.org@lists.gentoo.org>)
	id 1JjIXi-0002L4-E3
	for garchives@archives.gentoo.org; Tue, 08 Apr 2008 18:27:38 +0000
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 6C4EAE039F;
	Tue,  8 Apr 2008 18:27:35 +0000 (UTC)
Received: from duke.localdomain (p78-102.acedsl.com [66.114.78.102])
	by pigeon.gentoo.org (Postfix) with ESMTP id 204EFE039F
	for <gentoo-user@lists.gentoo.org>; Tue,  8 Apr 2008 18:27:35 +0000 (UTC)
Received: from [127.0.0.1] (duke.wrkhors.com [127.0.0.1])
	by duke.localdomain (Postfix) with ESMTP id AA26228D675
	for <gentoo-user@lists.gentoo.org>; Tue,  8 Apr 2008 14:19:46 -0400 (EDT)
Message-ID: <47FBB742.2050806@wrkhors.com>
Date: Tue, 08 Apr 2008 14:19:46 -0400
From: Steven Lembark <lembark@wrkhors.com>
Organization: Workhorse Computing
User-Agent: Thunderbird 2.0.0.9 (X11/20071212)
Precedence: bulk
List-Post: <mailto:gentoo-user@lists.gentoo.org>
List-Help: <mailto:gentoo-user+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-user.gentoo.org>
X-BeenThere: gentoo-user@lists.gentoo.org
Reply-to: gentoo-user@lists.gentoo.org
MIME-Version: 1.0
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Re: Emergency shutdown, how to?
References: <47EC9F50.5070503@bellsouth.net>	 <200804021549.15550.dirk.heinrichs.ext@nsn.com>	 <3297039.CL3N7pcUCC@schmarck.cn>	 <200804021628.29507.dirk.heinrichs.ext@nsn.com>	 <28748152.K45aiMzFyV@michael-schmarck.my-fqdn.de>	 <20080402184829.1c6d2b9c@zaphod.digimed.co.uk>	 <5bdc1c8b0804041405u6f3fdef1r802963828f3bf8c5@mail.gmail.com>	 <47F7AC03.6010101@wrkhors.com> <1207530457.15340.11.camel@orpheus>	 <47FA59DA.9090404@wrkhors.com> <1207610982.15340.50.camel@orpheus>
In-Reply-To: <1207610982.15340.50.camel@orpheus>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Archives-Salt: 94cc1b71-b226-457c-ad2e-dbe504ef0fcf
X-Archives-Hash: f4e63b67daf7638b0544f864e8f56008


> I agree that your script is nice and simple, and hence less prone to
> errors.  I coded mine in c++ because I use it not only for a machine
> type watchdog, but also a task based watchdog that reboots the machine
> based on certain tasks living or not.  Each task has to register with
> the watchdog server and continually tell the server they're alive, or
> reboot!  But that's a story for another thread...

    #!/path/to/perl

    use strict;

    use Sys::Syslog;

    open my $fh, '>', '/dev/watchdog'
    or die "/dev/watchdog: $!";

    # if any of these go away we need to notice it.
    # ok... you'll notice the first one anyway.

    my @watchz
    = qw
    (
        init
        ntpd
        apache
        /opt/sybase/ASE-12_5/bin/dataserver
    );

    # wd timeout / 2, or 1 for minimum sleep
    # (avoid usleep: too much overhead).

    my $cycle   = 15;

    # get the syslog handle

    openlog blah blah blah
    or die 'Et tu, syslog?';

    CYCLE:
    for(;;)
    {
        sleep ( $cycle - ( time % $cycle ) );

        # split and args vary by O/S, this works on linux.

        my @procz   = map { split /\s+/, $_, 6 )[5] } qx( ps a );

        my %chechz  = ();

        @chechz{ @watchz }  = ();

        delete @chechz{ @procz };

        if( %chechz )
        {
            # oops, current proc's don't include the
            # list of processes being watched.
            #
            # this can happen twice in a w/d interval
            # before the system goes down.

            my $nastygram
            = join "\t", 'Missing proc's:', join "\t", keys %chechz

            syslog LOG_CRIT | LOG_FOO, $nastygram;

            next CYCLE

            # alternative here is to close $fh here and
            # bounce the system immediately, the
            # approach of looping allows an
            # intentional restart of the service
            # (in less than 1 w/d cycle) w/o bouncing the box.
        }

        # if the proc check got this far then the w/d
        # file gets poked and we live for another loop.

        print $wd "\n";
    }

    # this isn't a module

    0

    __END__

-- 
Steven Lembark                                            85-09 90th St.
Workhorse Computing                                 Woodhaven, NY, 11421
lembark@wrkhors.com                                      +1 888 359 3508
-- 
gentoo-user@lists.gentoo.org mailing list