From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by nuthatch.gentoo.org with esmtp (Exim 4.43) id 1Duv9M-0000iI-5A for garchives@archives.gentoo.org; Tue, 19 Jul 2005 16:40:56 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.13.4/8.13.4) with SMTP id j6JGdHmj010180; Tue, 19 Jul 2005 16:39:17 GMT Received: from smtp.gentoo.org (smtp.gentoo.org [134.68.220.30]) by robin.gentoo.org (8.13.4/8.13.4) with ESMTP id j6JGbeOW006135 for ; Tue, 19 Jul 2005 16:37:40 GMT Received: from 179.mupb.nyrk.nycenycp.dsl.att.net ([12.98.143.179] helo=hercules.magbank.com) by smtp.gentoo.org with esmtp (Exim 4.43) id 1Duv79-0004D4-Aj for gentoo-dev@lists.gentoo.org; Tue, 19 Jul 2005 16:38:39 +0000 Content-class: urn:content-classes:message Subject: [gentoo-dev] init script guidelines Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C58C80.D2B0E2FC" Date: Tue, 19 Jul 2005 12:42:16 -0400 Message-ID: <16CC9569DA3E4D41A1D4BC25D7B5A16A473A7C@hercules.magbank.com> X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: init script guidelines Thread-Index: AcWMgNJVv8vCasn7QY6UK2n8OqwfiQ== From: "Eric Brown" To: X-Archives-Salt: 403f728a-f634-4f7f-9210-c37a1404c4bd X-Archives-Hash: 56b6320aa1609a4edf8bd89be3e9369f This is a multi-part message in MIME format. ------_=_NextPart_001_01C58C80.D2B0E2FC Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Services that use Gentoo init scripts often report a status of [started] or [OK] even though they fail to start. The most recent bug like this that I've found is with snort. If you have a bad rule, snort will initialize, the rc-scripts will give it an [OK] status, and then it will die once it parses the rules. =20 The real problem is not that the daemons don't return errors, but that our init scripts do not make reasonable attempts to verify service startup. If a Gentoo init script claims that a service started, it should make an effort to check that the processes are actually running shortly after the script is run, even if start-stop-daemon says the parent process initialized. Relying on the return value of start-stop-daemon is simply insufficient for some services. =20 I am aware that there are services that can monitor the status of other services (app-admin/mon?) but I think this issue is a little different. If an ebuild developer is aware of an error condition can commonly occur shortly after a daemon initializes, why not attempt to catch those errors? Most of them could probably be caught by simply checking to see if the process is still running shortly after the script is run. =20 I propose increasing developer awareness of this problem, perhaps through some formal guidelines for ebuild developers. At the very least, I would like to see these bugs being acknowledged in bugs.gentoo.org instead of getting the same old upstream/it's not our fault response. We are responsible for our init scripts, and they are important to our users. =20 I have 2 ideas for the actual implementation: =20 1) Some kind of check() function in the init.d script, or a generic check() function that just checks with ps | grep. This might typically be called after having the init script sleep for a certain amount of time. =20 2) Some kind of special init script that checks registered daemons after all services have started. (i.e. it depends on all daemons, or they are put into it's config file). With this scheme we could avoid excessive sleeping during startup (to keep it fast), And perhaps even keep using service specific check() functions =20 =20 Does anyone else think this idea is worth looking into? =20 =20 =20 =20 ------_=_NextPart_001_01C58C80.D2B0E2FC Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Services that use Gentoo init scripts often =
report a status of [started] or
[OK] even =
though they fail to start.  The most recent bug like this that =
I've
found is =
with snort.  If you have a bad rule, snort will initialize, =
the
rc-scripts will give it an [OK] status, and =
then it will die once it parses =
the
rules.
 
The real =
problem is not that the daemons don't return errors, but that our =
init
scripts =
do not make reasonable attempts to verify service startup.  If a =
Gentoo
init =
script claims that a service started, it should make an effort to =
check
that the =
processes are actually running shortly after the script is run, even =
if
start-stop-daemon says the parent process =
initialized.  Relying on the =
return
value of =
start-stop-daemon is simply insufficient for some =
services.
 
I am =
aware that there are services that can monitor the status of other =
services
(app-admin/mon?) but I think this issue is a =
little different.  If an =
ebuild
developer =
is aware of an error condition can commonly occur shortly after =
a
daemon =
initializes, why not attempt to catch those errors?  Most of them =
could
probably =
be caught by simply checking to see if the process is still =
running
shortly =
after the script is run.
 
I propose =
increasing developer awareness of this problem, perhaps through =
some
formal =
guidelines for ebuild developers.  At the very least, I would like =
to see
these =
bugs being acknowledged in bugs.gentoo.org instead of getting the same =
old
upstream/it's not our fault response.  =
We are responsible for our init =
scripts,
and they =
are important to our users.
 
I have 2 =
ideas for the actual =
implementation:
 
1) Some =
kind of check() function in the init.d script, or a generic check() =
function
that just =
checks with ps | grep.  This might typically be called after having =
the
init =
script sleep for a certain amount of =
time.
 
2) Some =
kind of special init script that checks registered daemons after all =
services
have =
started. (i.e. it depends on all daemons, or they are put into =
it’s config file).
With this =
scheme we could avoid excessive sleeping during startup (to keep it =
fast),
And =
perhaps even keep using service specific check() =
functions
 
 
Does =
anyone else think this idea is worth looking =
into?

 

 

  

 

=
------_=_NextPart_001_01C58C80.D2B0E2FC-- -- gentoo-dev@gentoo.org mailing list