From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 11C9213877A for ; Mon, 4 Aug 2014 20:40:04 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 7C697E0933; Mon, 4 Aug 2014 20:39:54 +0000 (UTC) Received: from mail-we0-f169.google.com (mail-we0-f169.google.com [74.125.82.169]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 5291AE0882 for ; Mon, 4 Aug 2014 20:39:53 +0000 (UTC) Received: by mail-we0-f169.google.com with SMTP id u56so8223932wes.28 for ; Mon, 04 Aug 2014 13:39:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=C8gn94DShkxtVSmntBf9IghOG++20wuhP/bhRRvDQGw=; b=xtREJTXPU4PQ5BGabQ30450m3QSmGUPurLzjsQcI4SdMdzPk1AI9bpVXd8t8aRbWpR 7Hi7eYqiC0i2Sdf1hXeS3tLfZPLxA8fuseamVDFBt9pcmDDMEfTS9sgrQJYqhQTMw/rB qkgaZRxMb2t9eZ7uRp6lwmo+kT2Iq5URTpXfa0nDjoROsB/JHaYV4ykszleojQ0OOP0p RV5pAVXHJkAO48eoUk3kkp00+0tiU2IgWEAtX44/0bYfOabNgS8huZG3NIjvp1x1ACHq pAV1grlCI/JVEqJa6jTWfb7wCbidN/lCbPZMLVLyx5BsNsvhZCgH+p/fweFFaHQfMYin eElQ== X-Received: by 10.180.104.163 with SMTP id gf3mr32879549wib.24.1407184791116; Mon, 04 Aug 2014 13:39:51 -0700 (PDT) Received: from [172.20.0.41] (105-237-190-183.access.mtnbusiness.co.za. [105.237.190.183]) by mx.google.com with ESMTPSA id ch5sm45983003wjb.18.2014.08.04.13.39.49 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Aug 2014 13:39:50 -0700 (PDT) Message-ID: <53DFEF61.9060604@gmail.com> Date: Mon, 04 Aug 2014 22:38:57 +0200 From: Alan McKinnon User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] Re: Recommendations for scheduler References: <53DBCF34.6060601@gmail.com> <10589ff2-a642-4951-955d-339d475ccaad@email.android.com> <4871526.Mj2HT7lMQH@andromeda> <53DF8C2D.5070402@gmail.com> <506301c4-0106-4ee6-b532-12d08b7a1ce1@email.android.com> In-Reply-To: <506301c4-0106-4ee6-b532-12d08b7a1ce1@email.android.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Archives-Salt: 70816b34-17d5-436c-9ed3-52aa4a189137 X-Archives-Hash: 7981f2ca6e2b71f4038e20004703ee69 On 04/08/2014 21:46, J. Roeleveld wrote: > On 4 August 2014 15:35:41 CEST, Alan McKinnon wrote: >> On 04/08/2014 15:31, Martin Vaeth wrote: >>> J. Roeleveld wrote: >>>>> >>>>> So you have a command which might break due to hardware error >>>>> and cannot be rerun. I cannot see how any general-purpose scheduler >>>>> might help you here: You either need to be able to split your >> command >>>>> into several (sequential) commands or you need something adapted >>>>> for your particular command. >>>> >>>> A general-purpose scheduler can work, as they do exist. >>> >>> I doubt that they can solve your problem. >>> Let me repeat: You have a single program which accesses the database >>> in a complex way and somewhere in the course of accessing it, the >>> machine (or program) crashes. >>> No general-purpose program can recover from this: You need >>> particular knowledge of the database and the program if you even >>> want to have a *chance* to recover from such a situation. >>> A program with such a particular knowledge can hardly be called >>> "general-purpose". >> >> >> Joost, >> >> Either make the ETL tool pick up where it stopped and continue as it is >> the only that knows what it was doing and how far it got. Or, wrap the >> entire script in a single transaction. > > Alan, > > That would be the ideal solution. You have the same concerns I do - how do you make a transaction around 500 million rows. So I asked the in-house expert - Mrs Alan :-) > However, a single transaction dealing with around 500,000,000 rows will get me shot by the DBAs :) > (Never mind that the performance of this will be such that having it all done by an office full of secretaries might be quicker.) She reckons an ETL job *must* be self-contained; if it isn't then it's broken by design. It must be idempotent too, which can be as simple as "Truncate, Load, Commit" > Having the ETL process clever enough to be able to pick up from any point requires a degree of forward thinking and planning that is never done in real life. > I would love to design it like that as it isn't too difficult. But I always get brought into these projects when implementing these structures will require a full rewrite and getting the original architects to admit their design can't be made restartable without human intervention. I agree with that design actually - it's the job of the hardware and OS guys to make stuff reliable that the application layer can rely on. When a SAN connection goes away, it usually comes back and the app layer just carries on (never mind that it retried 100 times meanwhile). Sometimes this doesn't work out. The easiest, cheapest and quickest way to handle it is to just restart the whole job from the beginning. This offends the engineer in us sometimes, but it really is the best way and all of Unix is built on this very idea :-) If the SAn goes away too often and it causes issues, the manybe the best approach is to get the SAN and facilities guys to get their act together > At which point the business simply says it is acceptable to have people do a manual rollback and restart the schedules from wherever it went wrong. Exactly. One of the few cases where business has the correct idea. There's only some many pennies to spend and so many dollars to be delivered. > > I'm sure your wife has similar experiences as this is why these projects are always late to deliver and over budget. She says her projects are subject to the same universal inviolate rule as mine: time and cost is always best engineering estimate times pi We learn to deal with it. Which brings us back to Martin's initial statement: a scheduler cannot deal with any of this, the job itself must. It's an unpredictable event and schedulers can only deal with predictable events -- Alan McKinnon alan.mckinnon@gmail.com