public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: "J. Roeleveld" <joost@antarean.org>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Re: Recommendations for scheduler
Date: Mon, 04 Aug 2014 12:40:12 +0200	[thread overview]
Message-ID: <4871526.Mj2HT7lMQH@andromeda> (raw)
In-Reply-To: <slrnltun2t.8io.martin@bois.imp.fu-berlin.de>

On Monday, August 04, 2014 10:11:41 AM Martin Vaeth wrote:
> J. Roeleveld <joost@antarean.org> wrote:
> > These schedules then also can't be restarted from the beginning
> > when they stop halfway through without risking massive consistency
> > problems in the final data.
> 
> So you have a command which might break due to hardware error
> and cannot be rerun. I cannot see how any general-purpose scheduler
> might help you here: You either need to be able to split your command
> into several (sequential) commands or you need something adapted
> for your particular command.

A general-purpose scheduler can work, as they do exist. (With a price tag)
In the OSS world, there is, to my knowledge, none.
Yours seems to be the most promising as it looks like the missing features 
shouldn't be too difficult to add.

The commands are relatively simple, but they deal with large amounts of data. 
I am talking about ETL processes that, due to the amount of data being 
processed, can easily take several hours per step.
If, during one of these steps, the database or ETL process suffers a crash, 
the activities of the ETL process need to be rolled back to the point where 
you can restart it.

I am not talking about simple schedules related to day-to-day maintenance of a 
few servers.

> > And then multiple of those starting at random times with
> > occasionally a whole bunch of the same schedule put into the
> > queue with dependencies to the previous run.
> 
> That's not a problem. Only if the granularity of one command is
> not fine enough, it becomes a problem.

If nothing happens, it can all be stuck into a single script and the end 
result will be the same. Problems start because the real world is not 100% 
reliable.

> > If, during that time, one of the machines has a hardware failure
> > or the scheduling process crashes on one or more of the servers,
> > the last state needs to be recoverable.
> 
> One must distinguish two cases:
> 
> 1. The machine running "schedule-server" has a hardware failure.
>    (Let us assume tha "schedule-server" does not have a software failure -
>    otherwise, you have problems anyway.)
> 2. Some other machine has a hardware failure.
> 
> Case 2. is not bad (as concerns the scheduling): Of course, the
> machine will not report that it completed the job, and you will
> have to think how to complete the job. But it is clear that in
> such exceptional cases you have to interfere manually in some sense.

Agreed, this happens more often then you might think.

> In order to deal with case 1., you can regularly (e.g. each minute)
> dump the output of "schedule list" (possibly suppressing non-important
> data through the options to keep it short).

Or all the necessary information is kept in-sync on persistent storage. This 
would then also allow easy fail-over if the master-schedule-node fails. A 2nd 
machine could quickly take over.

> One could add a logging option to decrease the possible race of 1 minute,
> but in case of hardware failure a possible race cannot be excluded anyway.
> 
> In case 1. you manually have to re-queue the jobs and think what to do
> with the already started jobs. However, I cannot imagine that this
> occurs so frequently that this exceptional case becomes something
> one should seriously think about.

As I mentioned above, with BI infrastructure (large databases, complex ETL 
processes, interactive report services,....), the scheduler is busy 24/7. The 
amount of tasks, schedules, dependencies, states,.... that needs to kept track 
off can easily lead to unforeseen issues and bugs.


  reply	other threads:[~2014-08-04 10:40 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-01 17:32 [gentoo-user] Recommendations for scheduler Alan McKinnon
2014-08-01 17:49 ` Сергей
2014-08-01 17:50   ` Сергей
2014-08-01 19:10     ` Alan McKinnon
2014-08-03  9:27       ` Bruce Schultz
2014-08-03 12:08         ` Alan McKinnon
2014-08-04  3:07           ` Bruce Schultz
2014-08-01 18:17 ` [gentoo-user] " James
2014-08-01 19:19   ` Alan McKinnon
2014-08-01 19:35     ` covici
2014-08-02  9:18       ` Alan McKinnon
2014-08-02 13:34         ` J. Roeleveld
2014-08-01 21:17   ` J. Roeleveld
2014-08-01 21:02 ` Martin Vaeth
2014-08-01 21:22   ` J. Roeleveld
2014-08-01 22:06     ` Martin Vaeth
2014-08-02  9:27   ` Alan McKinnon
2014-08-01 21:13 ` [gentoo-user] " J. Roeleveld
2014-08-02  9:33   ` Alan McKinnon
2014-08-02 13:31     ` J. Roeleveld
2014-08-02 14:03       ` Alan McKinnon
2014-08-02 16:53         ` [gentoo-user] " James
2014-08-03  7:23           ` Joost Roeleveld
2014-08-03 12:16             ` Alan McKinnon
2014-08-03 13:33               ` J. Roeleveld
2014-08-05 19:57             ` James
2014-08-05 20:43               ` J. Roeleveld
2014-08-05 21:29                 ` Alan McKinnon
2014-08-06  8:29                 ` Peter Humphrey
2014-08-06 10:26                   ` J. Roeleveld
2014-08-03  7:50       ` Martin Vaeth
2014-08-03  8:06         ` J. Roeleveld
2014-08-03 12:10           ` Martin Vaeth
2014-08-03 13:36             ` J. Roeleveld
2014-08-03 20:04               ` Alan McKinnon
2014-08-03 20:23                 ` J. Roeleveld
2014-08-03 20:57                   ` Alan McKinnon
2014-08-03 21:10                     ` J. Roeleveld
2014-08-04  8:41               ` Martin Vaeth
2014-08-04  9:02                 ` J. Roeleveld
2014-08-04 10:11                   ` Martin Vaeth
2014-08-04 10:40                     ` J. Roeleveld [this message]
2014-08-04 13:31                       ` Martin Vaeth
2014-08-04 13:35                         ` Alan McKinnon
2014-08-04 19:46                           ` J. Roeleveld
2014-08-04 20:38                             ` Alan McKinnon
2014-08-05 11:42                               ` J. Roeleveld
2014-08-04 19:54                         ` J. Roeleveld
2014-08-05  6:33                           ` Martin Vaeth
2014-08-05 11:32                             ` J. Roeleveld
2014-08-08 23:21                               ` Martin Vaeth
2014-08-03 13:02     ` [gentoo-user] " Tanstaafl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4871526.Mj2HT7lMQH@andromeda \
    --to=joost@antarean.org \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox