* [gentoo-user] Recommendations for scheduler @ 2014-08-01 17:32 Alan McKinnon 2014-08-01 17:49 ` Сергей ` (3 more replies) 0 siblings, 4 replies; 52+ messages in thread From: Alan McKinnon @ 2014-08-01 17:32 UTC (permalink / raw To: gentoo-user Hi, Up-front disclaimer: Mostly [OT] post. But at least I'll test drive it on Gentoo before putting it in production :-) New job, new environment. Existing persons suffer from 5-year-old-with-a-hammer syndrome and assume cron is the solution to all ills. Result: a towering edifice of cron jobs that may or may not clobber each other's work, may or may not work at all, and implement no error handling at all. But my god, can they spew out mail from STOUT But cron has only one event trigger: wall-clock time. And it's a very blunt weapon. I'm looking for recommendations of alternative schedulers that satisfy real-world business needs that need some other event trigger than what the time is right now. For those familiar with it, I'm looking for something with the useful feature set, without the useless features and without the price tag of ControlM Anyone care to share experiences? -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-01 17:32 [gentoo-user] Recommendations for scheduler Alan McKinnon @ 2014-08-01 17:49 ` Сергей 2014-08-01 17:50 ` Сергей 2014-08-01 18:17 ` [gentoo-user] " James ` (2 subsequent siblings) 3 siblings, 1 reply; 52+ messages in thread From: Сергей @ 2014-08-01 17:49 UTC (permalink / raw To: gentoo-user For example in crontab */3 means "every three hours/minutes/etc". 2014-08-01 21:32 GMT+04:00 Alan McKinnon <alan.mckinnon@gmail.com>: > Hi, > > Up-front disclaimer: Mostly [OT] post. But at least I'll test drive it > on Gentoo before putting it in production :-) > > New job, new environment. Existing persons suffer from > 5-year-old-with-a-hammer syndrome and assume cron is the solution to all > ills. Result: a towering edifice of cron jobs that may or may not > clobber each other's work, may or may not work at all, and implement no > error handling at all. But my god, can they spew out mail from STOUT > > > But cron has only one event trigger: wall-clock time. And it's a very > blunt weapon. I'm looking for recommendations of alternative schedulers > that satisfy real-world business needs that need some other event > trigger than what the time is right now. > > For those familiar with it, I'm looking for something with the useful > feature set, without the useless features and without the price tag of > ControlM > > Anyone care to share experiences? > > > -- > Alan McKinnon > alan.mckinnon@gmail.com > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-01 17:49 ` Сергей @ 2014-08-01 17:50 ` Сергей 2014-08-01 19:10 ` Alan McKinnon 0 siblings, 1 reply; 52+ messages in thread From: Сергей @ 2014-08-01 17:50 UTC (permalink / raw To: gentoo-user Also you can have a look at anacron. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-01 17:50 ` Сергей @ 2014-08-01 19:10 ` Alan McKinnon 2014-08-03 9:27 ` Bruce Schultz 0 siblings, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-01 19:10 UTC (permalink / raw To: gentoo-user On 01/08/2014 19:50, Сергей wrote: > Also you can have a look at anacron. > > > Unfortunately, anacron doesn't suit my needs at all. Here's how anacron works: this bunch of job will all happen today regardless of what time it is. That's not what I need, I need something that has very little to do with time. Example: 1. Start backup job on db server A 2. When complete, copy backup to server B and do a test import 3. If import succeeds, move backup to permanent storage and log the fact 4. If import fails, raise an alert and trigger the whole cycle to start again at 1 Meanwhile, 1. All servers are regularly doing apt-get update and downloading .debs, and applying security packages. Delay this on the db server if a backup is in progress. Meanwhile there is the regular Friday 5am code-publish cycle and month-end finance runs - this is a DevOps environment. Yes, I know I can hack something together with bash scripts and cron with a truly insane number of flag files. But this doesn't work for sane definitions of work involving other people. I can't expect my support crew to read bash scripts they found from crontabs and figure out what they mean. They need a picture that shows what will happen when and what the environment looks like. So basically I need something to replace bash and cron the same way puppet replaces scp and for loops -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-01 19:10 ` Alan McKinnon @ 2014-08-03 9:27 ` Bruce Schultz 2014-08-03 12:08 ` Alan McKinnon 0 siblings, 1 reply; 52+ messages in thread From: Bruce Schultz @ 2014-08-03 9:27 UTC (permalink / raw To: gentoo-user On 2 August 2014 5:10:43 AM AEST, Alan McKinnon <alan.mckinnon@gmail.com> wrote: >On 01/08/2014 19:50, Сергей wrote: >> Also you can have a look at anacron. >> >> >> > > >Unfortunately, anacron doesn't suit my needs at all. Here's how anacron >works: > >this bunch of job will all happen today regardless of what time it is. >That's not what I need, I need something that has very little to do >with >time. Example: > >1. Start backup job on db server A >2. When complete, copy backup to server B and do a test import >3. If import succeeds, move backup to permanent storage and log the >fact >4. If import fails, raise an alert and trigger the whole cycle to start >again at 1 > >Meanwhile, > >1. All servers are regularly doing apt-get update and downloading >.debs, >and applying security packages. Delay this on the db server if a backup >is in progress. > >Meanwhile there is the regular Friday 5am code-publish cycle and >month-end finance runs - this is a DevOps environment. I'm not sure if its quite what you have in mind, and it comes with a bit of a steep learning curve, but cfengine might fit the bill. http://cfengine.com Bruce -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-03 9:27 ` Bruce Schultz @ 2014-08-03 12:08 ` Alan McKinnon 2014-08-04 3:07 ` Bruce Schultz 0 siblings, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-03 12:08 UTC (permalink / raw To: gentoo-user On 03/08/2014 11:27, Bruce Schultz wrote: > > > On 2 August 2014 5:10:43 AM AEST, Alan McKinnon <alan.mckinnon@gmail.com> wrote: >> On 01/08/2014 19:50, Сергей wrote: >>> Also you can have a look at anacron. >>> >>> >>> >> >> >> Unfortunately, anacron doesn't suit my needs at all. Here's how anacron >> works: >> >> this bunch of job will all happen today regardless of what time it is. >> That's not what I need, I need something that has very little to do >> with >> time. Example: >> >> 1. Start backup job on db server A >> 2. When complete, copy backup to server B and do a test import >> 3. If import succeeds, move backup to permanent storage and log the >> fact >> 4. If import fails, raise an alert and trigger the whole cycle to start >> again at 1 >> >> Meanwhile, >> >> 1. All servers are regularly doing apt-get update and downloading >> .debs, >> and applying security packages. Delay this on the db server if a backup >> is in progress. >> >> Meanwhile there is the regular Friday 5am code-publish cycle and >> month-end finance runs - this is a DevOps environment. > > I'm not sure if its quite what you have in mind, and it comes with a bit of a steep learning curve, but cfengine might fit the bill. > > http://cfengine.com Hi Bruce, Thanks for the reply. I only worked with cfengine once, briefly, years ago, and we quickly decided to roll our own deployment solution to solve that very specific vertical problem. Isn't cfengine a deployment framework, similar in ideals to puppet and chef? I don't want to deploy code or manage state, I want to run code (backups, database maintenance, repair of dodgy data in databases and code publish in a devops environment) -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-03 12:08 ` Alan McKinnon @ 2014-08-04 3:07 ` Bruce Schultz 0 siblings, 0 replies; 52+ messages in thread From: Bruce Schultz @ 2014-08-04 3:07 UTC (permalink / raw To: gentoo-user On 3 August 2014 10:08:39 PM AEST, Alan McKinnon <alan.mckinnon@gmail.com> wrote: >On 03/08/2014 11:27, Bruce Schultz wrote: >> >> >> On 2 August 2014 5:10:43 AM AEST, Alan McKinnon ><alan.mckinnon@gmail.com> wrote: >>> On 01/08/2014 19:50, Сергей wrote: >>>> Also you can have a look at anacron. >>>> >>>> >>>> >>> >>> >>> Unfortunately, anacron doesn't suit my needs at all. Here's how >anacron >>> works: >>> >>> this bunch of job will all happen today regardless of what time it >is. >>> That's not what I need, I need something that has very little to do >>> with >>> time. Example: >>> >>> 1. Start backup job on db server A >>> 2. When complete, copy backup to server B and do a test import >>> 3. If import succeeds, move backup to permanent storage and log the >>> fact >>> 4. If import fails, raise an alert and trigger the whole cycle to >start >>> again at 1 >>> >>> Meanwhile, >>> >>> 1. All servers are regularly doing apt-get update and downloading >>> .debs, >>> and applying security packages. Delay this on the db server if a >backup >>> is in progress. >>> >>> Meanwhile there is the regular Friday 5am code-publish cycle and >>> month-end finance runs - this is a DevOps environment. >> >> I'm not sure if its quite what you have in mind, and it comes with a >bit of a steep learning curve, but cfengine might fit the bill. >> >> http://cfengine.com > >Hi Bruce, > >Thanks for the reply. > >I only worked with cfengine once, briefly, years ago, and we quickly >decided to roll our own deployment solution to solve that very specific >vertical problem. > > >Isn't cfengine a deployment framework, similar in ideals to puppet and >chef? > >I don't want to deploy code or manage state, I want to run code >(backups, database maintenance, repair of dodgy data in databases and >code publish in a devops environment) Cfengine can run arbitrary commands at scheduled times, so it is capable as a replacment for cron. It also has package management built in for your package updates. It is in the same vein as chef & puppet, but "deployment framework" is not the way I would describe it. Deployment is only be a subset of what you can do with it. Cfengine3 was a major rewrite over version 2. The community edition is open source and should be available in Debian. The gentoo ebuild is a bit out of date currently. It also comes as a supported enterprise version which adds some sort of framework around the core - I've never personally looked into the enterprise features though. Bruce -- :B ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-01 17:32 [gentoo-user] Recommendations for scheduler Alan McKinnon 2014-08-01 17:49 ` Сергей @ 2014-08-01 18:17 ` James 2014-08-01 19:19 ` Alan McKinnon 2014-08-01 21:17 ` J. Roeleveld 2014-08-01 21:02 ` Martin Vaeth 2014-08-01 21:13 ` [gentoo-user] " J. Roeleveld 3 siblings, 2 replies; 52+ messages in thread From: James @ 2014-08-01 18:17 UTC (permalink / raw To: gentoo-user Alan McKinnon <alan.mckinnon <at> gmail.com> writes: > New job, new environment. Existing persons suffer from > 5-year-old-with-a-hammer syndrome and assume cron is the solution to all > ills. Result: a towering edifice of cron jobs that may or may not > clobber each other's work, may or may not work at all, and implement no > error handling at all. But my god, can they spew out mail from STOUT Sounds like a department full of computer scientist I inherited a few decades ago........... I know nothing bout chronos, but I find it an interesting read....ymmv. http://nerds.airbnb.com/introducing-chronos/ http://airbnb.github.io/chronos/ https://github.com/airbnb/chronos cheers mate! James ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-01 18:17 ` [gentoo-user] " James @ 2014-08-01 19:19 ` Alan McKinnon 2014-08-01 19:35 ` covici 2014-08-01 21:17 ` J. Roeleveld 1 sibling, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-01 19:19 UTC (permalink / raw To: gentoo-user On 01/08/2014 20:17, James wrote: > Alan McKinnon <alan.mckinnon <at> gmail.com> writes: > > >> New job, new environment. Existing persons suffer from >> 5-year-old-with-a-hammer syndrome and assume cron is the solution to all >> ills. Result: a towering edifice of cron jobs that may or may not >> clobber each other's work, may or may not work at all, and implement no >> error handling at all. But my god, can they spew out mail from STOUT > > Sounds like a department full of computer scientist I inherited a few > decades ago........... I've met folks like that.... Brilliant in their chosen field but completely useless outside it? The kind of fellows who see nothing wrong with eating a barbeque'd steak with a spoon because they can get a result? > > I know nothing bout chronos, but I find it an interesting read....ymmv. > > > http://nerds.airbnb.com/introducing-chronos/ > http://airbnb.github.io/chronos/ > https://github.com/airbnb/chronos Aaaaaaaah, now this sounds like something I can use. Proper dependency chains, Restful JSON interface so the devs can write code to drive it in automation. Good find, thanks! > > > cheers mate! > > James > > > > > > -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-01 19:19 ` Alan McKinnon @ 2014-08-01 19:35 ` covici 2014-08-02 9:18 ` Alan McKinnon 0 siblings, 1 reply; 52+ messages in thread From: covici @ 2014-08-01 19:35 UTC (permalink / raw To: gentoo-user Alan McKinnon <alan.mckinnon@gmail.com> wrote: > On 01/08/2014 20:17, James wrote: > > Alan McKinnon <alan.mckinnon <at> gmail.com> writes: > > > > > >> New job, new environment. Existing persons suffer from > >> 5-year-old-with-a-hammer syndrome and assume cron is the solution to all > >> ills. Result: a towering edifice of cron jobs that may or may not > >> clobber each other's work, may or may not work at all, and implement no > >> error handling at all. But my god, can they spew out mail from STOUT > > > > Sounds like a department full of computer scientist I inherited a few > > decades ago........... > > I've met folks like that.... > Brilliant in their chosen field but completely useless outside it? The > kind of fellows who see nothing wrong with eating a barbeque'd steak > with a spoon because they can get a result? > > > > > I know nothing bout chronos, but I find it an interesting read....ymmv. > > > > > > http://nerds.airbnb.com/introducing-chronos/ > > http://airbnb.github.io/chronos/ > > https://github.com/airbnb/chronos > > Aaaaaaaah, now this sounds like something I can use. Proper dependency > chains, Restful JSON interface so the devs can write code to drive it in > automation. > > Good find, thanks! Unless I am missing something, chronos is not in the tree at all. -- Your life is like a penny. You're going to lose it. The question is: How do you spend it? John Covici covici@ccs.covici.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-01 19:35 ` covici @ 2014-08-02 9:18 ` Alan McKinnon 2014-08-02 13:34 ` J. Roeleveld 0 siblings, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-02 9:18 UTC (permalink / raw To: gentoo-user On 01/08/2014 21:35, covici@ccs.covici.com wrote: > Alan McKinnon <alan.mckinnon@gmail.com> wrote: > >> On 01/08/2014 20:17, James wrote: >>> Alan McKinnon <alan.mckinnon <at> gmail.com> writes: >>> >>> >>>> New job, new environment. Existing persons suffer from >>>> 5-year-old-with-a-hammer syndrome and assume cron is the solution to all >>>> ills. Result: a towering edifice of cron jobs that may or may not >>>> clobber each other's work, may or may not work at all, and implement no >>>> error handling at all. But my god, can they spew out mail from STOUT >>> >>> Sounds like a department full of computer scientist I inherited a few >>> decades ago........... >> >> I've met folks like that.... >> Brilliant in their chosen field but completely useless outside it? The >> kind of fellows who see nothing wrong with eating a barbeque'd steak >> with a spoon because they can get a result? >> >>> >>> I know nothing bout chronos, but I find it an interesting read....ymmv. >>> >>> >>> http://nerds.airbnb.com/introducing-chronos/ >>> http://airbnb.github.io/chronos/ >>> https://github.com/airbnb/chronos >> >> Aaaaaaaah, now this sounds like something I can use. Proper dependency >> chains, Restful JSON interface so the devs can write code to drive it in >> automation. >> >> Good find, thanks! > > Unless I am missing something, chronos is not in the tree at all. > Correct, it isn't in the tree. But there's nothing stopping me from getting it in there -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-02 9:18 ` Alan McKinnon @ 2014-08-02 13:34 ` J. Roeleveld 0 siblings, 0 replies; 52+ messages in thread From: J. Roeleveld @ 2014-08-02 13:34 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1799 bytes --] On Saturday, August 02, 2014 11:18:32 AM Alan McKinnon wrote: > On 01/08/2014 21:35, covici@ccs.covici.com wrote: > > Alan McKinnon <alan.mckinnon@gmail.com> wrote: > >> On 01/08/2014 20:17, James wrote: > >>> Alan McKinnon <alan.mckinnon <at> gmail.com> writes: > >>>> New job, new environment. Existing persons suffer from > >>>> 5-year-old-with-a-hammer syndrome and assume cron is the solution to > >>>> all > >>>> ills. Result: a towering edifice of cron jobs that may or may not > >>>> clobber each other's work, may or may not work at all, and implement no > >>>> error handling at all. But my god, can they spew out mail from STOUT > >>> > >>> Sounds like a department full of computer scientist I inherited a few > >>> decades ago........... > >> > >> I've met folks like that.... > >> Brilliant in their chosen field but completely useless outside it? The > >> kind of fellows who see nothing wrong with eating a barbeque'd steak > >> with a spoon because they can get a result? > >> > >>> I know nothing bout chronos, but I find it an interesting read....ymmv. > >>> > >>> > >>> http://nerds.airbnb.com/introducing-chronos/ > >>> http://airbnb.github.io/chronos/ > >>> https://github.com/airbnb/chronos > >> > >> Aaaaaaaah, now this sounds like something I can use. Proper dependency > >> chains, Restful JSON interface so the devs can write code to drive it in > >> automation. > >> > >> Good find, thanks! > > > > Unless I am missing something, chronos is not in the tree at all. > > Correct, it isn't in the tree. But there's nothing stopping me from > getting it in there Neither are the dependencies. If you get it to work, don't forget to create a nice howto documentation as from what I found online, the documentation is incomplete and out of date. -- Joost [-- Attachment #2: Type: text/html, Size: 8740 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-01 18:17 ` [gentoo-user] " James 2014-08-01 19:19 ` Alan McKinnon @ 2014-08-01 21:17 ` J. Roeleveld 1 sibling, 0 replies; 52+ messages in thread From: J. Roeleveld @ 2014-08-01 21:17 UTC (permalink / raw To: gentoo-user On 1 August 2014 20:17:05 CEST, James <wireless@tampabay.rr.com> wrote: >Alan McKinnon <alan.mckinnon <at> gmail.com> writes: > > >> New job, new environment. Existing persons suffer from >> 5-year-old-with-a-hammer syndrome and assume cron is the solution to >all >> ills. Result: a towering edifice of cron jobs that may or may not >> clobber each other's work, may or may not work at all, and implement >no >> error handling at all. But my god, can they spew out mail from STOUT > >Sounds like a department full of computer scientist I inherited a few >decades ago........... > >I know nothing bout chronos, but I find it an interesting read....ymmv. > > >http://nerds.airbnb.com/introducing-chronos/ >http://airbnb.github.io/chronos/ >https://github.com/airbnb/chronos > > >cheers mate! > >James Looks interesting. Apart from it requiring a clustered environment (mesos). Unless I misunderstand the part where it says it runs on top of mesos? -- Joost -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-01 17:32 [gentoo-user] Recommendations for scheduler Alan McKinnon 2014-08-01 17:49 ` Сергей 2014-08-01 18:17 ` [gentoo-user] " James @ 2014-08-01 21:02 ` Martin Vaeth 2014-08-01 21:22 ` J. Roeleveld 2014-08-02 9:27 ` Alan McKinnon 2014-08-01 21:13 ` [gentoo-user] " J. Roeleveld 3 siblings, 2 replies; 52+ messages in thread From: Martin Vaeth @ 2014-08-01 21:02 UTC (permalink / raw To: gentoo-user Alan McKinnon <alan.mckinnon@gmail.com> wrote: > > But cron has only one event trigger: wall-clock time. And it's a very > blunt weapon. I'm looking for recommendations of alternative schedulers > that satisfy real-world business needs that need some other event > trigger than what the time is right now. I had a similar need recently, and since the discussion in https://forums.gentoo.org/viewtopic-t-992780-highlight-.html had led to nothing satisfactory for me, I have written a scheduler tool which serves my needs (which might very well differ from yours...): The corresponding tool is still in beta testing phase: https://github.com/vaeth/schedule/ You can install it from the mv overlay (available over layman). ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-01 21:02 ` Martin Vaeth @ 2014-08-01 21:22 ` J. Roeleveld 2014-08-01 22:06 ` Martin Vaeth 2014-08-02 9:27 ` Alan McKinnon 1 sibling, 1 reply; 52+ messages in thread From: J. Roeleveld @ 2014-08-01 21:22 UTC (permalink / raw To: gentoo-user On 1 August 2014 23:02:11 CEST, Martin Vaeth <martin@mvath.de> wrote: >Alan McKinnon <alan.mckinnon@gmail.com> wrote: >> >> But cron has only one event trigger: wall-clock time. And it's a very >> blunt weapon. I'm looking for recommendations of alternative >schedulers >> that satisfy real-world business needs that need some other event >> trigger than what the time is right now. > >I had a similar need recently, and since the discussion in > >https://forums.gentoo.org/viewtopic-t-992780-highlight-.html > >had led to nothing satisfactory for me, I have written a >scheduler tool which serves my needs >(which might very well differ from yours...): > >The corresponding tool is still in beta testing phase: >https://github.com/vaeth/schedule/ > >You can install it from the mv overlay (available over layman). Going to have a look at this soon. What are the features it currently has already and what are you planning on adding? -- Joost -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-01 21:22 ` J. Roeleveld @ 2014-08-01 22:06 ` Martin Vaeth 0 siblings, 0 replies; 52+ messages in thread From: Martin Vaeth @ 2014-08-01 22:06 UTC (permalink / raw To: gentoo-user J. Roeleveld <joost@antarean.org> wrote: >>https://github.com/vaeth/schedule/ > > What are the features it currently has already This is hard to answer, since at a first glance the whole thing does not even look like a scheduler: It looks more like a means to communicate with some server, but after the discussions in the gentoo forums, it became clear to my surprise that this is all what is needed for the use cases I had in mind: The "real" scheduler driving the whole thing can be a tiny script (in shell or any other language) which just communicates with that server. To understand whether this can solve your problems, it is probably best if you look at the examples in the README (and/or the mentioned discussion in the gentoo forum). > and what are you planning on adding? Since it is sufficient for my purposes, I am currently not planning to add anything (except possibly bug fixes or if I run into a problem which I cannot solve with it). Patches for extensions are welcome, of course. (Also suggestions without patches are welcome, but my time is currently very limited, and I do not make any promises.) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-01 21:02 ` Martin Vaeth 2014-08-01 21:22 ` J. Roeleveld @ 2014-08-02 9:27 ` Alan McKinnon 1 sibling, 0 replies; 52+ messages in thread From: Alan McKinnon @ 2014-08-02 9:27 UTC (permalink / raw To: gentoo-user On 01/08/2014 23:02, Martin Vaeth wrote: > Alan McKinnon <alan.mckinnon@gmail.com> wrote: >> >> But cron has only one event trigger: wall-clock time. And it's a very >> blunt weapon. I'm looking for recommendations of alternative schedulers >> that satisfy real-world business needs that need some other event >> trigger than what the time is right now. > > I had a similar need recently, and since the discussion in > > https://forums.gentoo.org/viewtopic-t-992780-highlight-.html Interesting thread :-) Conceptually, your needs are the same as mine - sequence defined by something other than wall-clock time. The responders there do the same thing as I experience - tunnel vision with regard to cron. Sysadmins are used to cron and sadly most of us want to ram a purely cron-based solution into places where it most certainly does not belong. Business rules very seldom fit easily into a cron model, they usually rely on a defined sequence > > had led to nothing satisfactory for me, I have written a > scheduler tool which serves my needs > (which might very well differ from yours...): > > The corresponding tool is still in beta testing phase: > https://github.com/vaeth/schedule/ > > You can install it from the mv overlay (available over layman). Nice, thanks for the link :-) Now I have two projects to evaluate. -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-01 17:32 [gentoo-user] Recommendations for scheduler Alan McKinnon ` (2 preceding siblings ...) 2014-08-01 21:02 ` Martin Vaeth @ 2014-08-01 21:13 ` J. Roeleveld 2014-08-02 9:33 ` Alan McKinnon 3 siblings, 1 reply; 52+ messages in thread From: J. Roeleveld @ 2014-08-01 21:13 UTC (permalink / raw To: gentoo-user On 1 August 2014 19:32:36 CEST, Alan McKinnon <alan.mckinnon@gmail.com> wrote: >Hi, > >Up-front disclaimer: Mostly [OT] post. But at least I'll test drive it >on Gentoo before putting it in production :-) > >New job, new environment. Existing persons suffer from >5-year-old-with-a-hammer syndrome and assume cron is the solution to >all >ills. Result: a towering edifice of cron jobs that may or may not >clobber each other's work, may or may not work at all, and implement no >error handling at all. But my god, can they spew out mail from STOUT > > >But cron has only one event trigger: wall-clock time. And it's a very >blunt weapon. I'm looking for recommendations of alternative schedulers >that satisfy real-world business needs that need some other event >trigger than what the time is right now. > >For those familiar with it, I'm looking for something with the useful >feature set, without the useless features and without the price tag of >ControlM > >Anyone care to share experiences? I'm also looking for a free alternative. At most of my clients, I see Tivoli Workload Scheduler (TWS) being used a lot. It has most things what you want from an intelligent multi host scheduler. Unfortunately, it also comes with a corresponding price tag. If anyone knows of an OS project with comparable features, please let me know. Failing this, it is on my list to start writing one myself when I get some spare time. -- Joost -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-01 21:13 ` [gentoo-user] " J. Roeleveld @ 2014-08-02 9:33 ` Alan McKinnon 2014-08-02 13:31 ` J. Roeleveld 2014-08-03 13:02 ` [gentoo-user] " Tanstaafl 0 siblings, 2 replies; 52+ messages in thread From: Alan McKinnon @ 2014-08-02 9:33 UTC (permalink / raw To: gentoo-user On 01/08/2014 23:13, J. Roeleveld wrote: > On 1 August 2014 19:32:36 CEST, Alan McKinnon <alan.mckinnon@gmail.com> wrote: >> Hi, >> >> Up-front disclaimer: Mostly [OT] post. But at least I'll test drive it >> on Gentoo before putting it in production :-) >> >> New job, new environment. Existing persons suffer from >> 5-year-old-with-a-hammer syndrome and assume cron is the solution to >> all >> ills. Result: a towering edifice of cron jobs that may or may not >> clobber each other's work, may or may not work at all, and implement no >> error handling at all. But my god, can they spew out mail from STOUT >> >> >> But cron has only one event trigger: wall-clock time. And it's a very >> blunt weapon. I'm looking for recommendations of alternative schedulers >> that satisfy real-world business needs that need some other event >> trigger than what the time is right now. >> >> For those familiar with it, I'm looking for something with the useful >> feature set, without the useless features and without the price tag of >> ControlM >> >> Anyone care to share experiences? > > I'm also looking for a free alternative. > At most of my clients, I see Tivoli Workload Scheduler (TWS) being used a lot. > > It has most things what you want from an intelligent multi host scheduler. Unfortunately, it also comes with a corresponding price tag. I have an unusual boss. He's a business owner and quite naturally profit-driven. He also employs smart people and expects us to maintain systems in-house. He's also a zealous FLOSS fan. So when I present him a price tag for software his first question is always "is there any free as in freedom software suited for the job?" I'm still trying to wrap my brains around dealing with a boss that thinks like this :-) > > If anyone knows of an OS project with comparable features, please let me know. > Failing this, it is on my list to start writing one myself when I get some spare time. > > -- > Joost > -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-02 9:33 ` Alan McKinnon @ 2014-08-02 13:31 ` J. Roeleveld 2014-08-02 14:03 ` Alan McKinnon 2014-08-03 7:50 ` Martin Vaeth 2014-08-03 13:02 ` [gentoo-user] " Tanstaafl 1 sibling, 2 replies; 52+ messages in thread From: J. Roeleveld @ 2014-08-02 13:31 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 2657 bytes --] On Saturday, August 02, 2014 11:33:30 AM Alan McKinnon wrote: > On 01/08/2014 23:13, J. Roeleveld wrote: > > On 1 August 2014 19:32:36 CEST, Alan McKinnon <alan.mckinnon@gmail.com> wrote: > >> Hi, > >> > >> Up-front disclaimer: Mostly [OT] post. But at least I'll test drive it > >> on Gentoo before putting it in production :-) > >> > >> New job, new environment. Existing persons suffer from > >> 5-year-old-with-a-hammer syndrome and assume cron is the solution to > >> all > >> ills. Result: a towering edifice of cron jobs that may or may not > >> clobber each other's work, may or may not work at all, and implement no > >> error handling at all. But my god, can they spew out mail from STOUT > >> > >> > >> But cron has only one event trigger: wall-clock time. And it's a very > >> blunt weapon. I'm looking for recommendations of alternative schedulers > >> that satisfy real-world business needs that need some other event > >> trigger than what the time is right now. > >> > >> For those familiar with it, I'm looking for something with the useful > >> feature set, without the useless features and without the price tag of > >> ControlM > >> > >> Anyone care to share experiences? > > > > I'm also looking for a free alternative. > > At most of my clients, I see Tivoli Workload Scheduler (TWS) being used a > > lot. > > > > It has most things what you want from an intelligent multi host scheduler. > > Unfortunately, it also comes with a corresponding price tag. > I have an unusual boss. He's a business owner and quite naturally > profit-driven. He also employs smart people and expects us to maintain > systems in-house. > > He's also a zealous FLOSS fan. > > So when I present him a price tag for software his first question is > always "is there any free as in freedom software suited for the job?" Depends on the specific requirements. If you want: - time based start of a schedule - dependencies in said schedules and between schedules which can delay the actual start - stop of schedule if error occurs - ability to restart schedule from crashed point - have schedules operate over multiple machines (eg. part run on database, some on a compute-cluster, some other bit making nice graphs and printing it,...) Then you might be out of luck. If anyone has something that is already going along these lines, please let me know. I am more then willing to spend time and effort to assist in the development. Doing a project like that on my own in my extremely limited free time is not really an option. > I'm still trying to wrap my brains around dealing with a boss that > thinks like this :-) Hehe :) -- Joost [-- Attachment #2: Type: text/html, Size: 11807 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-02 13:31 ` J. Roeleveld @ 2014-08-02 14:03 ` Alan McKinnon 2014-08-02 16:53 ` [gentoo-user] " James 2014-08-03 7:50 ` Martin Vaeth 1 sibling, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-02 14:03 UTC (permalink / raw To: gentoo-user On 02/08/2014 15:31, J. Roeleveld wrote: > Depends on the specific requirements. > > If you want: > > - time based start of a schedule > > - dependencies in said schedules and between schedules which can delay > the actual start > > - stop of schedule if error occurs > > - ability to restart schedule from crashed point > > - have schedules operate over multiple machines (eg. part run on > database, some on a compute-cluster, some other bit making nice graphs > and printing it,...) > > > > Then you might be out of luck. > > If anyone has something that is already going along these lines, please > let me know. I am more then willing to spend time and effort to assist > in the development. Doing a project like that on my own in my extremely > limited free time is not really an option. > Well, we've found 2 projects that at least in part seek to achieve our general goals - chronos and Martin's new project. Why don't we both fool around with them for a bit and get a sense of what it will take to add features etc? Then we can meet back here and discuss. Always better to build on an existing foundation -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-02 14:03 ` Alan McKinnon @ 2014-08-02 16:53 ` James 2014-08-03 7:23 ` Joost Roeleveld 0 siblings, 1 reply; 52+ messages in thread From: James @ 2014-08-02 16:53 UTC (permalink / raw To: gentoo-user Alan McKinnon <alan.mckinnon <at> gmail.com> writes: > Well, we've found 2 projects that at least in part seek to achieve our > general goals - chronos and Martin's new project. > Why don't we both fool around with them for a bit and get a sense of > what it will take to add features etc? Then we can meet back here and > discuss. Always better to build on an existing foundation Mesos looks promising for a variety of (Apache) reasons. Some key technologies folks may want google about that are related: Quincy (fair schedular) Chronos (scheduler) Hadoop (scheduler) HDFS (clusterd file system) http://gpo.zugaina.org/sys-cluster/apache-hadoop-common Zookeeper (Fault tolerance) SPARK ( optimized for interative jobs where a datase is resued in many parallel operations (advanced math/science and many other apps.) https://spark.apache.org/ Dryad Torque Mpiche2 MPI Globus tookit mesos_tech_report.pdf It looks as though Amazon, google, facebook and many others large in the Cluster/Cloud arena are using Mesos......? So let's all post what we find, particularly in overlays. hth, James ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-02 16:53 ` [gentoo-user] " James @ 2014-08-03 7:23 ` Joost Roeleveld 2014-08-03 12:16 ` Alan McKinnon 2014-08-05 19:57 ` James 0 siblings, 2 replies; 52+ messages in thread From: Joost Roeleveld @ 2014-08-03 7:23 UTC (permalink / raw To: gentoo-user On Saturday 02 August 2014 16:53:26 James wrote: > Alan McKinnon <alan.mckinnon <at> gmail.com> writes: > > Well, we've found 2 projects that at least in part seek to achieve our > > general goals - chronos and Martin's new project. > > Why don't we both fool around with them for a bit and get a sense of > > what it will take to add features etc? Then we can meet back here and > > discuss. Always better to build on an existing foundation > > Mesos looks promising for a variety of (Apache) reasons. Some key > technologies folks may want google about that are related: > > Quincy (fair schedular) > Chronos (scheduler) > Hadoop (scheduler) Hadoop not a scheduler. It's a framework for a Big Data clustered database. > HDFS (clusterd file system) Unless it's changed recently, not suitable for anything else then Hadoop and contains a single point of failure. > http://gpo.zugaina.org/sys-cluster/apache-hadoop-common > > Zookeeper (Fault tolerance) > SPARK ( optimized for interative jobs where a datase is resued in many > parallel operations (advanced math/science and many other apps.) > https://spark.apache.org/ > > Dryad Torque Mpiche2 MPI > Globus tookit > > mesos_tech_report.pdf > > It looks as though Amazon, google, facebook and many others > large in the Cluster/Cloud arena are using Mesos......? > > So let's all post what we find, particularly in overlays. Unless you are dealing with Big Data projects, like Google, Facebook, Amazon, big banks,... you don't have much use for those projects. Mesos looks like a nice project, just like Hadoop and related are also nice. But for most people, they are as usefull as using Exalytics. A scheduler should not have a large set of dependencies that you wouldn't use otherwise. That makes Chronos a non-option to me. Martin's project looks promising, but doesn't store the schedules internally. For repeating schedules, like what Alan was describing, you need to put those into scripts and start those from an existing cron. Of the 2, I think improving Martin's project is the most likely option for me as it doesn't have additional dependencies and seems to be easily implemented. -- Joost ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-03 7:23 ` Joost Roeleveld @ 2014-08-03 12:16 ` Alan McKinnon 2014-08-03 13:33 ` J. Roeleveld 2014-08-05 19:57 ` James 1 sibling, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-03 12:16 UTC (permalink / raw To: gentoo-user On 03/08/2014 09:23, Joost Roeleveld wrote: > On Saturday 02 August 2014 16:53:26 James wrote: >> Alan McKinnon <alan.mckinnon <at> gmail.com> writes: >>> Well, we've found 2 projects that at least in part seek to achieve our >>> general goals - chronos and Martin's new project. >>> Why don't we both fool around with them for a bit and get a sense of >>> what it will take to add features etc? Then we can meet back here and >>> discuss. Always better to build on an existing foundation >> >> Mesos looks promising for a variety of (Apache) reasons. Some key >> technologies folks may want google about that are related: >> >> Quincy (fair schedular) >> Chronos (scheduler) >> Hadoop (scheduler) > > Hadoop not a scheduler. It's a framework for a Big Data clustered database. > >> HDFS (clusterd file system) > > Unless it's changed recently, not suitable for anything else then Hadoop and > contains a single point of failure. > >> http://gpo.zugaina.org/sys-cluster/apache-hadoop-common >> >> Zookeeper (Fault tolerance) >> SPARK ( optimized for interative jobs where a datase is resued in many >> parallel operations (advanced math/science and many other apps.) >> https://spark.apache.org/ >> >> Dryad Torque Mpiche2 MPI >> Globus tookit >> >> mesos_tech_report.pdf >> >> It looks as though Amazon, google, facebook and many others >> large in the Cluster/Cloud arena are using Mesos......? >> >> So let's all post what we find, particularly in overlays. > > Unless you are dealing with Big Data projects, like Google, Facebook, Amazon, > big banks,... you don't have much use for those projects. My wife works in BigData for real, she and Joost speak the same language, I don't :-) She reckons Big Data is like teenage sex - everyone says they are doing it and no-one really does ;-D > Mesos looks like a nice project, just like Hadoop and related are also nice. > But for most people, they are as usefull as using Exalytics. A bit OT, but it might be worthwhile for interested persons to get good ebuilds going for these projects. Someone will use it on Gentoo, and it will add value to the project. Much like gems and other business-oriented packages benefit > > A scheduler should not have a large set of dependencies that you wouldn't use > otherwise. That makes Chronos a non-option to me. > > Martin's project looks promising, but doesn't store the schedules internally. > For repeating schedules, like what Alan was describing, you need to put those > into scripts and start those from an existing cron. Sounds like a small feature-add. If Martin did his groundwork correctly[1] then the core logic will work and it's just a case of adding some persistence and loading the data back in on demand > Of the 2, I think improving Martin's project is the most likely option for me > as it doesn't have additional dependencies and seems to be easily implemented. Don't forget Martins is the guy who does eix. Street cred? check Knows Gentoo? check [1] I only say it this way as I haven't evaluated his code at all yet so have no idea how far Martin has taken it -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-03 12:16 ` Alan McKinnon @ 2014-08-03 13:33 ` J. Roeleveld 0 siblings, 0 replies; 52+ messages in thread From: J. Roeleveld @ 2014-08-03 13:33 UTC (permalink / raw To: gentoo-user On Sunday, August 03, 2014 02:16:37 PM Alan McKinnon wrote: > On 03/08/2014 09:23, Joost Roeleveld wrote: > > On Saturday 02 August 2014 16:53:26 James wrote: > >> Alan McKinnon <alan.mckinnon <at> gmail.com> writes: <snipped> > > Unless you are dealing with Big Data projects, like Google, Facebook, > > Amazon, big banks,... you don't have much use for those projects. > > My wife works in BigData for real, she and Joost speak the same > language, I don't :-) > She reckons Big Data is like teenage sex - everyone says they are doing > it and no-one really does ;-D I know a few companies that actually do use it. But, the biggest issue with the whole "Big Data" thing is that noone really agrees on what it actually is. > > Mesos looks like a nice project, just like Hadoop and related are also > > nice. But for most people, they are as usefull as using Exalytics. > > A bit OT, but it might be worthwhile for interested persons to get good > ebuilds going for these projects. Someone will use it on Gentoo, and it > will add value to the project. Much like gems and other > business-oriented packages benefit I agree, but just to implement a decent scheduler, I still think it's overkill. > > A scheduler should not have a large set of dependencies that you wouldn't > > use otherwise. That makes Chronos a non-option to me. > > > > Martin's project looks promising, but doesn't store the schedules > > internally. For repeating schedules, like what Alan was describing, you > > need to put those into scripts and start those from an existing cron. > > Sounds like a small feature-add. If Martin did his groundwork > correctly[1] then the core logic will work and it's just a case of > adding some persistence and loading the data back in on demand The code looks clean and I think it shouldn't be too much work to add it. > > Of the 2, I think improving Martin's project is the most likely option for > > me as it doesn't have additional dependencies and seems to be easily > > implemented. > Don't forget Martins is the guy who does eix. > Street cred? check > Knows Gentoo? check > > [1] I only say it this way as I haven't evaluated his code at all yet so > have no idea how far Martin has taken it The code is clean and does what Martin says it does. -- Joost ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-03 7:23 ` Joost Roeleveld 2014-08-03 12:16 ` Alan McKinnon @ 2014-08-05 19:57 ` James 2014-08-05 20:43 ` J. Roeleveld 1 sibling, 1 reply; 52+ messages in thread From: James @ 2014-08-05 19:57 UTC (permalink / raw To: gentoo-user Joost Roeleveld <joost <at> antarean.org> writes: > > Mesos looks promising for a variety of (Apache) reasons. Some key > > technologies folks may want google about that are related: > > > > Quincy (fair schedular) > > Chronos (scheduler) > > Hadoop (scheduler) > > Hadoop not a scheduler. It's a framework for a Big Data clustered > database. > > HDFS (clusterd file system) > Unless it's changed recently, not suitable for anything else then Hadoop > and contains a single point of failure. I'm curious as to more information about this 'single point of failure. Can you be more specific or provides links? On this resource: http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html JournalNode machines talks about surviving faults: "increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally. " > > > http://gpo.zugaina.org/sys-cluster/apache-hadoop-common > > > > Zookeeper (Fault tolerance) > > SPARK ( optimized for interative jobs where a datase is resued in many > > parallel operations (advanced math/science and many other apps.) > > https://spark.apache.org/ > > > > Dryad Torque Mpiche2 MPI > > Globus tookit > > > > mesos_tech_report.pdf > > > > It looks as though Amazon, google, facebook and many others > > large in the Cluster/Cloud arena are using Mesos......? > > > > So let's all post what we find, particularly in overlays. > > Unless you are dealing with Big Data projects, like Google, Facebook, Amazon, big banks,... you don't have much use for those projects. Many scientific applications are using the cluster (cloud) or big data approach to all sorts of problems. Furthermore, as GPU and the new Arm systems with dozens and dozens of cpu cores inside one computer become readily available, the cluster-cloud (big data) approach will become much more pervasive in the next few years, imho. http://blog.rescale.com/reservoir-simulation-moves-to-the-cloud/ There are thousands of small companies needing reservoir simulation, not to mention the millions of folks working on carbon sequestration..... Anything to do with Biological or Chemical Science is using or moving to the Cloud-Clustered world. For me, a Cluster is just a cloud internally managee, rather than outsourcing it to others; ymmv. > Mesos looks like a nice project, just like Hadoop and related are also > nice. But for most people, they are as usefull as using Exalytics. I'm not excited about an Oracle solution to anything. Many of the folks I know consult on moving technologies away from Oracle's spear of influence, not limited to mysql; ymmv. I know of one very large communications company that went broke and had to merge because of those ridiculous Oracle fees. Caveat Emptor; long live Postresql. > A scheduler should not have a large set of dependencies that you wouldn't > use otherwise. That makes Chronos a non-option to me. Those other technologies are often useful to folks who would be attracted to something like chronos. > Martin's project looks promising, but doesn't store the schedules > internally. For repeating schedules, like what Alan was describing, you > need to put those into scripts and start those from an existing cron. > Of the 2, I think improving Martin's project is the most likely option > for me as it doesn't have additional dependencies and seems to be > easily implemented. > Joost Understood. Like others, I'll be curious to follow what develops out of Martin's work. For me Chronos, Mesos and the other aforementioned technologies look to be more viable; particularly if one is preparing for a clustered world with CPUs, GPUs, SoCs and Arm machines distributed about the ethernet as resources to be scheduled and utilized in a variety of schema. It's the quest for one-infrastructure to solve many problems where scenarios compete. Big data is not the only reason for cloud-clusters. Theoretically, (Clustered) systems can have a far greater resource utilization of networked resources than traditional (distributed) approaches. I grant you that this is a work in progress, but I personally know of dozens of mathematically complex distributed systems that are migrating to the clustered approach rather than something custom or traditionally distributed. Granted, Cloud <--> Clustered <--> Distributed are all overlaping approaches to big problems. I do appreciate the candor of this thread. James ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-05 19:57 ` James @ 2014-08-05 20:43 ` J. Roeleveld 2014-08-05 21:29 ` Alan McKinnon 2014-08-06 8:29 ` Peter Humphrey 0 siblings, 2 replies; 52+ messages in thread From: J. Roeleveld @ 2014-08-05 20:43 UTC (permalink / raw To: gentoo-user On 5 August 2014 21:57:56 CEST, James <wireless@tampabay.rr.com> wrote: >Joost Roeleveld <joost <at> antarean.org> writes: > > >> > Mesos looks promising for a variety of (Apache) reasons. Some key >> > technologies folks may want google about that are related: >> > >> > Quincy (fair schedular) >> > Chronos (scheduler) >> > Hadoop (scheduler) >> >> Hadoop not a scheduler. It's a framework for a Big Data clustered >> database. > >> > HDFS (clusterd file system) >> Unless it's changed recently, not suitable for anything else then >Hadoop >> and contains a single point of failure. > >I'm curious as to more information about this 'single point of failure. >Can >you be more specific or provides links? > >On this resource: > >http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html > >JournalNode machines talks about surviving faults: > >"increase the number of failures the system can tolerate, you should >run an >odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N >JournalNodes, the system can tolerate at most (N - 1) / 2 failures and >continue to function normally. " Just read that part. Looks like they solved it partly since 2.2. The problem lies with the NameNodes. Prior to 2.2, you only had 1. If that one dies, you loose the entire cluster. If that one is unrecoverable, you loose all the data. After 2.2, you can configure a standby NameNode. However, it still requires manual restart. Considering that Hadoop is most often running on old machines, chances for hardware failure are higher when compared with clusters using newer hardware. I'm not sure how other cluster FSs deal with this, but I consider it a design flaw if the disappearance of a single machine in a 100+ node cluster dies, the entire cluster ends up in a broken state. It's like running a single Raid5 with 100+ drives. Anyone stupid enough to do that deserves to loose their data. >> > http://gpo.zugaina.org/sys-cluster/apache-hadoop-common >> > >> > Zookeeper (Fault tolerance) >> > SPARK ( optimized for interative jobs where a datase is resued in >many >> > parallel operations (advanced math/science and many other apps.) >> > https://spark.apache.org/ >> > >> > Dryad Torque Mpiche2 MPI >> > Globus tookit >> > >> > mesos_tech_report.pdf >> > >> > It looks as though Amazon, google, facebook and many others >> > large in the Cluster/Cloud arena are using Mesos......? >> > >> > So let's all post what we find, particularly in overlays. >> >> Unless you are dealing with Big Data projects, like Google, Facebook, >Amazon, big banks,... you don't have much use for those projects. > >Many scientific applications are using the cluster (cloud) or big data >approach to all sorts of problems. Furthermore, as GPU and the new >Arm systems with dozens and dozens of cpu cores inside one computer >become >readily available, the cluster-cloud (big data) approach will become >much >more pervasive in the next few years, imho. > >http://blog.rescale.com/reservoir-simulation-moves-to-the-cloud/ > >There are thousands of small companies needing reservoir simulation, >not to >mention the millions of folks working on carbon sequestration..... >Anything to do with Biological or Chemical Science is using or moving >to the Cloud-Clustered world. For me, a Cluster is just a cloud >internally >managee, rather than outsourcing it to others; ymmv. My apologies. I forgot the scientific research here. But that was mostly because they have been dealing with really large datasets and corresponding large compute clusters for decades. The term Big Data is generally applied to financial and social media data. >> Mesos looks like a nice project, just like Hadoop and related are >also >> nice. But for most people, they are as usefull as using Exalytics. > >I'm not excited about an Oracle solution to anything. Many of the folks >I know consult on moving technologies away from Oracle's spear of >influence, >not limited to mysql; ymmv. I know of one very large communications >company >that went broke and had to merge because of those ridiculous Oracle >fees. >Caveat Emptor; long live Postresql. I'd be interested in the name of that company. Even offlist. And I definitely agree. PostgreSQL is often a valid alternative. Unfortunately, it is rarely possible to use it as a back end to enterprise software as these are all designed to be used with databases from the usual suspects (Oracle, IBM, Microsoft, ....) Same goes for OSS projects. The developers are often unable to properly code the SQL layer and end up simply using MySQL and its broken SQL implementation. >> A scheduler should not have a large set of dependencies that you >wouldn't >> use otherwise. That makes Chronos a non-option to me. > >Those other technologies are often useful to folks who would be >attracted to >something like chronos. If you already use Mesos, using Chronos makes sense. If you're only interested in a scheduler, installing Mesos just to use Chronos doesn't make sense. >> Martin's project looks promising, but doesn't store the schedules >> internally. For repeating schedules, like what Alan was describing, >you >> need to put those into scripts and start those from an existing cron. >> Of the 2, I think improving Martin's project is the most likely >option >> for me as it doesn't have additional dependencies and seems to be >> easily implemented. >> Joost > >Understood. >Like others, I'll be curious to follow what develops out of Martin's >work. I believe Martin's scheduler will be very valuable. Even for me. I am very likely going to start using this for some of my regular maintenance activities on the home network. But as the rest of the thread shows, I wouldn't be able to use it as a scheduler for large projects where the schedules can get very complex very quickly. The type of scheduler needed for these requires a different approach, which would be overkill for the home network environment where Martin's excels. >For me Chronos, Mesos and the other aforementioned technologies look to >be >more viable; particularly if one is preparing for a clustered world >with >CPUs, GPUs, SoCs and Arm machines distributed about the ethernet as >resources to be scheduled and utilized in a variety of schema. It's the >quest for one-infrastructure to solve many problems where scenarios >compete. I fully agree, see my comment above where I state Chronos makes sense when Mesos does as well. >Big data is not the only reason for cloud-clusters. Theoretically, >(Clustered) systems can have a far greater resource utilization of >networked >resources than traditional (distributed) approaches. I grant you that >this >is a work in progress, but I personally know of dozens of >mathematically >complex distributed systems that are migrating to the clustered >approach >rather than something custom or traditionally distributed. I still remember running seti@home and similar programs in the past. Those were large clusters, but with a very badly designed network. There is a use-case for large well integrated clusters, loosely coupled clusters and big machines. Here is the difference between horizontal (many machines) and vertical (1 really big machine) clustering. The vertical only has clustering between different processes. >Granted, Cloud <--> Clustered <--> Distributed are all overlaping >approaches >to big problems. I do appreciate the candor of this thread. They are. It started with distributed computing in a lab, then moved onto the internet. Then people started to build a mini internet with a lot of old computers and Clusters were born. Then that ended up back on the internet with clusters being made accessible online. And this is what is considered to be " The Cloud". If you take the general definition of The Cloud, which is along the lines off: "being able to access your data anywhere using any device", running your own server and being able to access the data on there from anywhere with internet access using your laptop, smartphone, tablet,.... then you are using the cloud. If anyone is actually planning to implement Mesos and Chronos on Gentoo, I would be interested in joining the effort as it does sound like fun. I just don't have the time to do a lot of work on that at the moment. -- Joost -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-05 20:43 ` J. Roeleveld @ 2014-08-05 21:29 ` Alan McKinnon 2014-08-06 8:29 ` Peter Humphrey 1 sibling, 0 replies; 52+ messages in thread From: Alan McKinnon @ 2014-08-05 21:29 UTC (permalink / raw To: gentoo-user On 05/08/2014 22:43, J. Roeleveld wrote: > I believe Martin's scheduler will be very valuable. Even for me. > I am very likely going to start using this for some of my regular maintenance activities on the home network. > > But as the rest of the thread shows, I wouldn't be able to use it as a scheduler for large projects where the schedules can get very complex very quickly. Martin will be happy to know I think his work will fit my needs just nicely :-) -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-05 20:43 ` J. Roeleveld 2014-08-05 21:29 ` Alan McKinnon @ 2014-08-06 8:29 ` Peter Humphrey 2014-08-06 10:26 ` J. Roeleveld 1 sibling, 1 reply; 52+ messages in thread From: Peter Humphrey @ 2014-08-06 8:29 UTC (permalink / raw To: gentoo-user On Tuesday 05 August 2014 22:43:42 J. Roeleveld wrote: > I still remember running seti@home and similar programs in the past. Those > were large clusters, but with a very badly designed network. Was that in the days before BOINC, Joost? Do you think it's any better now? I run 5 BOINC projects here in the same general area as SETI. They seem to work all right, except for getting changes in what they call computing preferences propagated around the projects. (Just an aside - I don't want to hijack this interesting thread.) -- Regards Peter ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-06 8:29 ` Peter Humphrey @ 2014-08-06 10:26 ` J. Roeleveld 0 siblings, 0 replies; 52+ messages in thread From: J. Roeleveld @ 2014-08-06 10:26 UTC (permalink / raw To: gentoo-user On Wednesday, August 06, 2014 09:29:53 AM Peter Humphrey wrote: > On Tuesday 05 August 2014 22:43:42 J. Roeleveld wrote: > > I still remember running seti@home and similar programs in the past. Those > > were large clusters, but with a very badly designed network. > > Was that in the days before BOINC, Joost? Do you think it's any better now? > I run 5 BOINC projects here in the same general area as SETI. They seem to > work all right, except for getting changes in what they call computing > preferences propagated around the projects. > > (Just an aside - I don't want to hijack this interesting thread.) Yes, I did it for a short period sometime in 1999. It worked alright, I just meant that running it on thousands of personal computers using dial-up to the internet is a "badly designed network" for a cluster. -- Joost ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-02 13:31 ` J. Roeleveld 2014-08-02 14:03 ` Alan McKinnon @ 2014-08-03 7:50 ` Martin Vaeth 2014-08-03 8:06 ` J. Roeleveld 1 sibling, 1 reply; 52+ messages in thread From: Martin Vaeth @ 2014-08-03 7:50 UTC (permalink / raw To: gentoo-user J. Roeleveld <joost@antarean.org> wrote: > > Depends on the specific requirements. > If you want: In a sense, most you require can be done with my mentioned "schedule" tool, although perhaps the usage is not in the way you expected. I reorder your points for a clearer explanation: > - have schedules operate over multiple machines (eg. part run on > database, some on a compute-cluster, some other bit making nice graphs > and printing it,...) Since "schedule" can use TCP for communication, this should not be a problem if you let "schedule-server" listen world-wide (export SCHEDULE_SERVER_OPTS=-a0.0.0.0) For the actual scheduling you must setup your machines correspondingly: Queue on one machine the task doing the database access you want (with "schedule -a[serveraddress] queue command_to_access_database") and similarly on the other machines. Of course, ssh or anything else can be used to do this without physically accessing the machines. Then, on one machine (not necessarily that of the server), you run an appropriate "driver" script. > - time based start of a schedule > - dependencies in said schedules and between schedules which can delay > the actual start > - stop of schedule if error occurs All this is not a problem, since the "driver" script is just a shell script which calls "schedule" to start the tasks, wait for them being finished and/or checking their exit status. This is perhaps inconvenient but has the advantage of being absolutely flexible: You can use all linux tools like "sleep" (or also use at or cron) to get any delays you want, do tests more powerful than checking the exit status etc. > - ability to restart schedule from crashed point Running non-yet started jobs after a crash is not a problem - you just edit your "driver" script appropriately and restart it. Jobs which were already running need to be re-queued if they should be running again. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-03 7:50 ` Martin Vaeth @ 2014-08-03 8:06 ` J. Roeleveld 2014-08-03 12:10 ` Martin Vaeth 0 siblings, 1 reply; 52+ messages in thread From: J. Roeleveld @ 2014-08-03 8:06 UTC (permalink / raw To: gentoo-user On Sunday, August 03, 2014 07:50:57 AM Martin Vaeth wrote: > J. Roeleveld <joost@antarean.org> wrote: > > Depends on the specific requirements. > > > If you want: > In a sense, most you require can be done with my mentioned "schedule" > tool, although perhaps the usage is not in the way you expected. I agree, based on a quick look. > I reorder your points for a clearer explanation: <snipped explanation> A useful addition to your schedule-tool would be to store the scripts in a way that makes editing simpler and then add an editing tool to make this process simpler. Add monitoring (email alerts, webpage, front-end) to check the status of all the batch-jobs. I might be mistaken, but I think the server keeps the entire queue in-memory and when the process dies, the status is lost? Or is it kept somewhere? -- Joost ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-03 8:06 ` J. Roeleveld @ 2014-08-03 12:10 ` Martin Vaeth 2014-08-03 13:36 ` J. Roeleveld 0 siblings, 1 reply; 52+ messages in thread From: Martin Vaeth @ 2014-08-03 12:10 UTC (permalink / raw To: gentoo-user J. Roeleveld <joost@antarean.org> wrote: > > A useful addition to your schedule-tool would be to store the > scripts in a way that makes editing simpler Since it is an arbitrary script in an arbitrary language, I think this is not in the scope of this project to do this. In most cases I used it so far, 1-2 more or less complex lines (maybe a few more if they would not be complex) in an interactive zsh were enough, and these are very simple enough to edit in zsh, i.e. I even did not write any script "file" in the classical sense. > I might be mistaken, but I think the server keeps the entire > queue in-memory and when the process dies, the status is lost? Yes, the server process must not die. If it dies, not only the queue is lost but also the waiting processes (that is: queued but not yet started) cannot be reached anymore: These waiting processes do not have their own TCP socket but just keep their established connection with the server's socket until the server tells them through this connection to start or to cancel; if this connection gets lost, the waiting processes die: What else could they do, reasonably? The already started processes have a unique ID (into which the server's process is encoded): They reestablish the connection to report the exit status according to this ID. If the server is stopped, they cannot report this status, of course, and moreover, a new server does not know their IDs either and thus will ignore these "status reports". Maybe this "protocol" is not the most clever solution, but it is one which could be implemented without lots of overhead: Mainly, I was up to a "quick" solution which is working good enough for me: If the server has no bugs, why should it die? Moreover, if the server dies for some strange reasons, it is probably safer to re-queue the jobs again, anyway. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-03 12:10 ` Martin Vaeth @ 2014-08-03 13:36 ` J. Roeleveld 2014-08-03 20:04 ` Alan McKinnon 2014-08-04 8:41 ` Martin Vaeth 0 siblings, 2 replies; 52+ messages in thread From: J. Roeleveld @ 2014-08-03 13:36 UTC (permalink / raw To: gentoo-user On Sunday, August 03, 2014 12:10:49 PM Martin Vaeth wrote: > J. Roeleveld <joost@antarean.org> wrote: > > A useful addition to your schedule-tool would be to store the > > scripts in a way that makes editing simpler > > Since it is an arbitrary script in an arbitrary language, > I think this is not in the scope of this project to do this. > In most cases I used it so far, 1-2 more or less complex lines > (maybe a few more if they would not be complex) > in an interactive zsh were enough, and these are very simple > enough to edit in zsh, i.e. I even did not write any script "file" > in the classical sense. > > > I might be mistaken, but I think the server keeps the entire > > queue in-memory and when the process dies, the status is lost? > > Yes, the server process must not die. > > If it dies, not only the queue is lost but also the waiting processes > (that is: queued but not yet started) cannot be reached anymore: > These waiting processes do not have their own TCP socket but just > keep their established connection with the server's socket until > the server tells them through this connection to start or to cancel; > if this connection gets lost, the waiting processes die: > What else could they do, reasonably? > > The already started processes have a unique ID (into which the > server's process is encoded): They reestablish the connection to report > the exit status according to this ID. If the server is stopped, > they cannot report this status, of course, and moreover, > a new server does not know their IDs either and thus will ignore these > "status reports". > > Maybe this "protocol" is not the most clever solution, but it is > one which could be implemented without lots of overhead: > Mainly, I was up to a "quick" solution which is working good enough > for me: If the server has no bugs, why should it die? > Moreover, if the server dies for some strange reasons, it is probably > safer to re-queue the jobs again, anyway. With the kind of schedules I am working with (and I believe Alan will also end up with), restarting the whole process from the start can lead to issues. Finding out how far the process got before the service crashed can become rather complex. -- Joost ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-03 13:36 ` J. Roeleveld @ 2014-08-03 20:04 ` Alan McKinnon 2014-08-03 20:23 ` J. Roeleveld 2014-08-04 8:41 ` Martin Vaeth 1 sibling, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-03 20:04 UTC (permalink / raw To: gentoo-user On 03/08/2014 15:36, J. Roeleveld wrote: >> Maybe this "protocol" is not the most clever solution, but it is >> > one which could be implemented without lots of overhead: >> > Mainly, I was up to a "quick" solution which is working good enough >> > for me: If the server has no bugs, why should it die? >> > Moreover, if the server dies for some strange reasons, it is probably >> > safer to re-queue the jobs again, anyway. > With the kind of schedules I am working with (and I believe Alan will also end > up with), restarting the whole process from the start can lead to issues. > Finding out how far the process got before the service crashed can become > rather complex. Yes, very much so. My first concern is the database cleanups - without scheduler guarantees I'd need transactions in MySQL. -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-03 20:04 ` Alan McKinnon @ 2014-08-03 20:23 ` J. Roeleveld 2014-08-03 20:57 ` Alan McKinnon 0 siblings, 1 reply; 52+ messages in thread From: J. Roeleveld @ 2014-08-03 20:23 UTC (permalink / raw To: gentoo-user On Sunday, August 03, 2014 10:04:50 PM Alan McKinnon wrote: > On 03/08/2014 15:36, J. Roeleveld wrote: > >> Maybe this "protocol" is not the most clever solution, but it is > >> > >> > one which could be implemented without lots of overhead: > >> > Mainly, I was up to a "quick" solution which is working good enough > >> > for me: If the server has no bugs, why should it die? > >> > Moreover, if the server dies for some strange reasons, it is probably > >> > safer to re-queue the jobs again, anyway. > > > > With the kind of schedules I am working with (and I believe Alan will also > > end up with), restarting the whole process from the start can lead to > > issues. Finding out how far the process got before the service crashed > > can become rather complex. > > Yes, very much so. My first concern is the database cleanups - without > scheduler guarantees I'd need transactions in MySQL. Or you migrate to PostgreSQL, but that is OT :) -- Joost ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-03 20:23 ` J. Roeleveld @ 2014-08-03 20:57 ` Alan McKinnon 2014-08-03 21:10 ` J. Roeleveld 0 siblings, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-03 20:57 UTC (permalink / raw To: gentoo-user On 03/08/2014 22:23, J. Roeleveld wrote: > On Sunday, August 03, 2014 10:04:50 PM Alan McKinnon wrote: >> On 03/08/2014 15:36, J. Roeleveld wrote: >>>> Maybe this "protocol" is not the most clever solution, but it is >>>> >>>>> one which could be implemented without lots of overhead: >>>>> Mainly, I was up to a "quick" solution which is working good enough >>>>> for me: If the server has no bugs, why should it die? >>>>> Moreover, if the server dies for some strange reasons, it is probably >>>>> safer to re-queue the jobs again, anyway. >>> >>> With the kind of schedules I am working with (and I believe Alan will also >>> end up with), restarting the whole process from the start can lead to >>> issues. Finding out how far the process got before the service crashed >>> can become rather complex. >> >> Yes, very much so. My first concern is the database cleanups - without >> scheduler guarantees I'd need transactions in MySQL. > > Or you migrate to PostgreSQL, but that is OT :) Maybe, but also valid :-) I took one look at the schemas here and wondered "Why MySQL? This is Postgres territory". It's a case of LAMP tunnel vision. -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-03 20:57 ` Alan McKinnon @ 2014-08-03 21:10 ` J. Roeleveld 0 siblings, 0 replies; 52+ messages in thread From: J. Roeleveld @ 2014-08-03 21:10 UTC (permalink / raw To: gentoo-user On Sunday, August 03, 2014 10:57:06 PM Alan McKinnon wrote: > On 03/08/2014 22:23, J. Roeleveld wrote: > > On Sunday, August 03, 2014 10:04:50 PM Alan McKinnon wrote: > >> On 03/08/2014 15:36, J. Roeleveld wrote: > >>>> Maybe this "protocol" is not the most clever solution, but it is > >>>> > >>>>> one which could be implemented without lots of overhead: > >>>>> Mainly, I was up to a "quick" solution which is working good enough > >>>>> for me: If the server has no bugs, why should it die? > >>>>> Moreover, if the server dies for some strange reasons, it is probably > >>>>> safer to re-queue the jobs again, anyway. > >>> > >>> With the kind of schedules I am working with (and I believe Alan will > >>> also > >>> end up with), restarting the whole process from the start can lead to > >>> issues. Finding out how far the process got before the service crashed > >>> can become rather complex. > >> > >> Yes, very much so. My first concern is the database cleanups - without > >> scheduler guarantees I'd need transactions in MySQL. > > > > Or you migrate to PostgreSQL, but that is OT :) > > Maybe, but also valid :-) > > I took one look at the schemas here and wondered "Why MySQL? This is > Postgres territory". It's a case of LAMP tunnel vision. That and that people who start with LAMP don't learn SQL. This leads to code that is near impossible to port to a different database and when people actually want to do all the work to get the SQL to work on any database, the projects involved refuse the patches. -- Joost ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-03 13:36 ` J. Roeleveld 2014-08-03 20:04 ` Alan McKinnon @ 2014-08-04 8:41 ` Martin Vaeth 2014-08-04 9:02 ` J. Roeleveld 1 sibling, 1 reply; 52+ messages in thread From: Martin Vaeth @ 2014-08-04 8:41 UTC (permalink / raw To: gentoo-user J. Roeleveld <joost@antarean.org> wrote: > > With the kind of schedules I am working with (and I believe Alan will > also end up with), restarting the whole process from the start can > lead to issues. > Finding out how far the process got before the service crashed can become > rather complex. I am not sure whether I understand this correctly: schedule has not a problem to display which tasks have finished/failed/are still running at any time. Of course, a finer granulation than tasks are not possible ("how far has a certain task got?") because this would require knowledge about the task and how to check it - you need to be able to split your tasks into more shell commands to make a finer granulation available for "schedule". You can just rerun your "driving" script with the effect that the tasks which already are finished/failed will actually not be restarted, but the behaviour is as if they would finish immediately and report that they are finished/failed. (When you plan to do this, I would suggest to schedule things like "sleep" as separate tasks, too, and not build them into the "driving" script.) If there is an unexpected problem, and e.g. you want to re-run a failed task anyway, you can just re-queue your new task on the same place as there was the previous task, e.g. schedule remove jobnr schedule -j jobnr queue commmand to do your task Then the old job (and its state) is replaced by the new queued job, and your (identical as before) driving script will start it instead of assuming that the job is already finished. In order to avoid races, I would recommend to do the above only while your driving script is not running (e.g., you can put it in the background with ctrl-z if you have written it in (...) or if it is really a "classical" script, and then continue it with "fg"; or you even stop it completely with Ctrl-c and re-run it, depending on what you want): The problem is that between the above two commands the jobs after "jobnr" are renumbered. Alternatively, you can insert your new job at the end of the joblist and then use something like (untested) schedule -jjobnr insert 0 jobnr+1:-1 schedule remove 0 to to re-sort your job list: The "insert" is race-free, and having added a job at the end for some time will hopefully not disturb anything. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-04 8:41 ` Martin Vaeth @ 2014-08-04 9:02 ` J. Roeleveld 2014-08-04 10:11 ` Martin Vaeth 0 siblings, 1 reply; 52+ messages in thread From: J. Roeleveld @ 2014-08-04 9:02 UTC (permalink / raw To: gentoo-user On 4 August 2014 10:41:04 CEST, Martin Vaeth <martin@mvath.de> wrote: >J. Roeleveld <joost@antarean.org> wrote: >> >> With the kind of schedules I am working with (and I believe Alan will >> also end up with), restarting the whole process from the start can >> lead to issues. >> Finding out how far the process got before the service crashed can >become >> rather complex. > >I am not sure whether I understand this correctly: The schedules I am used to dealing with easily span 8 - 14 hours with occasionally even over a week. These schedules then also can't be restarted from the beginning when they stop halfway through without risking massive consistency problems in the final data. And then multiple of those starting at random times with occasionally a whole bunch of the same schedule put into the queue with dependencies to the previous run. If, during that time, one of the machines has a hardware failure or the scheduling process crashes on one or more of the servers, the last state needs to be recoverable. If you have to clean up the environment and bring it back to a state where you can restart the schedules, it saves time if you know which commands and tasks were actually running at the time. For this, the schedules, queues and current state for each node needs to be stored on persistent storage. Hope this clarifies it all a bit. -- Joost -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-04 9:02 ` J. Roeleveld @ 2014-08-04 10:11 ` Martin Vaeth 2014-08-04 10:40 ` J. Roeleveld 0 siblings, 1 reply; 52+ messages in thread From: Martin Vaeth @ 2014-08-04 10:11 UTC (permalink / raw To: gentoo-user J. Roeleveld <joost@antarean.org> wrote: > > These schedules then also can't be restarted from the beginning > when they stop halfway through without risking massive consistency > problems in the final data. So you have a command which might break due to hardware error and cannot be rerun. I cannot see how any general-purpose scheduler might help you here: You either need to be able to split your command into several (sequential) commands or you need something adapted for your particular command. > And then multiple of those starting at random times with > occasionally a whole bunch of the same schedule put into the > queue with dependencies to the previous run. That's not a problem. Only if the granularity of one command is not fine enough, it becomes a problem. > If, during that time, one of the machines has a hardware failure > or the scheduling process crashes on one or more of the servers, > the last state needs to be recoverable. One must distinguish two cases: 1. The machine running "schedule-server" has a hardware failure. (Let us assume tha "schedule-server" does not have a software failure - otherwise, you have problems anyway.) 2. Some other machine has a hardware failure. Case 2. is not bad (as concerns the scheduling): Of course, the machine will not report that it completed the job, and you will have to think how to complete the job. But it is clear that in such exceptional cases you have to interfere manually in some sense. In order to deal with case 1., you can regularly (e.g. each minute) dump the output of "schedule list" (possibly suppressing non-important data through the options to keep it short). One could add a logging option to decrease the possible race of 1 minute, but in case of hardware failure a possible race cannot be excluded anyway. In case 1. you manually have to re-queue the jobs and think what to do with the already started jobs. However, I cannot imagine that this occurs so frequently that this exceptional case becomes something one should seriously think about. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-04 10:11 ` Martin Vaeth @ 2014-08-04 10:40 ` J. Roeleveld 2014-08-04 13:31 ` Martin Vaeth 0 siblings, 1 reply; 52+ messages in thread From: J. Roeleveld @ 2014-08-04 10:40 UTC (permalink / raw To: gentoo-user On Monday, August 04, 2014 10:11:41 AM Martin Vaeth wrote: > J. Roeleveld <joost@antarean.org> wrote: > > These schedules then also can't be restarted from the beginning > > when they stop halfway through without risking massive consistency > > problems in the final data. > > So you have a command which might break due to hardware error > and cannot be rerun. I cannot see how any general-purpose scheduler > might help you here: You either need to be able to split your command > into several (sequential) commands or you need something adapted > for your particular command. A general-purpose scheduler can work, as they do exist. (With a price tag) In the OSS world, there is, to my knowledge, none. Yours seems to be the most promising as it looks like the missing features shouldn't be too difficult to add. The commands are relatively simple, but they deal with large amounts of data. I am talking about ETL processes that, due to the amount of data being processed, can easily take several hours per step. If, during one of these steps, the database or ETL process suffers a crash, the activities of the ETL process need to be rolled back to the point where you can restart it. I am not talking about simple schedules related to day-to-day maintenance of a few servers. > > And then multiple of those starting at random times with > > occasionally a whole bunch of the same schedule put into the > > queue with dependencies to the previous run. > > That's not a problem. Only if the granularity of one command is > not fine enough, it becomes a problem. If nothing happens, it can all be stuck into a single script and the end result will be the same. Problems start because the real world is not 100% reliable. > > If, during that time, one of the machines has a hardware failure > > or the scheduling process crashes on one or more of the servers, > > the last state needs to be recoverable. > > One must distinguish two cases: > > 1. The machine running "schedule-server" has a hardware failure. > (Let us assume tha "schedule-server" does not have a software failure - > otherwise, you have problems anyway.) > 2. Some other machine has a hardware failure. > > Case 2. is not bad (as concerns the scheduling): Of course, the > machine will not report that it completed the job, and you will > have to think how to complete the job. But it is clear that in > such exceptional cases you have to interfere manually in some sense. Agreed, this happens more often then you might think. > In order to deal with case 1., you can regularly (e.g. each minute) > dump the output of "schedule list" (possibly suppressing non-important > data through the options to keep it short). Or all the necessary information is kept in-sync on persistent storage. This would then also allow easy fail-over if the master-schedule-node fails. A 2nd machine could quickly take over. > One could add a logging option to decrease the possible race of 1 minute, > but in case of hardware failure a possible race cannot be excluded anyway. > > In case 1. you manually have to re-queue the jobs and think what to do > with the already started jobs. However, I cannot imagine that this > occurs so frequently that this exceptional case becomes something > one should seriously think about. As I mentioned above, with BI infrastructure (large databases, complex ETL processes, interactive report services,....), the scheduler is busy 24/7. The amount of tasks, schedules, dependencies, states,.... that needs to kept track off can easily lead to unforeseen issues and bugs. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-04 10:40 ` J. Roeleveld @ 2014-08-04 13:31 ` Martin Vaeth 2014-08-04 13:35 ` Alan McKinnon 2014-08-04 19:54 ` J. Roeleveld 0 siblings, 2 replies; 52+ messages in thread From: Martin Vaeth @ 2014-08-04 13:31 UTC (permalink / raw To: gentoo-user J. Roeleveld <joost@antarean.org> wrote: >> >> So you have a command which might break due to hardware error >> and cannot be rerun. I cannot see how any general-purpose scheduler >> might help you here: You either need to be able to split your command >> into several (sequential) commands or you need something adapted >> for your particular command. > > A general-purpose scheduler can work, as they do exist. I doubt that they can solve your problem. Let me repeat: You have a single program which accesses the database in a complex way and somewhere in the course of accessing it, the machine (or program) crashes. No general-purpose program can recover from this: You need particular knowledge of the database and the program if you even want to have a *chance* to recover from such a situation. A program with such a particular knowledge can hardly be called "general-purpose". > If, during one of these steps, the database or ETL process suffers a > crash, the activities of the ETL process need to be rolled back to > the point where you can restart it. I agree, but you need particular knowledge of the database and your tasks to do this which is far beyond the job of a scheduler. As already mentioned by someone in this thread, your problem needs to be solved on the level of the database (using snapshopt capabilities etc.) >> In order to deal with case 1., you can regularly (e.g. each minute) >> dump the output of "schedule list" (possibly suppressing non-important >> data through the options to keep it short). > > Or all the necessary information is kept in-sync on persistent storage. > This would then also allow easy fail-over if the master-schedule-node > fails No, it wouldn't, since jobs just finishing and wanting to report their status cannot do this when there is no server. You would need a rather involved protocol to deal with such situations dynamically. It can certainly be done, but it is not something which can easily be "added" as a feature: If this is required, it has to be the fundamental concept from the very beginning and everything else has to follow this first aim. You need different protocols than TCP sockets, to start with; something like "dbus over IP" with servers being able to announce their new presence, etc. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-04 13:31 ` Martin Vaeth @ 2014-08-04 13:35 ` Alan McKinnon 2014-08-04 19:46 ` J. Roeleveld 2014-08-04 19:54 ` J. Roeleveld 1 sibling, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-04 13:35 UTC (permalink / raw To: gentoo-user On 04/08/2014 15:31, Martin Vaeth wrote: > J. Roeleveld <joost@antarean.org> wrote: >>> >>> So you have a command which might break due to hardware error >>> and cannot be rerun. I cannot see how any general-purpose scheduler >>> might help you here: You either need to be able to split your command >>> into several (sequential) commands or you need something adapted >>> for your particular command. >> >> A general-purpose scheduler can work, as they do exist. > > I doubt that they can solve your problem. > Let me repeat: You have a single program which accesses the database > in a complex way and somewhere in the course of accessing it, the > machine (or program) crashes. > No general-purpose program can recover from this: You need > particular knowledge of the database and the program if you even > want to have a *chance* to recover from such a situation. > A program with such a particular knowledge can hardly be called > "general-purpose". Joost, Either make the ETL tool pick up where it stopped and continue as it is the only that knows what it was doing and how far it got. Or, wrap the entire script in a single transaction. -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-04 13:35 ` Alan McKinnon @ 2014-08-04 19:46 ` J. Roeleveld 2014-08-04 20:38 ` Alan McKinnon 0 siblings, 1 reply; 52+ messages in thread From: J. Roeleveld @ 2014-08-04 19:46 UTC (permalink / raw To: gentoo-user On 4 August 2014 15:35:41 CEST, Alan McKinnon <alan.mckinnon@gmail.com> wrote: >On 04/08/2014 15:31, Martin Vaeth wrote: >> J. Roeleveld <joost@antarean.org> wrote: >>>> >>>> So you have a command which might break due to hardware error >>>> and cannot be rerun. I cannot see how any general-purpose scheduler >>>> might help you here: You either need to be able to split your >command >>>> into several (sequential) commands or you need something adapted >>>> for your particular command. >>> >>> A general-purpose scheduler can work, as they do exist. >> >> I doubt that they can solve your problem. >> Let me repeat: You have a single program which accesses the database >> in a complex way and somewhere in the course of accessing it, the >> machine (or program) crashes. >> No general-purpose program can recover from this: You need >> particular knowledge of the database and the program if you even >> want to have a *chance* to recover from such a situation. >> A program with such a particular knowledge can hardly be called >> "general-purpose". > > >Joost, > >Either make the ETL tool pick up where it stopped and continue as it is >the only that knows what it was doing and how far it got. Or, wrap the >entire script in a single transaction. Alan, That would be the ideal solution. However, a single transaction dealing with around 500,000,000 rows will get me shot by the DBAs :) (Never mind that the performance of this will be such that having it all done by an office full of secretaries might be quicker.) Having the ETL process clever enough to be able to pick up from any point requires a degree of forward thinking and planning that is never done in real life. I would love to design it like that as it isn't too difficult. But I always get brought into these projects when implementing these structures will require a full rewrite and getting the original architects to admit their design can't be made restartable without human intervention. At which point the business simply says it is acceptable to have people do a manual rollback and restart the schedules from wherever it went wrong. I'm sure your wife has similar experiences as this is why these projects are always late to deliver and over budget. -- Joost -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-04 19:46 ` J. Roeleveld @ 2014-08-04 20:38 ` Alan McKinnon 2014-08-05 11:42 ` J. Roeleveld 0 siblings, 1 reply; 52+ messages in thread From: Alan McKinnon @ 2014-08-04 20:38 UTC (permalink / raw To: gentoo-user On 04/08/2014 21:46, J. Roeleveld wrote: > On 4 August 2014 15:35:41 CEST, Alan McKinnon <alan.mckinnon@gmail.com> wrote: >> On 04/08/2014 15:31, Martin Vaeth wrote: >>> J. Roeleveld <joost@antarean.org> wrote: >>>>> >>>>> So you have a command which might break due to hardware error >>>>> and cannot be rerun. I cannot see how any general-purpose scheduler >>>>> might help you here: You either need to be able to split your >> command >>>>> into several (sequential) commands or you need something adapted >>>>> for your particular command. >>>> >>>> A general-purpose scheduler can work, as they do exist. >>> >>> I doubt that they can solve your problem. >>> Let me repeat: You have a single program which accesses the database >>> in a complex way and somewhere in the course of accessing it, the >>> machine (or program) crashes. >>> No general-purpose program can recover from this: You need >>> particular knowledge of the database and the program if you even >>> want to have a *chance* to recover from such a situation. >>> A program with such a particular knowledge can hardly be called >>> "general-purpose". >> >> >> Joost, >> >> Either make the ETL tool pick up where it stopped and continue as it is >> the only that knows what it was doing and how far it got. Or, wrap the >> entire script in a single transaction. > > Alan, > > That would be the ideal solution. You have the same concerns I do - how do you make a transaction around 500 million rows. So I asked the in-house expert - Mrs Alan :-) > However, a single transaction dealing with around 500,000,000 rows will get me shot by the DBAs :) > (Never mind that the performance of this will be such that having it all done by an office full of secretaries might be quicker.) She reckons an ETL job *must* be self-contained; if it isn't then it's broken by design. It must be idempotent too, which can be as simple as "Truncate, Load, Commit" > Having the ETL process clever enough to be able to pick up from any point requires a degree of forward thinking and planning that is never done in real life. > I would love to design it like that as it isn't too difficult. But I always get brought into these projects when implementing these structures will require a full rewrite and getting the original architects to admit their design can't be made restartable without human intervention. I agree with that design actually - it's the job of the hardware and OS guys to make stuff reliable that the application layer can rely on. When a SAN connection goes away, it usually comes back and the app layer just carries on (never mind that it retried 100 times meanwhile). Sometimes this doesn't work out. The easiest, cheapest and quickest way to handle it is to just restart the whole job from the beginning. This offends the engineer in us sometimes, but it really is the best way and all of Unix is built on this very idea :-) If the SAn goes away too often and it causes issues, the manybe the best approach is to get the SAN and facilities guys to get their act together > At which point the business simply says it is acceptable to have people do a manual rollback and restart the schedules from wherever it went wrong. Exactly. One of the few cases where business has the correct idea. There's only some many pennies to spend and so many dollars to be delivered. > > I'm sure your wife has similar experiences as this is why these projects are always late to deliver and over budget. She says her projects are subject to the same universal inviolate rule as mine: time and cost is always best engineering estimate times pi We learn to deal with it. Which brings us back to Martin's initial statement: a scheduler cannot deal with any of this, the job itself must. It's an unpredictable event and schedulers can only deal with predictable events -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-04 20:38 ` Alan McKinnon @ 2014-08-05 11:42 ` J. Roeleveld 0 siblings, 0 replies; 52+ messages in thread From: J. Roeleveld @ 2014-08-05 11:42 UTC (permalink / raw To: gentoo-user On Monday, August 04, 2014 10:38:57 PM Alan McKinnon wrote: > On 04/08/2014 21:46, J. Roeleveld wrote: > > On 4 August 2014 15:35:41 CEST, Alan McKinnon <alan.mckinnon@gmail.com> > >> Either make the ETL tool pick up where it stopped and continue as it is > >> the only that knows what it was doing and how far it got. Or, wrap the > >> entire script in a single transaction. > > > > Alan, > > > > That would be the ideal solution. > > You have the same concerns I do - how do you make a transaction around > 500 million rows. So I asked the in-house expert - Mrs Alan :-) Have a very large temporary tablespace on the database server. > > However, a single transaction dealing with around 500,000,000 rows will > > get me shot by the DBAs :) (Never mind that the performance of this will > > be such that having it all done by an office full of secretaries might be > > quicker.) > She reckons an ETL job *must* be self-contained; if it isn't then it's > broken by design. It must be idempotent too, which can be as simple as > "Truncate, Load, Commit" Most common tactic (done by humans): - delete from <target table> where INS_PCS_ID = <crashed run-id>; - update target table set VLD_TO = null where UPD_PCS_ID = <crashed run-id>; Then, restart the crashed run-id. For this, you need to know which command failed to know where to find the actual run-id you need to roll back. > > Having the ETL process clever enough to be able to pick up from any point > > requires a degree of forward thinking and planning that is never done in > > real life. I would love to design it like that as it isn't too difficult. > > But I always get brought into these projects when implementing these > > structures will require a full rewrite and getting the original > > architects to admit their design can't be made restartable without human > > intervention. > I agree with that design actually - it's the job of the hardware and OS > guys to make stuff reliable that the application layer can rely on. When > a SAN connection goes away, it usually comes back and the app layer just > carries on (never mind that it retried 100 times meanwhile). Yes, until you find out the clustered FS being used causes the crashes... (Yes, been in that situation...) > Sometimes this doesn't work out. The easiest, cheapest and quickest way > to handle it is to just restart the whole job from the beginning. This > offends the engineer in us sometimes, but it really is the best way and > all of Unix is built on this very idea :-) Which is generally done. Usually, requiring a manual clean up prior to restart. If done properly, the ETL process has the capability to roll back the failed run prior to redoing it. This, however, requires extensive planning and design at the initial implementation phase. > If the SAn goes away too often and it causes issues, the manybe the best > approach is to get the SAN and facilities guys to get their act together Instead of finger-pointing. > > At which point the business simply says it is acceptable to have people do > > a manual rollback and restart the schedules from wherever it went wrong. > Exactly. One of the few cases where business has the correct idea. > There's only some many pennies to spend and so many dollars to be delivered. Nightly processes that fail and then have to wait for the day-shift to arrive often cost the business more because the reports are delayed. > > I'm sure your wife has similar experiences as this is why these projects > > are always late to deliver and over budget. > She says her projects are subject to the same universal inviolate rule > as mine: > > time and cost is always best engineering estimate times pi "Overhead, testing, maintenance, ....", yes, it all adds to. > We learn to deal with it. Which brings us back to Martin's initial > statement: a scheduler cannot deal with any of this, the job itself > must. It's an unpredictable event and schedulers can only deal with > predictable events True, but keeping the schedules and state stored in a way to make it easy to find out how far the whole process got makes recovery simpler. Otherwise it's often quicker to simply roll back the entire schedule and restart. Even if only the last 2 of the 50 commands didn't run yet. -- Joost ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-04 13:31 ` Martin Vaeth 2014-08-04 13:35 ` Alan McKinnon @ 2014-08-04 19:54 ` J. Roeleveld 2014-08-05 6:33 ` Martin Vaeth 1 sibling, 1 reply; 52+ messages in thread From: J. Roeleveld @ 2014-08-04 19:54 UTC (permalink / raw To: gentoo-user On 4 August 2014 15:31:40 CEST, Martin Vaeth <martin@mvath.de> wrote: >J. Roeleveld <joost@antarean.org> wrote: >>> >>> So you have a command which might break due to hardware error >>> and cannot be rerun. I cannot see how any general-purpose scheduler >>> might help you here: You either need to be able to split your >command >>> into several (sequential) commands or you need something adapted >>> for your particular command. >> >> A general-purpose scheduler can work, as they do exist. > >I doubt that they can solve your problem. >Let me repeat: You have a single program which accesses the database >in a complex way and somewhere in the course of accessing it, the >machine (or program) crashes. >No general-purpose program can recover from this: You need >particular knowledge of the database and the program if you even >want to have a *chance* to recover from such a situation. >A program with such a particular knowledge can hardly be called >"general-purpose". The scheduler needs to be able to show which process failed/didn't finish. Then humans need to ensure that part finishes/reruns properly. Then humans need to be able to mark the failed process as succeeded. At which point the scheduler continues with the schedule(s) >> If, during one of these steps, the database or ETL process suffers a >> crash, the activities of the ETL process need to be rolled back to >> the point where you can restart it. > >I agree, but you need particular knowledge of the database and >your tasks to do this which is far beyond the job of a scheduler. >As already mentioned by someone in this thread, your problem needs >to be solved on the level of the database (using >snapshopt capabilities etc.) Or human intervention. Which requires a clear indication of where it went wrong and allows a simple action to continue the schedule from where it was after these humans solved the issues and ensure consistency. >>> In order to deal with case 1., you can regularly (e.g. each minute) >>> dump the output of "schedule list" (possibly suppressing >non-important >>> data through the options to keep it short). >> >> Or all the necessary information is kept in-sync on persistent >storage. >> This would then also allow easy fail-over if the master-schedule-node >> fails > >No, it wouldn't, since jobs just finishing and wanting to report their >status cannot do this when there is no server. You would need a rather >involved protocol to deal with such situations dynamically. >It can certainly be done, but it is not something which can >easily be "added" as a feature: If this is required, it has to be the >fundamental concept from the very beginning and everything else has to >follow this first aim. You need different protocols than TCP sockets, >to start with; something like "dbus over IP" with servers being able >to announce their new presence, etc. I think it's doable with standard networking protocols. But, either you have a master server which controls everything. Or you have a master process which has failover functionality using classical distributed software techniques. These emails are actually quite useful as I am getting a clear pucture in my head on how I could approach this properly. Thanks, Joost -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-04 19:54 ` J. Roeleveld @ 2014-08-05 6:33 ` Martin Vaeth 2014-08-05 11:32 ` J. Roeleveld 0 siblings, 1 reply; 52+ messages in thread From: Martin Vaeth @ 2014-08-05 6:33 UTC (permalink / raw To: gentoo-user J. Roeleveld <joost@antarean.org> wrote: >> >>No, it wouldn't, since jobs just finishing and wanting to report their >>status cannot do this when there is no server. You would need a rather >>involved protocol to deal with such situations dynamically. >>It can certainly be done, but it is not something which can >>easily be "added" as a feature: If this is required, it has to be the >>fundamental concept from the very beginning and everything else has to >>follow this first aim. You need different protocols than TCP sockets, >>to start with; something like "dbus over IP" with servers being able >>to announce their new presence, etc. > > I think it's doable with standard networking protocols. Yes, you can "tunnel" such a protocol over existing protocols, but "essentially" you must use a different one. Unless you want a static setup (use server A, if that fail use server B, and server A reports everything to server B) it cannot be done in a simple way that you have only one port open on the server: The client also needs a port open to be informed about the "current" server. Even worse, you need a "daemon" running for each client to handle this port. In such a case, you might make each client its own server, by spreading all changes to all clients immediately. > But, either you have a master server which controls everything. > Or you have a master process which has failover functionality > using classical distributed software techniques. This summarizes it quite good. The concept of my "schedule" is to follow the first path (with the advantage of being simple, having only one part, clients do nothing while their "task" is runnning). If you want to follow the latter, you need a rather different CLI and a different protocol - which is practically everything "schedule" consists of; so it is probably simpler to rewrite this from scratch. As I said: It is not a "feature" you can easily add later on; it is a fundamental decision you must choose from the very beginning. When you are at it you should probably also encrypt the communication and establish methods for authentification which is also something I currently omitted in "schedule" for simplicity (although this might be easier to add later on). ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Re: Recommendations for scheduler 2014-08-05 6:33 ` Martin Vaeth @ 2014-08-05 11:32 ` J. Roeleveld 2014-08-08 23:21 ` Martin Vaeth 0 siblings, 1 reply; 52+ messages in thread From: J. Roeleveld @ 2014-08-05 11:32 UTC (permalink / raw To: gentoo-user On Tuesday, August 05, 2014 06:33:59 AM Martin Vaeth wrote: > J. Roeleveld <joost@antarean.org> wrote: > >>No, it wouldn't, since jobs just finishing and wanting to report their > >>status cannot do this when there is no server. You would need a rather > >>involved protocol to deal with such situations dynamically. > >>It can certainly be done, but it is not something which can > >>easily be "added" as a feature: If this is required, it has to be the > >>fundamental concept from the very beginning and everything else has to > >>follow this first aim. You need different protocols than TCP sockets, > >>to start with; something like "dbus over IP" with servers being able > >>to announce their new presence, etc. > >> > > I think it's doable with standard networking protocols. > > Yes, you can "tunnel" such a protocol over existing protocols, > but "essentially" you must use a different one. > Unless you want a static setup (use server A, if that fail use > server B, and server A reports everything to server B) > it cannot be done in a simple way that you have only > one port open on the server: The client also needs a port open > to be informed about the "current" server. Even worse, you need > a "daemon" running for each client to handle this port. > In such a case, you might make each client its own server, > by spreading all changes to all clients immediately. Not necessarily, the client listens on a port and the server connects to the clients it maintains. It then also knows when a client is dead and corresponding jobs have an issue. > > But, either you have a master server which controls everything. > > Or you have a master process which has failover functionality > > using classical distributed software techniques. > > This summarizes it quite good. > The concept of my "schedule" is to follow the first path (with the > advantage of being simple, having only one part, clients do nothing > while their "task" is runnning). > If you want to follow the latter, you need a rather different CLI > and a different protocol - which is practically everything "schedule" > consists of; so it is probably simpler to rewrite this from scratch. > As I said: It is not a "feature" you can easily add later on; it is a > fundamental decision you must choose from the very beginning. > When you are at it you should probably also encrypt the communication > and establish methods for authentification which is also something > I currently omitted in "schedule" for simplicity (although this might > be easier to add later on). I agree. "schedule" is good for most uses we might encounter. For the business case I have, I will need to write something myself. Thanks to this discussion we've been having, I now have a much better idea on how to approach this project. For that I am very thankful. -- Joost ^ permalink raw reply [flat|nested] 52+ messages in thread
* [gentoo-user] Re: Recommendations for scheduler 2014-08-05 11:32 ` J. Roeleveld @ 2014-08-08 23:21 ` Martin Vaeth 0 siblings, 0 replies; 52+ messages in thread From: Martin Vaeth @ 2014-08-08 23:21 UTC (permalink / raw To: gentoo-user > On Tuesday, August 05, 2014 06:33:59 AM Martin Vaeth wrote: > >> When you are at it you should probably also encrypt the communication schedule-0.15 is finally able to use encryption, hence the current mild security risks will practically vanish, even if listening to a world-wide port. schedule-1.0 will probably soon be ready with encryption strengthened even more. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [gentoo-user] Recommendations for scheduler 2014-08-02 9:33 ` Alan McKinnon 2014-08-02 13:31 ` J. Roeleveld @ 2014-08-03 13:02 ` Tanstaafl 1 sibling, 0 replies; 52+ messages in thread From: Tanstaafl @ 2014-08-03 13:02 UTC (permalink / raw To: gentoo-user On 8/2/2014 5:33 AM, Alan McKinnon <alan.mckinnon@gmail.com> wrote: > I have an unusual boss. He's a business owner and quite naturally > profit-driven. He also employs smart people and expects us to maintain > systems in-house. > > He's also a zealous FLOSS fan. > > So when I present him a price tag for software his first question is > always "is there any free as in freedom software suited for the job?" > > I'm still trying to wrap my brains around dealing with a boss that > thinks like this:-) I am *sooooooooooo* jealous... ;) ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2014-08-08 23:21 UTC | newest] Thread overview: 52+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-08-01 17:32 [gentoo-user] Recommendations for scheduler Alan McKinnon 2014-08-01 17:49 ` Сергей 2014-08-01 17:50 ` Сергей 2014-08-01 19:10 ` Alan McKinnon 2014-08-03 9:27 ` Bruce Schultz 2014-08-03 12:08 ` Alan McKinnon 2014-08-04 3:07 ` Bruce Schultz 2014-08-01 18:17 ` [gentoo-user] " James 2014-08-01 19:19 ` Alan McKinnon 2014-08-01 19:35 ` covici 2014-08-02 9:18 ` Alan McKinnon 2014-08-02 13:34 ` J. Roeleveld 2014-08-01 21:17 ` J. Roeleveld 2014-08-01 21:02 ` Martin Vaeth 2014-08-01 21:22 ` J. Roeleveld 2014-08-01 22:06 ` Martin Vaeth 2014-08-02 9:27 ` Alan McKinnon 2014-08-01 21:13 ` [gentoo-user] " J. Roeleveld 2014-08-02 9:33 ` Alan McKinnon 2014-08-02 13:31 ` J. Roeleveld 2014-08-02 14:03 ` Alan McKinnon 2014-08-02 16:53 ` [gentoo-user] " James 2014-08-03 7:23 ` Joost Roeleveld 2014-08-03 12:16 ` Alan McKinnon 2014-08-03 13:33 ` J. Roeleveld 2014-08-05 19:57 ` James 2014-08-05 20:43 ` J. Roeleveld 2014-08-05 21:29 ` Alan McKinnon 2014-08-06 8:29 ` Peter Humphrey 2014-08-06 10:26 ` J. Roeleveld 2014-08-03 7:50 ` Martin Vaeth 2014-08-03 8:06 ` J. Roeleveld 2014-08-03 12:10 ` Martin Vaeth 2014-08-03 13:36 ` J. Roeleveld 2014-08-03 20:04 ` Alan McKinnon 2014-08-03 20:23 ` J. Roeleveld 2014-08-03 20:57 ` Alan McKinnon 2014-08-03 21:10 ` J. Roeleveld 2014-08-04 8:41 ` Martin Vaeth 2014-08-04 9:02 ` J. Roeleveld 2014-08-04 10:11 ` Martin Vaeth 2014-08-04 10:40 ` J. Roeleveld 2014-08-04 13:31 ` Martin Vaeth 2014-08-04 13:35 ` Alan McKinnon 2014-08-04 19:46 ` J. Roeleveld 2014-08-04 20:38 ` Alan McKinnon 2014-08-05 11:42 ` J. Roeleveld 2014-08-04 19:54 ` J. Roeleveld 2014-08-05 6:33 ` Martin Vaeth 2014-08-05 11:32 ` J. Roeleveld 2014-08-08 23:21 ` Martin Vaeth 2014-08-03 13:02 ` [gentoo-user] " Tanstaafl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox