On Thu, Jan 10, 2008 at 02:59:27PM +0100, Jos Houtman wrote: > For my master thesis I took up a project that requires mapping of a number of statically defined parallel jobs into a more dynamic environment that allows better scaling. > The situation as described below let me to believe a cluster or distributed queue (DrQueue?) solution is necessary. For the situation see [situation] at the end of this email. Off the top of my head, many of your requirements are available in two totally different apps: - Gearman, written by Brad Fitzpatrick @ LiveJournal. Perl mainly, I think there are other interfaces as well to it. - Torque/PBS - somewhat less of a fit, I'm not certain about running perpetual jobs. You may also need some degree of STONITH for the job running only once during node failure case. (Say the job manager crashes, the job is still running, but you have no control of it. You need to zap it hard). -- Robin Hugh Johnson Gentoo Linux Developer & Infra Guy E-Mail : robbat2@gentoo.org GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85