* [gentoo-cluster] examples of (large) Gentoo clusters
@ 2006-12-01 23:16 Bryan Green
2006-12-02 3:48 ` Donnie Berkholz
2006-12-02 14:40 ` Nick Anderson
0 siblings, 2 replies; 29+ messages in thread
From: Bryan Green @ 2006-12-01 23:16 UTC (permalink / raw
To: gentoo-cluster
Hello all,
I am looking for something of a survey of examples of Gentoo-driven clusters
out there. If such a survey has been done, perhaps someone point me to it.
But I would like to hear from others on the list about their clusters.
I am in the process of advocating for using Gentoo on a new cluster that we
will be building. The cluster will be a "hyperwall", meaning that each node
will have graphics, forming a grid of displays for multi-parameter,
multi-dimensional scientific visualization. There will also be several disk
servers which will run Suse in order to get Lustre support (Lustre support
on the client side will be OS-neutral when the current beta is officially
released). In addition to graphics, the nodes will also be used for compute
jobs (scientific), and may serve as a testbed for a production scientific
computing environment.
In the process of making my case, I've been asked what other examples there
are of large Gentoo clusters. This cluster will be 128 nodes (dual socket,
dual or quad core). Of particular interest are production and/or scientific
environments - not so much database clusters, though all examples are of
interest. Use of MPI is particularly relevant. Graphics clusters are also
of interest of course.
I'd be grateful for any feedback I get from others on the list about the
clusters they maintain or use, and perhaps some comments about the efficacy
of Gentoo in an environment where stability is very important, and how
system administration compares to administration of a Suse or Redhat cluster.
Thanks,
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-01 23:16 [gentoo-cluster] examples of (large) Gentoo clusters Bryan Green
@ 2006-12-02 3:48 ` Donnie Berkholz
2006-12-02 13:54 ` Hanni Ali
2006-12-02 17:57 ` Bryan Green
2006-12-02 14:40 ` Nick Anderson
1 sibling, 2 replies; 29+ messages in thread
From: Donnie Berkholz @ 2006-12-02 3:48 UTC (permalink / raw
To: gentoo-cluster
[-- Attachment #1: Type: text/plain, Size: 2006 bytes --]
Bryan Green wrote:
> Hello all,
>
> I am looking for something of a survey of examples of Gentoo-driven clusters
> out there. If such a survey has been done, perhaps someone point me to it.
http://www.gentoo.org/proj/en/cluster/#doc_chap2
> But I would like to hear from others on the list about their clusters.
>
> I am in the process of advocating for using Gentoo on a new cluster that we
> will be building. The cluster will be a "hyperwall", meaning that each node
> will have graphics, forming a grid of displays for multi-parameter,
> multi-dimensional scientific visualization. There will also be several disk
> servers which will run Suse in order to get Lustre support (Lustre support
> on the client side will be OS-neutral when the current beta is officially
> released). In addition to graphics, the nodes will also be used for compute
> jobs (scientific), and may serve as a testbed for a production scientific
> computing environment.
Joel Martin has previously posted Lustre ebuilds to the list (for both
client and server, I thinkg). You may be interested. We'll want to get
them into portage at some point, so there's no requirement that you use
Suse server-side.
> I'd be grateful for any feedback I get from others on the list about the
> clusters they maintain or use, and perhaps some comments about the efficacy
> of Gentoo in an environment where stability is very important, and how
> system administration compares to administration of a Suse or Redhat cluster.
The main difference is that, since we're "live," you need to consider
how you want to deal with upgrades. You may wish to pick a static
portage tree, import it into some sort of version control, and
selectively import changes you want (probably just security bumps, which
you can find using the wonderful glsa-check tool from gentoolkit).
I've got a glsa-check wrapper that I use to make things a little easier,
which shows and optionally applies applicable updates. I attached it.
Thanks,
Donnie
[-- Attachment #2: glsa-apply.sh --]
[-- Type: application/x-shellscript, Size: 1033 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-02 3:48 ` Donnie Berkholz
@ 2006-12-02 13:54 ` Hanni Ali
2006-12-02 18:12 ` Bryan Green
2006-12-02 17:57 ` Bryan Green
1 sibling, 1 reply; 29+ messages in thread
From: Hanni Ali @ 2006-12-02 13:54 UTC (permalink / raw
To: gentoo-cluster
[-- Attachment #1: Type: text/plain, Size: 1817 bytes --]
Hi Bryan,
I run a start up which provides Gentoo based clusters for a wide variety of
applications. I find it far simpler to maintain and run Gentoo, portage
simplifies maintenance so much.
The cluster will be a "hyperwall", meaning that each node
> > will have graphics, forming a grid of displays for multi-parameter,
> > multi-dimensional scientific visualization.
Sounds fascinating I do hope you will report how it goes and what method
you use to achieve this.
the nodes will also be used for compute
> > jobs (scientific), and may serve as a testbed for a production
> scientific
> > computing environment.
In my experience MPI works very well on Gentoo.
> I'd be grateful for any feedback I get from others on the list about the
> > clusters they maintain or use, and perhaps some comments about the
> efficacy
> > of Gentoo in an environment where stability is very important, and how
> > system administration compares to administration of a Suse or Redhat
> cluster.
>
I always find sys admin far easier with Gentoo, but wrt clusters I think
architecture of the cluster is as important as the OS, I always recommend
diskless although local disks for replication etc. are fine, but by keeping
the important parts centrally and providing an image for the nodes to boot
the chance of stray mistakes is reduced. This also allows you to improve
stability by testing a new image on one node before deploying across the
cluster.
KlustOS (our OS) which is Gentoo based has been designed to scale to
hundreds if not thousands of nodes and I believe Gentoo is more than capable
of running large production clusters stably. Donnie's advice about security
updates in combination with a testing image is useful as well.
Hanni
--
E-mail: hanni.ali@gmail.com
Mobile: 07985580147
Website: www.ainkaboot.co.uk
[-- Attachment #2: Type: text/html, Size: 2466 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-01 23:16 [gentoo-cluster] examples of (large) Gentoo clusters Bryan Green
2006-12-02 3:48 ` Donnie Berkholz
@ 2006-12-02 14:40 ` Nick Anderson
1 sibling, 0 replies; 29+ messages in thread
From: Nick Anderson @ 2006-12-02 14:40 UTC (permalink / raw
To: gentoo-cluster
Unfortunately we haven't done a gentoo cluster, though with portage
maintenance would be easy. I did want to comment on your processor choice. If
you choose quad core your only option currently is Intel clovertown. These
machines are power hungry. I would reccomend waiting for the opteron quad
core chips that will be comming out in 07. If your going dual core I would
reccomend opterons. They don't require the fully buffered dimms which from
our testing seem to draw about 15W per dimm. The other thing to consider is
the characteristics of your code, what we have seen from the Intel cpus is
jobs run quite well if they are serial jobs, however when running parallel
jobs the opterons still win out.
Just some food for thought ....
On Friday 01 December 2006 17:16, Bryan Green wrote:
> Hello all,
>
> I am looking for something of a survey of examples of Gentoo-driven
> clusters out there. If such a survey has been done, perhaps someone point
> me to it. But I would like to hear from others on the list about their
> clusters.
>
> I am in the process of advocating for using Gentoo on a new cluster that we
> will be building. The cluster will be a "hyperwall", meaning that each
> node will have graphics, forming a grid of displays for multi-parameter,
> multi-dimensional scientific visualization. There will also be several
> disk servers which will run Suse in order to get Lustre support (Lustre
> support on the client side will be OS-neutral when the current beta is
> officially released). In addition to graphics, the nodes will also be used
> for compute jobs (scientific), and may serve as a testbed for a production
> scientific computing environment.
>
> In the process of making my case, I've been asked what other examples there
> are of large Gentoo clusters. This cluster will be 128 nodes (dual socket,
> dual or quad core). Of particular interest are production and/or
> scientific environments - not so much database clusters, though all
> examples are of interest. Use of MPI is particularly relevant. Graphics
> clusters are also of interest of course.
>
> I'd be grateful for any feedback I get from others on the list about the
> clusters they maintain or use, and perhaps some comments about the efficacy
> of Gentoo in an environment where stability is very important, and how
> system administration compares to administration of a Suse or Redhat
> cluster.
>
> Thanks,
> -bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-02 3:48 ` Donnie Berkholz
2006-12-02 13:54 ` Hanni Ali
@ 2006-12-02 17:57 ` Bryan Green
2006-12-02 19:51 ` Philipp Riegger
2006-12-03 3:49 ` Donnie Berkholz
1 sibling, 2 replies; 29+ messages in thread
From: Bryan Green @ 2006-12-02 17:57 UTC (permalink / raw
To: gentoo-cluster
Donnie Berkholz writes:
> Bryan Green wrote:
> > I am looking for something of a survey of examples of Gentoo-driven cluster
> s
> > out there. If such a survey has been done, perhaps someone point me to it.
>
> http://www.gentoo.org/proj/en/cluster/#doc_chap2
Whoa, I don't know how I missed that page. Thanks!
>
> Joel Martin has previously posted Lustre ebuilds to the list (for both
> client and server, I thinkg). You may be interested. We'll want to get
> them into portage at some point, so there's no requirement that you use
> Suse server-side.
Yes, I actually used those ebuilds to test Lustre on our "mini" 3x3
hyperwall which runs Gentoo. I was able to get it working, but over here
they want the supported, released version, whereas those ebuilds are for the
beta. I tried to install the released version, but eventually ran into
problems. Also, since getting support from CFS is a requirement, that
restricts the OS choice to specific versions of Suse or Redhat.
The beta is supposed to become stable and supported by what was January, but
now has apparently been pushed out to March. :( The only chance of putting
Gentoo on the nodes of this cluster is if we can decide to go with the
version that is still currently in beta. This is because the beta, version
1.6, has a "patchless client", and so CFS is agnostic about OS on the client
side. For the server side, support=Suse as far as anyone I've talked to is
concerned.
>
> > I'd be grateful for any feedback I get from others on the list about the
> > clusters they maintain or use, and perhaps some comments about the efficacy
> > of Gentoo in an environment where stability is very important, and how
> > system administration compares to administration of a Suse or Redhat cluste
> r.
>
> The main difference is that, since we're "live," you need to consider
> how you want to deal with upgrades. You may wish to pick a static
> portage tree, import it into some sort of version control, and
> selectively import changes you want (probably just security bumps, which
> you can find using the wonderful glsa-check tool from gentoolkit).
>
> I've got a glsa-check wrapper that I use to make things a little easier,
> which shows and optionally applies applicable updates. I attached it.
>
I'd very interested in the different approaches here. I had thought about a
static portage tree, but that left the problem of getting needed updates,
especially GLSA's. Your suggested approach sounds very interesting.
How big of an extra administrative burden does that create? Maintaining our
own version controlled portage tree might be a hard sell. Thanks for the
script - I'll take a look at it. Is there any documentation out there about
a static portage tree?
-bryan
P.S.: I checked out SiCortex at SC06, and talked to one of the guys there.
Its definitely Gentoo. It sounds like they are a bunch of Gentoo
enthusiasts, actually.
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-02 13:54 ` Hanni Ali
@ 2006-12-02 18:12 ` Bryan Green
0 siblings, 0 replies; 29+ messages in thread
From: Bryan Green @ 2006-12-02 18:12 UTC (permalink / raw
To: gentoo-cluster
"Hanni Ali" writes:
>
> The cluster will be a "hyperwall", meaning that each node
> > > will have graphics, forming a grid of displays for multi-parameter,
> > > multi-dimensional scientific visualization.
>
>
> Sounds fascinating I do hope you will report how it goes and what method
> you use to achieve this.
We already have a 3x3 hyperwall running Gentoo, but thats a lot simpler than
what we will be building. If you are interested, our mini got a little
publicity a year ago in the Gentoo Weekly Newsletter:
http://www.gentoo.org/news/en/gwn/20051205-newsletter.xml#doc_chap2
We also just had a paper published in IEEE Transactions on Visualization and
Computer Graphics. The graphics cluster described is our 7x7, which is
rather old and still running Fedora Core 2.
http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=4015457
The new cluster will be designed to do more of what is described in that
paper.
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-02 17:57 ` Bryan Green
@ 2006-12-02 19:51 ` Philipp Riegger
2006-12-03 3:49 ` Donnie Berkholz
1 sibling, 0 replies; 29+ messages in thread
From: Philipp Riegger @ 2006-12-02 19:51 UTC (permalink / raw
To: gentoo-cluster
On Dec 2, 2006, at 7:57 PM, Bryan Green wrote:
> I'd very interested in the different approaches here. I had
> thought about a
> static portage tree, but that left the problem of getting needed
> updates,
> especially GLSA's. Your suggested approach sounds very interesting.
> How big of an extra administrative burden does that create?
> Maintaining our
> own version controlled portage tree might be a hard sell. Thanks
> for the
> script - I'll take a look at it. Is there any documentation out
> there about
> a static portage tree?
On gentoo-dev there is a discussion going on about a sort of gentoo
stable tree. Chris Gianelloni (if i remember it correctly) stated
that he wanted to create a 2007.1 tree with the 2007.1 release and
only put security fixes and required packages of security fixes in...
I did not make it clear: He wants to take a snapshot of the tree when
2007.1 will be released and then like above.
_But_ there are like 50 more unread messages of the thread in my
mailbox, so might be this is not true anymore. Look at the gentoo-dev
archives.
Philipp
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-02 17:57 ` Bryan Green
2006-12-02 19:51 ` Philipp Riegger
@ 2006-12-03 3:49 ` Donnie Berkholz
2006-12-04 18:15 ` Bryan Green
1 sibling, 1 reply; 29+ messages in thread
From: Donnie Berkholz @ 2006-12-03 3:49 UTC (permalink / raw
To: gentoo-cluster
Bryan Green wrote:
> Yes, I actually used those ebuilds to test Lustre on our "mini" 3x3
> hyperwall which runs Gentoo. I was able to get it working, but over here
> they want the supported, released version, whereas those ebuilds are for the
> beta. I tried to install the released version, but eventually ran into
> problems. Also, since getting support from CFS is a requirement, that
> restricts the OS choice to specific versions of Suse or Redhat.
I guess that means we should get in touch with them to get on the
supported systems list. =)
> I'd very interested in the different approaches here. I had thought about a
> static portage tree, but that left the problem of getting needed updates,
> especially GLSA's. Your suggested approach sounds very interesting.
> How big of an extra administrative burden does that create? Maintaining our
> own version controlled portage tree might be a hard sell. Thanks for the
> script - I'll take a look at it. Is there any documentation out there about
> a static portage tree?
The OSL (Open Source Lab), which hosts much of the Gentoo infrastructure
and runs a lot of other projects on Gentoo boxes, takes a similar
approach to what I mentioned above. I think you already know Corey
Shields, so you could ask him about it.
You may also want to take a look at
http://article.gmane.org/gmane.linux.gentoo.devel/43984 -- it's from one
of our developers who's deployed fairly decent-sized clusters.
Thanks,
Donnie
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-03 3:49 ` Donnie Berkholz
@ 2006-12-04 18:15 ` Bryan Green
2006-12-04 20:53 ` Donnie Berkholz
0 siblings, 1 reply; 29+ messages in thread
From: Bryan Green @ 2006-12-04 18:15 UTC (permalink / raw
To: gentoo-cluster
Donnie Berkholz writes:
> Bryan Green wrote:
> > Yes, I actually used those ebuilds to test Lustre on our "mini" 3x3
> > hyperwall which runs Gentoo. I was able to get it working, but over here
> > they want the supported, released version, whereas those ebuilds are for th
> e
> > beta. I tried to install the released version, but eventually ran into
> > problems. Also, since getting support from CFS is a requirement, that
> > restricts the OS choice to specific versions of Suse or Redhat.
>
> I guess that means we should get in touch with them to get on the
> supported systems list. =)
Sounds like a fine idea to me. :)
I talked to someone there at SC06, but they did not sound terribly open to
the idea of directing precious resources in that direction. But it would be
great to convince them otherwise.
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-04 18:15 ` Bryan Green
@ 2006-12-04 20:53 ` Donnie Berkholz
2006-12-04 22:35 ` Hanni Ali
2006-12-04 23:00 ` Bryan Green
0 siblings, 2 replies; 29+ messages in thread
From: Donnie Berkholz @ 2006-12-04 20:53 UTC (permalink / raw
To: gentoo-cluster
Bryan Green wrote:
> Donnie Berkholz writes:
>> I guess that means we should get in touch with them to get on the
>> supported systems list. =)
>
> Sounds like a fine idea to me. :)
> I talked to someone there at SC06, but they did not sound terribly open to
> the idea of directing precious resources in that direction. But it would be
> great to convince them otherwise.
We would probably need to show that there are a decent number of places
that would like to (or already do) use Lustre on Gentoo, and would be
interested in paid support. We've now got at least you and SiCortex
using it, but not sure about interest in support from SiCortex. Anyone else?
Thanks,
Donnie
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-04 20:53 ` Donnie Berkholz
@ 2006-12-04 22:35 ` Hanni Ali
2006-12-04 23:00 ` Bryan Green
1 sibling, 0 replies; 29+ messages in thread
From: Hanni Ali @ 2006-12-04 22:35 UTC (permalink / raw
To: gentoo-cluster
[-- Attachment #1: Type: text/plain, Size: 1003 bytes --]
Ainkaboot and our customers would probably be interested in Lustre and paid
support.
Hanni
On 04/12/06, Donnie Berkholz <dberkholz@gentoo.org> wrote:
>
> Bryan Green wrote:
> > Donnie Berkholz writes:
> >> I guess that means we should get in touch with them to get on the
> >> supported systems list. =)
> >
> > Sounds like a fine idea to me. :)
> > I talked to someone there at SC06, but they did not sound terribly open
> to
> > the idea of directing precious resources in that direction. But it
> would be
> > great to convince them otherwise.
>
> We would probably need to show that there are a decent number of places
> that would like to (or already do) use Lustre on Gentoo, and would be
> interested in paid support. We've now got at least you and SiCortex
> using it, but not sure about interest in support from SiCortex. Anyone
> else?
>
> Thanks,
> Donnie
> --
> gentoo-cluster@gentoo.org mailing list
>
>
--
E-mail: hanni.ali@gmail.com
Mobile: 07985580147
Website: www.ainkaboot.co.uk
[-- Attachment #2: Type: text/html, Size: 1499 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-04 20:53 ` Donnie Berkholz
2006-12-04 22:35 ` Hanni Ali
@ 2006-12-04 23:00 ` Bryan Green
2006-12-04 23:55 ` Daniel van Ham Colchete
1 sibling, 1 reply; 29+ messages in thread
From: Bryan Green @ 2006-12-04 23:00 UTC (permalink / raw
To: gentoo-cluster
Donnie Berkholz writes:
> Bryan Green wrote:
> > Donnie Berkholz writes:
> >> I guess that means we should get in touch with them to get on the
> >> supported systems list. =)
> >
> > Sounds like a fine idea to me. :)
> > I talked to someone there at SC06, but they did not sound terribly open to
> > the idea of directing precious resources in that direction. But it would b
> e
> > great to convince them otherwise.
>
> We would probably need to show that there are a decent number of places
> that would like to (or already do) use Lustre on Gentoo, and would be
> interested in paid support. We've now got at least you and SiCortex
> using it, but not sure about interest in support from SiCortex. Anyone else?
>
I wonder... They are going to be OS agnostic on the client side when 1.6
comes out, because of the "patchless client", i.e. the kernel on the client
side does not need to be patched.
On the server side, what is missing is a patched gentoo-sources or
vanilla-sources kernel. But we know that there is a lustre-kernel ebuild
out there. Depending on the issues involved, getting them to support Gentoo
may just be a matter of getting them to support the lustre-kernel package.
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-04 23:00 ` Bryan Green
@ 2006-12-04 23:55 ` Daniel van Ham Colchete
2006-12-05 3:55 ` Donnie Berkholz
2006-12-05 13:18 ` John R. Dunning
0 siblings, 2 replies; 29+ messages in thread
From: Daniel van Ham Colchete @ 2006-12-04 23:55 UTC (permalink / raw
To: gentoo-cluster
I studied Lustre last week a little bit and, talking about MDSs and
OSSs, I came with one reason for them not to make Lustre to support
Gentoo: Lustre uses a lot of kernel features that if not enabled will
cause the kernel to crash.
I didn't find any documentation explaning those features but I could
make a list of the orbivious ones: LVM, DM, ext3, ...
I think that even they can't make a list of all those features, that
is why they have to make Lustre available mainly on pre-compiled /
pre-configured kernels. And, thank God, Gentoo doesn't have a
predefined kernel. Although that would make easy for them to change
and distribute it.
What do you think about my ideia?
But that leads to a more generic question: if Linux is always Linux
(the kernel), and the distro is only a way to organize packages, files
and init scripts, why would anyone need restrict an open source
software to a distro? If my first assumption is right, the quicky (but
not necessarily well thought) answer would be: lack of knowledge.
Best,
Daniel Colchete
-
On 12/4/06, Bryan Green <bgreen@nas.nasa.gov> wrote:
> I wonder... They are going to be OS agnostic on the client side when 1.6
> comes out, because of the "patchless client", i.e. the kernel on the client
> side does not need to be patched.
> On the server side, what is missing is a patched gentoo-sources or
> vanilla-sources kernel. But we know that there is a lustre-kernel ebuild
> out there. Depending on the issues involved, getting them to support Gentoo
> may just be a matter of getting them to support the lustre-kernel package.
>
> -bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-04 23:55 ` Daniel van Ham Colchete
@ 2006-12-05 3:55 ` Donnie Berkholz
2006-12-05 13:18 ` John R. Dunning
1 sibling, 0 replies; 29+ messages in thread
From: Donnie Berkholz @ 2006-12-05 3:55 UTC (permalink / raw
To: gentoo-cluster
Daniel van Ham Colchete wrote:
> I studied Lustre last week a little bit and, talking about MDSs and
> OSSs, I came with one reason for them not to make Lustre to support
> Gentoo: Lustre uses a lot of kernel features that if not enabled will
> cause the kernel to crash.
>
> I didn't find any documentation explaning those features but I could
> make a list of the orbivious ones: LVM, DM, ext3, ...
>
> I think that even they can't make a list of all those features, that
> is why they have to make Lustre available mainly on pre-compiled /
> pre-configured kernels. And, thank God, Gentoo doesn't have a
> predefined kernel. Although that would make easy for them to change
> and distribute it.
>
> What do you think about my ideia?
It really shouldn't be that difficult to add in features until it stops
crashing, then specify those features as dependencies in the kernel
build system.
> But that leads to a more generic question: if Linux is always Linux
> (the kernel), and the distro is only a way to organize packages, files
> and init scripts, why would anyone need restrict an open source
> software to a distro? If my first assumption is right, the quicky (but
> not necessarily well thought) answer would be: lack of knowledge.
Sure. If they offer to support Lustre on a distribution, they need to be
able to fix problems on that distribution. That means being aware of
possible distribution-specific interactions that could cause issues and
also knowing how to deal with them as well as reproduce them locally.
Thanks,
Donnie
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-04 23:55 ` Daniel van Ham Colchete
2006-12-05 3:55 ` Donnie Berkholz
@ 2006-12-05 13:18 ` John R. Dunning
2006-12-05 16:25 ` Bryan Green
2006-12-05 21:15 ` Daniel van Ham Colchete
1 sibling, 2 replies; 29+ messages in thread
From: John R. Dunning @ 2006-12-05 13:18 UTC (permalink / raw
To: gentoo-cluster
From: "Daniel van Ham Colchete" <daniel.colchete@gmail.com>
Date: Mon, 4 Dec 2006 21:55:12 -0200
[...]
Lustre uses a lot of kernel features that if not enabled will
cause the kernel to crash.
[...]
I don't think that's true.
I've been running lustre on assorted kernels, mostly under gentoo dists, for
some months, and found that once you get past the installation issues, it's
pretty trouble free.
Now, note the caveats there: The installation issues are non-trivial, mostly
because lustre is very intrusive into the vfs layer. This causes no end of
headaches integrating with various other peoples' kernel patches, due to
collisions with the other peoples' patches to vfs. That statement is as true
of gentoo as it is of any kernel other than the few for which they supply
canned patchsets. But I've never seen anything in there that contitutes using
a kernel feature which causes the kernel to crash if not enabled. The closest
thing I've seen to that is if you muff the patch merging and end up with an
inconsistent patchset, that generally leads to a crash :-}
Lustre 1.6 (at least the client end) doesn't even really *require* all those
kernel patches, ie they do support the idea of a patchless client. The issue
is that lustre changes the logic involved in various kinds of fs operations,
including anything related to lookups, so as to short-circuit much of the work
involved when it figures out that it can do so. Running the client without
the patches will work, but it won't give you the performance that you'd get
with the patches. So odds are anybody who's interested in running lustre in
the first place probably wants the patches too.
Lustre also is not restricted to precompiled kernels, their build script
contains patchsets for things other than their recommended redhat and suse
ones. We routinely compile it up for all kinds of experimental kernels with
no trouble. Again, once you've gotten over the hurdle of getting the kernel
patches integrated, the rest of it behaves reasonably well. The reason cfs
advocates the small number of kernels they do is because they know what a pain
it is to draw outside the lines, and they try to steer people away from that.
We at sicortex are planning on rolling out a gentoo-based cluster that depends
heavily on lustre, so we've spent a fair bit of time banging on it. I'm
pretty sure we understand it at this point. We'll know for sure soon :-}
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-05 13:18 ` John R. Dunning
@ 2006-12-05 16:25 ` Bryan Green
2006-12-05 21:15 ` Daniel van Ham Colchete
1 sibling, 0 replies; 29+ messages in thread
From: Bryan Green @ 2006-12-05 16:25 UTC (permalink / raw
To: gentoo-cluster
"John R. Dunning" writes:
>
> Lustre 1.6 (at least the client end) doesn't even really *require* all those
> kernel patches, ie they do support the idea of a patchless client. The issue
> is that lustre changes the logic involved in various kinds of fs operations,
> including anything related to lookups, so as to short-circuit much of the wor
> k
> involved when it figures out that it can do so. Running the client without
> the patches will work, but it won't give you the performance that you'd get
> with the patches. So odds are anybody who's interested in running lustre in
> the first place probably wants the patches too.
I hadn't realized that the patchless client was potentially
lower-performance than a patched client. Are you sure about that?
How much of a difference do you think it is?
Are you using version 1.6 or 1.4?
>
> We at sicortex are planning on rolling out a gentoo-based cluster that depend
> s
> heavily on lustre, so we've spent a fair bit of time banging on it. I'm
> pretty sure we understand it at this point. We'll know for sure soon :-}
Do you get support from CFS? It seems pretty clear that you do not.
What kernel versions do you use?
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-05 13:18 ` John R. Dunning
2006-12-05 16:25 ` Bryan Green
@ 2006-12-05 21:15 ` Daniel van Ham Colchete
2006-12-05 21:22 ` Bryan Green
2006-12-05 21:28 ` John R. Dunning
1 sibling, 2 replies; 29+ messages in thread
From: Daniel van Ham Colchete @ 2006-12-05 21:15 UTC (permalink / raw
To: gentoo-cluster
On 12/5/06, John R. Dunning <jrd@sicortex.com> wrote:
> From: "Daniel van Ham Colchete" <daniel.colchete@gmail.com>
> Date: Mon, 4 Dec 2006 21:55:12 -0200
> [...]
> Lustre uses a lot of kernel features that if not enabled will
> cause the kernel to crash.
> [...]
> I don't think that's true.
>
> I've been running lustre on assorted kernels, mostly under gentoo dists, for
> some months, and found that once you get past the installation issues, it's
> pretty trouble free.
>
> Now, note the caveats there: The installation issues are non-trivial, mostly
> because lustre is very intrusive into the vfs layer. This causes no end of
> headaches integrating with various other peoples' kernel patches, due to
> collisions with the other peoples' patches to vfs. That statement is as true
> of gentoo as it is of any kernel other than the few for which they supply
> canned patchsets. But I've never seen anything in there that contitutes using
> a kernel feature which causes the kernel to crash if not enabled. The closest
> thing I've seen to that is if you muff the patch merging and end up with an
> inconsistent patchset, that generally leads to a crash :-}
Well, my first Lustre test was crashing on every 'write' operation.
Them I enabled LVM and it worked. I'm using only the vanilla 2.6.12.6
kernel with the lastest 1.4 release.
I have another machine with the same kernel that crashes everytime I
try to use Lustre over the network, either as a client or as a server.
Locally it works perfectly. But I'm still trying to learn it and I
think I still have to spend plenty of time studying it :-).
> Lustre 1.6 (at least the client end) doesn't even really *require* all those
> kernel patches, ie they do support the idea of a patchless client.
That's a very good point.
> We at sicortex are planning on rolling out a gentoo-based cluster that depends
> heavily on lustre, so we've spent a fair bit of time banging on it. I'm
> pretty sure we understand it at this point. We'll know for sure soon :-}
Question: would you use Lustre 1.6 now or you would wait until the
official version is out?
Question: do you expect in upgrade incopability between the current
1.6 beta and next betas or the official version?
Best regards,
Daniel Colchete
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-05 21:15 ` Daniel van Ham Colchete
@ 2006-12-05 21:22 ` Bryan Green
2006-12-05 21:28 ` John R. Dunning
1 sibling, 0 replies; 29+ messages in thread
From: Bryan Green @ 2006-12-05 21:22 UTC (permalink / raw
To: gentoo-cluster
"Daniel van Ham Colchete" writes:
>
> Question: do you expect in upgrade incopability between the current
> 1.6 beta and next betas or the official version?
According to the person that I talked to at CFS, the beta is pretty much
finished, and they are waiting for tester feedback before releasing it as
1.6. However, I just learned that they pushed the release back from January
to March, so perhaps that means another beta release inbetween. I dont
know.
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-05 21:15 ` Daniel van Ham Colchete
2006-12-05 21:22 ` Bryan Green
@ 2006-12-05 21:28 ` John R. Dunning
2006-12-07 0:33 ` Bryan Green
1 sibling, 1 reply; 29+ messages in thread
From: John R. Dunning @ 2006-12-05 21:28 UTC (permalink / raw
To: gentoo-cluster
From: "Daniel van Ham Colchete" <daniel.colchete@gmail.com>
Date: Tue, 5 Dec 2006 19:15:49 -0200
Well, my first Lustre test was crashing on every 'write' operation.
Them I enabled LVM and it worked. I'm using only the vanilla 2.6.12.6
kernel with the lastest 1.4 release.
I'd say something's manged in your kernel/patches. Perhaps due to 1.4; I went
to 1.6 as soon as I was able to, and have no experience with the latest and
greatest 1.4.
Question: would you use Lustre 1.6 now or you would wait until the
official version is out?
If I had to ship today, I'd probably ship the 1.6b5 code. I find lustre 1.4
much more of a headache to configure and manage. Thankfully, I don't have to
ship today; I expect by the time I do, cfs will have released the real 1.6
code.
Question: do you expect in upgrade incopability between the current
1.6 beta and next betas or the official version?
What variety of incompatibility? On-disk format? On-the-wire format?
Something else? The short answer is no, in general the cfs guys seem to do a
pretty good job at making that stuff backward compatible. Having said that,
there was some kind of an incompatibility between 1.6b4 and 1.6b5. So I guess
they don't get it right all the time :-}
The slightly longer answer is "ask cfs". I believe the answer you'll get is
that they claim compatibility for one prior release, and that they make no
claims about compatibility of beta code with anything else.
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-05 21:28 ` John R. Dunning
@ 2006-12-07 0:33 ` Bryan Green
2006-12-07 13:12 ` John R. Dunning
0 siblings, 1 reply; 29+ messages in thread
From: Bryan Green @ 2006-12-07 0:33 UTC (permalink / raw
To: gentoo-cluster
"John R. Dunning" writes:
> From: "Daniel van Ham Colchete" <daniel.colchete@gmail.com>
> Date: Tue, 5 Dec 2006 19:15:49 -0200
>
> Question: would you use Lustre 1.6 now or you would wait until the
> official version is out?
>
> If I had to ship today, I'd probably ship the 1.6b5 code. I find lustre 1.4
> much more of a headache to configure and manage. Thankfully, I don't have to
> ship today; I expect by the time I do, cfs will have released the real 1.6
> code.
It is encouraging to hear that you are willing to base a product on Lustre
1.6. Are you by any chance willing to share some of your knowledge about
installing Lustre on Gentoo with others? :) Perhaps I could make
self-support an option, if it looked like it would be reliable.
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-07 0:33 ` Bryan Green
@ 2006-12-07 13:12 ` John R. Dunning
2006-12-08 3:56 ` Bryan Green
0 siblings, 1 reply; 29+ messages in thread
From: John R. Dunning @ 2006-12-07 13:12 UTC (permalink / raw
To: gentoo-cluster
From: Bryan Green <bgreen@nas.nasa.gov>
Date: Wed, 06 Dec 2006 16:33:12 -0800
"John R. Dunning" writes:
> From: "Daniel van Ham Colchete" <daniel.colchete@gmail.com>
> Date: Tue, 5 Dec 2006 19:15:49 -0200
>
> Question: would you use Lustre 1.6 now or you would wait until the
> official version is out?
>
> If I had to ship today, I'd probably ship the 1.6b5 code. I find lustre 1.4
> much more of a headache to configure and manage. Thankfully, I don't have to
> ship today; I expect by the time I do, cfs will have released the real 1.6
> code.
It is encouraging to hear that you are willing to base a product on Lustre
1.6.
There are problems either way, but based on my experience, I believe 1.6 is a
better choice, at least for the kind of situation I'm expecting to see.
That's based partly on the fact that in my testing I've seen a pretty small
quotient of out-and-out bugs (though there are a couple which are pretty
annoying) and partly on the fact that configuration and management-wise, 1.6
is way easier to deal with. Part of what I expect will be happening in
deployments is to be building lustrefs's on the fly, under control of some
kind of configurator thingie. For that kind of task, 1.4 would be much more
difficult to deal with.
We have a test gentoo cluster system which runs with lustre as its rootfs. It
essentially "just works". I've run numerous benchmarks and tests on it,
including bonnie, iozone, ltp, and assorted bits of application code; for the
most part it's been trouble-free, and the performance is generally pretty
good. There are a few areas where, due to the properties of lustre, things
run unexpectedly slow, but for my purposes, they're all things that can be
lived with. What I conclude from all that is that it's good enough for me to
consider shipping it as part of a product while still being able to sleep at
night :-}
Are you by any chance willing to share some of your knowledge about
installing Lustre on Gentoo with others? :)
Sure.
Are you worrying about the kernel patching and other software installation
issues, or about how to set up the fs itself once you've got the software
together?
Very briefly, the kernel-patching issue is an ongoing headache. Lustre
patches vfs in non-trivial ways. Unfortunately, everybody else does too. It
becomes a fairly ugly patch-merging problem. If you want, I can detail the
process I've settled on for coming up with a kernel patchset, but you won't
like it. There are similar issues around ldiskfs and other bits, but they're
simpler, at least by comparison.
Once the software is installed, configuring the fs goes pretty much by the
book. mkfs.lustre, mount -t lustre, lfs, and lctl are your friends. You'll
have some work to do deciding what your architecture is, in terms of how many
OSTs of what type, what's the interconnect topology which will get you the
best throughput etc, but there aren't really any landmines in there. I've
only worked with the failover stuff a small amount, so can't really say a lot
about that, but the time I did play with it, it seemed to work as advertised.
If you are looking for more detail on something specific, I'm happy to say
what little I know about it.
Perhaps I could make
self-support an option, if it looked like it would be reliable.
Well, obviously, you should test the bejeezus out of your configuration before
you declare open season on it. So far I haven't found reason to believe
lustre is substantially worse than any of the other open-source software
packages which are used in production situations. I think that constitutes a
qualified "yes" :-}
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-07 13:12 ` John R. Dunning
@ 2006-12-08 3:56 ` Bryan Green
2006-12-08 14:18 ` John R. Dunning
0 siblings, 1 reply; 29+ messages in thread
From: Bryan Green @ 2006-12-08 3:56 UTC (permalink / raw
To: gentoo-cluster
"John R. Dunning" writes:
> From: Bryan Green <bgreen@nas.nasa.gov>
>
> It is encouraging to hear that you are willing to base a product on Lustre
> 1.6.
>
> There are problems either way, but based on my experience, I believe 1.6 is a
> better choice, at least for the kind of situation I'm expecting to see.
> That's based partly on the fact that in my testing I've seen a pretty small
> quotient of out-and-out bugs (though there are a couple which are pretty
> annoying) and partly on the fact that configuration and management-wise, 1.6
> is way easier to deal with. Part of what I expect will be happening in
> deployments is to be building lustrefs's on the fly, under control of some
> kind of configurator thingie. For that kind of task, 1.4 would be much more
> difficult to deal with.
>
>From my limited experience with 1.6, and even more limited experience with 1.4, I
wholeheartedly agree with your assessment. Version 1.4 looks like a real headache
to configure. By comparison, 'mount -t lustre' pretty much characterizes the
simplicity of 1.6.
>
> Are you by any chance willing to share some of your knowledge about
> installing Lustre on Gentoo with others? :)
>
> Sure.
>
> Are you worrying about the kernel patching and other software installation
> issues, or about how to set up the fs itself once you've got the software
> together?
Kernel patching. For software installation, the lustre ebuild that was put on
this list recently seemed to do the trick for me, and setup was pretty easy.
I was able to patch the kernel, but the server was somewhat unstable. Actually,
my memory is hazy. I used the 'lustre-sources' ebuild, which effectively packaged
up the patches. It was a 2.6.15 kernel. I also tried to make a custom kernel for
lustre 1.4, but ultimately hit too many roadblocks. I did learn a bit about how
to use 'quilt' though.
>
> Very briefly, the kernel-patching issue is an ongoing headache. Lustre
> patches vfs in non-trivial ways. Unfortunately, everybody else does too. It
> becomes a fairly ugly patch-merging problem. If you want, I can detail the
> process I've settled on for coming up with a kernel patchset, but you won't
> like it. There are similar issues around ldiskfs and other bits, but they're
> simpler, at least by comparison.
I'd be interested in some of the details - off-list if that is more appropriate,
though it might be of interest to others on the list as well. Once you download a
1.6 beta, how do you produce a kernel for Gentoo? Do you patch a gentoo-sources
kernel, a vanilla-sources kernel, or something else? The ideal would perhaps be
to have a 'lustre-sources' ebuild in the gentoo-science overlay. :)
>
> Perhaps I could make
> self-support an option, if it looked like it would be reliable.
>
> Well, obviously, you should test the bejeezus out of your configuration before
> you declare open season on it. So far I haven't found reason to believe
> lustre is substantially worse than any of the other open-source software
> packages which are used in production situations. I think that constitutes a
> qualified "yes" :-}
Are you considering getting support from CFS at some point? Sorry, you don't have
to answer if that is a sensitive question. But part of this thread has been the
topic of encouraging CFS to support Gentoo. Interestingly, my colleague, who is
in charge of installing Lustre (1.4) on our test system, is talking to CFS about
supporting a vanilla kernel configuration. The reason? We can't make the system
stable with a SLES kernel. It was stable for a long time with Gentoo. Now they
seem to have gotten it stable with SLES plus a vanilla 2.6.19 kernel (which of
course does not have the Lustre patches). So they want Suse to provide a newer
SLES kernel with the Lustre patches, and CFS to support that configuration.
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-08 3:56 ` Bryan Green
@ 2006-12-08 14:18 ` John R. Dunning
2006-12-08 17:15 ` Bryan Green
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: John R. Dunning @ 2006-12-08 14:18 UTC (permalink / raw
To: gentoo-cluster
From: Bryan Green <bgreen@nas.nasa.gov>
Date: Thu, 07 Dec 2006 19:56:46 -0800
[...]
By comparison, 'mount -t lustre' pretty much characterizes the
simplicity of 1.6.
Agreed.
> Are you worrying about the kernel patching and other software installation
> issues, or about how to set up the fs itself once you've got the software
> together?
Kernel patching. For software installation, the lustre ebuild that was put on
this list recently seemed to do the trick for me, and setup was pretty easy.
Yeah, I think that ebuild came from us.
I was able to patch the kernel, but the server was somewhat unstable.
Do you remember how it was unstable? That's the kind of thing I'd very much
like to understand, as we're proposing to depend heavily on it. If there are
issues, whether specifically tied to our patches or not, I'd love to know
about them.
Actually,
my memory is hazy. I used the 'lustre-sources' ebuild, which effectively packaged
up the patches. It was a 2.6.15 kernel. I also tried to make a custom kernel for
lustre 1.4, but ultimately hit too many roadblocks. I did learn a bit about how
to use 'quilt' though.
Hmmm. Maybe not. Our stuff ditches quilt.
>
> Very briefly, the kernel-patching issue is an ongoing headache. Lustre
> patches vfs in non-trivial ways. Unfortunately, everybody else does too. It
> becomes a fairly ugly patch-merging problem. If you want, I can detail the
> process I've settled on for coming up with a kernel patchset, but you won't
> like it. There are similar issues around ldiskfs and other bits, but they're
> simpler, at least by comparison.
I'd be interested in some of the details - off-list if that is more appropriate,
though it might be of interest to others on the list as well. Once you download a
1.6 beta, how do you produce a kernel for Gentoo? Do you patch a gentoo-sources
kernel, a vanilla-sources kernel, or something else? The ideal would perhaps be
to have a 'lustre-sources' ebuild in the gentoo-science overlay. :)
We can start here and if people get sick of hearing about it, take it
someplace else.
The approach taken by most of the patches in lustre/kernel_patches/patches is,
for any particular base kernel, go through and add the datastructures and
logic to implement the lustre-specific functionality, which involves changes
to core vfs datastructures, sometimes changes in locking strategy, changes to
arglists etc. They generally start with RHEL or SLES kernels. There are a
couple of problems for the rest of us with that; (a) the RHEL and SLES kernels
tend to be a bit antiquated, and (b) the vendors also tend to make quite free
with the patches to core datastructures. Some of the latter is actually due
to the former; because they're using antique kernels, but they want some bits
of the latest and greatest fixes, they selectively import more modern stuff
as their own vendor patches.
The result of all this is the layer of patches to implement lustre
functionality, when viewed from the point of view of an unpatched kernel,
makes no sense at all. If you try to install such a patchset on a vanilla-ish
kernel, even if you get the right base version, you'll get tons of rejects,
and when you look at them, it's obvious that they depend on stuff that's not
there.
The way I settled on getting to a patchset which doesn't depend on all kinds
of RHEL or SLES was to essentially build a RHEL (fc5, if I recall) kernel,
then "subtract out" the RHEL-ness, then take the resultant kernel and diff it
against a virgin one. That description covers a multitude of sins.
Subtracting out the RHEL-ness (by essentially doing patch -R, then cleaning up
the mess) has the inverse of many of the same problems that you get trying to
patch lustre on top of a vanilla kernel; arglists don't match etc. The only
piece of good news is that you at that point have three datapoints to work
with; vanilla, RHEL, and RHEL+lustre, so it's rather easier (though not
exactly easy) to divine what the intention of the lustre patches is, and work
out how to do the analogous thing without RHEL. Even at that, I had to be
wary of some bits of code which disappeared in the RHEL transition, but came
back when I backed out RHEL, which needed to be given the same treatment as
other analogous bits of code which were still there.
The bottom line is that you have to understand enough about what the lustre
patches are accomplishing that you can come up with analogous patches for the
kernel of your choice, which happens not to be one of the ones cfs ships
patchsets for.
The first time I did all that stuff it took something like 3-4 weeks, with
numerous false starts. The most recent time I did it, it was something like a
couple of days, though that's misleading, because it was very close to the
previous version I was upgrading from. If I had to do it today, starting from
scratch, I'd estimate 3-5 days.
You'll note that nowhere in that set of stuff did I utter the word "gentoo".
The kernel we're using is not really a gentoo kernel. We're mips-based, so
we're starting from something that's perilously close to the mainline
linux-mips kernel, then building it up from there. Thankfully, the linux-mips
guys don't go in for heavy-duty patching of non-platform-related stuff, so
from the point of view of adding lustre to it, it's virtually identical to a
vanilla kernel.org kernel. I believe we may have pulled in a small number of
the gentoo kernel patches, but I'm not the kernel wizard, so don't know off
the top of my head. From my point of view, it looks vanilla.
It's not clear to me how you'd go about making lustre installation work in a
more gentoo-ish kind of way, at least not without a very large amount of work.
I guess I think the most likely path forward would be to work with cfs to try
to get them to support more vanilla kernels, then try to work on the rest of
the gentoo kernel patches to make them fit better. Unfortunately, I suspect
that that still isn't going to be easy, as you've got the classic
patch-collision problem happening all over the place. I suspect that
following that approach would end up with two parallel streams of patches, one
for lustre kernels and one for non-lustre kernels. Unless you can get the
gentoo community to roll lustre in as a standard part of the gentoo patchset.
That probably requires that somebody do a lustre patchset for every kernel
version. Unlikely.
You could, of course, invert the problem and layer lustre on top, but until
such time as gentoo is much more prevalent, I doubt you'll get cfs to do that,
which means that somebody in the gentoo community gets signed up for the task
of re-doing the process I outlined above, for every gentoo kernel which comes
down the pike. I'm not holding my breath for that one either.
A longer term solution is to do some combination of remodularizing vfs and
recasting the lustre stuff so as to depend less on getting its fingers into
the guts. I once spent some time looking into that, and I do believe it's
possible, but it would take some work, and would really need to be done in
concert with the rest of the core kernel guys, and I ran out of time to pursue
it. In the meantime, the more the gentoo community can resist the temptation
to patch the kernel (at least the vfs parts of it), the easier it will be to
add lustre.
Separate from the core kernel patching issues (Hah! you thought I was done,
didn't you?) there's stuff around ldiskfs. The strategy used by lustre is to
grab a copy of ext3, cart it off to the side, change all the names, insert a
few other strategically placed patches, and call it ldiskfs. That then
becomes the basic facility by which actual bits are stored on block devices.
The issue there is roughly similar to the core kernel, but not as severe, ie
any given patchset depends heavily on which specific version of ext3 you
started from. Update the kernel, and if it contained fixes to ext3, you've
got a problem.
In practice, this issue tends to be swamped by the core kernel one, ie getting
lustre going on a specific kernel binds you so tightly to that kernel that you
don't have to worry too much about changing ext3. But at such time as the
kernel integration issue becomes easier to deal with, this one will have to be
addressed as well. My preferred solution would be to simply snag a copy of
ext3 that works, do the foozling once, then make that code be a permanent part
of the lustre distribution, rather than relying on constructing it on the
fly. But that's up to cfs.
So anyhow, the short answer is that there's no real rocket science involved in
getting lustre to work on a gentoo system, but it does take some work, and if
you do it the way I did it, you end up with a system which is more constrained
than a normal gentoo system, because you're no longer free to update the
kernel using the stock tools. For us it's not a huge deal, but I suspect that
some of the gentoo community will balk at that.
[...]
Are you considering getting support from CFS at some point?
We are working with cfs. That doesn't mean they're doing all our work for us
:-}
Honestly, a big part of it is just plain old market sensitivity. Cfs is
paying attention to where their bread and butter is. So far, that's not
gentoo. Perhaps if sicortex is wildly successful we'll be able to change that
equation :-}
Sorry, you don't have
to answer if that is a sensitive question. But part of this thread has been the
topic of encouraging CFS to support Gentoo. Interestingly, my colleague, who is
in charge of installing Lustre (1.4) on our test system, is talking to CFS about
supporting a vanilla kernel configuration. The reason? We can't make the system
stable with a SLES kernel. It was stable for a long time with Gentoo.
I have not observed stability problems; it pretty much just works. If you can
say any more about what issues you ran across, I'd love to hear it.
Now they
seem to have gotten it stable with SLES plus a vanilla 2.6.19 kernel (which of
course does not have the Lustre patches). So they want Suse to provide a newer
SLES kernel with the Lustre patches, and CFS to support that configuration.
Well, ok, I dunno what to tell you about working with the vendors on that
one.
We actually did consider running RHEL or SLES kernels, but remember we're
mips, and looking at the state of the mips support in those kernels, it was
not a pretty picture. We also didn't really want to be in the game of having
that much of a frankenstein system. So our approach has boiled down to
1. Stick close to vanilla
2. Make mips work
3. Do whatever we need to do to make lustre layer on top of that
Based on what you've said, I wouldn't fool around with SLES, I'd just figure
out what close-to-vanilla kernel you want to start from (picking one you think
you can live with for a while) and do some part of what I described above.
You might have a somewhat easier time of it if you started with 2.6.18, as I
believe there's a cfs-supplied patchset for that one. If you want to start
from a gentoo 2.6.18 one, I suspect your task will be to start with vanilla,
make that work, then work out how to re-apply the gentoo patches. Re getting
cfs to help, my bet would be that you'll have an easier time getting the
gentoo community to create patches that are amenable to going on top of a
lustre-ized vanilla kernel (and relying on cfs to support vanilla kernels)
than you will getting cfs to generate patches to go on top of gentoo. If you
watch the lustre lists, you'll see more people asking for vanilla than are
asking for gentoo.
Under no circumstances would I advocate getting a kernel working at some
level, then trying to use the kernel.org patches, or anybody else's, to move
it forward. I tried that a few times, and while I actually did find a couple
of combinations that worked, most of the ones I tried blew up in my face.
It's the same problem; there's all kinds of activity going on in vfs. I hope
that situation doesn't continue indefinitely, but that's the way it seems to
be right now.
I've gone on long enough for now. Feel free to dig deeper if you dare :-}
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-08 14:18 ` John R. Dunning
@ 2006-12-08 17:15 ` Bryan Green
2006-12-11 13:38 ` John R. Dunning
2006-12-08 18:29 ` Donnie Berkholz
2006-12-11 19:44 ` Bryan Green
2 siblings, 1 reply; 29+ messages in thread
From: Bryan Green @ 2006-12-08 17:15 UTC (permalink / raw
To: gentoo-cluster
"John R. Dunning" writes:
> From: Bryan Green <bgreen@nas.nasa.gov>
>
> I was able to patch the kernel, but the server was somewhat unstable.
>
> Do you remember how it was unstable? That's the kind of thing I'd very much
> like to understand, as we're proposing to depend heavily on it. If there are
> issues, whether specifically tied to our patches or not, I'd love to know
> about them.
I remember the system was stable until we tried to shut it down. It would lock up while
shutting down, possibly while unmounting filesystems. I also did not get to do extensive
testing of the system, so I don't know if it would have been stable under real use of the
Lustre filesystem.
> I also tried to make a custom kernel for
> lustre 1.4, but ultimately hit too many roadblocks. I did learn a bit about how
> to use 'quilt' though.
>
> Hmmm. Maybe not. Our stuff ditches quilt.
I just used quilt when working with 1.4. I did not have an ebuild for that.
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-08 14:18 ` John R. Dunning
2006-12-08 17:15 ` Bryan Green
@ 2006-12-08 18:29 ` Donnie Berkholz
2006-12-08 18:52 ` Bryan Green
2006-12-08 19:10 ` John R. Dunning
2006-12-11 19:44 ` Bryan Green
2 siblings, 2 replies; 29+ messages in thread
From: Donnie Berkholz @ 2006-12-08 18:29 UTC (permalink / raw
To: gentoo-cluster
[-- Attachment #1: Type: text/plain, Size: 2216 bytes --]
John R. Dunning wrote:
> From: Bryan Green <bgreen@nas.nasa.gov>
> A longer term solution is to do some combination of remodularizing vfs and
> recasting the lustre stuff so as to depend less on getting its fingers into
> the guts. I once spent some time looking into that, and I do believe it's
> possible, but it would take some work, and would really need to be done in
> concert with the rest of the core kernel guys, and I ran out of time to pursue
> it. In the meantime, the more the gentoo community can resist the temptation
> to patch the kernel (at least the vfs parts of it), the easier it will be to
> add lustre.
> Based on what you've said, I wouldn't fool around with SLES, I'd just figure
> out what close-to-vanilla kernel you want to start from (picking one you think
> you can live with for a while) and do some part of what I described above.
> You might have a somewhat easier time of it if you started with 2.6.18, as I
> believe there's a cfs-supplied patchset for that one. If you want to start
> from a gentoo 2.6.18 one, I suspect your task will be to start with vanilla,
> make that work, then work out how to re-apply the gentoo patches. Re getting
> cfs to help, my bet would be that you'll have an easier time getting the
> gentoo community to create patches that are amenable to going on top of a
> lustre-ized vanilla kernel (and relying on cfs to support vanilla kernels)
> than you will getting cfs to generate patches to go on top of gentoo. If you
> watch the lustre lists, you'll see more people asking for vanilla than are
> asking for gentoo.
I've just got a couple comments on this.
* The gentoo-sources patches are almost all upstream and based on the
W.X.Y.Z "stable release" patchsets, except for 2-3 cases that probably
aren't relevant to clusters. As a result, Gentoo folks would be
reasonably well off just running vanilla-sources, which groups them in
with everyone else wanting Lustre on a vanilla kernel.
* Normally I would recommend hardened-sources for anything resembling a
server, but you should have all your nodes and file servers blocked off
from the Internet anyway so that's a non-issue.
Thanks,
Donnie
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-08 18:29 ` Donnie Berkholz
@ 2006-12-08 18:52 ` Bryan Green
2006-12-08 19:10 ` John R. Dunning
1 sibling, 0 replies; 29+ messages in thread
From: Bryan Green @ 2006-12-08 18:52 UTC (permalink / raw
To: gentoo-cluster
Donnie Berkholz writes:
>
> * The gentoo-sources patches are almost all upstream and based on the
> W.X.Y.Z "stable release" patchsets, except for 2-3 cases that probably
> aren't relevant to clusters. As a result, Gentoo folks would be
> reasonably well off just running vanilla-sources, which groups them in
> with everyone else wanting Lustre on a vanilla kernel.
Yes, using vanilla-sources definitely sounds like the way forward.
(More comments to come... I'm still digesting)
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-08 18:29 ` Donnie Berkholz
2006-12-08 18:52 ` Bryan Green
@ 2006-12-08 19:10 ` John R. Dunning
1 sibling, 0 replies; 29+ messages in thread
From: John R. Dunning @ 2006-12-08 19:10 UTC (permalink / raw
To: gentoo-cluster
From: Donnie Berkholz <dberkholz@gentoo.org>
Date: Fri, 08 Dec 2006 10:29:00 -0800
[...]
I've just got a couple comments on this.
* The gentoo-sources patches are almost all upstream and based on the
W.X.Y.Z "stable release" patchsets, except for 2-3 cases that probably
aren't relevant to clusters. As a result, Gentoo folks would be
reasonably well off just running vanilla-sources, which groups them in
with everyone else wanting Lustre on a vanilla kernel.
Fair enough. If the solution to how to run lustre on a gentoo system starts
with "run a vanilla kernel, not a gentoo kernel", that makes things
considerably easier. It doesn't make all the problems go away, but at least
you've removed one of the uglier dimensions from the task :-}
* Normally I would recommend hardened-sources for anything resembling a
server, but you should have all your nodes and file servers blocked off
from the Internet anyway so that's a non-issue.
Yes. I expect that most of our machines will not be having ports open on the
big-I internet, at least in the early days. Later, that may change, and those
network-security patches will be of more immediate interest to us.
Of somewhat more interest, even early on, are patches for file-system
security. Last I paid attention to it, the plan was to pull in that class of
patch on an ad-hoc basis as we decide they're justified.
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-08 17:15 ` Bryan Green
@ 2006-12-11 13:38 ` John R. Dunning
0 siblings, 0 replies; 29+ messages in thread
From: John R. Dunning @ 2006-12-11 13:38 UTC (permalink / raw
To: gentoo-cluster
From: Bryan Green <bgreen@nas.nasa.gov>
Date: Fri, 08 Dec 2006 09:15:54 -0800
"John R. Dunning" writes:
> From: Bryan Green <bgreen@nas.nasa.gov>
>
> I was able to patch the kernel, but the server was somewhat unstable.
>
> Do you remember how it was unstable? That's the kind of thing I'd very much
> like to understand, as we're proposing to depend heavily on it. If there are
> issues, whether specifically tied to our patches or not, I'd love to know
> about them.
I remember the system was stable until we tried to shut it down.
Ah. I have observed lustre to get cranky when you try to boot the system out
from under it. In particular, if you go to all the servers and /sbin/shutdown
without first shutting down lustre, I've seen it hang. Given the nature of
lustre, that didn't surprise me a lot :-} The times I've shut down lustre in
the correct order (shut down clients, then oss's, then mds, then mgs) it's
always behaved itself.
It would lock up while
shutting down, possibly while unmounting filesystems. I also did not get to do extensive
testing of the system, so I don't know if it would have been stable under real use of the
Lustre filesystem.
Ok. Like I said, I've found a few bugs, but I've never seen it act unstable
in real use.
> I also tried to make a custom kernel for
> lustre 1.4, but ultimately hit too many roadblocks. I did learn a bit about how
> to use 'quilt' though.
>
> Hmmm. Maybe not. Our stuff ditches quilt.
I just used quilt when working with 1.4. I did not have an ebuild for that.
I could never get quilt to work so just ditched it. You don't need it anyhow,
if you ./configure blah-blah --disable-quilt, it works fine. I imagine if you
were doing core development on lustre, in particular trying to actually build
the large collection of patches they ship with it, quilt would be handy, but
for just trying to get the kernel patched, my scripts skip it.
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-cluster] examples of (large) Gentoo clusters
2006-12-08 14:18 ` John R. Dunning
2006-12-08 17:15 ` Bryan Green
2006-12-08 18:29 ` Donnie Berkholz
@ 2006-12-11 19:44 ` Bryan Green
2 siblings, 0 replies; 29+ messages in thread
From: Bryan Green @ 2006-12-11 19:44 UTC (permalink / raw
To: gentoo-cluster
"John R. Dunning" writes:
>
> We are working with cfs. That doesn't mean they're doing all our work for us
> :-}
>
> Honestly, a big part of it is just plain old market sensitivity. Cfs is
> paying attention to where their bread and butter is. So far, that's not
> gentoo. Perhaps if sicortex is wildly successful we'll be able to change that
> equation :-}
>
> ...
>
> So our approach has boiled down to
>
> 1. Stick close to vanilla
> 2. Make mips work
> 3. Do whatever we need to do to make lustre layer on top of that
>
> Based on what you've said, I wouldn't fool around with SLES, I'd just figure
> out what close-to-vanilla kernel you want to start from (picking one you think
> you can live with for a while) and do some part of what I described above.
> You might have a somewhat easier time of it if you started with 2.6.18, as I
> believe there's a cfs-supplied patchset for that one.
Thank you for your extensive feedback. :)
I think you've done a pretty good job of showing the complications involved in putting together
your own lustre kernel. It sounds gnarly. What I take away from this is the need for a vanilla
kernel patchset from CFS, preferably for 2.6.18 or higher. If there is already at least the
basis for a vanilla 2.6.18 patchset in the current beta, that could be the starting point for an
ebuild that would be of use for a while.
Its been awhile since I last tried running Lustre. Perhaps it is time I tried building a 2.6.18
lustre-ized kernel. It depends on how good the patchset provided by CFS is. I don't have the
bandwidth to go through the extensive process that you did to get a working kernel.
-bryan
--
gentoo-cluster@gentoo.org mailing list
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2006-12-11 19:46 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-01 23:16 [gentoo-cluster] examples of (large) Gentoo clusters Bryan Green
2006-12-02 3:48 ` Donnie Berkholz
2006-12-02 13:54 ` Hanni Ali
2006-12-02 18:12 ` Bryan Green
2006-12-02 17:57 ` Bryan Green
2006-12-02 19:51 ` Philipp Riegger
2006-12-03 3:49 ` Donnie Berkholz
2006-12-04 18:15 ` Bryan Green
2006-12-04 20:53 ` Donnie Berkholz
2006-12-04 22:35 ` Hanni Ali
2006-12-04 23:00 ` Bryan Green
2006-12-04 23:55 ` Daniel van Ham Colchete
2006-12-05 3:55 ` Donnie Berkholz
2006-12-05 13:18 ` John R. Dunning
2006-12-05 16:25 ` Bryan Green
2006-12-05 21:15 ` Daniel van Ham Colchete
2006-12-05 21:22 ` Bryan Green
2006-12-05 21:28 ` John R. Dunning
2006-12-07 0:33 ` Bryan Green
2006-12-07 13:12 ` John R. Dunning
2006-12-08 3:56 ` Bryan Green
2006-12-08 14:18 ` John R. Dunning
2006-12-08 17:15 ` Bryan Green
2006-12-11 13:38 ` John R. Dunning
2006-12-08 18:29 ` Donnie Berkholz
2006-12-08 18:52 ` Bryan Green
2006-12-08 19:10 ` John R. Dunning
2006-12-11 19:44 ` Bryan Green
2006-12-02 14:40 ` Nick Anderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox