From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1JAELY-0006MB-Us for garchives@archives.gentoo.org; Thu, 03 Jan 2008 00:54:09 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.14.2/8.14.0) with SMTP id m030rCew031747; Thu, 3 Jan 2008 00:53:12 GMT Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.176]) by robin.gentoo.org (8.14.2/8.14.0) with ESMTP id m030rAnQ031742 for ; Thu, 3 Jan 2008 00:53:11 GMT Received: by wa-out-1112.google.com with SMTP id k34so9276130wah.10 for ; Wed, 02 Jan 2008 16:53:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=5USUmWDcNReiAVGfEn6jP5vpEj1hPHZVaGi9J0OyLaE=; b=ZBtuPkR9kpafsSNxp9v3WxHKc7/VkQiaRSk62G94Lld4K7K+JXjHsW7SsjngBSZGMI8jvRQdYVA0yeRc4fHdtryvBcweqvRg0hL1HH0TivyOSWXUtCFi5/M7vZankajCbc85YFgap/3Rms6rkvxY0VmE/vYOwHXOqVZvp2Pd4pA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=vBIqzwwrDsbcf8yvcGiUKVWg1/mNG748QZUGdnkaxWh3YVBqRt81j3dxRvvH8YAjlzIYlVx6/8nTD3v6JtloRNKcIQsGOmNxASl/qwkWVMLm1BX0HZT7NclLGNCo5flDUESGpoN9Ha2MP/4CsjBlsfrPAwMZMFAmtJy5t01VCsY= Received: by 10.115.55.1 with SMTP id h1mr14559880wak.69.1199321590384; Wed, 02 Jan 2008 16:53:10 -0800 (PST) Received: by 10.114.110.16 with HTTP; Wed, 2 Jan 2008 16:53:10 -0800 (PST) Message-ID: <5b7094580801021653u3394be30q824c6d8ff9aeca7b@mail.gmail.com> Date: Wed, 2 Jan 2008 16:53:10 -0800 From: "Brian Budge" To: gentoo-cluster@lists.gentoo.org Subject: Re: [gentoo-cluster] openib, no /dev/infiniband In-Reply-To: <20080103004149.2CEA62391C5@ece06.nas.nasa.gov> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-cluster@gentoo.org Reply-to: gentoo-cluster@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_35798_65949.1199321590371" References: <5b7094580801021606y6a65804ck115731926f2ba0a8@mail.gmail.com> <20080103004149.2CEA62391C5@ece06.nas.nasa.gov> X-Archives-Salt: e130e9f7-26b5-48a8-a9f4-34a9b6fbc89e X-Archives-Hash: a198bae8e50640db5b7b0bf54cfbd7ed ------=_Part_35798_65949.1199321590371 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Bryan - Thanks! I inserted the rules, and restarted udev. The permissions were wrong on the components, but I manually changed them and it works! Thanks again for your help, Brian On Jan 2, 2008 4:41 PM, Bryan Green wrote: > "Brian Budge" writes: > > Hi Bryan - > > > > I don't seem to have a 40-ib.rules in any of my /etc/udev/rules.d on any > > node. > > Aha. That file is part of sys-cluster/openib-drivers, which you don't > have > installed. You can use the infiniband support that is part of the kernel, > but the driver versions won't match your openib userspace software > versions > (not necessarily a problem), and you'll be missing the startup scripts. > > Older versions of the files are installed by the openib-files ebuild. > That > ebuild is currently incompatible with openib-userspace, though it perhaps > shouldn't be. I guess I could fix that so you could install openib-files. > But if possible, I'd recommend turning off the kernel builtin drivers, and > emerge openib-drivers. > > In the mean time, just installing '40-ib.rules' might help, but I'm not > sure. > > #### /etc/udev/rules.d/40-ib.rules #### > KERNEL=="umad*", NAME="infiniband/%k" > KERNEL=="issm*", NAME="infiniband/%k" > KERNEL=="ucm*", NAME="infiniband/%k", MODE="0666" > KERNEL=="uverbs*", NAME="infiniband/%k", MODE="0666" > KERNEL=="uat", NAME="infiniband/%k", MODE="0666" > KERNEL=="ucma", NAME="infiniband/%k", MODE="0666" > KERNEL=="rdma_cm", NAME="infiniband/%k", MODE="0666" > ######## > > > My /sys/class/infiniband directory contains mthca0, which contains: > > > ls -la /sys/class/infiniband/mthca0/ > > total 0 > > drwxr-xr-x 3 root root 0 Jan 2 20:54 . > > drwxr-xr-x 3 root root 0 Jan 2 20:54 .. > > -r--r--r-- 1 root root 4096 Jan 2 21:07 board_id > > lrwxrwxrwx 1 root root 0 Jan 3 00:01 device -> > > ../../../devices/pci0000:20/0000:20:0a.0/0000:21:00.0 > > -r--r--r-- 1 root root 4096 Jan 2 21:07 fw_ver > > -r--r--r-- 1 root root 4096 Jan 2 21:07 hca_type > > -r--r--r-- 1 root root 4096 Jan 2 21:07 hw_rev > > -rw-r--r-- 1 root root 4096 Jan 2 21:07 node_desc > > -r--r--r-- 1 root root 4096 Jan 2 21:07 node_guid > > -r--r--r-- 1 root root 4096 Jan 2 21:06 node_type > > drwxr-xr-x 3 root root 0 Jan 2 21:07 ports > > lrwxrwxrwx 1 root root 0 Jan 3 00:01 subsystem -> > > ../../../class/infiniband > > -r--r--r-- 1 root root 4096 Jan 2 21:07 sys_image_guid > > --w------- 1 root root 4096 Jan 2 20:54 uevent > > Is this what is shown on the node that does not have '/dev/infiniband'? > > What about '/sys/class/infiniband_verbs/'? > > -- > gentoo-cluster@gentoo.org mailing list > > ------=_Part_35798_65949.1199321590371 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Bryan -

Thanks!  I inserted the rules, and restarted udev.  The permissions were wrong on the components, but I manually changed them and it works!

Thanks again for your help,
  Brian

On Jan 2, 2008 4:41 PM, Bryan Green <bryan.d.green@nasa.gov> wrote:
"Brian Budge" writes:
> Hi Bryan -
>
> I don't seem to have a 40-ib.rules in any of my /etc/udev/rules.d on any
> node.

Aha.  That file is part of sys-cluster/openib-drivers, which you don't have
installed.  You can use the infiniband support that is part of the kernel,
but the driver versions won't match your openib userspace software versions
(not necessarily a problem), and you'll be missing the startup scripts.

Older versions of the files are installed by the openib-files ebuild.  That
ebuild is currently incompatible with openib-userspace, though it perhaps
shouldn't be.  I guess I could fix that so you could install openib-files.
But if possible, I'd recommend turning off the kernel builtin drivers, and
emerge openib-drivers.

In the mean time, just installing '40-ib.rules' might help, but I'm not
sure.

####  /etc/udev/rules.d/40- ib.rules  ####
KERNEL=="umad*", NAME="infiniband/%k"
KERNEL=="issm*", NAME="infiniband/%k"
KERNEL=="ucm*", NAME="infiniband/%k", MODE="0666"
KERNEL=="uverbs*", NAME="infiniband/%k", MODE="0666"
KERNEL=="uat", NAME="infiniband/%k", MODE="0666"
KERNEL=="ucma", NAME="infiniband/%k", MODE="0666"
KERNEL=="rdma_cm", NAME="infiniband/%k", MODE="0666"
########

> My /sys/class/infiniband directory contains mthca0, which contains:
> > ls -la /sys/class/infiniband/mthca0/
> total 0
> drwxr-xr-x 3 root root    0 Jan  2 20:54 .
> drwxr-xr-x 3 root root    0 Jan  2 20:54 ..
> -r--r--r-- 1 root root 4096 Jan  2 21:07 board_id
> lrwxrwxrwx 1 root root    0 Jan  3 00:01 device ->
> ../../../devices/pci0000:20/0000:20:0a.0/0000:21:00.0
> -r--r--r-- 1 root root 4096 Jan  2 21:07 fw_ver
> -r--r--r-- 1 root root 4096 Jan  2 21:07 hca_type
> -r--r--r-- 1 root root 4096 Jan  2 21:07 hw_rev
> -rw-r--r-- 1 root root 4096 Jan  2 21:07 node_desc
> -r--r--r-- 1 root root 4096 Jan  2 21:07 node_guid
> -r--r--r-- 1 root root 4096 Jan  2 21:06 node_type
> drwxr-xr-x 3 root root    0 Jan  2 21:07 ports
> lrwxrwxrwx 1 root root    0 Jan  3 00:01 subsystem ->
> ../../../class/infiniband
> -r--r--r-- 1 root root 4096 Jan  2 21:07 sys_image_guid
> --w------- 1 root root 4096 Jan  2 20:54 uevent

Is this what is shown on the node that does not have '/dev/infiniband'?

What about '/sys/class/infiniband_verbs/'?

--

------=_Part_35798_65949.1199321590371-- -- gentoo-cluster@gentoo.org mailing list