From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1JADcc-0004xN-Jv for garchives@archives.gentoo.org; Thu, 03 Jan 2008 00:07:43 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.14.2/8.14.0) with SMTP id m0306l81024995; Thu, 3 Jan 2008 00:06:47 GMT Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.176]) by robin.gentoo.org (8.14.2/8.14.0) with ESMTP id m0306jp7024990 for ; Thu, 3 Jan 2008 00:06:46 GMT Received: by wa-out-1112.google.com with SMTP id k34so9255114wah.10 for ; Wed, 02 Jan 2008 16:06:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=VC007gWMVk62KMNdzOMyMT/DW6HU3iJu3lnH6HWNNeI=; b=l+5UCka5qbeboF63vyrn8p8AEUrySTOFBlGY8t3LBAWZIB8+Iq0DfftpIgOAH8pVYGaOWe6zV2ytBg3Kzb6OmTq+96rAr6tWpG0+Tmz+aVHp8b2fXPtaWbb04BKehXNIj4Uqc94Z+DS+Y5KHJ7Pr3jKv1Sbds+Cx9eXL5K91e2g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=I1hhwPO0fAJTmJ9i/WYN8gyJ+Ppog/g8ICXerpFXHlSSZ/xKqql/zC5crHVP6+mlJo9TbKxO/w/G2sm2XSOW3q90l1cZrjX4fYkMzXE16t4u0is1+1fMSJIdh52rzIJdyM16ukogrwz92DjsUifKNaqukfmeXlIKJI9Od3xJ1Ng= Received: by 10.114.159.1 with SMTP id h1mr2957133wae.122.1199318805435; Wed, 02 Jan 2008 16:06:45 -0800 (PST) Received: by 10.114.110.16 with HTTP; Wed, 2 Jan 2008 16:06:45 -0800 (PST) Message-ID: <5b7094580801021606y6a65804ck115731926f2ba0a8@mail.gmail.com> Date: Wed, 2 Jan 2008 16:06:45 -0800 From: "Brian Budge" To: gentoo-cluster@lists.gentoo.org Subject: Re: [gentoo-cluster] openib, no /dev/infiniband In-Reply-To: <20080102221153.1DE4E2391C5@ece06.nas.nasa.gov> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-cluster@gentoo.org Reply-to: gentoo-cluster@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_35661_480033.1199318805422" References: <5b7094580801021339n22db7c35y8580642c784d2c17@mail.gmail.com> <20080102221153.1DE4E2391C5@ece06.nas.nasa.gov> X-Archives-Salt: 0c2e22bd-ed96-4f26-9cb7-a0cf733c6041 X-Archives-Hash: c6bc7ccc028be837a2416782de8f9564 ------=_Part_35661_480033.1199318805422 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Bryan - I don't seem to have a 40-ib.rules in any of my /etc/udev/rules.d on any node. My /sys/class/infiniband directory contains mthca0, which contains: > ls -la /sys/class/infiniband/mthca0/ total 0 drwxr-xr-x 3 root root 0 Jan 2 20:54 . drwxr-xr-x 3 root root 0 Jan 2 20:54 .. -r--r--r-- 1 root root 4096 Jan 2 21:07 board_id lrwxrwxrwx 1 root root 0 Jan 3 00:01 device -> ../../../devices/pci0000:20/0000:20:0a.0/0000:21:00.0 -r--r--r-- 1 root root 4096 Jan 2 21:07 fw_ver -r--r--r-- 1 root root 4096 Jan 2 21:07 hca_type -r--r--r-- 1 root root 4096 Jan 2 21:07 hw_rev -rw-r--r-- 1 root root 4096 Jan 2 21:07 node_desc -r--r--r-- 1 root root 4096 Jan 2 21:07 node_guid -r--r--r-- 1 root root 4096 Jan 2 21:06 node_type drwxr-xr-x 3 root root 0 Jan 2 21:07 ports lrwxrwxrwx 1 root root 0 Jan 3 00:01 subsystem -> ../../../class/infiniband -r--r--r-- 1 root root 4096 Jan 2 21:07 sys_image_guid --w------- 1 root root 4096 Jan 2 20:54 uevent I don't have any ib modules loaded at all on any node. All of my kernel modules are built into the kernel: CONFIG_INFINIBAND=y CONFIG_INFINIBAND_USER_MAD=y CONFIG_INFINIBAND_USER_ACCESS=y CONFIG_INFINIBAND_USER_MEM=y CONFIG_INFINIBAND_ADDR_TRANS=y CONFIG_INFINIBAND_MTHCA=y CONFIG_INFINIBAND_MTHCA_DEBUG=y # CONFIG_INFINIBAND_IPATH is not set CONFIG_INFINIBAND_AMSO1100=y # CONFIG_INFINIBAND_AMSO1100_DEBUG is not set CONFIG_MLX4_INFINIBAND=y CONFIG_INFINIBAND_IPOIB=y # CONFIG_INFINIBAND_IPOIB_CM is not set CONFIG_INFINIBAND_IPOIB_DEBUG=y # CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set # CONFIG_INFINIBAND_SRP is not set # CONFIG_INFINIBAND_ISER is not set Thanks, Brian On Jan 2, 2008 2:11 PM, Bryan Green wrote: > "Brian Budge" writes: > > > > Hi all - > > > > I'm new to infiniband and still getting my feet wet. I am admining a > very > > small cluster of 5 nodes, and have recently installed infiniband HCAs. > I > > have the infiniband modules built into the kernel, and I am using the > > openib-userspace package in the gentoo-science overlay. > > > > The strange thing with my situation is that I have infiniband working > with > > openmpi on 4 of my 5 nodes, but the 5th one is a mystery. > > > > All 4 working nodes have a /dev/infiniband directory that look roughly > like > > this: > > > > crw-rw---- 1 root root 231, 64 Dec 31 09:13 issm0 > > crw-rw-rw- 1 root root 231, 224 Dec 31 09:13 ucm0 > > crw-rw---- 1 root root 231, 0 Dec 31 09:13 umad0 > > crw-rw-rw- 1 root root 231, 192 Dec 31 09:13 uverbs0 > > > > > > But the 5th node doesn't, which could indicate the problem (it isn't > > completely the problem, as I tried making those nodes myself to match, > but > > it doesn't help). I'm just not sure what the difference is, because I > > installed them all the same way, they all have the same hardware, and > they > > are all running the same kernel. > > The '/dev/infiniband' subdir is created by the udev rules in > '/etc/udev/rules.d/40-ib.rules' > > Does the '/sys/class/infiniband' directory exist? > If so, what does it contain? What loaded modules with an 'ib_' prefix > does > lsmod report? > > -bryan > > -- > gentoo-cluster@gentoo.org mailing list > > ------=_Part_35661_480033.1199318805422 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Bryan -

I don't seem to have a 40-ib.rules in any of my /etc/udev/rules.d on any node.

My /sys/class/infiniband directory contains mthca0, which contains:
> ls -la /sys/class/infiniband/mthca0/
total 0
drwxr-xr-x 3 root root    0 Jan  2 20:54 .
drwxr-xr-x 3 root root    0 Jan  2 20:54 ..
-r--r--r-- 1 root root 4096 Jan  2 21:07 board_id
lrwxrwxrwx 1 root root    0 Jan  3 00:01 device -> ../../../devices/pci0000:20/0000:20: 0a.0/0000:21:00.0
-r--r--r-- 1 root root 4096 Jan  2 21:07 fw_ver
-r--r--r-- 1 root root 4096 Jan  2 21:07 hca_type
-r--r--r-- 1 root root 4096 Jan  2 21:07 hw_rev
-rw-r--r-- 1 root root 4096 Jan  2 21:07 node_desc
-r--r--r-- 1 root root 4096 Jan  2 21:07 node_guid
-r--r--r-- 1 root root 4096 Jan  2 21:06 node_type
drwxr-xr-x 3 root root    0 Jan  2 21:07 ports
lrwxrwxrwx 1 root root    0 Jan  3 00:01 subsystem -> ../../../class/infiniband
-r--r--r-- 1 root root 4096 Jan  2 21:07 sys_image_guid
--w------- 1 root root 4096 Jan  2 20:54 uevent

I don't have any ib modules loaded at all on any node.  All of my kernel modules are built into the kernel:

CONFIG_INFINIBAND=y
CONFIG_INFINIBAND_USER_MAD=y
CONFIG_INFINIBAND_USER_ACCESS=y
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=y
CONFIG_INFINIBAND_MTHCA_DEBUG=y
# CONFIG_INFINIBAND_IPATH is not set
CONFIG_INFINIBAND_AMSO1100=y
# CONFIG_INFINIBAND_AMSO1100_DEBUG is not set
CONFIG_MLX4_INFINIBAND=y
CONFIG_INFINIBAND_IPOIB=y
# CONFIG_INFINIBAND_IPOIB_CM is not set
CONFIG_INFINIBAND_IPOIB_DEBUG=y
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set
# CONFIG_INFINIBAND_SRP is not set
# CONFIG_INFINIBAND_ISER is not set


Thanks,
  Brian

On Jan 2, 2008 2:11 PM, Bryan Green <bryan.d.green@nasa.gov> wrote:
"Brian Budge" writes:
>
> Hi all -
>
> I'm new to infiniband and still getting my feet wet.  I am admining a very
> small cluster of 5 nodes, and have recently installed infiniband HCAs.  I
> have the infiniband modules built into the kernel, and I am using the
> openib-userspace package in the gentoo-science overlay.
>
> The strange thing with my situation is that I have infiniband working with
> openmpi on 4 of my 5 nodes, but the 5th one is a mystery.
>
> All 4 working nodes have a /dev/infiniband directory that look roughly like
> this:
>
> crw-rw---- 1 root root 231,  64 Dec 31 09:13 issm0
> crw-rw-rw- 1 root root 231, 224 Dec 31 09:13 ucm0
> crw-rw---- 1 root root 231,   0 Dec 31 09:13 umad0
> crw-rw-rw- 1 root root 231, 192 Dec 31 09:13 uverbs0
>
>
> But the 5th node doesn't, which could indicate the problem (it isn't
> completely the problem, as I tried making those nodes myself to match, but
> it doesn't help).  I'm just not sure what the difference is, because I
> installed them all the same way, they all have the same hardware, and they
> are all running the same kernel.

The '/dev/infiniband' subdir is created by the udev rules in '/etc/udev/rules.d/40-ib.rules'

Does the '/sys/class/infiniband' directory exist?
If so, what does it contain?  What loaded modules with an 'ib_' prefix does
lsmod report?

-bryan

--
gentoo-cluster@gentoo.org mailing list


------=_Part_35661_480033.1199318805422-- -- gentoo-cluster@gentoo.org mailing list