From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from <gentoo-cluster+bounces-420-garchives=archives.gentoo.org@gentoo.org>) id 1JADcc-0004xN-Jv for garchives@archives.gentoo.org; Thu, 03 Jan 2008 00:07:43 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.14.2/8.14.0) with SMTP id m0306l81024995; Thu, 3 Jan 2008 00:06:47 GMT Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.176]) by robin.gentoo.org (8.14.2/8.14.0) with ESMTP id m0306jp7024990 for <gentoo-cluster@lists.gentoo.org>; Thu, 3 Jan 2008 00:06:46 GMT Received: by wa-out-1112.google.com with SMTP id k34so9255114wah.10 for <gentoo-cluster@lists.gentoo.org>; Wed, 02 Jan 2008 16:06:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=VC007gWMVk62KMNdzOMyMT/DW6HU3iJu3lnH6HWNNeI=; b=l+5UCka5qbeboF63vyrn8p8AEUrySTOFBlGY8t3LBAWZIB8+Iq0DfftpIgOAH8pVYGaOWe6zV2ytBg3Kzb6OmTq+96rAr6tWpG0+Tmz+aVHp8b2fXPtaWbb04BKehXNIj4Uqc94Z+DS+Y5KHJ7Pr3jKv1Sbds+Cx9eXL5K91e2g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=I1hhwPO0fAJTmJ9i/WYN8gyJ+Ppog/g8ICXerpFXHlSSZ/xKqql/zC5crHVP6+mlJo9TbKxO/w/G2sm2XSOW3q90l1cZrjX4fYkMzXE16t4u0is1+1fMSJIdh52rzIJdyM16ukogrwz92DjsUifKNaqukfmeXlIKJI9Od3xJ1Ng= Received: by 10.114.159.1 with SMTP id h1mr2957133wae.122.1199318805435; Wed, 02 Jan 2008 16:06:45 -0800 (PST) Received: by 10.114.110.16 with HTTP; Wed, 2 Jan 2008 16:06:45 -0800 (PST) Message-ID: <5b7094580801021606y6a65804ck115731926f2ba0a8@mail.gmail.com> Date: Wed, 2 Jan 2008 16:06:45 -0800 From: "Brian Budge" <brian.budge@gmail.com> To: gentoo-cluster@lists.gentoo.org Subject: Re: [gentoo-cluster] openib, no /dev/infiniband In-Reply-To: <20080102221153.1DE4E2391C5@ece06.nas.nasa.gov> Precedence: bulk List-Post: <mailto:gentoo-cluster@lists.gentoo.org> List-Help: <mailto:gentoo-cluster+help@gentoo.org> List-Unsubscribe: <mailto:gentoo-cluster+unsubscribe@gentoo.org> List-Subscribe: <mailto:gentoo-cluster+subscribe@gentoo.org> List-Id: Gentoo Linux mail <gentoo-cluster.gentoo.org> X-BeenThere: gentoo-cluster@gentoo.org Reply-to: gentoo-cluster@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_35661_480033.1199318805422" References: <5b7094580801021339n22db7c35y8580642c784d2c17@mail.gmail.com> <20080102221153.1DE4E2391C5@ece06.nas.nasa.gov> X-Archives-Salt: 0c2e22bd-ed96-4f26-9cb7-a0cf733c6041 X-Archives-Hash: c6bc7ccc028be837a2416782de8f9564 ------=_Part_35661_480033.1199318805422 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Bryan - I don't seem to have a 40-ib.rules in any of my /etc/udev/rules.d on any node. My /sys/class/infiniband directory contains mthca0, which contains: > ls -la /sys/class/infiniband/mthca0/ total 0 drwxr-xr-x 3 root root 0 Jan 2 20:54 . drwxr-xr-x 3 root root 0 Jan 2 20:54 .. -r--r--r-- 1 root root 4096 Jan 2 21:07 board_id lrwxrwxrwx 1 root root 0 Jan 3 00:01 device -> ../../../devices/pci0000:20/0000:20:0a.0/0000:21:00.0 -r--r--r-- 1 root root 4096 Jan 2 21:07 fw_ver -r--r--r-- 1 root root 4096 Jan 2 21:07 hca_type -r--r--r-- 1 root root 4096 Jan 2 21:07 hw_rev -rw-r--r-- 1 root root 4096 Jan 2 21:07 node_desc -r--r--r-- 1 root root 4096 Jan 2 21:07 node_guid -r--r--r-- 1 root root 4096 Jan 2 21:06 node_type drwxr-xr-x 3 root root 0 Jan 2 21:07 ports lrwxrwxrwx 1 root root 0 Jan 3 00:01 subsystem -> ../../../class/infiniband -r--r--r-- 1 root root 4096 Jan 2 21:07 sys_image_guid --w------- 1 root root 4096 Jan 2 20:54 uevent I don't have any ib modules loaded at all on any node. All of my kernel modules are built into the kernel: CONFIG_INFINIBAND=y CONFIG_INFINIBAND_USER_MAD=y CONFIG_INFINIBAND_USER_ACCESS=y CONFIG_INFINIBAND_USER_MEM=y CONFIG_INFINIBAND_ADDR_TRANS=y CONFIG_INFINIBAND_MTHCA=y CONFIG_INFINIBAND_MTHCA_DEBUG=y # CONFIG_INFINIBAND_IPATH is not set CONFIG_INFINIBAND_AMSO1100=y # CONFIG_INFINIBAND_AMSO1100_DEBUG is not set CONFIG_MLX4_INFINIBAND=y CONFIG_INFINIBAND_IPOIB=y # CONFIG_INFINIBAND_IPOIB_CM is not set CONFIG_INFINIBAND_IPOIB_DEBUG=y # CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set # CONFIG_INFINIBAND_SRP is not set # CONFIG_INFINIBAND_ISER is not set Thanks, Brian On Jan 2, 2008 2:11 PM, Bryan Green <bryan.d.green@nasa.gov> wrote: > "Brian Budge" writes: > > > > Hi all - > > > > I'm new to infiniband and still getting my feet wet. I am admining a > very > > small cluster of 5 nodes, and have recently installed infiniband HCAs. > I > > have the infiniband modules built into the kernel, and I am using the > > openib-userspace package in the gentoo-science overlay. > > > > The strange thing with my situation is that I have infiniband working > with > > openmpi on 4 of my 5 nodes, but the 5th one is a mystery. > > > > All 4 working nodes have a /dev/infiniband directory that look roughly > like > > this: > > > > crw-rw---- 1 root root 231, 64 Dec 31 09:13 issm0 > > crw-rw-rw- 1 root root 231, 224 Dec 31 09:13 ucm0 > > crw-rw---- 1 root root 231, 0 Dec 31 09:13 umad0 > > crw-rw-rw- 1 root root 231, 192 Dec 31 09:13 uverbs0 > > > > > > But the 5th node doesn't, which could indicate the problem (it isn't > > completely the problem, as I tried making those nodes myself to match, > but > > it doesn't help). I'm just not sure what the difference is, because I > > installed them all the same way, they all have the same hardware, and > they > > are all running the same kernel. > > The '/dev/infiniband' subdir is created by the udev rules in > '/etc/udev/rules.d/40-ib.rules' > > Does the '/sys/class/infiniband' directory exist? > If so, what does it contain? What loaded modules with an 'ib_' prefix > does > lsmod report? > > -bryan > > -- > gentoo-cluster@gentoo.org mailing list > > ------=_Part_35661_480033.1199318805422 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi Bryan -<br><br>I don't seem to have a 40-ib.rules in any of my /etc/udev/rules.d on any node.<br><br>My /sys/class/infiniband directory contains mthca0, which contains:<br>> ls -la /sys/class/infiniband/mthca0/<br> total 0<br>drwxr-xr-x 3 root root 0 Jan 2 20:54 .<br>drwxr-xr-x 3 root root 0 Jan 2 20:54 ..<br>-r--r--r-- 1 root root 4096 Jan 2 21:07 board_id<br>lrwxrwxrwx 1 root root 0 Jan 3 00:01 device -> ../../../devices/pci0000:20/0000:20: 0a.0/0000:21:00.0<br>-r--r--r-- 1 root root 4096 Jan 2 21:07 fw_ver<br>-r--r--r-- 1 root root 4096 Jan 2 21:07 hca_type<br>-r--r--r-- 1 root root 4096 Jan 2 21:07 hw_rev<br>-rw-r--r-- 1 root root 4096 Jan 2 21:07 node_desc <br>-r--r--r-- 1 root root 4096 Jan 2 21:07 node_guid<br>-r--r--r-- 1 root root 4096 Jan 2 21:06 node_type<br>drwxr-xr-x 3 root root 0 Jan 2 21:07 ports<br>lrwxrwxrwx 1 root root 0 Jan 3 00:01 subsystem -> ../../../class/infiniband <br>-r--r--r-- 1 root root 4096 Jan 2 21:07 sys_image_guid<br>--w------- 1 root root 4096 Jan 2 20:54 uevent<br><br>I don't have any ib modules loaded at all on any node. All of my kernel modules are built into the kernel: <br><br>CONFIG_INFINIBAND=y<br>CONFIG_INFINIBAND_USER_MAD=y<br>CONFIG_INFINIBAND_USER_ACCESS=y<br>CONFIG_INFINIBAND_USER_MEM=y<br>CONFIG_INFINIBAND_ADDR_TRANS=y<br>CONFIG_INFINIBAND_MTHCA=y<br>CONFIG_INFINIBAND_MTHCA_DEBUG=y <br># CONFIG_INFINIBAND_IPATH is not set<br>CONFIG_INFINIBAND_AMSO1100=y<br># CONFIG_INFINIBAND_AMSO1100_DEBUG is not set<br>CONFIG_MLX4_INFINIBAND=y<br>CONFIG_INFINIBAND_IPOIB=y<br># CONFIG_INFINIBAND_IPOIB_CM is not set <br>CONFIG_INFINIBAND_IPOIB_DEBUG=y<br># CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set<br># CONFIG_INFINIBAND_SRP is not set<br># CONFIG_INFINIBAND_ISER is not set<br><br><br>Thanks,<br> Brian<br><br><div class="gmail_quote"> On Jan 2, 2008 2:11 PM, Bryan Green <<a href="mailto:bryan.d.green@nasa.gov">bryan.d.green@nasa.gov</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div class="Ih2E3d">"Brian Budge" writes:<br>><br>> Hi all -<br>><br>> I'm new to infiniband and still getting my feet wet. I am admining a very<br>> small cluster of 5 nodes, and have recently installed infiniband HCAs. I <br>> have the infiniband modules built into the kernel, and I am using the<br>> openib-userspace package in the gentoo-science overlay.<br>><br>> The strange thing with my situation is that I have infiniband working with <br>> openmpi on 4 of my 5 nodes, but the 5th one is a mystery.<br>><br>> All 4 working nodes have a /dev/infiniband directory that look roughly like<br>> this:<br>><br>> crw-rw---- 1 root root 231, 64 Dec 31 09:13 issm0 <br>> crw-rw-rw- 1 root root 231, 224 Dec 31 09:13 ucm0<br>> crw-rw---- 1 root root 231, 0 Dec 31 09:13 umad0<br>> crw-rw-rw- 1 root root 231, 192 Dec 31 09:13 uverbs0<br>><br>><br>> But the 5th node doesn't, which could indicate the problem (it isn't <br>> completely the problem, as I tried making those nodes myself to match, but<br>> it doesn't help). I'm just not sure what the difference is, because I<br>> installed them all the same way, they all have the same hardware, and they <br>> are all running the same kernel.<br><br></div>The '/dev/infiniband' subdir is created by the udev rules in '/etc/udev/rules.d/40-ib.rules'<br><br>Does the '/sys/class/infiniband' directory exist? <br>If so, what does it contain? What loaded modules with an 'ib_' prefix does<br>lsmod report?<br><br>-bryan<br><font color="#888888"><br>--<br><a href="mailto:gentoo-cluster@gentoo.org">gentoo-cluster@gentoo.org </a> mailing list<br><br></font></blockquote></div><br> ------=_Part_35661_480033.1199318805422-- -- gentoo-cluster@gentoo.org mailing list