public inbox for gentoo-cluster@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-cluster] openib, no /dev/infiniband
@ 2008-01-02 21:39 Brian Budge
  2008-01-02 22:11 ` Bryan Green
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Budge @ 2008-01-02 21:39 UTC (permalink / raw
  To: gentoo-cluster

[-- Attachment #1: Type: text/plain, Size: 2303 bytes --]

Hi all -

I'm new to infiniband and still getting my feet wet.  I am admining a very
small cluster of 5 nodes, and have recently installed infiniband HCAs.  I
have the infiniband modules built into the kernel, and I am using the
openib-userspace package in the gentoo-science overlay.

The strange thing with my situation is that I have infiniband working with
openmpi on 4 of my 5 nodes, but the 5th one is a mystery.

All 4 working nodes have a /dev/infiniband directory that look roughly like
this:

crw-rw---- 1 root root 231,  64 Dec 31 09:13 issm0
crw-rw-rw- 1 root root 231, 224 Dec 31 09:13 ucm0
crw-rw---- 1 root root 231,   0 Dec 31 09:13 umad0
crw-rw-rw- 1 root root 231, 192 Dec 31 09:13 uverbs0


But the 5th node doesn't, which could indicate the problem (it isn't
completely the problem, as I tried making those nodes myself to match, but
it doesn't help).  I'm just not sure what the difference is, because I
installed them all the same way, they all have the same hardware, and they
are all running the same kernel.

All 5 nodes have the same thing in the /sys/class/infiniband directory.

Here's the mpirun I am trying:

mpirun -np 2 -mca btl self,openib -machinefile burn_machine_file ./loadtest
[burn-3][0,1,1][btl_openib_component.c:437:init_one_hca] error obtaining
device context for mthca0 errno says No such file or directory

--------------------------------------------------------------------------
WARNING: There were errors during IB HCA initialization on host 'burn-3'.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There is at least on IB HCA found on host 'burn-3', but there is
no active ports detected. This is most certainly not what you wanted.
Check your cables and SM configuration.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------

Any help would be appreciated!  Thanks.

  Brian

[-- Attachment #2: Type: text/html, Size: 2538 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-01-03  0:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-02 21:39 [gentoo-cluster] openib, no /dev/infiniband Brian Budge
2008-01-02 22:11 ` Bryan Green
2008-01-03  0:06   ` Brian Budge
2008-01-03  0:41     ` Bryan Green
2008-01-03  0:53       ` Brian Budge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox