From: "Brian Budge" <brian.budge@gmail.com>
To: gentoo-cluster@lists.gentoo.org
Subject: [gentoo-cluster] openib, no /dev/infiniband
Date: Wed, 2 Jan 2008 13:39:37 -0800 [thread overview]
Message-ID: <5b7094580801021339n22db7c35y8580642c784d2c17@mail.gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 2303 bytes --]
Hi all -
I'm new to infiniband and still getting my feet wet. I am admining a very
small cluster of 5 nodes, and have recently installed infiniband HCAs. I
have the infiniband modules built into the kernel, and I am using the
openib-userspace package in the gentoo-science overlay.
The strange thing with my situation is that I have infiniband working with
openmpi on 4 of my 5 nodes, but the 5th one is a mystery.
All 4 working nodes have a /dev/infiniband directory that look roughly like
this:
crw-rw---- 1 root root 231, 64 Dec 31 09:13 issm0
crw-rw-rw- 1 root root 231, 224 Dec 31 09:13 ucm0
crw-rw---- 1 root root 231, 0 Dec 31 09:13 umad0
crw-rw-rw- 1 root root 231, 192 Dec 31 09:13 uverbs0
But the 5th node doesn't, which could indicate the problem (it isn't
completely the problem, as I tried making those nodes myself to match, but
it doesn't help). I'm just not sure what the difference is, because I
installed them all the same way, they all have the same hardware, and they
are all running the same kernel.
All 5 nodes have the same thing in the /sys/class/infiniband directory.
Here's the mpirun I am trying:
mpirun -np 2 -mca btl self,openib -machinefile burn_machine_file ./loadtest
[burn-3][0,1,1][btl_openib_component.c:437:init_one_hca] error obtaining
device context for mthca0 errno says No such file or directory
--------------------------------------------------------------------------
WARNING: There were errors during IB HCA initialization on host 'burn-3'.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There is at least on IB HCA found on host 'burn-3', but there is
no active ports detected. This is most certainly not what you wanted.
Check your cables and SM configuration.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
Any help would be appreciated! Thanks.
Brian
[-- Attachment #2: Type: text/html, Size: 2538 bytes --]
next reply other threads:[~2008-01-02 21:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-02 21:39 Brian Budge [this message]
2008-01-02 22:11 ` [gentoo-cluster] openib, no /dev/infiniband Bryan Green
2008-01-03 0:06 ` Brian Budge
2008-01-03 0:41 ` Bryan Green
2008-01-03 0:53 ` Brian Budge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5b7094580801021339n22db7c35y8580642c784d2c17@mail.gmail.com \
--to=brian.budge@gmail.com \
--cc=gentoo-cluster@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox