Hi all - I'm new to infiniband and still getting my feet wet. I am admining a very small cluster of 5 nodes, and have recently installed infiniband HCAs. I have the infiniband modules built into the kernel, and I am using the openib-userspace package in the gentoo-science overlay. The strange thing with my situation is that I have infiniband working with openmpi on 4 of my 5 nodes, but the 5th one is a mystery. All 4 working nodes have a /dev/infiniband directory that look roughly like this: crw-rw---- 1 root root 231, 64 Dec 31 09:13 issm0 crw-rw-rw- 1 root root 231, 224 Dec 31 09:13 ucm0 crw-rw---- 1 root root 231, 0 Dec 31 09:13 umad0 crw-rw-rw- 1 root root 231, 192 Dec 31 09:13 uverbs0 But the 5th node doesn't, which could indicate the problem (it isn't completely the problem, as I tried making those nodes myself to match, but it doesn't help). I'm just not sure what the difference is, because I installed them all the same way, they all have the same hardware, and they are all running the same kernel. All 5 nodes have the same thing in the /sys/class/infiniband directory. Here's the mpirun I am trying: mpirun -np 2 -mca btl self,openib -machinefile burn_machine_file ./loadtest [burn-3][0,1,1][btl_openib_component.c:437:init_one_hca] error obtaining device context for mthca0 errno says No such file or directory -------------------------------------------------------------------------- WARNING: There were errors during IB HCA initialization on host 'burn-3'. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There is at least on IB HCA found on host 'burn-3', but there is no active ports detected. This is most certainly not what you wanted. Check your cables and SM configuration. -------------------------------------------------------------------------- -------------------------------------------------------------------------- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- Any help would be appreciated! Thanks. Brian