In my MPI code I assign MPI processes GPU’s with:
#ifdef _OPENACC
call acc_init(acc_device_default)
dtype=acc_get_device_type()
numdevices = acc_get_num_devices(acc_device_nvidia)
print *, "device type=", dtype
print *, "mpi rank = ", MyId
print *, "# devices on my node = ",numdevices
mydevice = mod(MyId,numdevices)
call acc_set_device_num(mydevice,acc_device_nvidia)
#endif
At run time, my print messages show that I am not detecting any GPUs (I run 8 nodes, 1 MPI process per node):
device type= 0
mpi rank = 0
# devices on my node = 0
device type= 0
mpi rank = 4
# devices on my node = 0
device type= 0
mpi rank = 5
# devices on my node = 0
etc.
However, pgaccel info has no trouble finding the gpu:
-bash-3.2$ pgaccelinfo
CUDA Driver Version: 5050
NVRM version: NVIDIA UNIX x86_64 Kernel Module 319.23 Thu May 16 19:36:02 PDT 2013
Device Number: 0
Device Name: Tesla C1060
Device Revision Number: 1.3
Global Memory Size: 4294770688
Number of Multiprocessors: 30
Number of Cores: 240
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 16384
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 2147483647B
Texture Alignment: 256B
Clock Rate: 1296 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: No
ECC Enabled: No
Memory Clock Rate: 800 MHz
Memory Bus Width: 512 bits
Max Threads Per SMP: 1024
Async Engines: 1
Unified Addressing: No
Initialization time: 657481 microseconds
Current free memory: 4237299456
Upload time (4MB): 1153 microseconds ( 726 ms pinned)
Download time: 1053 microseconds ( 772 ms pinned)
Upload bandwidth: 3637 MB/sec (5777 MB/sec pinned)
Download bandwidth: 3983 MB/sec (5433 MB/sec pinned)
Removing the assignment code alltogether since there’s only 1 GPU per node anyways still shows that I am having trouble detecting the GPU:
call to cuInit returned error 100: No device
Any common causes of this sort of behavior? I rememberd to have use openacc in my code this time.