Hi,
I’m testing MPI and CUDA performance on RHEL7 clusters each of which has four P100 cards. One weird issue is that GPU cards are not detected only when I submit an MPI job to multiple nodes. For example,
bsub -q devgpu.q -m “devicegpu01” -n 2 mpirun my_program
works well as expected but an equivalent command
bsub -q devgpu.q -m “devicegpu01 devicegpu02” -R “span[ptile=1]” -n 2 mpirun my_program
doesn’t work giving cudaErrorNoDevice messages.
Since my other programs which do not use any CUDA are working well for both the above commands, I guess there’s some issue with my CUDA setting. Could you please give any comments?
Thanks.