I have a setup with one S1070 and one GTX 280, which make 5 devices. As I am not the only person using the system I decided to switch all devices to compute-exclusive mode. I leave nvidia-smi running in the background to ensure the setting is kep. All tools tell me that the cards are in compute exclusive mode. In addition, if I now run more then 5 applications in parallel (or set one card to prohibited) the additional applications will fail as expected. That far, that good.
Now my problem is the result of cudaGetDevice( int* devid ). While if I explicitly choose a device this works as expected, in the case that the driver assigns a GPU it will always report to be running on device 0. Of course this cannot be the case, as otherwise the applications shouldn’t fail if I overcommit. No I am wondering, is cudaGetDevice(…) broken in this case or am I doing something wrong.
Below is an outline of my code:
51 »·// CUDA initialization
52 »·int devId = -1;
53 »·if( vm.count( "device" ) )
54 »·{
55 »·»·// select user specified device
56 »·»·devId = vm[ "device" ].as< unsigned int >();
57
58 »·»·cudaErr = cudaSetDevice( devId );
59 »·»·if( cudaErr )
60 »·»·»·throw CudaError( "Failed to initialize device", cudaErr );
61 »·}
62 »·else
63 »·{
64 »·»·// force CUDA to select a device (just so we can be shure the queried
65 »·»·// device is the one we run on)
66 »·»·cudaErr = cudaSetValidDevices( 0, 0 );
67 »·»·if( cudaErr )
68 »·»·»·throw CudaError( "Failed to initialize device", cudaErr );
69 »·}
70
71 »·// check which device we got
72 »·cudaErr = cudaGetDevice( &devId );
73 »·if( cudaErr )
74 »·»·throw CudaError( "Failed retrive used device", cudaErr );
75
76 »·cudaDeviceProp props;
77 »·cudaErr = cudaGetDeviceProperties( &props, devId );
78 »·if( cudaErr )
79 »·»·throw CudaError( "Failed to get device properties", cudaErr );
80
81 »·cout << "Running on device " << devId << ": " << props.name << endl;
82
83 // do yourr work on the GPU
For testing I use the following command line:
for i in 1 2 3 4 5; do
./gpu_run &
done;