compute-exclusive mode and cudaGetDevice(...) Always claims to be running on device 0.

theMarix · July 20, 2009, 1:28pm

I have a setup with one S1070 and one GTX 280, which make 5 devices. As I am not the only person using the system I decided to switch all devices to compute-exclusive mode. I leave nvidia-smi running in the background to ensure the setting is kep. All tools tell me that the cards are in compute exclusive mode. In addition, if I now run more then 5 applications in parallel (or set one card to prohibited) the additional applications will fail as expected. That far, that good.

Now my problem is the result of cudaGetDevice( int* devid ). While if I explicitly choose a device this works as expected, in the case that the driver assigns a GPU it will always report to be running on device 0. Of course this cannot be the case, as otherwise the applications shouldn’t fail if I overcommit. No I am wondering, is cudaGetDevice(…) broken in this case or am I doing something wrong.

Below is an outline of my code:

51 Â»Â·// CUDA initialization

 52 Â»Â·int devId = -1;

 53 Â»Â·if( vm.count( "device" ) )

 54 Â»Â·{

 55 Â»Â·Â»Â·// select user specified device

 56 Â»Â·Â»Â·devId = vm[ "device" ].as< unsigned int >();

 57

 58 Â»Â·Â»Â·cudaErr = cudaSetDevice( devId );

 59 Â»Â·Â»Â·if( cudaErr )

 60 Â»Â·Â»Â·Â»Â·throw CudaError( "Failed to initialize device", cudaErr );

 61 Â»Â·}

 62 Â»Â·else

 63 Â»Â·{

 64 Â»Â·Â»Â·// force CUDA to select a device (just so we can be shure the queried

 65 Â»Â·Â»Â·// device is the one we run on)

 66 Â»Â·Â»Â·cudaErr = cudaSetValidDevices( 0, 0 );

 67 Â»Â·Â»Â·if( cudaErr )

 68 Â»Â·Â»Â·Â»Â·throw CudaError( "Failed to initialize device", cudaErr );

 69 Â»Â·}

 70

 71 Â»Â·// check which device we got

 72 Â»Â·cudaErr = cudaGetDevice( &devId );

 73 Â»Â·if( cudaErr )

 74 Â»Â·Â»Â·throw CudaError( "Failed retrive used device", cudaErr );

 75

 76 Â»Â·cudaDeviceProp props;

 77 Â»Â·cudaErr = cudaGetDeviceProperties( &props, devId );

 78 Â»Â·if( cudaErr )

 79 Â»Â·Â»Â·throw CudaError( "Failed to get device properties", cudaErr );

 80

 81 Â»Â·cout << "Running on device " << devId << ": " << props.name << endl;

 82

 83 // do yourr work on the GPU

For testing I use the following command line:

for i in 1 2 3 4 5; do

  ./gpu_run &

done;

theMarix · July 22, 2009, 9:44am

I just upgraded to CUDA 2.3 and the problem persists.

seckaka · July 27, 2009, 4:55am

I think this may help you

https://www.wiki.ed.ac.uk/display/ecdfwiki/…-Exclusive+Mode

theMarix · July 27, 2009, 7:49am

Thanks for the pointer. However your example shows the same problem. While the run, when no device is selected, is obviously running on device 1 it reports to be running on device 0. Well, at least it seems like I am doing everything correct, thanks.

Sarnath · July 27, 2009, 8:58am

Possible that the device number reported by cudaGetDevice() is a logical number in this case and actually represents physical device 1…

Me just guessing here.

Try printing device name, properties etc…

Also, check for time for completion – That will give u a clue.

MisterAnderson42 · July 27, 2009, 12:31pm

There is no concept of a “logical device” vs “physical device” in CUDA. cudaGetDevice will identify the actual device number in use as consistently listed by all other CUDA commands.

theMarix, I’m not sure what is going on in your case, but I certainly do not see the behavior that you do.

Code (devtest.cu)

#include <iostream>

using namespace std;

int main()

	{

	int *d_ptr;

	cudaError_t error = cudaMalloc((void**)&d_ptr, sizeof(int));

	if (error != cudaSuccess)

		{

		cout << cudaGetErrorString(error) << endl;

		return 1;

		}

	int dev;

	cudaGetDevice(&dev);

	cout << "Running on device " << dev << endl;

	int left = sleep(10);

	while (left > 0)

		left = sleep(left);

	cudaFree(d_ptr);

	return 0;

	}

Execution on my 9800 GX2 system

$ for i in 1 2 3; do

> ./devtest & done;

[1] 2195

[2] 2196

[3] 2197

user@host ~/cuda_test $ Running on device 0

Running on device 1

no CUDA-capable device is available

MisterAnderson42 · July 27, 2009, 12:37pm

Now I see what you are doing. You are calling cudaGetDevice before a context is initialized. If I put the cudaGetDevice before the cudaMalloc in my code, I get the same behavior as you.

It is unfortunate that the documentation does not mention this behavior for cudaGetDevice().

Topic		Replies	Views
Device Enumeration and cudaSetDevice SDK Examples Failing to Run on Device 0, but run fine on Device CUDA Programming and Performance	5	30646	August 25, 2011
Quick Question on cudaSetDevice()? It does not work in my case. CUDA Programming and Performance	5	11696	November 20, 2009
Correct on Device 0, Incorrect on others CUDA Programming and Performance	1	1274	July 21, 2009
Abnormal Device ID CUDA Programming and Performance cuda	2	507	April 13, 2022
Choosing CUDA device programmatically CUDA Programming and Performance	3	8903	August 13, 2009
A question about using cudaSetDevice CUDA Programming and Performance	4	9312	November 2, 2011
cudaSetDevice question CUDA Programming and Performance	12	33260	February 3, 2009
cudaGetDeviceCount returned 100 -> no CUDA-capable device is detected CUDA Setup and Installation	0	1330	May 12, 2021
cudaGetDevice does not work on device CUDA Programming and Performance	6	3103	March 18, 2018
cudaGetDeviceCount() returns 38 as error code on Win10 while 2 GPUs are present and working CUDA Setup and Installation	3	1181	June 30, 2017

compute-exclusive mode and cudaGetDevice(...) Always claims to be running on device 0.

Related topics