Quick Question on cudaSetDevice()? It does not work in my case.

Hi guys,

I have three GPUs detected in my system, well… essentially two: a Tesla C1060 and a GTX 295. By running “./deviceQuery”, it reports all the stat’s for the three GPUs as in the following order:

Device 0: “GeForce GTX 295”,

Device 1: “Tesla C1060”,

Device 2: “GeForce GTX 295”.

I wrote the following codes to select the device manually, however, it just doesn’t take any effect, it always chooses Device 0 no matter what I put as the input for cudaSetDevice(). Do you guys know what happened here? I guess I must screw up the codes somewhere. :(

Thanks a lot for the help!


   // Num. of Device Count


printf("\n\nDevice Numbers: %d\n\n", *device_Count);

// Device Selection, Device 1: Tesla C1060


// Current Device Detection


cudaGetDeviceProperties(&deviceProp, device);

printf("Using device %d: %s \n", device, deviceProp.name);[/codebox]

Output of the above codes:

[codebox]Device Numbers: 3

Using device 0: GeForce GTX 295 [/codebox]

So just let you know that, my system is running on Ubuntu 9.04, CUDA 2.3, Driver Version 190.18.

is device an int?

Two diagnostics.

First, you’re ignoring all error returns. Maybe you already set the device and cudaSetDevice is returning an error or something. Without posting a complete program there’s no way for us to tell.
Doing something like a cuda memory allocation would initialize a new context with the default device (0) even if you didn’t explicitly use cudaSetDevice first, then later cudaSetDevice calls would fail with a cudaSetOnActiveDevice return, which you’re currently ignoring.

Put an error check on every single cuda call.

Second, print out the device number you get from cudaGetDevice().
It should be 1. I suspect it’s 0, which would explain why it reports the GTX295. Now why it’d be zero is a good question (perhaps explained by the previous point) but I suspect that the DeviceProperties is returning just the info you asked for if you are passing in 0.

Hi SPWorley,

Thanks for the quick reply! So hinted by your points, I looked over my program again. You are right! cudaSetDevice() does return with an error! So here is the more detailed codes attached below, let me describe it a little bit. What I am doing here is to compare the conjugate gradient (CG) algorithm using cblas running on host and cublas on device, I have removed some of the codes with little to do with my problem (reading in the data in host for example).



// Program main


int main(void)


   // create and start timer

   float timeCost1;

   cudaEvent_t start, end;



cudaEventRecord(start, 0);

/////////////////////// Part 1: CG using Host //////////////////////

resulty = CG_Blas(A, btemp, xtemp, p, r, temp, row, col, TOL, maxit);	     


cudaEventRecord(end, 0);


cudaEventElapsedTime(&timeCost1, start, end);

timeCost1 = timeCost1;

printf("cblas result: %f , Time cost: %f(ms), GFlop/Sec: %f \n\n", 

       cblas_snrm2(row, resulty, 1), timeCost1, flopCount(row, 21, timeCost1*1e-3));

//////////////////// Part 2: CG using Device //////////////////////

cublasStatus status;

int device_Count;

int device = 1;	 % Tesla C1060	

cudaDeviceProp deviceProp;


printf("\n\nDevice Numbers: %d\n\n", device_Count);

// Device Selection

status = cudaSetDevice(device);

    if (status != CUBLAS_STATUS_SUCCESS) {

         printf ("!!!!  Set Device error\n");

         return EXIT_FAILURE;



printf("device %d \n", device);

cudaGetDeviceProperties(&deviceProp, device);

printf("Using device %d: %s \n", device, deviceProp.name);


The output corresponding to the " Part 2: CG using Device" is given as:


Device Numbers: 3

!!! Set Device error[/codebox]

You probably have noticed that, when I count the elapsed time in host, I create a cudaEvent, which might have initialize the context with device (0), I guess! So I move the cudaSetDevice() line to the very beginning of the whole program, then everything works just fine!!

Ok, so here are my two other questions, please bear with me, if it sounds straight forward to you… :)

(1). Is it the right way to count for the consuming time in host by creating cudaEvent?

(2). Does that mean, I can only select the default device ONCE in the whole program? In another word, I cannot change the default devices once it’s been setup?

Thanks a lot for your help!

As far as I know you can’t switch from a context that has already been created on a GPU - and therefor you can’t select another device - without trashing its content, e.g. by calling cudaThreadExit. What you can do is to launch a host thread for each GPU you want to use.

Hi pszilard,

That sounds like an good idea! I will go look into this, thanks for the help!


cudaSetDevice happens ONCE for a thread. either implicitly OR explicitly. Subsequent calls make no difference.