problem of cudaGetDevice for MultiGPUs

Hi, all

I have two GPUs and now I am using openMP to control the twp GPUs. my setting is that: launch two CPU threads and let each one control one GPU.

part of my code is the same as the openMP project of SDK:

#pragma omp parallel 

	{	

		unsigned int thread_id = omp_get_thread_num();

	printf("thread id is %d \n", thread_id);

	unsigned int num_threads = omp_get_num_threads();

		

	int gpu_id=-1;

		cudaSetDevice(thread_id);

		cudaGetDevice(&gpu_id);

		cudaDeviceProp dprop;

		cudaGetDeviceProperties(&dprop, thread_id);

		printf("  %s\n", dprop.name);	

 	printf("CPU thread %d (of %d) uses CUDA device %d\n", thread_id, num_threads, gpu_id);

	   // codes

	}

the weird thing is that, every time, the returned gpu_id of each thread is wrong, always 1 (which is supposed to be 0 for one thread, and 1 for the other), but the dprop.name is correct ( tesla for one thread, fx quadro for the other thread). why?

Also I dont know why the gpu_id needs to be initialized. If I dont initialize it, i get segementation fault at the runtime.

thanks!

You might want to include some openMP variable scope directives to ensure gpu_id is thread private

thanks for pointing out that. But, I think defining the variables insider the parallel region already means the variables are thread private. For example, the thread_id in each thread has correct value.