Detecting which GPUs are attached to a display

Hi all,

Finally got pretty hardcore into CUDA over the last couple of weeks. I’ve got a working kernel linked into another larger program.

Problem I’m having is the other program is already setup to be multi-threaded and has another Ada task (implemented as a light weight thread) that is using OpenGL to do visualizaiton.

Due to the way the OpenGL code is implemented it isnt’ playing nice with CUDA, which is fine. I’ve got a 9600 GT attached to X that is running that part and GTX 275 unattached running CUDA. I know how to check which of these two is which CUDA device number and set the correct device to do CUDA processing.

What I don’t know how to do is determine at runtime which card is or is not attached to a display without a priori knowledge. I know the information is available somewhere, just not sure where from doing a few google searches and looking at a couple of SDK examples. The NVidia X Server Settings program for instance sees the GTX 275 and doesn’t list it as attached to anything.

While I can get around this for my development machine, I can’t release code built around such assumptions. I can release code requiring two video cards actually, as this is pretty specialized software. Seems like it might be a common problem. Any ideas would be appreciated.

Thanks,
Jon

Neither CUDA API directly expose whether a GPU has an attached display. The driver API does let you see whether there is an active watchdog timer (CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT) which is probably useful enough for what you want.

A better, simpler and more flexible way is to use nvidia-smi to designate devices as compute permissive or compute prohibited. On my development box I have this:

avid@cuda:~/NVIDIA_GPU_Computing_SDK/C$ nvidia-smi -g 0 -s

Compute-mode rules for GPU=0x0: 0x2

avid@cuda:~/NVIDIA_GPU_Computing_SDK/C$ nvidia-smi -g 1 -s

Compute-mode rules for GPU=0x1: 0x1

(alternatively in deviceQuery):

avid@cuda:~/NVIDIA_GPU_Computing_SDK/C$ bin/linux/release/deviceQuery

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: "GeForce GTX 275"

  CUDA Driver Version:						   2.30

  CUDA Runtime Version:						  2.30

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 3

  Total amount of global memory:				 938803200 bytes

  Number of multiprocessors:					 30

  Number of cores:							   240

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 16384

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.46 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   Yes

  Compute mode:								  Prohibited (no host thread can use this device)

Device 1: "GeForce GTX 275"

  CUDA Driver Version:						   2.30

  CUDA Runtime Version:						  2.30

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 3

  Total amount of global memory:				 939261952 bytes

  Number of multiprocessors:					 30

  Number of cores:							   240

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 16384

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.46 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 No

  Integrated:									No

  Support host page-locked memory mapping:	   Yes

  Compute mode:								  Exclusive (only one host thread at a time can use this device)

Test PASSED

Which makes the display GPU compute prohibited and won’t allow a CUDA kernel to run, while marking the second as compute exclusive, allowing one host thread to use the device at a time. If you go this route, apart from greatly simplifying the code, it give a lot of flexibility to end users over how different hardware setups can be accommodated by your app.

Thanks! That sounds like it will do exactly what I’m looking for.

-Jon