Setting Compute Mode in Windows

Hi there,

I’ve been perusing the forums reading about the watchdog timer in Windows that can cause problems if a kernel takes too long to execute and trying to find ways around it.

I’ve seen several methods such as:

  1. Timeout Detection and Recovery

  2. Disabling the Watchdog Timer While Testing Display Drivers

but I am not sure if this is the best course of action.

Currently, I am using Windows Vista 64-bit and I have three GeForce GTX 260 CUDA enabled graphics cards to do my processing. I’ve read that you can setup the cards to be dedicated CUDA processing cards by setting the computeMode to exclusive using the nvidia-smi tool (which I think is linux only?) Is this possible to do Windows?

By looking at the properties using the following function:

cudaDeviceProp deviceProp;

cudaGetDeviceProperties( &deviceProp, device );

	

printf( "%d - name:					%s\n" ,device ,deviceProp.name );

printf( "%d - totalGlobalMem:		  %d bytes ( %.2f Gbytes)\n" ,device ,deviceProp.totalGlobalMem , deviceProp.totalGlobalMem / (float)( 1024 * 1024 * 1024)  );

printf( "%d - sharedMemPerBlock:	   %d bytes ( %.2f Kbytes)\n" ,device ,deviceProp.sharedMemPerBlock ,deviceProp.sharedMemPerBlock / (float)1024  );

printf( "%d - regsPerBlock:			%d\n" ,device ,deviceProp.regsPerBlock );

printf( "%d - warpSize:				%d\n" ,device ,deviceProp.warpSize );

printf( "%d - memPitch:				%d\n" ,device ,deviceProp.memPitch );

printf( "%d - maxThreadsPerBlock:	  %d\n" ,device ,deviceProp.maxThreadsPerBlock );

printf( "%d - maxThreadsDim[0]:		%d\n" ,device ,deviceProp.maxThreadsDim[0] );

printf( "%d - maxThreadsDim[1]:		%d\n" ,device ,deviceProp.maxThreadsDim[1] );

printf( "%d - maxThreadsDim[2]:		%d\n" ,device ,deviceProp.maxThreadsDim[2] );

printf( "%d - maxGridSize[0]:		  %d\n" ,device ,deviceProp.maxGridSize[0] );

printf( "%d - maxGridSize[1]:		  %d\n" ,device ,deviceProp.maxGridSize[1] );

printf( "%d - maxGridSize[2]:		  %d\n" ,device ,deviceProp.maxGridSize[2] );

printf( "%d - totalConstMem:		   %d bytes ( %.2f Kbytes)\n" ,device ,deviceProp.totalConstMem ,deviceProp.totalConstMem / (float) 1024 );

printf( "%d - compute capability:	  %d.%d\n" ,device ,deviceProp.major ,deviceProp.minor);

printf( "%d - clockRate				%d kilohertz\n" ,device ,deviceProp.clockRate );

printf( "%d - textureAlignment		 %d\n" ,device ,deviceProp.textureAlignment );

printf( "%d - deviceOverlap			%d\n" ,device ,deviceProp.deviceOverlap );

printf( "%d - multiProcessorCount	  %d\n" ,device ,deviceProp.multiProcessorCount );

printf( "%d - kernelExecTimeoutEnabled %d\n" ,device ,deviceProp.kernelExecTimeoutEnabled );

printf( "%d - integrated			   %d\n" ,device ,deviceProp.integrated );

printf( "%d - canMapHostMemory		 %d\n" ,device ,deviceProp.canMapHostMemory );

printf( "%d - computeMode			  %d\n\n" ,device ,deviceProp.computeMode );

And get this as the result:

Number of CUDA devices: 3

0 - name:					GeForce GTX 260

0 - totalGlobalMem:		  939524096 bytes ( 0.88 Gbytes)

0 - sharedMemPerBlock:	   16384 bytes ( 16.00 Kbytes)

0 - regsPerBlock:			16384

0 - warpSize:				32

0 - memPitch:				262144

0 - maxThreadsPerBlock:	  512

0 - maxThreadsDim[0]:		512

0 - maxThreadsDim[1]:		512

0 - maxThreadsDim[2]:		64

0 - maxGridSize[0]:		  65535

0 - maxGridSize[1]:		  65535

0 - maxGridSize[2]:		  1

0 - totalConstMem:		   65536 bytes ( 64.00 Kbytes)

0 - compute capability:	  1.3

0 - clockRate				1242000 kilohertz

0 - textureAlignment		 256

0 - deviceOverlap			1

0 - multiProcessorCount	  27

0 - kernelExecTimeoutEnabled 0

0 - integrated			   0

0 - canMapHostMemory		 1

0 - computeMode			  0

1 - name:					GeForce GTX 260

1 - totalGlobalMem:		  939524096 bytes ( 0.88 Gbytes)

1 - sharedMemPerBlock:	   16384 bytes ( 16.00 Kbytes)

1 - regsPerBlock:			16384

1 - warpSize:				32

1 - memPitch:				262144

1 - maxThreadsPerBlock:	  512

1 - maxThreadsDim[0]:		512

1 - maxThreadsDim[1]:		512

1 - maxThreadsDim[2]:		64

1 - maxGridSize[0]:		  65535

1 - maxGridSize[1]:		  65535

1 - maxGridSize[2]:		  1

1 - totalConstMem:		   65536 bytes ( 64.00 Kbytes)

1 - compute capability:	  1.3

1 - clockRate				1242000 kilohertz

1 - textureAlignment		 256

1 - deviceOverlap			1

1 - multiProcessorCount	  27

1 - kernelExecTimeoutEnabled 0

1 - integrated			   0

1 - canMapHostMemory		 1

1 - computeMode			  0

2 - name:					GeForce GTX 260

2 - totalGlobalMem:		  939524096 bytes ( 0.88 Gbytes)

2 - sharedMemPerBlock:	   16384 bytes ( 16.00 Kbytes)

2 - regsPerBlock:			16384

2 - warpSize:				32

2 - memPitch:				262144

2 - maxThreadsPerBlock:	  512

2 - maxThreadsDim[0]:		512

2 - maxThreadsDim[1]:		512

2 - maxThreadsDim[2]:		64

2 - maxGridSize[0]:		  65535

2 - maxGridSize[1]:		  65535

2 - maxGridSize[2]:		  1

2 - totalConstMem:		   65536 bytes ( 64.00 Kbytes)

2 - compute capability:	  1.3

2 - clockRate				1242000 kilohertz

2 - textureAlignment		 256

2 - deviceOverlap			1

2 - multiProcessorCount	  27

2 - kernelExecTimeoutEnabled 0

2 - integrated			   0

2 - canMapHostMemory		 1

2 - computeMode			  0

You can see that all of my compute modes are set to 0 (normal mode described here).

One of my cards is currently driving two of my monitors, but the other two have nothing connected to them. I’ve tried running my kernels on only the two cards with nothing attached to them but I’m still hitting the watchdog apparently because the screen goes blank and I get the familiar “Display drive stopped responding and has recovered”.

Thanks in advance for any advice / insight that can be provided.