MultiGPU To avoid Launch timed out problem

Hello,

I have an application in CUDA that process images, and it works perfect for some images.

If the image is bigger, I get the “launch timed out” problem.

I have a 9800GX2 card, and the problem should be fixed if I use the GPU that is not configured as main display.

I’d like to know how can I tell the compiler or application to use the second GPU instead of the first one (display) to run the application.

I run the simpleMultiGPU app from the SDK, and it works well without errors, but I don’t understand how it works. :wacko:

Help me please!

Thank you very much.

you can use runtime library to set which device you want , first use deviceQuery in SDK to check your configuration,

your card 9800GX2 has two GPUs, labeled as GPU0 and GPU1, suppose you want to use GPU1, then

[codebox] cudaDeviceProp deviceProp;

int device = 1 ; // choose device 1

cudaGetDeviceProperties(&deviceProp, device);

cudaSetDevice( device );



printf("use device %d, name = %s\n", device, deviceProp.name );[/codebox]

in simpleMultiGPU example, it set each host thread to different GPU via

[codebox] for(i = 0; i < GPU_N; i++){

    plan[i].device = i;  % i-th GPU 

    ^^^^^^^^^^^^^^^^^^^

    plan[i].h_Data = h_Data + gpuBase;

    plan[i].h_Sum = h_SumGPU + i;

    gpuBase += plan[i].dataN;

}[/codebox]

and bind current host thread to i-th GPU by “cudaSetDevice” in function “solverThread”

[codebox]static CUT_THREADPROC solverThread(TGPUplan *plan){

const int  BLOCK_N = 32;

const int THREAD_N = 256;

const int  ACCUM_N = BLOCK_N * THREAD_N;

float *d_Data,*d_Sum;

float *h_Sum;

float sum;

int i;

//Set device

cutilSafeCall( cudaSetDevice(plan->device) );[/codebox]