Share the work between two device parrallel

Hi everyone,

(I’m french sorry for my english …)

I’m work in MRI labo width two tesla c2050 and I’d like to make something like this :

void GPU_test(int *array, int size)


  int dCount, device;


for (device=0 ; device < dCount ; device++)


      int *d_array;


cudaMalloc((void **)&d_array, sizeof(int) * size/2);

      cudaMemcpy(d_array, &array[device * size/2], sizeof(int) * size/2, cudaMemcpyHostToDevice);

Kernel_test <<< 1, size/2 >>> (d_array, size/2);  // kernel just inc(+1) array values

cudaMemcpy(&array[device * size/2], d_array, sizeof(int) * size/2, cudaMemcpyDeviceToHost);




I’d like sharing array between 2 devices but it doesn’t work …

How to call the same kernel on two devices (without waiting the end of the first before calling the second) and waiting the end of the two


You’ll just have to switch to a second context of the second GPU and then make the launch. Check the CUDA reference for how to do that.

Launch 2 threads (one per GPU) and share the CPU pointer globaly so that each thread has access to it.

It’s work ! Thank you

I used the solution of brano : 1 thread / device