Running 2 GPUs on CUDA Low Speed Memory Copy Operations Problem

Hi Everyone,

I am trying to write an application which uses 2 GPUs. I started with “MultiGPU” sample code and build an application.

My problem is the low speed of memory copy operations. For example when i used “cudaMemcpyToSymbol” function, it approximately takes 40 ms when copying 7205763 bytes of data. The same copying operation takes place 4 ms in our single-GPU application. Here is the GPU thread code:

Additionally, the application outputs correct results in “EmuDebug” mode but fails in “Release” mode. Am I doing something wrong in multi-threading issues?