I tested simpleMutliGPU, last week and got a puzzled result. I tested using 8, 4, and 1 GPUs. However, with the increase of numbers of GPUs, the execution time also increase. The following are test result:
â€¢ 8 GPUs Time: 15166 ms, GPU sum: 16777304
â€¢ 4 GPUs Time: 7974 ms, GPU sum: 16777304
â€¢ 1 GPU Time: 2236 ms, GPU sum: 26777296.
For the code, I changed nothing besides set MAX_GPU_COUNT = 8, 4, and 1, respectively.
My system is two S2050 (8 GPUs), one node, and Pthread+ CUDA, OS Linux, GCC 4.1.3.
The problem sounds that GPUs run serially instead of parallizisim. Why?