I tested simpleMutliGPU, last week and got a puzzled result. I tested using 8, 4, and 1 GPUs. However, with the increase of numbers of GPUs, the execution time also increase. The following are test result:
• 8 GPUs Time: 15166 ms, GPU sum: 16777304
• 4 GPUs Time: 7974 ms, GPU sum: 16777304
• 1 GPU Time: 2236 ms, GPU sum: 26777296.
For the code, I changed nothing besides set MAX_GPU_COUNT = 8, 4, and 1, respectively.
My system is two S2050 (8 GPUs), one node, and Pthread+ CUDA, OS Linux, GCC 4.1.3.
The problem sounds that GPUs run serially instead of parallizisim. Why?
For testing, you can just run it in a separate terminal. As a long term solution, you might want to have it run automatically at startup. The best mechanism for this depends on your Linux distribution and your interest in writing init scripts or just hacking rc.local. :)
Hi Thanks a lot. As you tlod, the simpleMultiGPus is not a good program to test multiGPUs. Do you have a benchmark to test? I have done according your suggestions. The total time is half of previous one. However, the time is still increasing with increase of number of GPUs used:(