I am trying to write a program that splits up a task among several GPUs. I started with the multiGPU SDK sample and adapted it. On Windows it seems to be working fine. I only have 2 GPUs in a Windows box, so I ported it over to Linux and an S870. Now it doesn’t seem to work.
I am attaching a sample code that runs in parallel without any CUDA stuff (use the #define TEST_PARALLEL). When you start a CUDA context (using cudaFree(0)) the CPU threads seem to go serial.
Output from a case without CUDA. You can see that all the threads start at the same time and finish at the same time.
Output from a case with CUDA. All the threads start at the same time but they seem to run in serial. Also note that each thread only initializes the context and that takes >180 msec.
The attachment is a .cu file. I had to change the extension to upload it. I compile with