Without going into too detailed description of my setup, would there be any obvious reason why 2 operating system threads (CPU) running CUDA kernels and performing memcpy’s etc, would be more than 4 times slower than just a single thread?
Both threads are doing the same work on different data (2 camera’s inputs).
I expected to see a linear increase in times.
A single thread completes its processing in 5ms, whereas 2 threads complete their processing in 24ms.
And yes - I’m calling cudaThreadSynchronize to ensure accurate timing.
Thanks for any light anyone may be able to bring to the problem! I’ll post more details if required.
Visual Studio 2005
Geforce 8800 GTX
Intel Core 2 CPU 6700 @ 2.66GHz