Parallel computation between Quadro and Tesla not good


I have a Tesla and Quadro running in parallel using SDL_Threads on Linux (CUDA 4.0, Tesla C2060, Quadro FX5800).

The Quadro does nothing but generate a 3d texture that’s 480x384x384 using glTexImage3D while the Tesla is doing a bunch of math.

Running them in parallel is no faster than running them one at a time… that is, the Quadro’s texture generation slows down the stuff running on the Tesla.

I don’t see any great reason for it… the memory transfers don’t seem to be bottlenecking the process.

Any hints? I’ve tried OpenGL+CUDA integration and I’d say that it runs even slower.