Performance issues with CUDA 1.1 & 169.09 drivers Performance degradation on OGL interop.


I have changed the cudaProcess kernel code just to see the overall fps on updating one PBO with input from another. It looks like this:

__global__ void cudaProcess(int* g_data, int* g_odata, int imgw, int imgh, int tilew, int r, float threshold, float highlight) {

    int tx = threadIdx.x;

    int ty = threadIdx.y;

    int bw = blockDim.x;

    int bh = blockDim.y;

    int x = blockIdx.x*bw + tx;

    int y = blockIdx.y*bh + ty;

   g_odata[y*imgw+x] = g_data[y*imgw+x];


With CUDA 1.0 and 162.01 drivers, the sample runs at 400fps on a 8800 GTX, without gpu interoperability, of course.

With CUDA 1.1 and 169.09 drivers, the sample runs at 70 fps on the same 8800 GTX, except that it is only used for CUDA computations (a 7900 GTX is used for display).

Also, even with everything (display and computation) running on the same 8800 GTX, the fps is about 75!!!

Why is this PBO-to-PBO copy with the new drivers causing such performance degradation?

Read the release notes, they explain why:
“o On systems with multiple GPUs installed or systems with multiple
monitors connected to a single GPU, OpenGL interoperability
always copies shared buffers through host memory.”

Ok, two of three points were ‘solved’, but what about

With CUDA 1.0 and 162.01 drivers, I got about 400fps on a single copy, and now only 75!

This stills a problem for me…

By “everything (display and computation) running on the same 8800 GTX” do you mean a single machine with a single monitor and a single video card? Because if you still have 2 cards or 2 monitors attached to a single card, then the release note still applies.

OK, this completely “solved” my problem. Now I guess I’m gonna wait for a 1.2 release.