Hello,
I have changed the cudaProcess kernel code just to see the overall fps on updating one PBO with input from another. It looks like this:
__global__ void cudaProcess(int* g_data, int* g_odata, int imgw, int imgh, int tilew, int r, float threshold, float highlight) {
int tx = threadIdx.x;
int ty = threadIdx.y;
int bw = blockDim.x;
int bh = blockDim.y;
int x = blockIdx.x*bw + tx;
int y = blockIdx.y*bh + ty;
g_odata[y*imgw+x] = g_data[y*imgw+x];
}
With CUDA 1.0 and 162.01 drivers, the sample runs at 400fps on a 8800 GTX, without gpu interoperability, of course.
With CUDA 1.1 and 169.09 drivers, the sample runs at 70 fps on the same 8800 GTX, except that it is only used for CUDA computations (a 7900 GTX is used for display).
Also, even with everything (display and computation) running on the same 8800 GTX, the fps is about 75!!!
Why is this PBO-to-PBO copy with the new drivers causing such performance degradation?