I’m working on a CUDA application that requires me to compare a set of model renderings (1000+) to a video frame to calculate the rendering that best fits the frame. I have an array of directx surfaces rendered and now want to map that array to CUDA to do the comparisons. The program works fine, except my framerate drops from 120fps to about 1fps! Seeing how these surfaces are already on the video card, it doesn’t seem like the cudaD3D9MapResources call should have SO much overhead… Does anyone know if either:
a) This will be fixed in a future release?
b) There is some way to access this array of surfaces from CUDA without calling cudaD3D9MapResources?
c) There is some other type of resource (other than a surface) that will map to CUDA faster?
d) This call would be any faster using DirectX10?
Any advice would be greatly appreciated.