How expensive is D3D interop?

Hi everyone,

I’m using CUDA 2.0 beta on WinXP, G8500GT. I’m using the new cudaMapResources function to map a 640x480x4 render-target texture surface, in order to perform some processing on the image using CUDA. Timing just the calls to cudaMapResources and cudaUnmapResources I get a whole millisecond, which seems quite a lot.
I verified that the timing doesn’t include the completion of previous rendering commands, only the mapping.
What exactly happens when mapping a D3D resource? Is there copying involved? Should this be faster with better hardware?
Since D3D can’t access a mapped resource, I need to map and unmap every frame in real time, and I kind of expected this mapping to be “for free”…