I’ve written a simple 1D convolution method, with a signature like this:
bool convolve(const float* const input,float* const output,size_t n)
It takes a host-memory row of length “n” of pixels as “input”, and writes the filtered output to “output”. Internally, the following steps are performed:
Copy the input from host to device memory.
Pad the input data in device memory to a power of two.
Apply a point-wise complex multiplication in the frequency domain.
Scale the result to account for the un-normalized FFT.
Copy the result back from device to host memory.
The code works just fine unless I provide pointers to a mapped D3D texture surface as input / output. Of course, I changed all the “host to device” / “device to host” cudaMemcpy operations to the “device to device” kind. I also replaced my convolve() call with a simple copy to rule out any D3D resource registering / mapping bugs. The problem seems to be the cufftExecR2C() call which seems to return garbage if a D3D resource is mapped. Before I go into any more details by posting more code or writing a minimal example program, I’d like to know if there are any known issues here or if I’m trying to do something really stupid.
The reasoning behind using D3D textures instead of host memory as input / output is that the data I need to convolve is on the GPU already anyways (namely, as D3D textures), and I thought I could gain some speed by not copying the texture from the GPU to the host, convolve it, and then copy the result back to the texture again.
I’ve tried both CUDA 2.0 and 2.1 beta under Windows XP 32 bit. Any help is appreciated, thanks!