I’m using cufft to perform C2C transformation of purely real numbers into the frequency domain and then back out to the spatial domain. I’m using `cufftMakePlan2d`

which has arguments for `nx`

and `ny`

which I am selecting to be powers of two (i.e. `2^n`

) and then `cufftExecC2C()`

to perform the transformation. If I scale the result of the forward transformation into the frequency domain by `sqrt( nx * ny )`

and then use `cufftExecC2C()`

to bring it back out (inverse) into the spatial domain again and then scale a second time by `sqrt( nx * ny )`

the code works fine as long as `nx == ny`

, however, when they are not equal, the final answer I get, depending upon their ratio, is sometimes mis-scaled. For example, if `nx = 4`

and `ny = 2`

then my spatial answer at the end is too large by a factor of `2`

. Strangely, if `nx = 8`

and `ny = 2`

then my spatial answer is scaled correctly, but then if `nx = 16`

and `ny = 2`

then my spatial answer is scaled up by approximately `1.28`

.

All of this strangeness goes away if instead of scaling by `sqrt( nx * ny )`

on the frequency result and then again on the final spatial result, I defer scaling until the end when I come back out into the spatial result and instead scale by `nx * ny`

. So for now I appear to have a workaround, but I’d like to understand what is going on here.

Can anybody help explain this behavior in Cuda v9 compiled code on V100 GPU on Linux x64?