I’m doing some benchmarks and I’m noticing that executing 100 cufftExecC2C with a size of 2^20 I’m able to obtain
aroun 0.02 ms for each transformation, as soon I do 1000 cufftExecC2C then the time per tranformation rises to 3 ms,
anyone knows why ?
cufftExecC2C call is sync or async ? In the documentation I didn’t see any mention about it.