Since the performance of CUFFT is humpered by the data transfer to/from the host, is it possible to use mapped memory for that?
Although pinned memory works, I get cufft exec failed when I use mapped memory allocated with
cudaHostAlloc((void**) &hostcache_sp, cacheSize, cudaHostAllocMapped)
and using the device-side pointer via
cudaHostGetDevicePointer((void**) &devicecache_complex, hostcache_sp, 0);
eventhough the mapped memory is enabled via
cudaSetDeviceFlags(cudaDeviceMapHost);
before the first call.
Is that a CUFFT issue, or not? :rolleyes:
Many Thanks