Hi all,
so far my application using Cuda 2.1 follows this path:
a) copy host->device
b) exec CUFFT
c) copy device->host
d) goto a)
as far as I understood with Cuda 2.2 I’ll be able to do:
alloc memory on host, map it on device and then:
a) exec CUFFT
b) goto a)
Is that correct? Is the overhead of copies between host and device and viceversa
completely gone then?