Cuda 2.2 and CUFFT

Hi all,
so far my application using Cuda 2.1 follows this path:

a) copy host->device
b) exec CUFFT
c) copy device->host
d) goto a)

as far as I understood with Cuda 2.2 I’ll be able to do:

alloc memory on host, map it on device and then:

a) exec CUFFT
b) goto a)

Is that correct? Is the overhead of copies between host and device and viceversa
completely gone then?