Memory usage during CUFFT & 4D FFT

Is it possible to measure the memory usage during a CUFFT call? I can of course measure the amount of free memory before and after the call, but what about during the call?

With CUFFT it is possible to use out-of-place or in-place FFT. For in-place FFT, is there any temp variable in the CUFFT function that uses the same amount of memory as the variable to transform?

I currently use CUFFT for a 4D FFT. I perform the 4D FFT as two batches of 2D FFT’s. The problem is that I need a temp variable to flip the orientation of the data from (x,y,z,t) to (z,t,x,y) between the two 2D FFT’s. With an in-place 4D FFT (if Nvidia extends CUFFT to directly support 4D FFT’s), I could process larger 4D datasets if no temp variable is used inside the CUFFT function.