inverted memcpy for FFT

I am using the CUFFT routines.
But I do need the results (of the neg freqs) in the reverse order.
So I don’t want a normal memcpy but a memcpy which reverses the order of the samples.
0 1 2 3 4 5 6 7
should come out as
7 6 5 4 3 2 1 0

I can of course write my own kernel to do that by copying the samples one by one in the reversed order, but that would probably not be optimal.

Isn’t there an inverted memcpy or some other trick I can use here?

Thanks in advance,
Martin

Interesting problem. I don’t know of a builtin function, but I have a hunch that a close-to-optimal kernel to do this is not that hard to write. Use lots of threads to read coalesced into shared memory and write back coalesced into the reverse array:

shared[threadIdx.x] = global[blockIdx.x * stride + threadIdx.x]

reverseGlobal[blockIdx.x * stride + stride - threadIdx.x];