Memory Swapping

Hello nice people,

I was wondering what would be the most efficient way to perform memory swapping for two arrays stored on the device.

For example, I have two large vectors “p” and “pz” and I wish to swap their contents, such that p=pz and pz=p.

Of course this could be accomplished with an intermediate temporary vector using cudaMemcpy, however being at the edge of my device’s memory capacity this would require copying stuff back to the host which is detrimental to the performance of my algorithm.

Is there a command to explicitly perform memory swapping on the device?

Kind regards,

No direct answer here, but is there any good reason not to simply swap pointers to the memory locations, rather than the actual memory?

In addition, I don’t know of any explicit swapping method on the device, but a very simple kernel would do it without any memory penalty.

Something like this would do:

template<typename T>

__global__ void swap(T *a, T *b, int N) {

    int idx = blockIdx.x*gridDim.x + threadIdx.x;

    for (int i=idx; i < N; i += blockIdx.x*gridDim.x) {

        T tmp = a[i];

        a[i] = b[i];

        b[i] = tmp;



You can of course refine this depending on the size of the your types, but as for an initial approach, that should do.

Good ideas!
Thanks Gilles_C. Very helpful indeed :).

CUBLAS provides vector swap functions called cublas{S|D|C|Z}swap(), so if your vectors are made up of floats, doubles, or complex numbers you could simply invoke those.