Moving memory cudaMemmove()

Hello there,

I’m in the need to move memory on the device, but there is no function like cudaMemmove() available to do so. As far as I’ve read the docs, cudaMempy() isn’t save, regardless if moving upward or downward.

How do you do something like this?



Generally: you don’t. Just copy it to a temporary buffer and back again.

But if you really have to, you can implement it by implementing a rotate. A rotate can (unless I miss something) be implemented in CUDA by mutiple “local” rotates (for each of which the amount of elements that are rotated fit into the shared memory), requiring multiple kernel calls.

You can of course implement it with lots of cudaMemcpy, though that is almost certainly pointless (you need vector length / shift distance mempy calls).

Okay, copying into a temporary buffer is what I do currently, thanks for your hint.

The other option is to write your own global function to do the overlapping move operation safely, which shouldn’t be too hard.

Reimar is correct. If you use a temp space (and can afford the space) you typically get better performance as it can coalesce reads and writes to a given bank better.

And if you use it like a double buffer you can save one copy. May be slightly faster.

Is there any way I can memmove the entire array atomically so that other threads wait till the transfer is done?