memcpy in kernel? is it possible


I know I can make device to device memcpy from host program with cudaMemCpy.
But can i make device to device bulk copy inside a kernel. Is there any supported function(like memcpy) which we can use inside a kernel. Or looping is the only way to go?

P.S. when i use memcpy inside a kernel nvcc returns ACCESS VIOLATION with ptaxs died message.

Thank you.

Looping is the only way to do memory copies on the GPU.


I don’t think, that there is a function to copy memory, like you want to do it.
Why do you want to make a loop in your kernel?

Just try to configure your kernel, that there are as many threads as elements and copy one element per thread.
Perhaps you can use such a copy-kernel, when another kernel has finished.

Hi QD4,

That was exactly what I did. I think if u are stuck in a position like I am that shows that you are in the wrong direction. So I reorganized my kernel and make it “real parallel” and like u said we can call as many kernels as we like.

Thanks again.