I have allocated some memory in unified memory i.e. cudaMallocManaged, now I have to copy some data into it.
A very basic question comes to my mind is that should I do a normal cpp memcpy or cudaMemcpy with my understanding both should work, but what is the difference between them???
I mean when I do a cudaMemcpy and we specify 4th argument as CudaMemcpyHostToDevice what does this do???
ROOT ISSUE: both memcpy works fine for my code in a single pthread…
but gives a bus error when two different kernels are called from different threads when I do a normal memcpy
whereas works fine in above condition when I do CudaMemcpy…!!
Need a very urgent suggestions/solution to this issue…!!!
Any help is appreciated…!!!
Thanks in advance…