Hi All,
I have allocated some memory in unified memory i.e. cudaMallocManaged, now I have to copy some data into it.
A very basic question comes to my mind is that should I do a normal cpp memcpy or cudaMemcpy with my understanding both should work, but what is the difference between them???
I mean when I do a cudaMemcpy and we specify 4th argument as CudaMemcpyHostToDevice what does this do???
ROOT ISSUE: both memcpy works fine for my code in a single pthread…
but gives a bus error when two different kernels are called from different threads when I do a normal memcpy
whereas works fine in above condition when I do CudaMemcpy…!!
Need a very urgent suggestions/solution to this issue…!!!
Any help is appreciated…!!!
Thanks in advance…
The point of using cudaMallocManaged is that you don’t have to use cudaMemcpy
you might want to read the relevant section of the programming guide:
[url]Programming Guide :: CUDA Toolkit Documentation
and perhaps study some of the UM sample codes.
With respect to host operations, you can use ordinary host operations (e.g. memcpy) to populate it if you wish.
The bus error may be arising if you haven’t satisfied the UM requirement that no host thread is allowed to touch a managed data region after a kernel call until you explicitly do a cudaDeviceSynchronize(). If you use cudaMemcpy H->D on the other hand, you can still write to such a region. This synchronization requirement is covered in the programming guide:
[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-gpu-exclusive[/url]
Thanks txbob…!!! The link was of great help… And I think I got the issue where I’m going wrong…