i’m using cuda 10.1 version. vs2017 cmake 3.17.0
when using the
__device__ function call.
it always throws out an error that:
error: calling a __host__ function("cudaMemcpyAsync") from a __device__ function("Init") is not allowed
i need to copy 0 - 4K bytes buffer inside device function. and the buffer must be Synchronize generated by the previous result. i want to copy the memory as fast as possible while must not appear some errors likes
- read after write
- write before write.
Here some relative links in https://stackoverflow.com/a/49037139