Hi everyone,
I was reading about the asynchronous memory copy from global memory to shared memory; however, I was wondering is there a way to do the asynchronous memory copy from distributed shared memory to local shared memory?
Hi everyone,
I was reading about the asynchronous memory copy from global memory to shared memory; however, I was wondering is there a way to do the asynchronous memory copy from distributed shared memory to local shared memory?
This may be what you’re after.
Any CUDA code that references this?
libcu++ may be of use.