Distributed shared memory asynchronous memory copy

Hi everyone,

I was reading about the asynchronous memory copy from global memory to shared memory; however, I was wondering is there a way to do the asynchronous memory copy from distributed shared memory to local shared memory?

This may be what you’re after.

Any CUDA code that references this?

libcu++ may be of use.