Global memory to shared memory without passing registers

Hi, All

is there an option in CUDA that we can directly load data from global to shared memory without passing the thread-local registers?

Thanks!

This is better addressed on our CUDA Programming and Performance forum. You can find that forum at: