In my application I CudaMalloc() a global memory in host and pass it to a kernel. Since kernel arguments are passed via shared memory, so all accesses to the passed references in device should go through shared memory. Does this mean that there’s absolutely no reason to declare a shared memory directly in device code? Are the two approaches identical?