Qustion: Kernel Arugments


In my application I CudaMalloc() a global memory in host and pass it to a kernel. Since kernel arguments are passed via shared memory, so all accesses to the passed references in device should go through shared memory. Does this mean that there’s absolutely no reason to declare a shared memory directly in device code? Are the two approaches identical?

The argument that is being passed via shared memory is just the pointer NOT the data.

So your 100MB input data doesnt reside in shared memory… and when you access the data with the operator

you’re accessing global memory and not shared memory.