I was checking the “An Even Easier Introduction to CUDA” by Nvidia. There is a sample code adding 2 arrays together, which is straightforward. The 2 arrays that are going to be added are allocated in the shared memory which makes sense and is understandable. However there is also a variable that stores the size of the array as an integer (used in the for loop that iterates through the arrays within the kernel) which is not allocated in the shared memory. All of these are passed to the kernel through the parameters. How is the kernel going to access the integer that holds the size? Why are those arrays allocated in the shared memory but the integer is not? Am I missing something about how the memory works? After hours of searching the web and SO there is no answer to this specific question. That’s why I finally made an account to ask this.
Thanks to everyone in advance, I’m excited to get more into CUDA and parallel computing.
The kernel gets access to the numerical value of the integer variable in exactly the same way it gets access to the numerical value of the pointer variables that are used to access the allocations. All of these items are passed by value to the kernel.
The kernel pass-by-value mechanism is mentioned in the programming guide here. You can find other references in the programming guide that indicate that kernel arguments are copied to the device as part of the launch process.
Conceptually, the numerical value of an integer argument isn’t treated any differently than the numerical value of a pointer argument.
So the shared memory is not the only way that GPU can access data but is an optimized way of doing it, which would be a good choice for accessing and manipulating huge arrays. If I did not understand this wrong. I will read more into the programming guide.
Thanks for taking the time of your day to answer my question both here and in Stack Overflow. I really appreciate it. I hope your day goes really well and pleasant.