I’m starting to inform myself on CUDA.
It seems that each thread of a kernel executes with an XYZ index.
Does that mean that if I execute myfunction<<<1,10>>>(parameters) it executes 1000 threads each with its own unique XYZ index?
I guess I’m wrong cause that would be inefficient, for example, if I’m working on a 2 dimensions array, where I only need to refer to threadId.x and threadId.y
In a 10x10 array I need only 100 threads, instead of 1000