conceptual doubt about CUDA

I’m starting to inform myself on CUDA.

It seems that each thread of a kernel executes with an XYZ index.

Does that mean that if I execute myfunction<<<1,10>>>(parameters) it executes 1000 threads each with its own unique XYZ index?

I guess I’m wrong cause that would be inefficient, for example, if I’m working on a 2 dimensions array, where I only need to refer to threadId.x and threadId.y
In a 10x10 array I need only 100 threads, instead of 1000

With CUDA, you can index things with threads in 1d, 2d, or 3d.

Calling this :

kernel<<<1, 10>>>();

is actually only calling 1 block of threads with 10 threads per block so only 10 threads will actually be run.

Hopefully this will help : nvidia - Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation) - Stack Overflow

“It seems that each thread of a kernel executes with an XYZ index.”

It seems that each thread of a kernel executes with a common, shared XYZ index.

myfunction<<<1,10>>>(parameters), read within context would imply a grid of (1, 1, 1), and a block of (10, 1, 1)
grid multiplied by block gives 10 threads, not 1000

I see it now, thanks