Apologies if this has been asked before, and for its complete simpleness.
i have started to read the CUDA programmers guide, and have some very basic questions :
Section 2.1 describes kernels - which seem to be basic C functions. When this code on swection 2.1 is invoked rather than N iterations, there will be 1 (one) iteration multiplied by N to achieve a result ina pporx 1/N th amount of time ?.
Section 2.1 states that a calculation is divided in N threads which i would assume is as in 1 (one) above N calculations being processed at the same time. In the code example, the “threadIdx” is declared, and in section 2.2 “threadIdx” are arranged into blocks. Is it correct that a block is just the collection of threads ?, and you do not have to worry how they system implements these blocks when programming ?
Section 2.2 then continues to state that 1, 2 and 3 dimensional blocks have thread indexes ID calculated using the rules as per the text. Is it important to determine the thread index ID and would it be necessary to use this programming - basically is it rarely used, or something that will be used often ?.
Apologies for the basic and perhaps unusual questions - i just want to make sure that i remember the important areas before reading the document further. thanks.
__global__ void VecAdd(float* A, float* B, float* C, int M)
{
int i = threadIdx.x;
if ( i < M )
C[i] = A[i] + B[i];
}
}
int main()
{
// Kernel invocation
VecAdd<<<1, N>>>(A, B, C, M);
}
Thanks - i understand the syntax of what is being stated - that N is the number of threads in a block, but just wanted to make sure that all the N threads are calculated at the same time.
Further, the question i have is whether the threadIdx is ever used to manipulate data, or as per the example, it is just used to apply the mathematical operation in the kernel. ?
If N is a small number = 20, i assume that there are enough cores to run all calculations in paralell - but if N = 1000, then this may not be the case- graphics card dependent.
So i can use threadIdx to target a specific calculation/element - is this the correct way to view threadIdx, as an element or a calculation instance ?.