call __global__ Function

Hello together,

today I has been written Matrix multiply and it works fine. If I call the “global” kernel function and I must use the following syntax.

global void MatMul(…){

several source code
}

Now I call this function above with

MatMul<<< … , … >>>(…);.

So unfortunately I don’t know what exaclly means the <<< … , … >>>? Is it possible to explain me exactly what the NVIDIA compiler makes (What means the BLOCK_SIZE “16” on page 20 of the CUDA Programming Guide Version 2.3. Is the BLOCK_SIZE related with the operation e.g. A[2] = {1,2}, A2[2] = {3,4}, is one BLOCK A(1)+ A2(3) and the second BLOCK is A(2) +A2(4) or only with the Matrix dimension)? At the moment I use for a Array[10] the function e.g. MatMul<<< 1, 10 >>>. How works the scheduler/dispatcher e.g. “Thread handling”?

Regards
chero