Gpu Cores


Imagine I have 1024 blocks on GPU that each block has 1024 threads, and my GPU has 256 cores. due to my code :

int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
float sum = 0.0;
for (int i = -10; i <= 10; i++)
   for (int j = -10; j <= 10; j++)
 	 int idx = ((row + j)) + (col + i);
	 sum += inout[idx];
int index = (row) + col;
output[index] = sum / (441);

Each core calculate one “sum” (variable sum within the loops) ? In other means in same time my GPU calculate 256 “sum” ?

A CUDA core is really not like a CPU core. A CUDA core is basically a single-precision floating point multiply-add unit. It supports basically 3 machine language instructions: FADD, FMUL, and FMA. It doesn’t do anything else.

a GPU core is undoubtedly being used to process the FADD instruction associated with this line of code:

sum += inout[idx];

All the rest of the code you have shown is not using a CUDA core (with the exception of the last line)

It can’t be definitively stated that " in same time my GPU calculate 256 “sum” ", however that is a reasonable statement for the peak theoretical througput of the machine.

Thank you so much,

So how I can understand how many cores are available in my code running time? and my code run on how many cored ?

I don’t know why that matters. From a high level perspective, your code is running on all the CUDA cores in your GPU.

Perhaps I don’t understand the question. Perhaps you might want to learn how to use one of the profilers. Or perhaps you might want to study the CUDA deviceQuery sample code.

Thank you for your response.

I am writing a paper about parallel processing, because of it I wanted to know.