Kernel launch

mszigetihu · December 13, 2010, 10:35pm

Dear Guys,

There is a little part in the “Cuda by Example” book and i don’t understand it. Maybe i can’t recognize how simple it is or something. Kind of weird becouse i understand the hole concept, but i just started to learn so i need a little help. If you have the book it’s page 63-65, under “GPU sums of a longer vector”.

So, for example: there is a kernel launch with 128 threads/per block (second parameter). The book sais, “we can just launch N/128 blocks to get our total of N threads running” (first parameter). But why is the /? In the previous part, there was a simple kernel launch like <<<1,10>>>. I know, it’s just a one dimensional array. And then, we can’t just launch N/128, becouse if N is less then 128 it’s 0 becouse it is an integer. So we have to do like this: <<<(N+127)/128,128>>>. But why assume that the N is “less then or more than”? We put the number to the kernel launch, it’s not a random thing. We obviously won’t make our lives harder, so we just put <<128,128>>> or something. I know that the first parameter is the number of blocks and the second is the number of kernels in a blocks, but why is the division in the first parameter? I don’t have too much programming experience (just a little C++), truth be told, maybe i’m a little young for this thing. If you have the book, please try to explain me with other words.

Mod: OK, i kept reading and i understand that (N+127)/128 can’t be more than 65,535 which is the maximum size of the blocks in the grid. But i still don’t understand that why is this division there, why is the N there and the int 127. Think about it. If N=1, then (N+127)/128 is 1, and this is the first parameter which means we have 1 block?

Thank you, and sorry for my english.

Martin

_constant · December 14, 2010, 9:38am

Dear Guys,

There is a little part in the “Cuda by Example” book and i don’t understand it. Maybe i can’t recognize how simple it is or something. Kind of weird becouse i understand the hole concept, but i just started to learn so i need a little help. If you have the book it’s page 63-65, under “GPU sums of a longer vector”.

So, for example: there is a kernel launch with 128 threads/per block (second parameter). The book sais, “we can just launch N/128 blocks to get our total of N threads running” (first parameter). But why is the /? In the previous part, there was a simple kernel launch like <<<1,10>>>. I know, it’s just a one dimensional array. And then, we can’t just launch N/128, becouse if N is less then 128 it’s 0 becouse it is an integer. So we have to do like this: <<<(N+127)/128,128>>>. But why assume that the N is “less then or more than”? We put the number to the kernel launch, it’s not a random thing. We obviously won’t make our lives harder, so we just put <<128,128>>> or something. I know that the first parameter is the number of blocks and the second is the number of kernels in a blocks, but why is the division in the first parameter? I don’t have too much programming experience (just a little C++), truth be told, maybe i’m a little young for this thing. If you have the book, please try to explain me with other words.

Mod: OK, i kept reading and i understand that (N+127)/128 can’t be more than 65,535 which is the maximum size of the blocks in the grid. But i still don’t understand that why is this division there, why is the N there and the int 127. Think about it. If N=1, then (N+127)/128 is 1, and this is the first parameter which means we have 1 block?

Thank you, and sorry for my english.

Martin

Hi, If you write

int numThreads = 128;

int numBlocks = (N + numThreads - 1) / numThreads;

You will be guaranteed to have enough blocks to cover your entire data set.

Example: you are operating on vector where N = 16390 = 128^2 + 6, thus you need 129 blocks: numBlocks = (16390 + 128 - 1)/128 = 16517/128 = 129; // integer division…