Simple Question about 2D Grid

Hello everyone.

This is such a simple question, but I don’t understand why I am being given such grief. I want to launch a 2D grid reflecting the computations on a couple of square matrices I have. When I do:

[codebox]

global void kernel(params)

{

   cuPrintf("hello\n");

}

int main()

{

   int numThreadsPerBlock = 1;

   dim3 kBlocks = (10,10);

   kernel<<<kBlocks,numThreadsPerBlocks>>>(params);

}

///// output /////

hello

hello

hello

hello

hello

hello

hello

hello

hello

hello

[/codebox]

So, I am only seeing 10 of these being output. I should be seeing 100 of them. When I print out the dimensions of the grid, I am shown gridDim.x = 10 and gridDim.y = 1. Why is gridDim.y == 1 whenever I even hard code 10? I did get a 2D grid to work yesterday, but only after hours of fiddling with it. I can’t remember what I did now. External Image

I’d really appreciate any help. Thanks!

The initializer for kBlocks should be something like dim3(10,10).

Haha, well thanks. Ugh, it was such a simple mistake. I hate how that makes me feel. Still, somehow my brain glazed over the fact that dim3 was initialized that way instead of just your regular static variable. External Media

Uh, I don’t think dim3 foo = (10,10) does what you expect for any C compiler. (That is to say: this is not nvcc-specific.) Static initializers use braces, not parentheses. The parentheses cause this to be interpreted as an expression, so the behavior of the comma operator is to evaluate and discard the first expression, then evaluate and return the second expression. Thus, the statement you wrote should be equivalent to dim3 foo = 10 for all compliant C compilers.

In my head, I was thinking of dim3, which is CUDA specific and why my brain thought this kind of thing might work like this, as a tuple instead of a normal constructor. Tried things like kernel<<<(10,10),1>>>() which didn’t work either, obviously. I was stuck in tuple-mode for whatever reason.