A question for block dimension on cuda C programming

I have a question for cuda c programming.

When we write a cuda c code, we can set block dimension and grid dimension .

Like below [code1],

[code1]-----------------------------------------------------------------------------
dim3 dimBlock (32 , 32) ;
dim3 dimGrid ((N + dimBlock.x - 1) / dimBlock.x, (N + dimBlock.y -1)/ dimBlock.y);

I understand they set the number of threads.
but when I check my max number of threads by under [code2], It was appear to 1024.

[code2]---------------------------------------------------------
printf(“max threads per block : %d\n”,prop.maxThreadsPerBlock);

then I think I can set block dimension maximum 1024 like dimBlock(32,32).

But, although I increase the number of threads infinitely like dimBlock(1032500,1032500) , there are any compile error.

What is the my misunderstanding?
Perhaps, the compiler set the dimension to their maximum size automatically? or another reason?

I’m waiting your kind reply. thanks.

But, although I increase the number of threads infinitely like dimBlock(1032500,1032500) , there are any compile error.

runtime error will be occured (no error on compile time).
You’ll find this error by calling cudaGetLastError() just after kernel launch.

kernel_function<<<grid,TOO_BIG_BLOCK>>>(....);
cudaError_t status = cudaGetLastError();
assert( status == cudaSuccess);

Thank for your reply.

I tried to run the project.
actually, The result memory had wrong values you mentioned.

In my device, I have to set block dimension size under 1024.

Thank you again.