Jetson TK1 max blocks in X dim

I am using a Jetson TK1, and would like to have blocks in only the X-dimension. I am under the impression that I can have up to 2^31-1 blocks as per here:

However, if I try to use more than 65535 the kernel in question (a simple prinft statement at the moment) simply doesn’t execute.

Has anyone else encountered the same issue? It seems moving to 2-d grid is the best solution for now, but I would rather use a 1-d grid if possible.

The TK1 device properties returned by deviceQuery indicate that the max grid size is ( INT_MAX, 65535, 65535 ).

If that’s failing then you should file a bug ASAP!

CUDA_samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting…

Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)

but no test case…