thread id does not reach 68719476704 (32 blocks x 2147483647 grid)


I have the following grid and blocks declaration:

dim3 grid(2147483647, 1, 1); 
dim3 blocks(32, 1,  1);

Then in the kernel:

unsigned long long tid = threadIdx.x + blockDim.x * blockIdx.x;

But when I check if tid reaches 68719476703, it does not.

What am I doing wrong?

P.S. I tried unsigned long long tid = (size_t)threadIdx.x + blockDim.x * blockIdx.x; with no result.

P.S. P.S. The answer is: unsigned long long tid = (size_t)blockIdx.x * blockDim.x + threadIdx.x;