Technical Specifications 6.1
Maximum number of threads per block 1024
Maximum x-dimension of a grid of thread blocks 2^31-1
I want to run 2 ^ 32 threads, 32 registor per thread, 128 threads per block (<1024)
calc => (2^32)/128 = 33554432 bloks (< 2^31-1)
I steup kernel: kernel <<<33554432 ,128>>>() (this does not contradict the limitations)
but udaDeviceSynchronize returned error code 4 after launching addKernel
error does not occur if use: kernel <<<2097152 ,128>>>() (releas) and kernel <<<~4194304,128>>>() (debug)
There is a rule explaining why I can not take the number of blocks more than 2097152 ?