If you are allocating more threads per block than is allowed (note that on compute capability 2.x the maximum number of threads per block is 1024, even though the maximum number of threads per SM is 1536), then an error code will be returned by one of the next CUDA function calls. How are you checking the return codes of the CUDA functions?
I’m not sure I understand the problem. Since you got a CUDA error when trying to launch your kernel, the contents of dArr will be undefined. It might be left with whatever values were already in that part of GPU memory when you started. CUDA does not zero out memory when you allocate it.