How many times a kernel actually run?

I wrote some simple code to evaluate layers of neural networks, I found that even I just start the kernel for once(for i = 0; i<1 ++i){//start kernel}, it still run more than 1 times because I got the right answer for final layer!!(one is 5 layers, another is 120 layers) if I set the block number as 5121(3216 is normal). It seems that in order to run in warps to cover all the 512 threads(I used them to sum up the inputs), the kernel had run a lot of times depending on how big the problem is. If this is true, then it should be a bug that misinterpret the codes. Here is my kernel code:

Solved: It was actually write-read conflict when kernel sequentially executed.