Optimum thread count

Hello everyone

If I run the code as follows, I get the correct result matrix

example<<<1.256>>>(d_a, d_b, d_c)

If I run it like this, the result matrix returns wrong.

example<<<10.256>>>(d_a, d_b, d_c)

Where can there be an error