Matrix Multiplication Error


I am facing a strange problem for matrix multiplication code mentioned in Programming Guide. Suppose I have 256 matrices of size 256 x 256 and Block Size 16. Then I multiply them and found correct results. But if I decrease the number of matrices to 64 or 128 of same size and same block size or increase number of matrices to 2048 of same size then results are incorrect. Can anybody tell me about this strange behaviour as I have no idea why this strange behaviour as matrix size and block size are same.

Looking Forward,