I have a large matrix which I need to calculate its dot product. The matrix size is 900000,3 and I transpose it and calculate its dot product.

I have tried NVblas with python and it works but I fear I am getting the appropriate coverage via block parameter. What should my NVBLAS_TILE_DIM be set for such a large matrix?