Find reason behind CUBLAS_STATUS_INTERNAL_ERROR

I am getting CUBLAS_STATUS_INTERNAL_ERROR as the return value of cublasDgemm. This error is explained in the cuBLAS documentation as “an internal operation failed”.

Any help to find out the problem?

run your code with cuda-memcheck - are any errors reported?