I think I have found a bug in the cublasDtrsm() function. I am using cublasDtrsm() to solve AX=I to find the inverse of A and in my case A is a diagonal matrix so this should be easy.
When I use the function in the form of AX=I with A=diag([3956810 , 50666 , 770 , 21]), I should get X=diag([2.5273e-7 , 1.9737e-5 , 1.2987e-3 , 4.7619e-2]). I do get those elements along the diagonal and zeros on all of the off-diagonal elements except for X(4,1)=85.0459.
If I use the function in the form of XA=I with the same A, I again get the right elements along the diagonal and zeros on all of the off-diagonal elements except for X(4,1)=4.5111e-4. In this case the element is small enought that my iterative algorithm will converge but with some additional iterations.
If I replace the cublasDtrsm() function with my own inverse then the number of iterations exactly matches my straight Matlab implementation.