cublasSgemm - how to correctly use transpose?

Hello,

I am reading the docs on the cublasSgemm and the option on transposing matrices before matrix multiplication. But I just can’t figure it out, it always returns wrong results.

im using it for A*B like:
cublasSgemm (‘n’,‘n’,rA,cD,cA,alpha,pAgpu,rA,pDgpu,rD,beta,gc,rA);

and i thought that all I need for A*B^T would be:
cublasSgemm (‘n’,‘t’,rA,cD,cA,alpha,pAgpu,rA,pDgpu,rD,beta,gc,rA);

But I dont get correct results. I also tried changing the leading dimensions of A and B matrix with different combinations but still I dont get right results.

Can someone help me on how to do a A*B^T using cublasSgemm?