cuBlasLt's performance on 2080ti and Linux is much worse than on 1650 and Win10

I tested some code that basiclly just called cuBlasLt library, the resulted performance is confuse me.

The performance on our server, which equiped RTX 2080ti and CentOS 7.x system, is almost taken 2 times longer than my personal computer, which have GTX 1650 and Win10 system.

is anyone have the similar experience?

This doesn’t sound too surprising.

RTX 2080Ti has 4352 cores, while the GTX 1650 has 896.

Also, I don’t believe the 1650 has Tensor Cores which cuBLASLt might use.

I guess you get wrong on what I’m mean, what I’m trying to state, is the 2080ti’s performance is much worse than 1650.

and I checked the info about the two card, they both have the compute capability of 7.5, so the 1650 should has Tensor Cores as 2080ti do

Ahh, my bad. I did misread your original post.
We would need the exact code you’re running to do further analysis.