I am worming with a codebase calling cublasGemmEx, it works well on a Titan GPU server, but doesn’t work on a K20 one.
Both servers installed cuda 8.0, cudnn 5.1, with GPU driver support cuda 8.0.
The codebase is written in C#, the target architecture of visual studio on K20 sever has set to k_20, sm_20. All other functions except cublasGemmEx work well. Functions calling cublasGemmEx get all zero return values.
You might want to run this application under cuda-memcheck. This will catch CUDA API errors that occur internally, as well as out of illegal memory accesses inside the kernel.
I found the in doc that "cublasCgemmEx is only supported for GPU with architecture capabilities equal or greater than 5.0 ", the error value should be CUBLAS_STATUS_ARCH_MISMATCH.