cublasGemmEx does not work on certain GPU

I am worming with a codebase calling cublasGemmEx, it works well on a Titan GPU server, but doesn’t work on a K20 one.

Both servers installed cuda 8.0, cudnn 5.1, with GPU driver support cuda 8.0.

The codebase is written in C#, the target architecture of visual studio on K20 sever has set to k_20, sm_20. All other functions except cublasGemmEx work well. Functions calling cublasGemmEx get all zero return values.

cudaDeviceSynchronize();
cudaError_t error = cudaGetLastError();
shows no error information

You might want to run this application under cuda-memcheck. This will catch CUDA API errors that occur internally, as well as out of illegal memory accesses inside the kernel.

“k_20” is not a valid architecture, and “sm_20” is incorrect for the K20.

“compute_35” and “sm_35” are the correct values for the Tesla K20.

Thank you for all your replies.

I found the in doc that "cublasCgemmEx is only supported for GPU with architecture capabilities equal or greater than 5.0 ", the error value should be CUBLAS_STATUS_ARCH_MISMATCH.