Continuing from my last topic where I managed to launch Cuda and CublasC from Fortran: https://devtalk.nvidia.com/default/topic/995389/cuda-programming-and-performance/problem-using-cuda-as-a-static-library-with-c-and-fortran-on-vs2012/
I tried to test the process and how fast cublas gpu computing in comparison with Intel MKL. So I was testing sgemm fortran, dgemm fortran, cublas sgemm and cublas dgemm. And I noticed two main problems:
- reaching a certain size for the matrixes cublas dgemm and cublas sgemm don’t work. For 5000 x 5000, cublas dgemm while for 9000 x 9000 cublas sgemm don’t work, knowing that sgemm fortran and dgemm still compute. The error I get is CUBLAS_STATUS_MAPPING_ERROR in cublasGetVector when I want to copy the result from device to host. At least that’s when the error show it could be before. I suspected problem with stack or heap and tried to set their sizes higher but It didn’t work.
I have a Nvidia GT 7500M, so maybe the problem comes from the limitation of my graphic card. Knowing that I am only using it for testing cuda, the final program would run on a distant server wich have a Tesla graphic card.
- Cublas dgemm is very slow, It’s 10 times slower than cublas sgemm, It’s even slower than CPU. For 3000 x 3000:
sgem CPU: 721 ms
dgemm CPU: 1419 ms
cublas sgemm GPU: 153 ms
cublas dgemm GPU: 1878 ms
I uploaded the vs project in a zip file: https://ufile.io/1c7b2