How does the call to cublasSgemm() fail? What status does it return? Is the status of all other API calls and kernel launches checked carefully? The failure could be due to a follow-on issue, such as cublasSgemm() being passed an invalid pointer to allocated memory, since the previous allocation failed.
in my local cublasSgemm() definition in cublas.h is like this:
void CUBLASWINAPI cublasSgemm (char transa, char transb, int m, int n, int k,
float alpha, const float *A, int lda,
const float *B, int ldb, float beta, float *C,
the link your provide for cublasSgemm() is like this.
cublasStatus_t cublasSgemm(cublasHandle_t handle, cublasOperation_t transa, cublasOperation_t transb, int m, int n, int k, const float *alpha, const float *A, int lda, const float *B, int ldb, const float *beta, float *C, int ldc)
it has a cublasHandle_t, but my version is void and no return.
So what does cublasSgemm() return? You would want to assign its return status to a variable of type cublasStatus_t and check what that value indicates.
I do not understand what you mean by “cublas lib version 1.0 but not 2.0”. Ideally you would be using CUDA 6.5 at this time with the toolchain and libraries that come with it. Probably nobody here can reproduce issues with very old CUDA versions from several years ago.
Read the actual section I linked. It covers the cublas legacy api.
The format of the call for cublasSgemm in the legacy api does not have a cublas handle as its first parameter.
But more importantly, the section I linked discusses error-checking with the cublas legacy API.
Please study it.
“the status of core functions can be retrieved using cublasGetError().”
Seriously, read the whole section on the cublas legacy api. It’s not that long. It is clearly delineated from the cublas_v2 api, and has it’s own appendix (A.). It’s clearly stated in that appendix that the function prototypes are not contained in the doc, but instead must be found in cublas.h
Sorry, I did not realize that by “cublas lib version 1.0” you were referring to the legacy interface of CUBLAS, rather than the version of the library. It seems txbob managed to catch on to what you meant and already provided some good advice on how to proceed.
The starting point to finding out what went wrong with the SGEMM call is definitely to inspect the status return. With the modern CUBLAS interface (which you showed in your post) this is accomplished by looking at the return value of the function, which is why I inquired about it. With the CUBLAS legacy interfacs that status can be retrieved by using cublasGetError().
I am not aware of wide-spread reports of issues with cublasSgemm(), and I follow this forum as well as the [cuda] tag at StackOverflow fairly closely. A Google search of issues reported with cublasSgemm() over the last year likewise returned a relatively short list, Checking the first twenty or so they all seem to deal with usage problems on the part of programmers, rather than defects in cublasSgemm().
The one problem report I found specifically concerning cuda-convnet and cublasSgemm() seems to suggest that the issue in that case was a kernel time-out issue. I think status CUBLAS_STATUS_INTERNAL_ERROR (14) is returned in such a case, but am not completely sure. SGEMM execution time is pretty much proportional to the product of the matrix dimensions nmk, so you would want to try passing smaller matrices to cublasSgemm() if kernel timeout seems to be the problem. You can use the profiler to look at the kernel execution times.
The link I provided opens to a specific section of the document (it should, anyway) which covers error checking of the cublas legacy API, which is the specific question of yours that I had excerpted and responded to. If you simply open the link, it should open to the exact page in your browser window that mentions the error checking function and describes the possible error codes. You shouldn’t have to search on anything or scroll your browser window, at all. Just read what shows up there.
The whole section I am referring to that you read is Appendix A, the part that covers the cublas legacy API. If you’re going to be modifying code that uses the cublas legacy API, there is good information in that entire appendix (and I think it is only a few pages), for example a mention that the legacy API is not thread safe (might be important, depending on how you are using it, I don’t know). You will also begin to grasp that the Legacy API is really separate (almost completely) from the cublas_v2 api, and you will understand, for example, that the function prototypes specifically documented in the manual do not apply to their legacy API usage. That’s why I’m suggesting you read appendix A. I think it will help you.