cublasSgemm() alway fail during compute intensify task

I am using GTX 760 with 4GB GPU memory to train a deep learning model under windows 7 64 bit.

I always meet cublasSgemm() fail during training. not sure when it happens, but always meet.

this function is for matrix multiply. Is there anyone who meet the same issue and know how to fix it.

is this a hardware or driver issue? my driver is the latest version.

How does the call to cublasSgemm() fail? What status does it return? Is the status of all other API calls and kernel launches checked carefully? The failure could be due to a follow-on issue, such as cublasSgemm() being passed an invalid pointer to allocated memory, since the previous allocation failed.

void NVMatrix::rightMult(const NVMatrix &b, float scaleAB, NVMatrix &target) const
{
assert(isContiguous() && b.isContiguous() && target.isContiguous());
// assert(&target != &b);
assert(_numCols == b.getNumRows());
if(&target != this)
{
target.resize(_numRows, b.getNumCols());
target.setTrans(true);
}
assert(target.getNumRows() == _numRows);
assert(target.getNumCols() == b.getNumCols());
if(_numRows % 64 != 0 || _numCols % 64 != 0 || b.getNumCols() % 64 != 0)
{
WARN(“Matrix dimensions not divisible by 64 – cublasSgemm performance may suffer.”);
}

cublasSgemm(getTransChar(), b.getTransChar(), _numRows, b.getNumCols(), _numCols,
scaleAB, _devData, getLeadingDim(), b.getDevData(), b.getLeadingDim(),
0, target.getDevData(), getNumRows());

checkCublasError(“cublasSgemm failed right mult”);
// cudaThreadSynchronize();
}

the code is from https://code.google.com/p/cuda-convnet/

This code use cublas lib version 1.0 but not 2.0. How can I get detail error status for root cause? currently, it only print out “cublasSgemm failed right mult”

cublas documentation:

http://docs.nvidia.com/cuda/cublas/index.html#appendix-a-using-the-cublas-legacy-api

in my local cublasSgemm() definition in cublas.h is like this:

void CUBLASWINAPI cublasSgemm (char transa, char transb, int m, int n, int k,
float alpha, const float *A, int lda,
const float *B, int ldb, float beta, float *C,
int ldc);

the link your provide for cublasSgemm() is like this.

cublasStatus_t cublasSgemm(cublasHandle_t handle, cublasOperation_t transa, cublasOperation_t transb, int m, int n, int k, const float *alpha, const float *A, int lda, const float *B, int ldb, const float *beta, float *C, int ldc)

it has a cublasHandle_t, but my version is void and no return.

So what does cublasSgemm() return? You would want to assign its return status to a variable of type cublasStatus_t and check what that value indicates.

I do not understand what you mean by “cublas lib version 1.0 but not 2.0”. Ideally you would be using CUDA 6.5 at this time with the toolchain and libraries that come with it. Probably nobody here can reproduce issues with very old CUDA versions from several years ago.

Read the actual section I linked. It covers the cublas legacy api.

The format of the call for cublasSgemm in the legacy api does not have a cublas handle as its first parameter.

But more importantly, the section I linked discusses error-checking with the cublas legacy API.

Please study it.

“the status of core functions can be retrieved using cublasGetError().”

Seriously, read the whole section on the cublas legacy api. It’s not that long. It is clearly delineated from the cublas_v2 api, and has it’s own appendix (A.). It’s clearly stated in that appendix that the function prototypes are not contained in the doc, but instead must be found in cublas.h

if you do not know much, you can learn! if you have some basic knowledge about deep model and run some libs, you would definitely meet cublasSgemm() fail.

Just search on Google to see how many people meet cublasSgemm fail, OK?

cublas latest version header with “_v2”, which mean version 2. if you really know about it.

who tell you cuda 6.5 is released, then all users must immigrate to the latest version?

if you feel boring to answer question at here, you can leave!!!

Sorry, I did not realize that by “cublas lib version 1.0” you were referring to the legacy interface of CUBLAS, rather than the version of the library. It seems txbob managed to catch on to what you meant and already provided some good advice on how to proceed.

The starting point to finding out what went wrong with the SGEMM call is definitely to inspect the status return. With the modern CUBLAS interface (which you showed in your post) this is accomplished by looking at the return value of the function, which is why I inquired about it. With the CUBLAS legacy interfacs that status can be retrieved by using cublasGetError().

I am not aware of wide-spread reports of issues with cublasSgemm(), and I follow this forum as well as the [cuda] tag at StackOverflow fairly closely. A Google search of issues reported with cublasSgemm() over the last year likewise returned a relatively short list, Checking the first twenty or so they all seem to deal with usage problems on the part of programmers, rather than defects in cublasSgemm().

The one problem report I found specifically concerning cuda-convnet and cublasSgemm() seems to suggest that the issue in that case was a kernel time-out issue. I think status CUBLAS_STATUS_INTERNAL_ERROR (14) is returned in such a case, but am not completely sure. SGEMM execution time is pretty much proportional to the product of the matrix dimensions nmk, so you would want to try passing smaller matrices to cublasSgemm() if kernel timeout seems to be the problem. You can use the profiler to look at the kernel execution times.

You mean it’s my fault? I open your link then search “cublasSgemm” as key word, I only get one whole match, which has cublasStatus_t as return, but mine is void. SO WHY I CONTINUE READY YOUR LINK?

This web page is not long? Just down load the PDF version of this web page, it is totally 143 PAGES!!! From Chapter 1 introduction to Appendix C!!!

If you did not provide a clear answer, DO NOT BLAME OTHER ONE, OK?

Sorry to have upset you.

The link I provided opens to a specific section of the document (it should, anyway) which covers error checking of the cublas legacy API, which is the specific question of yours that I had excerpted and responded to. If you simply open the link, it should open to the exact page in your browser window that mentions the error checking function and describes the possible error codes. You shouldn’t have to search on anything or scroll your browser window, at all. Just read what shows up there.

The whole section I am referring to that you read is Appendix A, the part that covers the cublas legacy API. If you’re going to be modifying code that uses the cublas legacy API, there is good information in that entire appendix (and I think it is only a few pages), for example a mention that the legacy API is not thread safe (might be important, depending on how you are using it, I don’t know). You will also begin to grasp that the Legacy API is really separate (almost completely) from the cublas_v2 api, and you will understand, for example, that the function prototypes specifically documented in the manual do not apply to their legacy API usage. That’s why I’m suggesting you read appendix A. I think it will help you.

Most likely there is an issue with the parameters you entered into the SGEMM() call.

Seriously cuBLAS has been used by thousands of people successfully, in particular the SGEMM(), so an error in the implementation is very unlikely.

Did you disable the WDDM driver timeout in Windows 7 ?
Are you aware that cuBLAS uses column-major format?
If you input is in row major did you make to adjustments to column-major correctly?

Most times when people complain about cuBLAS errors, it is related to incorrect input parameters, and that is what you are seeing in your Google search.

You did not provide a clear code sample using code blocks, and do not give enough information to really troubleshoot the problem.

It will not help to berate those on this forum who are nice enough to try to help you out.

it turns out a TDR issue of windows 7 WDDM Driver. Now I increase delay time to 10sec and retrain.

my code run very large matrix multiply, default 2 seconds are far not enough.

this issue bother me several months!!!

The problem is once meet this issue, Balloon notice of windows system bar disappear very quickly. I do not know it already cause GPU reset if I do not monitor the program during training.