Thrust error "Unspecified launch failure" after using CUSPARSE level3 function

I’m trying to figure out why I receive this runtime error:

terminate called after throwing an instance of ‘thrust::system::system_error’
what(): unspecified launch failure

after executing cusparseScsrmm() from the CUSPARSE library. The matrix and vector data input to the cusparseScsrmm() call are stored in thrust::device_vector format - I pass the raw pointers to the thrust vectors using thrust::raw_pointer_cast() to cusparseScsrmm().

The code runs fine using these dimensions for the matrix and vectors input to cusparseScsrmm():

Input vector dimensions: 1500625 x 1
Matrix dimensions: 42875 x 1500625
Output vector dimensions: 42875 x 1

If I scale up my dimensions a little bit higher, for example, to the following:

Input vector dimensions: 1679616 x 1
Matrix dimensions: 46656 x 1679616
Output vector dimensions: 46656 x 1

Then I receive the error and the program crashes.

I’m using a CUDA 1.1 Compute card, a single GPU of a GeForce 9800 GX2, CUDA Toolkit 3.2 RC, Thrust v.1.3, Ubuntu 10.04 LTS, a dual AMD Opteron 248 2.2 GHz system with 2 GB RAM.

Any tips on where to begin? I should emphasize that the cusparseScsrmm() call does go through, as several C++ lines of code can execute after the call (such as std::cout commands) before I receive the error. If I try using cudaThreadSynchronize() before and after the cusparseScsrmm() calls I receive a CUDA “unspecified launch failure” error.

I’m trying to figure out why I receive this runtime error:

terminate called after throwing an instance of ‘thrust::system::system_error’
what(): unspecified launch failure

after executing cusparseScsrmm() from the CUSPARSE library. The matrix and vector data input to the cusparseScsrmm() call are stored in thrust::device_vector format - I pass the raw pointers to the thrust vectors using thrust::raw_pointer_cast() to cusparseScsrmm().

The code runs fine using these dimensions for the matrix and vectors input to cusparseScsrmm():

Input vector dimensions: 1500625 x 1
Matrix dimensions: 42875 x 1500625
Output vector dimensions: 42875 x 1

If I scale up my dimensions a little bit higher, for example, to the following:

Input vector dimensions: 1679616 x 1
Matrix dimensions: 46656 x 1679616
Output vector dimensions: 46656 x 1

Then I receive the error and the program crashes.

I’m using a CUDA 1.1 Compute card, a single GPU of a GeForce 9800 GX2, CUDA Toolkit 3.2 RC, Thrust v.1.3, Ubuntu 10.04 LTS, a dual AMD Opteron 248 2.2 GHz system with 2 GB RAM.

Any tips on where to begin? I should emphasize that the cusparseScsrmm() call does go through, as several C++ lines of code can execute after the call (such as std::cout commands) before I receive the error. If I try using cudaThreadSynchronize() before and after the cusparseScsrmm() calls I receive a CUDA “unspecified launch failure” error.

I think I figured out what was causing this error. I was specifying incorrect dimensions to the cusparseScsrmm() function. Long story short, I was able to get away it, but not always. Bottom line: triple-check the dimensions you input to the function.

I think I figured out what was causing this error. I was specifying incorrect dimensions to the cusparseScsrmm() function. Long story short, I was able to get away it, but not always. Bottom line: triple-check the dimensions you input to the function.