GEMM returning CUBLAS_STATUS_EXECUTION_FAILED but with data correct


I’m trying to use cublas in a sparse linear solver using StarPU.

My programm execute several GEMM / AXPY on GPU using cublas.

My problem is that sometime i get a CUBLAS_STATUS_EXECUTION_FAILED status after running cublasSgemm.

I look at my parameters and it looks ok :

transa = ‘n’, transb = ‘t’,

M = 12, N = 4, K = 4,

alpha = 1.0, A = 2f109e34, lda = 25,

B = 2f10a034, ldb = 25,

beta = 0.0, C = 20fc0000, ldc = 12

What is more strange is that if after getting this status i copy the data to the host and print all in files, my product is correct.

I runned all my application ignoring CUBLAS_STATUS_EXECUTION_FAILED and my system is correctly solved…

So all seems correct except that I have this CUBLAS_STATUS_EXECUTION_FAILED and I would like to know why I get this error and how I could correct it.

I call cublasSgemm trought this code :

#define CUBLAS(func) cublasS ## func

#define CUBLAS_GEMM(i,j,m,n,k,x,a,u,b,v,y,c,w)				\

  {									\

    BLAS_INT varim = (BLAS_INT)(m);					\

    BLAS_INT varin = (BLAS_INT)(n);					\

    BLAS_INT varik = (BLAS_INT)(k);					\

    BLAS_INT variu = (BLAS_INT)(u);					\

    BLAS_INT variv = (BLAS_INT)(v);					\

    BLAS_INT variw = (BLAS_INT)(w);					\

    FLOAT    varix = (FLOAT)(x);					\

    FLOAT    variy = (FLOAT)(y);					\

    CUBLAS(gemm)(*(i), *(j), varim, varin, varik, varix, (a),		\

		 variu, (b), variv, variy, (c), variw);			\

    CUBLAS_CHECK_GEMM(*(i),*(j),a,b,c);					\

    cudaStreamSynchronize(starpu_cuda_get_local_stream());		\


Where th check only get status, and if there is an error, print status error, print parameters, get matrices from GPU and print them into file.



Edit, my config :

DELL Precision T7400

Linux Debian 6.0.3 - 2.6.32-5-amd64

icc (ICC) 12.0.3 20110309

CUDA 4.0.17

GeForce GTX 295

2 quadcore Intel® Xeon® CPU E5410 @ 2.33GHz

RAM 32Go