GEMM returning CUBLAS_STATUS_EXECUTION_FAILED but with data correct

Hello,

I’m trying to use cublas in a sparse linear solver using StarPU.

My programm execute several GEMM / AXPY on GPU using cublas.

My problem is that sometime i get a CUBLAS_STATUS_EXECUTION_FAILED status after running cublasSgemm.

I look at my parameters and it looks ok :

transa = ‘n’, transb = ‘t’,

M = 12, N = 4, K = 4,

alpha = 1.0, A = 2f109e34, lda = 25,

B = 2f10a034, ldb = 25,

beta = 0.0, C = 20fc0000, ldc = 12

What is more strange is that if after getting this status i copy the data to the host and print all in files, my product is correct.

I runned all my application ignoring CUBLAS_STATUS_EXECUTION_FAILED and my system is correctly solved…

So all seems correct except that I have this CUBLAS_STATUS_EXECUTION_FAILED and I would like to know why I get this error and how I could correct it.

I call cublasSgemm trought this code :

#define CUBLAS(func) cublasS ## func

#define CUBLAS_GEMM(i,j,m,n,k,x,a,u,b,v,y,c,w)				\

  {									\

    BLAS_INT varim = (BLAS_INT)(m);					\

    BLAS_INT varin = (BLAS_INT)(n);					\

    BLAS_INT varik = (BLAS_INT)(k);					\

    BLAS_INT variu = (BLAS_INT)(u);					\

    BLAS_INT variv = (BLAS_INT)(v);					\

    BLAS_INT variw = (BLAS_INT)(w);					\

    FLOAT    varix = (FLOAT)(x);					\

    FLOAT    variy = (FLOAT)(y);					\

    CUBLAS(gemm)(*(i), *(j), varim, varin, varik, varix, (a),		\

		 variu, (b), variv, variy, (c), variw);			\

    CUBLAS_CHECK_GEMM(*(i),*(j),a,b,c);					\

    cudaStreamSynchronize(starpu_cuda_get_local_stream());		\

  }

Where th check only get status, and if there is an error, print status error, print parameters, get matrices from GPU and print them into file.

Thanks,

XL

Edit, my config :

DELL Precision T7400

Linux Debian 6.0.3 - 2.6.32-5-amd64

icc (ICC) 12.0.3 20110309

CUDA 4.0.17

GeForce GTX 295

2 quadcore Intel® Xeon® CPU E5410 @ 2.33GHz

RAM 32Go