I seem to get this error if I am calling cublasDscal on a vector that isn’t very large (<200 elements).

I work around it by allocation more memory than necessary, but I would like to avoid this if possible.

Has anyone else noticed this issue and is there a solution?

and yes, my drivers are all up to date.


Actually, it seems to happen with a lot of the scalar multiply and copy functions. cutilCheckMsg() just gives an “unknown error.”


Is the GPU you are using, double precision capable?

Your report is too generic, unless you post a small repro case, you will not get a meaningful response.


[codebox]extern “C” void

actDouble(double* f, unsigned int len)



unsigned int num_threads = 256;

unsigned int blocks = (len/num_threads) + 1;		

dim3 grid(blocks, 1);

dim3 threads(num_threads, 1);     

actFuncDouble<<< grid, threads >>>(f, len);     




[codebox]global void

actFuncDouble( double* d_data, unsigned int len )


const unsigned int tid = blockIdx.x*blockDim.x + threadIdx.x;

if(tid<len) {

	double d = d_data[tid];	

	d_data[tid] = 1/(   1+expf(-d)       );



I have this simple function mixed in with numerous CUBLAS calls. Is there a “warm-up” that needs to happen this first time I call a run-time API (assuming CUBLAS init was called and successful)?

I thought you were complaining that CUBLAS calls were failing, but I don’t see any CUBLAS calls in your sample code at all.

As an aside, I presume you are aware that your kernel as written, despite taking double precision arguments, is doing the computations using a mixture of integers cast to single precision and a single precision transcendental function, and simply casting the result back to double afterwards.

Yes, I am aware of the expf call. I was just trying everything I could think of to isolate the errors.

I presume you are aware your response was worthless.

Whatever those might be…

You’re welcome. Best of luck.