cublasGetVector() returns code 11


I have a problem with a cublasGetVector() call and I was wondering if anyone knows why this may be happening.
I have a small kernel doing linear interpolation (I can’t use textures because the data points are non linearly spaced… and binning them will cause too much error.)

// Call Kernel
dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
interp2_bruteForce_ker<<<grid,threads>>>(d_w_buff,d_image_buff,d_new_w_buff,d_result_buffer, d_location_buff,COLUMNS);

status = cublasGetVector(lSize, sizeof(h_result_buffer[0]), d_result_buffer, 1, h_result_buffer, 1);

global void interp2_bruteForce_ker(float *x, float *y, float *xi, float *yi, int *position, int width) {

int idx = blockIdx.x * blockDim.x + threadIdx.x;
int idy = blockIdx.y * blockDim.y + threadIdx.y;
    int current_pos = idy * width + idx;
int interp_row = position[idy];
int interpo_pos = interp_row*width + idx;
yi[current_pos] = ( (y[interpo_pos + width]-y[interpo_pos] )/( x[interp_row + 1]-x[interp_row] ) * (xi[idy] - x[interp_row]) ) + y[interpo_pos];


For some reason I do not understand I am getting error 11 (CUBLAS_STATUS_MAPPING_ERROR) after the cublasGetVector() call.
The thread “” has some information but I do not know if this is relevant since when I put a timer around the kernel it takes 6.5ms. (I do have other calls that take more than 20ms to execute and they do not cause this error). I am using a 1024x1024 array for data

I am using Windows Vista 64 and running 2 video cards (one GTX275 for CUDA and a Quadro 570 for the display) with the same driver installed. (Vista would not let me install more than one)
Any help/comment/idea is greatly appreciated!
Thanks in advance!

There was an internal error with my kernel because of addressing(my position array had values beyond the limits of the other arrays). When I put the cutilCheckMsg() after the call, it catch the error. For some reason, when the kernel failed, this was also causing the getvector() call to fail.