cudaError_enum error with Quadro NVS 140M

Hi all,

  I am a newbie to CUDA and was experimenting with the matrix addition program. I got a Thinkpad T61 which comes with the Quadro NVS 140M. I allocate a small amount of mem which imho will be supported by the NVS since its 128 meg. However, I still get the cudaError_enum error no matter what I change the size of the matrix to be. 
 const int N = 8;

  const int blocksize = 2;

   const int size = N*N*sizeof(float);

    CUDA_SAFE_CALL ( cudaMalloc( (void**)&ad, size));

    CUDA_SAFE_CALL ( cudaMalloc( (void**)&bd, size));

    CUDA_SAFE_CALL ( cudaMalloc( (void**)&cd, size));

   CUDA_SAFE_CALL ( cudaMemcpy(ad, a, size, cudaMemcpyHostToDevice));

    CUDA_SAFE_CALL ( cudaMemcpy(bd, b, size, cudaMemcpyHostToDevice));

   dim3 dimBlock( blocksize, blocksize);

    dim3 dimGrid (N/dimBlock.x, N/dimBlock.y);

    add_matrix<<<dimGrid, dimBlock>>>(ad, bd, cd, N);

   CUDA_SAFE_CALL ( cudaMemcpy(c, cd, size, cudaMemcpyDeviceToHost));

   CUDA_SAFE_CALL ( cudaFree(ad));

    CUDA_SAFE_CALL ( cudaFree(bd));

    CUDA_SAFE_CALL ( cudaFree(cd));

I get the error in all the lines above where memory is being allocated. I will also paste my deviceQuery result just for information sake.

There is 1 device supporting CUDA

Device 0: “Quadro NVS 140M”

Major revision number: 1

Minor revision number: 1

Total amount of global memory: 133890048 bytes

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 800000 kilohertz


Press ENTER to exit…

I would really appreciate it if somebody can tell me where I am going wrong. I use the latest version of the graphics driver which comes with CUDA 2.0 yet I use CUDA 1.1 for this.


Just noticed that even the example programs throw out the same cudaError_enum when I am running it.