Problem with CudaMemCpy in CUDA 1.1

Hi all,

I have a DELL Latitude D630 with a Quadro NVS 135M. I’ve waited so much time from DELL to post “DELL laptop compliant” NVidia drivers that support CUDA 1.1 (version >169…) .

Here they are, in version 174.31 ! I’ve installed them, and the new CUDA toolkit and SDK.

Unfortunately, the CUDA samples do not work any more in that version. The samples freeze 2-3 seconds on CUDA_DEVICE_INIT…
And all the tests done in that samples fail. The samples with 3D don’t display anything.

I’ve written a simple code to compare array values before and after a copy from host to device and back from device to host. This test fails every time.

float* h_idata1 = (float*) malloc(mem_size);
float* h_idata2 = (float*) malloc(mem_size);

float* d_idata;
CUDA_SAFE_CALL( cudaMalloc( (void**) &d_idata, mem_size));

CUDA_SAFE_CALL( cudaMemcpy( d_idata, h_idata1, mem_size,
cudaMemcpyHostToDevice) );
CUDA_SAFE_CALL( cudaMemcpy( h_idata2, d_idata, mem_size,
cudaMemcpyDeviceToHost) );

for( unsigned int i = 0; i < (size_x * size_y); ++i) {
if ((h_idata2[i]>h_idata1[i]+0.1f) || (h_idata2[i]<h_idata1[i]-0.1f)) {
printf(“memcpy test : FAILED\n”);

Have you an idea about that issue ? Is there something wrong about support of CUDA on Quadro NVS drivers ?
And how can i check my CUDA installation (what are the DLLs used… ?)

You likely only have 128MB of video memory. That isn’t really sufficient to use CUDA. What is reported when you run the deviceQuery SDK sample?

This is the output of deviceQuery :

Major revision number: 1

Minor revision number: 1

Total amount of global memory: 133890048 bytes

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 800000 kilohertz

In my sample code, i do a copy of a 32x128 float array. That’s only 32x128x4 = 16394 bytes.

CUDA 1.0 works fine on my machine. Is there any change about memory consumption in CUDA 1.1 ?

And one more question : how can i check that CUDA is well installed on my machine ?

But the Quadro NVS-135M is in the list of the GPU that support CUDA…

Where can i find informations about CUDA memory limitations ?