Quadro NVS 140M supported?

When I tried to run a few of the examples from the SDK on a Quadro NVS 140M (which according to Wikipedia is supposedly based on 8400), some of them suceed and some of them failed. Adding a few lines to check the results shows e.g. for convolutionSeparable, that the GPU result is 0 for the entire array.

So:

  • Is Quadro NVS 140M supported?
  • Why does some of the programs succeed (convolutionFFT2D, dwtHaar1D, fluidsGL, imageDenoising, matrixMul, MersenneTwister, …), but other fail (convolutionSeparable, convolutionTexture, histogram64, …)?

The setup is a Lenovo ThinkPad T61 running Red Hat Enterprise Linux 5 Desktop in 64-bit, driver version string is “NVIDIA GLX Module 100.14.11 Wed Jun 13 17:16:40 PDT 2007”.

How are the examples failing?
What is the full output?
Do they fail for both gpu & emu builds?

It looks like this has 128MB of video memory, so some examples are probably too big. Can you please try the deviceQuery sample? This will display the gpu details (speed & memory)

The emu build runs fine. The failure looks like this:

$ ./release/alignedTypes 

Allocating memory...

Generating host input data array...

Uploading input data to GPU memory...

Testing misaligned types...

RGBA8_misaligned...

Time: 3.095000 ms / Copy throughput: 60.182395 GB/s.

TEST FAILED

LA32_misaligned...

Time: 0.024000 ms / Copy throughput: 7761.021388 GB/s.

TEST FAILED

RGB32_misaligned...

Time: 0.015000 ms / Copy throughput: 12417.634109 GB/s.

TEST FAILED

RGBA32_misaligned...

Time: 0.015000 ms / Copy throughput: 12417.634606 GB/s.

TEST FAILED

Testing aligned types...

RGBA8...

Time: 0.014000 ms / Copy throughput: 13304.607798 GB/s.

TEST FAILED

I32...

Time: 0.014000 ms / Copy throughput: 13304.607798 GB/s.

TEST FAILED

LA32...

Time: 0.016000 ms / Copy throughput: 11641.531630 GB/s.

TEST FAILED

RGB32...

Time: 0.014000 ms / Copy throughput: 13304.607798 GB/s.

TEST FAILED

RGBA32...

Time: 0.021000 ms / Copy throughput: 8869.738925 GB/s.

TEST FAILED

RGBA32_2...

Time: 0.014000 ms / Copy throughput: 13304.607798 GB/s.

TEST FAILED

Shutting down...

Press ENTER to exit...

$ ./release/convolutionSeparable 

4096 x 4096

Initializing data...

Warm up...

GPU convolution...

GPU convolution time : 0.033000 msec //508400.487603 Mpixels/sec

Reading back GPU results...

Checking the results...

...running convolutionRowCPU()

...running convolutionColumnCPU()

...comparing the results

L1 norm: 1.000000E+00

TEST FAILED

Shutting down...

Press ENTER to exit...

Segmentation fault

$ ./release/convolutionTexture   

Initializing data...

convolutionRowGPU()

...convolutionRowGPU() time: 10.230000 msecs; //1640.001637 Mpix/s

Copying convolutionRowGPU() output back to a_Data...

...cudaMemcpyToArray() time: 0.019000 msecs; //883011.396814 Mpix/s

convolutionColumnGPU()...

...convolutionColumnGPU() time: 0.028000 msecs; //599186.267219 Mpix/s

Reading back GPU results...

Checking GPU results...

...convolutionRowCPU()

...convolutionColumnCPU()

...comparing the results

L1 norm: 1.000000E+00

TEST FAILED

Shutting down...

Press ENTER to exit...

Segmentation fault

I see, this is probably the cause. I will decrease the problem size and try again.

$ ./release/deviceQuery 

There is 1 device supporting CUDA

Device 0: "Quadro NVS 140M"

  Major revision number:                         1

  Minor revision number:                         1

  Total amount of global memory:                 133496832 bytes

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       16384 bytes

  Total number of registers available per block: 8192

  Warp size:                                     32

  Maximum number of threads per block:           512

  Maximum sizes of each dimension of a block:    512 x 512 x 64

  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1

  Maximum memory pitch:                          262144 bytes

  Texture alignment:                             256 bytes

  Clock rate:                                    337500 kilohertz

Test PASSED

Press ENTER to exit...

Thanks. Yep, you’ve got 128MB.

Try reducing the size of DATA_W and DATA_H for the gpu from 4096 to perhaps 1024 or smaller in convolutionSeparable_kernel.cu

sorry, in convolutionSeparable.cu (not _kernel)

Yes, it works! Thanks a lot!