adding float arrays problems depending on GPU used


I have just started to deal with CUDA and did first tests with
from SDK 4.0.

When changing the size of the arrays (originally: int N = 50000) on cheap
graphics cards as Geforce 9500 or Geforce GT 430 and 440 the verification
of the results (using CPU calculations) is O.K. even for N = 5000000.

It is important however to use the most current NVIDIA graphics driver
280.26, otherwise e.g. GT 440 calculated weird results.

The results were O.K. using Windows XP /SP3 and Windows 7/64.

When using a Geforce GTX 560 Ti on a computer with Windows 7/64 I did not
manage to get correct results. The orinal vectorAdd.exe (64-bit) seems to
work (N = 50000). All the versions compiled by me using Visual C++ 2008
and CUDA Toolkit 4.0 gave errors. A certain percentage of the calculations
seems to fail and the accuracy in the failing summations is strange
(sometimes only the first 1-2 digits are correct).

Now I am confused: I thought that adding floating point numbers on GPU
(float, not double) is the most simple application that should work on all
CUDA enabled graphics cards.

The results seem to depend on

  • the driver version
  • the graphcis processor used

And in my case the “best” (most powerful) graphics card is the most

What can go wrong in compiling / executing from NVIDIA GPU
Computing SDK 4.0 if I just change N from 50000 to e.g. 500000?


the C-Compiler in Visual C++ 2008 express is 32-bit. The reason for the problems seems to
be to use this 32-bit compiler with CUDA Toolkit 4.0 64-bit.

There are no errors or warning messages when compiling and some programs even run without
a problem. On the other hand this combination seems to result in executables that can
also give unpredictable results or crashes.