unspecified driver error on reduction sample reduction.cpp(476) : cudaSafeCallNoSync() Runtime API e

Hi all,

I am trying to understand the reduction sample program. However while i try to execute I get this unspecified driver error. I am on a 64-bit machine running 32-bit fedora 10, having devdriver_3.0_linux_32_195.36.15 for the nvidia geforce 8600 GT driver.

what is the problem here?

[student@localhost ~]$ cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/

[student@localhost release]$ ./reduction 

./reduction Starting...

Using Device 0: GeForce 8600 GT

Reducing array of type int

16777216 elements

256 threads (max)

64 blocks

reduction.cpp(476) : cudaSafeCallNoSync() Runtime API error : unspecified driver error.

[student@localhost release]$ uname -a

Linux localhost.localdomain 2.6.27.5-117.fc10.i686.PAE #1 SMP Tue Nov 18 12:08:10 EST 2008 i686 i686 i386 GNU/Linux

[student@localhost release]$ lspci -nn | grep 'VGA\|NV'

01:00.0 VGA compatible controller [0300]: nVidia Corporation GeForce 8600 GT [10de:0402] (rev a1)

However, later i tried this :

[student@localhost release]$ ./vectorAdd 

Vector addition

vectorAdd.cu(71) : cudaSafeCall() Runtime API error : unspecified driver error.

[student@localhost release]$ cd ..

[student@localhost linux]$ cd ..

[student@localhost bin]$ cd ..

[student@localhost C]$ cd src/

[student@localhost src]$ cd deviceQuery

[student@localhost deviceQuery]$ ls

deviceQuery.cpp  Makefile

[student@localhost deviceQuery]$ make

deviceQuery.cpp:120:11: warning: extra tokens at end of #else directive

deviceQuery.cpp:129:11: warning: extra tokens at end of #else directive

deviceQuery.cpp: In function ‘int main(int, const char**)’:

deviceQuery.cpp:121: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘const char*’

deviceQuery.cpp:121: warning: too many arguments for format

[student@localhost deviceQuery]$ cd ..

[student@localhost src]$ cd ..

[student@localhost C]$ cd bin/linux/release/

[student@localhost release]$ ls

deviceQuery  reduction  reduction.txt  vectorAdd

[student@localhost release]$ ./deviceQuery 

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "GeForce 8600 GT"

  CUDA Driver Version:						   3.0

  CUDA Runtime Version:						  3.0

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 536150016 bytes

  Number of multiprocessors:					 4

  Number of cores:							   32

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.19 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   No

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 134564155, CUDA Runtime Version = 3.0, NumDevs = 1, Device = GeForce 8600 GT

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

[student@localhost release]$ cd ..

[student@localhost linux]$ cd ..

[student@localhost bin]$ cd ..

[student@localhost C]$ cd src/

alignedTypes/			 convolutionSeparable/	 FDTD3d/				   MersenneTwister/		  recursiveGaussian/		simpleTemplates/		  template/

asyncAPI/				 convolutionTexture/	   fluidsGL/				 MonteCarlo/			   reduction/				simpleTexture/			threadFenceReduction/

bandwidthTest/			cppIntegration/		   histogram/				MonteCarloMultiGPU/	   scalarProd/			   simpleTexture3D/		  threadMigration/

bicubicTexture/		   dct8x8/				   imageDenoising/		   nbody/					simpleAtomicIntrinsics/   simpleTextureDrv/		 transpose/

binomialOptions/		  deviceQuery/			  lineOfSight/			  oceanFFT/				 simpleCUBLAS/			 simpleVoteIntrinsics/	 transposeNew/

BlackScholes/			 deviceQueryDrv/		   Mandelbrot/			   particles/				simpleCUFFT/			  simpleZeroCopy/		   vectorAdd/

boxFilter/				dwtHaar1D/				marchingCubes/			postProcessGL/			simpleGL/				 smokeParticles/		   vectorAddDrv/

clock/					dxtc/					 matrixMul/				ptxjit/				   simpleMultiGPU/		   SobelFilter/			  volumeRender/

concurrentKernels/		eigenvalues/			  matrixMulDrv/			 quasirandomGenerator/	 simplePitchLinearTexture/ SobolQRNG/				

convolutionFFT2D/		 fastWalshTransform/	   matrixMulDynlinkJIT/	  radixSort/				simpleStreams/			sortingNetworks/		  

[student@localhost C]$ cd src/d

dct8x8/		 deviceQuery/	deviceQueryDrv/ dwtHaar1D/	  dxtc/		   

[student@localhost C]$ cd src/deviceQueryDrv/

[student@localhost deviceQueryDrv]$ ls

deviceQueryDrv.cpp  Makefile

[student@localhost deviceQueryDrv]$ make

deviceQueryDrv.cpp: In function ‘int main(int, char**)’:

deviceQueryDrv.cpp:44: warning: unused variable ‘err’

[student@localhost deviceQueryDrv]$ cd ..

[student@localhost src]$ cd ..

[student@localhost C]$ cd bin/linux/release/

[student@localhost release]$ ls

deviceQuery  deviceQueryDrv  deviceQuery.txt  reduction  reduction.txt  SdkMasterLog.csv  vectorAdd

[student@localhost release]$ ./deviceQueryDrv 

CUDA Device Query (Driver API) statically linked version 

There is 1 device supporting CUDA

Device 0: "GeForce 8600 GT"

  CUDA Driver Version:						   3.0

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 536150016 bytes

  Number of multiprocessors:					 4

  Number of cores:							   32

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.19 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   No

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

PASSED

Press ENTER to exit...

The device Query program runs fine.

The runtime api and the driver api seems to be working fine, if i understand them correct. So, what’s the problem?

Thanks and Regards,

kg