My GPU Became Slower... after 1 month of not testing cuda

This last month i have been doing other things, not cuda programming.

in all this time i might have a kernel update on ubuntu
i played starcraft 2,
worked on MPI programming.

but today, when i want to test my cuda program, i get the surprise that the computation time takes double the time compared to before!!
in case it was the driver, i downloaded the 256.40 but the performance was the same.
btw my GPU is EVGA 9800GTX+ 512MB
how can this happen?, it got slower i dont know why…

EDIT: i checkd CPU performance and is the same as before, i got some test routines. So it is definetly the GPU itself on CUDA

Are your CUDA kernels very short? It is possible that some 3D app is running now (fancy desktop stuff?) in the background and competing with CUDA for GPU time. The overhead of context switching between two GPU users can drop the performance of CUDA dramatically.

kernels are not too short.

i also disabled the fancy desktop effects on ubuntu.

my CUDA program interoperates with OpenGL, modifiying vertex buffer objects and deforming 3d meshes.

i will keep cheking, any discovery will be reported.

thanks and any help or previous experience is useful for me

could it be possible that a kernel update from ubuntu 10.04 made the Nvidia drivers work slower?

did you update from CUDA 3.0 to 3.1? The register allocator in 3.1 goes overboard with extra registers in some kernels.

Do you mean to say that even if the APP was not re-compiled, it would still consume more registers than expected?

That has not been my experience - the additional registers come from the use of a function call ABI in 3.1 that doesn’t exist in 3.0.

Would you be able to share any (preferably simple :-) kernels that demonstrate a significant increase in register pressure when moving from CUDA 3.0 to CUDA 3.1? I’d be interested in taking a closer look at such code. Thanks.

Would you be able to share any (preferably simple :-) kernels that demonstrate a significant increase in register pressure when moving from CUDA 3.0 to CUDA 3.1? I’d be interested in taking a closer look at such code. Thanks.

i was cheking everything i have, when i opened this topic i had cuda 3.0 and drivers for 3.0 too.

i had some gpu algorithms that took 56ms to process a 3D mesh (mainly processing Element Buffer Array and reading VBO).

my problem is that now is it taking longer, that same mesh now takes 103ms to do the same thing.

instead of waiting, i decided to install 3.1 and latest cuda linux drivers, the problem is still here.

neoideo@neoideo:~$ NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery

NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "GeForce 9800 GTX+"

  CUDA Driver Version:						   3.10

  CUDA Runtime Version:						  3.10

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 536543232 bytes

  Number of multiprocessors:					 16

  Number of cores:							   128

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  2147483647 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.84 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   No

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

  Concurrent kernel execution:				   No

  Device has ECC support enabled:				No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 1, Device = GeForce 9800 GTX+

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

i will now chek the code of the kernel if there is something weird or not.

best regards, thanks for all sugestions.

(switching from 3.0 to 3.1 didnt make it any worst, so at least the registry issue is not generating a big impact in my kernels)

Cristobal.

i was cheking everything i have, when i opened this topic i had cuda 3.0 and drivers for 3.0 too.

i had some gpu algorithms that took 56ms to process a 3D mesh (mainly processing Element Buffer Array and reading VBO).

my problem is that now is it taking longer, that same mesh now takes 103ms to do the same thing.

instead of waiting, i decided to install 3.1 and latest cuda linux drivers, the problem is still here.

neoideo@neoideo:~$ NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery

NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "GeForce 9800 GTX+"

  CUDA Driver Version:						   3.10

  CUDA Runtime Version:						  3.10

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 536543232 bytes

  Number of multiprocessors:					 16

  Number of cores:							   128

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  2147483647 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.84 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   No

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

  Concurrent kernel execution:				   No

  Device has ECC support enabled:				No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 1, Device = GeForce 9800 GTX+

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

i will now chek the code of the kernel if there is something weird or not.

best regards, thanks for all sugestions.

(switching from 3.0 to 3.1 didnt make it any worst, so at least the registry issue is not generating a big impact in my kernels)

Cristobal.

solved!

the problem was → my GPU was being locked at clock 300MHZ (GPU Clock) at desktop.

so i downloaded nvclock for ubuntu and enabled GPU overclocking.

automatically became faster when it needed.

best regards

Cristobal

solved!

the problem was → my GPU was being locked at clock 300MHZ (GPU Clock) at desktop.

so i downloaded nvclock for ubuntu and enabled GPU overclocking.

automatically became faster when it needed.

best regards

Cristobal

nvclock… hmmm… interesting… Is it only for Linux? Does windows too hve smoething like that? Would appreciate any info on nvclock… Thanks!

nvclock… hmmm… interesting… Is it only for Linux? Does windows too hve smoething like that? Would appreciate any info on nvclock… Thanks!

i know windows have other alternatives which have been more time alive, like riva tuner and manufacters overcloking tools like EVGA precision tool. Also Zotac released one recently.

i think that new drivers have a power-save technology which lowers gpu clock significantly when used as a desktop 2d display for just showing windows and it should increment its frecuency to its maximums when heavy job is under the gpu. However, this was not happening on my linux ubuntu with my 257 drivers

you could try with those tools i mentioned

best regards,

i know windows have other alternatives which have been more time alive, like riva tuner and manufacters overcloking tools like EVGA precision tool. Also Zotac released one recently.

i think that new drivers have a power-save technology which lowers gpu clock significantly when used as a desktop 2d display for just showing windows and it should increment its frecuency to its maximums when heavy job is under the gpu. However, this was not happening on my linux ubuntu with my 257 drivers

you could try with those tools i mentioned

best regards,

Thanks!

Thanks!