My GPU Became Slower... after 1 month of not testing cuda

neoideo · August 5, 2010, 3:04pm

This last month i have been doing other things, not cuda programming.

in all this time i might have a kernel update on ubuntu
i played starcraft 2,
worked on MPI programming.

but today, when i want to test my cuda program, i get the surprise that the computation time takes double the time compared to before!!
in case it was the driver, i downloaded the 256.40 but the performance was the same.
btw my GPU is EVGA 9800GTX+ 512MB
how can this happen?, it got slower i dont know why…

EDIT: i checkd CPU performance and is the same as before, i got some test routines. So it is definetly the GPU itself on CUDA

seibert · August 5, 2010, 3:31pm

Are your CUDA kernels very short? It is possible that some 3D app is running now (fancy desktop stuff?) in the background and competing with CUDA for GPU time. The overhead of context switching between two GPU users can drop the performance of CUDA dramatically.

neoideo · August 5, 2010, 3:56pm

kernels are not too short.

i also disabled the fancy desktop effects on ubuntu.

my CUDA program interoperates with OpenGL, modifiying vertex buffer objects and deforming 3d meshes.

i will keep cheking, any discovery will be reported.

thanks and any help or previous experience is useful for me

neoideo · August 5, 2010, 4:26pm

could it be possible that a kernel update from ubuntu 10.04 made the Nvidia drivers work slower?

MisterAnderson42 · August 6, 2010, 12:00am

did you update from CUDA 3.0 to 3.1? The register allocator in 3.1 goes overboard with extra registers in some kernels.

Sarnath · August 6, 2010, 5:03am

Do you mean to say that even if the APP was not re-compiled, it would still consume more registers than expected?

MisterAnderson42 · August 6, 2010, 12:57pm

That has not been my experience - the additional registers come from the use of a function call ABI in 3.1 that doesn’t exist in 3.0.

njuffa · August 6, 2010, 10:23pm

Would you be able to share any (preferably simple :-) kernels that demonstrate a significant increase in register pressure when moving from CUDA 3.0 to CUDA 3.1? I’d be interested in taking a closer look at such code. Thanks.

njuffa · August 6, 2010, 10:23pm

Would you be able to share any (preferably simple :-) kernels that demonstrate a significant increase in register pressure when moving from CUDA 3.0 to CUDA 3.1? I’d be interested in taking a closer look at such code. Thanks.

neoideo · August 8, 2010, 3:26am

i was cheking everything i have, when i opened this topic i had cuda 3.0 and drivers for 3.0 too.

i had some gpu algorithms that took 56ms to process a 3D mesh (mainly processing Element Buffer Array and reading VBO).

my problem is that now is it taking longer, that same mesh now takes 103ms to do the same thing.

instead of waiting, i decided to install 3.1 and latest cuda linux drivers, the problem is still here.

neoideo@neoideo:~$ NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery

NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "GeForce 9800 GTX+"

  CUDA Driver Version:						   3.10

  CUDA Runtime Version:						  3.10

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 536543232 bytes

  Number of multiprocessors:					 16

  Number of cores:							   128

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  2147483647 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.84 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   No

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

  Concurrent kernel execution:				   No

  Device has ECC support enabled:				No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 1, Device = GeForce 9800 GTX+

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

i will now chek the code of the kernel if there is something weird or not.

best regards, thanks for all sugestions.

(switching from 3.0 to 3.1 didnt make it any worst, so at least the registry issue is not generating a big impact in my kernels)

Cristobal.

neoideo · August 8, 2010, 3:26am

i was cheking everything i have, when i opened this topic i had cuda 3.0 and drivers for 3.0 too.

i had some gpu algorithms that took 56ms to process a 3D mesh (mainly processing Element Buffer Array and reading VBO).

my problem is that now is it taking longer, that same mesh now takes 103ms to do the same thing.

instead of waiting, i decided to install 3.1 and latest cuda linux drivers, the problem is still here.

neoideo@neoideo:~$ NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery

NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "GeForce 9800 GTX+"

  CUDA Driver Version:						   3.10

  CUDA Runtime Version:						  3.10

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 536543232 bytes

  Number of multiprocessors:					 16

  Number of cores:							   128

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  2147483647 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.84 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   No

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

  Concurrent kernel execution:				   No

  Device has ECC support enabled:				No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 1, Device = GeForce 9800 GTX+

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

i will now chek the code of the kernel if there is something weird or not.

best regards, thanks for all sugestions.

(switching from 3.0 to 3.1 didnt make it any worst, so at least the registry issue is not generating a big impact in my kernels)

Cristobal.

neoideo · August 21, 2010, 1:33pm

i was cheking everything i have, when i opened this topic i had cuda 3.0 and drivers for 3.0 too.

i had some gpu algorithms that took 56ms to process a 3D mesh (mainly processing Element Buffer Array and reading VBO).

my problem is that now is it taking longer, that same mesh now takes 103ms to do the same thing.

instead of waiting, i decided to install 3.1 and latest cuda linux drivers, the problem is still here.

neoideo@neoideo:~$ NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery

NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "GeForce 9800 GTX+"

  CUDA Driver Version:						   3.10

  CUDA Runtime Version:						  3.10

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 536543232 bytes

  Number of multiprocessors:					 16

  Number of cores:							   128

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  2147483647 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.84 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   No

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

  Concurrent kernel execution:				   No

  Device has ECC support enabled:				No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 1, Device = GeForce 9800 GTX+

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

i will now chek the code of the kernel if there is something weird or not.

best regards, thanks for all sugestions.

(switching from 3.0 to 3.1 didnt make it any worst, so at least the registry issue is not generating a big impact in my kernels)

Cristobal.

solved!

the problem was → my GPU was being locked at clock 300MHZ (GPU Clock) at desktop.

so i downloaded nvclock for ubuntu and enabled GPU overclocking.

automatically became faster when it needed.

best regards

Cristobal

neoideo · August 21, 2010, 1:33pm

i was cheking everything i have, when i opened this topic i had cuda 3.0 and drivers for 3.0 too.

i had some gpu algorithms that took 56ms to process a 3D mesh (mainly processing Element Buffer Array and reading VBO).

my problem is that now is it taking longer, that same mesh now takes 103ms to do the same thing.

instead of waiting, i decided to install 3.1 and latest cuda linux drivers, the problem is still here.

neoideo@neoideo:~$ NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery

NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "GeForce 9800 GTX+"

  CUDA Driver Version:						   3.10

  CUDA Runtime Version:						  3.10

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 536543232 bytes

  Number of multiprocessors:					 16

  Number of cores:							   128

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  2147483647 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.84 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   No

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

  Concurrent kernel execution:				   No

  Device has ECC support enabled:				No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 1, Device = GeForce 9800 GTX+

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

i will now chek the code of the kernel if there is something weird or not.

best regards, thanks for all sugestions.

(switching from 3.0 to 3.1 didnt make it any worst, so at least the registry issue is not generating a big impact in my kernels)

Cristobal.

solved!

the problem was → my GPU was being locked at clock 300MHZ (GPU Clock) at desktop.

so i downloaded nvclock for ubuntu and enabled GPU overclocking.

automatically became faster when it needed.

best regards

Cristobal

Sarnath · August 22, 2010, 2:40am

nvclock… hmmm… interesting… Is it only for Linux? Does windows too hve smoething like that? Would appreciate any info on nvclock… Thanks!

Sarnath · August 22, 2010, 2:40am

nvclock… hmmm… interesting… Is it only for Linux? Does windows too hve smoething like that? Would appreciate any info on nvclock… Thanks!

neoideo · August 22, 2010, 5:09pm

i know windows have other alternatives which have been more time alive, like riva tuner and manufacters overcloking tools like EVGA precision tool. Also Zotac released one recently.

i think that new drivers have a power-save technology which lowers gpu clock significantly when used as a desktop 2d display for just showing windows and it should increment its frecuency to its maximums when heavy job is under the gpu. However, this was not happening on my linux ubuntu with my 257 drivers

you could try with those tools i mentioned

best regards,

neoideo · August 22, 2010, 5:09pm

i know windows have other alternatives which have been more time alive, like riva tuner and manufacters overcloking tools like EVGA precision tool. Also Zotac released one recently.

i think that new drivers have a power-save technology which lowers gpu clock significantly when used as a desktop 2d display for just showing windows and it should increment its frecuency to its maximums when heavy job is under the gpu. However, this was not happening on my linux ubuntu with my 257 drivers

you could try with those tools i mentioned

best regards,

Sarnath · August 23, 2010, 4:25am

Thanks!

Sarnath · August 23, 2010, 4:25am

Thanks!

Topic		Replies	Views
App for monitoring/changing GPU clock rate Very efficient - compared to Teapot :) CUDA Programming and Performance	23	106561	July 1, 2010
CUDA very slow performance CUDA Programming and Performance	21	17069	March 6, 2020
Speed difference for same CUDA code under Windows/Linux CUDA Programming and Performance	24	46365	March 17, 2010
How to get the cuda "first-call overhead" to happen only once for cuda called from dll? CUDA Programming and Performance	51	972	November 25, 2024
Deminishing performance? CUDA Programming and Performance	29	13410	March 5, 2009
CUDA hangups Jetson TK1	26	3951	October 18, 2021
CUDA 3.2 Driver BROKE ? Oops.... CUDA Programming and Performance	20	11572	December 22, 2010
Why would code run 1.7x faster when run with nvprof than without? CUDA Programming and Performance	35	3627	December 28, 2017
GPU overclocking tool CUDA Programming and Performance	33	81534	July 31, 2017
GPU and CPU don't run in (pure) parallel ? CUDA Programming and Performance	24	20445	May 4, 2007

My GPU Became Slower... after 1 month of not testing cuda

Related topics