This last month i have been doing other things, not cuda programming.
in all this time i might have a kernel update on ubuntu
i played starcraft 2,
worked on MPI programming.
but today, when i want to test my cuda program, i get the surprise that the computation time takes double the time compared to before!!
in case it was the driver, i downloaded the 256.40 but the performance was the same.
btw my GPU is EVGA 9800GTX+ 512MB
how can this happen?, it got slower i dont know why…
EDIT: i checkd CPU performance and is the same as before, i got some test routines. So it is definetly the GPU itself on CUDA
Are your CUDA kernels very short? It is possible that some 3D app is running now (fancy desktop stuff?) in the background and competing with CUDA for GPU time. The overhead of context switching between two GPU users can drop the performance of CUDA dramatically.
Would you be able to share any (preferably simple :-) kernels that demonstrate a significant increase in register pressure when moving from CUDA 3.0 to CUDA 3.1? I’d be interested in taking a closer look at such code. Thanks.
Would you be able to share any (preferably simple :-) kernels that demonstrate a significant increase in register pressure when moving from CUDA 3.0 to CUDA 3.1? I’d be interested in taking a closer look at such code. Thanks.
i was cheking everything i have, when i opened this topic i had cuda 3.0 and drivers for 3.0 too.
i had some gpu algorithms that took 56ms to process a 3D mesh (mainly processing Element Buffer Array and reading VBO).
my problem is that now is it taking longer, that same mesh now takes 103ms to do the same thing.
instead of waiting, i decided to install 3.1 and latest cuda linux drivers, the problem is still here.
neoideo@neoideo:~$ NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery
NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "GeForce 9800 GTX+"
CUDA Driver Version: 3.10
CUDA Runtime Version: 3.10
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536543232 bytes
Number of multiprocessors: 16
Number of cores: 128
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.84 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 1, Device = GeForce 9800 GTX+
PASSED
Press <Enter> to Quit...
-----------------------------------------------------------
i will now chek the code of the kernel if there is something weird or not.
best regards, thanks for all sugestions.
(switching from 3.0 to 3.1 didnt make it any worst, so at least the registry issue is not generating a big impact in my kernels)
i was cheking everything i have, when i opened this topic i had cuda 3.0 and drivers for 3.0 too.
i had some gpu algorithms that took 56ms to process a 3D mesh (mainly processing Element Buffer Array and reading VBO).
my problem is that now is it taking longer, that same mesh now takes 103ms to do the same thing.
instead of waiting, i decided to install 3.1 and latest cuda linux drivers, the problem is still here.
neoideo@neoideo:~$ NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery
NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "GeForce 9800 GTX+"
CUDA Driver Version: 3.10
CUDA Runtime Version: 3.10
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536543232 bytes
Number of multiprocessors: 16
Number of cores: 128
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.84 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 1, Device = GeForce 9800 GTX+
PASSED
Press <Enter> to Quit...
-----------------------------------------------------------
i will now chek the code of the kernel if there is something weird or not.
best regards, thanks for all sugestions.
(switching from 3.0 to 3.1 didnt make it any worst, so at least the registry issue is not generating a big impact in my kernels)
i know windows have other alternatives which have been more time alive, like riva tuner and manufacters overcloking tools like EVGA precision tool. Also Zotac released one recently.
i think that new drivers have a power-save technology which lowers gpu clock significantly when used as a desktop 2d display for just showing windows and it should increment its frecuency to its maximums when heavy job is under the gpu. However, this was not happening on my linux ubuntu with my 257 drivers
i know windows have other alternatives which have been more time alive, like riva tuner and manufacters overcloking tools like EVGA precision tool. Also Zotac released one recently.
i think that new drivers have a power-save technology which lowers gpu clock significantly when used as a desktop 2d display for just showing windows and it should increment its frecuency to its maximums when heavy job is under the gpu. However, this was not happening on my linux ubuntu with my 257 drivers