My machine has a Tesla and a 8600GT on the two pci x16 slots. When I run CUDA programs, which card is being used? How can I figure this out? Also I’m new to CUDA. Any beginners tutorial that is a recommended read for getting started with CUDA programming?
Thanks.
Maybe the following information will help in guessing which card is being used.
Bandwidth test gives the following
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1617.9
Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1487.2
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 14210.5
&&&& Test PASSED
Press ENTER to exit…
Eigenvalues gives:
Matrix size: 2048 x 2048
Precision: 0.000010
Iterations to be timed: 100
Result filename: ‘eigenvalues.dat’
Gerschgorin interval: -2.894310 / 2.923303
Average time step 1: 37.887676 ms
Average time step 2, one intervals: 5.630833 ms
Average time step 2, mult intervals: 0.011050 ms
Average time TOTAL: 43.570698 ms
PASSED.
How does this information compare to other nvidia cards available?
Most of the SDK programs use CUT_DEVICE_INIT, which basically grabs the first device you’ll also see when you run deviceQuery. So whatever is listed first in deviceQuery tends to be what you’re running on.
I’d like the next rev of the SDK to give easy options to specify exactly which device to run on, but for now, a simple thing you can do is replace the call to CUT_DEVICE_INIT() with cudaSetDevice(n) where n is the device number reported in deviceQuery on which you want to run.
Still a little slow. An ordinary 8800GT is faster, and it should not be;
Matrix size: 2048 x 2048
Precision: 0.000010
Iterations to be timed: 100
Result filename: ‘eigenvalues.dat’
Gerschgorin interval: -2.894310 / 2.923303
Average time step 1: 14.484527 ms
Average time step 2, one intervals: 4.433394 ms
Average time step 2, mult intervals: 0.018080 ms
Average time TOTAL: 18.975201 ms
I can only speculate. Myself I’m experience problems with the new driver. If I upgrade from 169.07 to 169.09 the total time goes from 18.975 to 35.313. Have you tried different driver versions?
I get about the same difference: 21.3 on C870 and 19.1 on 8800GT running eigenvalues.
That seems right to me considering the 8800GT is clocked faster (1.5Ghz vs 1.35)
I’m also getting these same results (21.3 and 19.1) when I switch between 169.07 and 169.09 drivers, so that’s strange some are seeing a big difference here. I’m running RHEL4.5 64bit. Kuisma, which distro are you using?
My configuration is with a D870 deskside that contains 2 C870s, but they appear like regular C870s. I’ve got the 8800GT as device0 and the C870s and devices 1 and 2 and I just change the value in cudaSetDevice() to change.
I have the same problem, but with 169.04. In other words, it works fine until I reboot, and then I have to reinstall the display drivers before starting X.
Any nice solutions out there? This is one of the things that drives me nuts about Ubuntu… I just don’t know what’s going on under the hood. Next time its back to Gentoo for me.
mfatica - Ok, now I’ve tested with CentOS-5.1 64bit. Driver 169.09 still only performs half the speed compared to 169.07. I guess we can rule out the distribution.
What do you need from me…?
Kuisma
Edit: Can’t attach the bug report log :( Sent it by mail instead.
I downloaded from NVIDIA site program to count the eigenvalues ​​of the tridiagonal matrix, I wanted to rewrite it so that it is running in double precision. To the dimension of the matrix 512x512 everything works smoothly, but with larger matrix I get an error bisect_large.cu (240): cutilCheckMsg cudaThreadSynchronize error: bisectKernelLarge_MultIntervals () FAILED. : Unknown error. Can anyone give me a hint what is the problem.