CUDA 1.1: Card selection Tesla Vs 8600

My machine has a Tesla and a 8600GT on the two pci x16 slots. When I run CUDA programs, which card is being used? How can I figure this out? Also I’m new to CUDA. Any beginners tutorial that is a recommended read for getting started with CUDA programming?

Thanks.

Maybe the following information will help in guessing which card is being used.

Bandwidth test gives the following

Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1617.9

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1487.2

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 14210.5

&&&& Test PASSED

Press ENTER to exit…

Eigenvalues gives:

Matrix size: 2048 x 2048
Precision: 0.000010
Iterations to be timed: 100
Result filename: ‘eigenvalues.dat’
Gerschgorin interval: -2.894310 / 2.923303
Average time step 1: 37.887676 ms
Average time step 2, one intervals: 5.630833 ms
Average time step 2, mult intervals: 0.011050 ms
Average time TOTAL: 43.570698 ms

PASSED.

How does this information compare to other nvidia cards available?

Section 4.5.2.2 of the CUDA Programming Guide explains how to list the available devices in a program, and select one to use.

You are using the 8600 ( from the device to device bandwidth numbers)

Most of the SDK programs use CUT_DEVICE_INIT, which basically grabs the first device you’ll also see when you run deviceQuery. So whatever is listed first in deviceQuery tends to be what you’re running on.

I’d like the next rev of the SDK to give easy options to specify exactly which device to run on, but for now, a simple thing you can do is replace the call to CUT_DEVICE_INIT() with cudaSetDevice(n) where n is the device number reported in deviceQuery on which you want to run.

I tried adding the line “cudaSetDevice(1);” to the main()

function in “eigenvalues” project of the SDK. I recompiled

but I see exactly the same timing still…

How can I modify it to use device 1 instead of device 0.

DeviceQuery tells me that device 0 is indeed my 8600GT Card.

Thanks.

My bad…got it i think… here are the new results: using tesla.

(only slightly faster than the 8600GT, why is that??)

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

Gerschgorin interval: -2.894310 / 2.923303

Average time step 1: 16.248138 ms

Average time step 2, one intervals: 4.961108 ms

Average time step 2, mult intervals: 0.010880 ms

Average time TOTAL: 21.249641 ms

Two times faster is not slightly faster I would say?

Still a little slow. An ordinary 8800GT is faster, and it should not be;

Matrix size: 2048 x 2048
Precision: 0.000010
Iterations to be timed: 100
Result filename: ‘eigenvalues.dat’
Gerschgorin interval: -2.894310 / 2.923303
Average time step 1: 14.484527 ms
Average time step 2, one intervals: 4.433394 ms
Average time step 2, mult intervals: 0.018080 ms
Average time TOTAL: 18.975201 ms

  • Kuisma

Thanks Kuisma, why could this be?

An 8800GT beating Tesla on eigenvalue computation.

Is there anything else one needs to do besides cudaSetDevice(1)

for using Tesla? Is there some kind of bottleneck that Tesla

is facing on my machine? Or is it just that Tesla is better on different

kinds of applications?

Any comments?

I can only speculate. Myself I’m experience problems with the new driver. If I upgrade from 169.07 to 169.09 the total time goes from 18.975 to 35.313. Have you tried different driver versions?

  • Kuisma

No I have not , but probably its worth trying, as you say.

Thanks.

I get about the same difference: 21.3 on C870 and 19.1 on 8800GT running eigenvalues.
That seems right to me considering the 8800GT is clocked faster (1.5Ghz vs 1.35)

I’m also getting these same results (21.3 and 19.1) when I switch between 169.07 and 169.09 drivers, so that’s strange some are seeing a big difference here. I’m running RHEL4.5 64bit. Kuisma, which distro are you using?

My configuration is with a D870 deskside that contains 2 C870s, but they appear like regular C870s. I’ve got the 8800GT as device0 and the C870s and devices 1 and 2 and I just change the value in cudaSetDevice() to change.

I’m using Ubuntu 7.10 64bit server edition. Running on chipset 780i (Asus P5N-T Deluxe bios 0703) and a Q6600 cpu.

The funny part is that if 169.07 has been loaded (but unloaded and replaced) 169.09 works fine until after next reboot.

I have the same problem, but with 169.04. In other words, it works fine until I reboot, and then I have to reinstall the display drivers before starting X.

Any nice solutions out there? This is one of the things that drives me nuts about Ubuntu… I just don’t know what’s going on under the hood. Next time its back to Gentoo for me.

dhoff - If you think it’s related to the distribution, suggest a distro I can download (for free), and I’ll try it out as a test.

Try CentOS, it is free and 100% compatible with RHEL.

mfatica - Ok, now I’ve tested with CentOS-5.1 64bit. Driver 169.09 still only performs half the speed compared to 169.07. I guess we can rule out the distribution.

What do you need from me…?

  • Kuisma

Edit: Can’t attach the bug report log :( Sent it by mail instead.

Hello,

I downloaded from NVIDIA site program to count the eigenvalues ​​of the tridiagonal matrix, I wanted to rewrite it so that it is running in double precision. To the dimension of the matrix 512x512 everything works smoothly, but with larger matrix I get an error bisect_large.cu (240): cutilCheckMsg cudaThreadSynchronize error: bisectKernelLarge_MultIntervals () FAILED. : Unknown error. Can anyone give me a hint what is the problem.

Thanks