CUDA 1.1: Card selection Tesla Vs 8600

magicblaze · January 25, 2008, 3:04pm

My machine has a Tesla and a 8600GT on the two pci x16 slots. When I run CUDA programs, which card is being used? How can I figure this out? Also I’m new to CUDA. Any beginners tutorial that is a recommended read for getting started with CUDA programming?

Thanks.

Maybe the following information will help in guessing which card is being used.

Bandwidth test gives the following

Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1617.9

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1487.2

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 14210.5

&&&& Test PASSED

Press ENTER to exit…

Eigenvalues gives:

Matrix size: 2048 x 2048
Precision: 0.000010
Iterations to be timed: 100
Result filename: ‘eigenvalues.dat’
Gerschgorin interval: -2.894310 / 2.923303
Average time step 1: 37.887676 ms
Average time step 2, one intervals: 5.630833 ms
Average time step 2, mult intervals: 0.011050 ms
Average time TOTAL: 43.570698 ms

PASSED.

How does this information compare to other nvidia cards available?

seibert · January 25, 2008, 3:13pm

Section 4.5.2.2 of the CUDA Programming Guide explains how to list the available devices in a program, and select one to use.

mfatica · January 25, 2008, 4:07pm

You are using the 8600 ( from the device to device bandwidth numbers)

e.ping · January 25, 2008, 7:46pm

Most of the SDK programs use CUT_DEVICE_INIT, which basically grabs the first device you’ll also see when you run deviceQuery. So whatever is listed first in deviceQuery tends to be what you’re running on.

I’d like the next rev of the SDK to give easy options to specify exactly which device to run on, but for now, a simple thing you can do is replace the call to CUT_DEVICE_INIT() with cudaSetDevice(n) where n is the device number reported in deviceQuery on which you want to run.

magicblaze · January 26, 2008, 2:09am

I tried adding the line “cudaSetDevice(1);” to the main()

function in “eigenvalues” project of the SDK. I recompiled

but I see exactly the same timing still…

How can I modify it to use device 1 instead of device 0.

DeviceQuery tells me that device 0 is indeed my 8600GT Card.

Thanks.

magicblaze · January 26, 2008, 2:46am

My bad…got it i think… here are the new results: using tesla.

(only slightly faster than the 8600GT, why is that??)

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

Gerschgorin interval: -2.894310 / 2.923303

Average time step 1: 16.248138 ms

Average time step 2, one intervals: 4.961108 ms

Average time step 2, mult intervals: 0.010880 ms

Average time TOTAL: 21.249641 ms

DenisR · January 26, 2008, 8:48am

Two times faster is not slightly faster I would say?

kuisma · January 26, 2008, 10:44am

Still a little slow. An ordinary 8800GT is faster, and it should not be;

Matrix size: 2048 x 2048
Precision: 0.000010
Iterations to be timed: 100
Result filename: ‘eigenvalues.dat’
Gerschgorin interval: -2.894310 / 2.923303
Average time step 1: 14.484527 ms
Average time step 2, one intervals: 4.433394 ms
Average time step 2, mult intervals: 0.018080 ms
Average time TOTAL: 18.975201 ms

Kuisma

magicblaze · January 26, 2008, 2:58pm

Thanks Kuisma, why could this be?

An 8800GT beating Tesla on eigenvalue computation.

Is there anything else one needs to do besides cudaSetDevice(1)

for using Tesla? Is there some kind of bottleneck that Tesla

is facing on my machine? Or is it just that Tesla is better on different

kinds of applications?

Any comments?

kuisma · January 26, 2008, 3:13pm

I can only speculate. Myself I’m experience problems with the new driver. If I upgrade from 169.07 to 169.09 the total time goes from 18.975 to 35.313. Have you tried different driver versions?

Kuisma

magicblaze · January 27, 2008, 4:50am

No I have not , but probably its worth trying, as you say.

Thanks.

e.ping · January 28, 2008, 7:33pm

I get about the same difference: 21.3 on C870 and 19.1 on 8800GT running eigenvalues.
That seems right to me considering the 8800GT is clocked faster (1.5Ghz vs 1.35)

I’m also getting these same results (21.3 and 19.1) when I switch between 169.07 and 169.09 drivers, so that’s strange some are seeing a big difference here. I’m running RHEL4.5 64bit. Kuisma, which distro are you using?

My configuration is with a D870 deskside that contains 2 C870s, but they appear like regular C870s. I’ve got the 8800GT as device0 and the C870s and devices 1 and 2 and I just change the value in cudaSetDevice() to change.

kuisma · January 28, 2008, 8:01pm

I’m using Ubuntu 7.10 64bit server edition. Running on chipset 780i (Asus P5N-T Deluxe bios 0703) and a Q6600 cpu.

The funny part is that if 169.07 has been loaded (but unloaded and replaced) 169.09 works fine until after next reboot.

bbudge · January 29, 2008, 1:49am

I have the same problem, but with 169.04. In other words, it works fine until I reboot, and then I have to reinstall the display drivers before starting X.

Any nice solutions out there? This is one of the things that drives me nuts about Ubuntu… I just don’t know what’s going on under the hood. Next time its back to Gentoo for me.

kuisma · January 29, 2008, 9:30am

dhoff - If you think it’s related to the distribution, suggest a distro I can download (for free), and I’ll try it out as a test.

mfatica · January 29, 2008, 12:20pm

Try CentOS, it is free and 100% compatible with RHEL.

kuisma · January 29, 2008, 6:10pm

mfatica - Ok, now I’ve tested with CentOS-5.1 64bit. Driver 169.09 still only performs half the speed compared to 169.07. I guess we can rule out the distribution.

What do you need from me…?

Kuisma

Edit: Can’t attach the bug report log :( Sent it by mail instead.

syrka · July 31, 2011, 10:09am

Hello,

I downloaded from NVIDIA site program to count the eigenvalues â€‹â€‹of the tridiagonal matrix, I wanted to rewrite it so that it is running in double precision. To the dimension of the matrix 512x512 everything works smoothly, but with larger matrix I get an error bisect_large.cu (240): cutilCheckMsg cudaThreadSynchronize error: bisectKernelLarge_MultIntervals () FAILED. : Unknown error. Can anyone give me a hint what is the problem.

Thanks

Topic		Replies	Views
Random behaviour with TESLA C870 CUDA Programming and Performance	11	6415	May 29, 2008
Low Cost CUDA Development Cards Entry Level Cards for CUDA Program Dev. CUDA Programming and Performance	7	12523	November 18, 2007
Starting out with CUDA CUDA Programming and Performance	4	1658	March 28, 2009
Hardware problem with Tesla card? CUDA Programming and Performance	9	8284	April 2, 2008
No device supporting CUDA? CUDA Programming and Performance	18	20772	January 31, 2008
Tesla compatibility CUDA Programming and Performance	13	15237	December 2, 2007
GTX280 vs Tesla C870 CUDA Programming and Performance	21	18969	August 13, 2008
9800 GTX and CUDA performance problems Slower than 8800 GT in some cases CUDA Programming and Performance	10	15100	June 27, 2008
device memory bandwidth issues with 177.67 lower then expected CUDA Programming and Performance	7	5366	October 5, 2008
Problems running CUDA on non-primary display CUDA Programming and Performance	23	54506	June 27, 2008

CUDA 1.1: Card selection Tesla Vs 8600

Bandwidth test gives the following

Eigenvalues gives:

Related topics