Tesla device problem Is it broken or it is just driver

pasoleatis · March 16, 2012, 1:52pm

Hello,

I hope this is the right forum. We have a computer with 2 Tels cards running Ubuntu 10.04. We got a problem with one of them. Teh device query reports some stranger numbers and then it crashes:

[deviceQuery] starting...

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 2 CUDA Capable device(s)

Device 0: "Tesla C2070"

  CUDA Driver Version / Runtime Version          4.1 / 4.0

  CUDA Capability Major/Minor version number:    2.0

  ( 0) Multiprocessors x (32) CUDA Cores/MP:     0 CUDA Cores

  Max Texture Dimension Size (x,y,z)             1D=(65535), 2D=(2048,2048), 3D=(0,512,0)

  Max Layered Texture Size (dim) x layers        1D=(1) x 6, 2D=(0,0) x 0

  Concurrent copy and execution:                 No with -725995520 copy engine(s)

...

  Device PCI Bus ID / PCI location ID:           -722875584 / 32669

  Compute Mode:

Segmentation fault

As you can see i detects 2 cards, but it shows that the there are 0 MP, the texture size is 0 there are negative num,ebr of copy engines and the PCI Bus Id is all messed up. We have the cudatoolkit 4.1 and the nvidia driver x86_64-285.05.33

It is also weird, because sometimes my programs run on device 0 and sometimes they do not run.

We did not seem to have this problem a few days ago before upgrading from 4.0 to 4.1. Is it jsut the driver or is the card burned?

mfatica · March 16, 2012, 1:56pm

Try to reinstall 4.1, you are still using the 4.0 runtime

CUDA Driver Version / Runtime Version 4.1 / 4.0

Also, be sure to recompile the examples.

pasoleatis · March 16, 2012, 3:04pm

Thanks. Is there anything special needed to be done in order to remove the previous runtime? Is the cudatoolkit installation script going to uninstall the previous version?

mfatica · March 16, 2012, 3:15pm

I usually move the old version manually ( mv /usr/local/cuda /usr/local/cuda_4.0) and then install the new version.
In this way, I can easily use older toolkits by changing the PATH and LD_LIBRARY_PATH or using modules.

Topic		Replies	Views
is my Tesla card broken? CUDA Programming and Performance	3	2874	March 6, 2012
CUDA Device Query Error CUDA Programming and Performance	1	2614	June 4, 2012
Hardware problem with Tesla card? CUDA Programming and Performance	9	8410	April 2, 2008
Driver doesn't see my Tesla C1060 CUDA Programming and Performance	5	9781	May 25, 2011
Problems installing Tesla C2050 on Dell T7500 CUDA Programming and Performance	9	3120	September 27, 2010
CUDA 4.1 CUDA Programming and Performance	1	1147	May 3, 2012
DeviceQuery not showing Tesla S2050 CUDA Setup and Installation	2	1780	November 28, 2012
Test Failed When using Tesla CUDA Programming and Performance	3	4471	July 20, 2008
CUDA Device Query (Runtime API) version (CUDART static linking) CUDA Programming and Performance	1	3282	November 15, 2010
CUDA for Tesla card (C2050) CUDA Programming and Performance	3	2051	October 11, 2010

Tesla device problem Is it broken or it is just driver

Related topics