[Solved] deviceQuery fails on Ubuntu 10.04.4 LTS (64-bit) NVIDIA video driver 260.19.26 with Tesla m

I have an Ubuntu 10.04.4 LTS (64-bit) machine with 4 Tesla m2090s (compute capability 2.0). I have been having a lot of stability issues with newer NVIDIA drivers (304.x and greater) and so have tried to install the CUDA 3.2 toolkit, CUDA 3.2 SDK, and recommended video driver (260.19.26). After installation and the successful compilation of the CUDA 3.2 SDK examples, deviceQuery fails with the message:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.

FAILED

Press <Enter> to Quit...
-----------------------------------------------------------

The nvidia-bug-report.log.gz and the nvidia-installer.log files are attached. It is worth noting that I am not starting X at boot (although I have in the past) so I am using the current script to generate the nvidia /dev files:

#!/bin/bash

COMMAND="$1"
case $COMMAND in
start|stop|restart)

if [ "$COMMAND" = "restart" ] || [ "$COMMAND" = "stop" ]; then
NVIDIADEV=`ls -l /dev/nvidia* | awk '{if ($9 != "/dev/nvidiactl") a+=1}END{print a}'`
NDEV=`expr $NVIDIADEV - 1`
for i in `seq 0 $NDEV`; do
unlink /dev/nvidia$i
done
unlink /dev/nvidiactl
fi

if [ "$COMMAND" = "restart" ] || [ "$COMMAND" = "start" ]; then

modprobe nvidia

if [ "$?" -eq 0 ]; then

NVGA=`/usr/bin/lspci | grep VGA | wc -l`

N=`expr $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
done
mknod -m 666 /dev/nvidiactl c 195 255

fi
fi
;;

*)
echo "$COMMAND is not supported on this job."
;;
esac

Script execution successfully creates the following files:

crw-rw-rw- 1 root root 195,   0 2014-06-19 16:26 /dev/nvidia0
crw-rw-rw- 1 root root 195,   1 2014-06-19 16:26 /dev/nvidia1
crw-rw-rw- 1 root root 195,   2 2014-06-19 16:26 /dev/nvidia2
crw-rw-rw- 1 root root 195,   3 2014-06-19 16:26 /dev/nvidia3
crw-rw-rw- 1 root root 195, 255 2014-06-19 16:26 /dev/nvidiactl

The users running ./deviceQuery are members of group video.

I originally upgraded this machine from 10.04 to 12.04 and could not get CUDA working (the drivers seemed unstable) at all so I did a fresh install of Ubuntu 10.04.4 LTS (64 bit). I have been working on this for a week and I’m not quite sure what to do next.

I have also added the output for:

nvidia-smi -a

in nvidia-smi.txt.

nvidia-installer.log (23.6 KB)
nvidia-bug-report.log.gz (67 KB)
nvida-smi.txt (2.44 KB)

I finally got everything working using a very specific driver version (285.05.33) and by removing a dkms nvidia kernel module. I think removing the older dkms nvidia kernel module probably did the trick.