Suse Enterprise 10.3

Hi,

We finally got some fermi cards added to our cluster, but because the thing is constantly busy, making time to upgrade the OS has been extremely difficult. The test Tesla node having being brought online, but still imaged with the old cluster toolkit (new one not yet available - told we may have to wait until November).

The drivers are installed

Snippet of /var/log/nvidia-installer.log

[codebox]nvidia: module license ‘NVIDIA’ taints kernel.

GSI 24 sharing vector 0xE2 and IRQ 24

ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 41 (level, low) -> IRQ 226

PCI: Setting latency timer of device 0000:07:00.0 to 64

GSI 25 sharing vector 0xEA and IRQ 25

ACPI: PCI Interrupt 0000:08:00.0[A] -> GSI 28 (level, low) -> IRQ 234

PCI: Setting latency timer of device 0000:08:00.0 to 64

GSI 26 sharing vector 0x33 and IRQ 26

ACPI: PCI Interrupt 0000:11:00.0[A] -> GSI 39 (level, low) -> IRQ 51

PCI: Setting latency timer of device 0000:11:00.0 to 64

GSI 27 sharing vector 0x3B and IRQ 27

ACPI: PCI Interrupt 0000:12:00.0[A] -> GSI 30 (level, low) -> IRQ 59

PCI: Setting latency timer of device 0000:12:00.0 to 64

NVRM: loading NVIDIA UNIX x86_64 Kernel Module 256.44 Thu Jul 29 01:22:44

PDT 2010

-> Installing both new and classic TLS OpenGL libraries.

-> Installing both new and classic TLS 32bit OpenGL libraries.

-> Install NVIDIA’s 32-bit compatibility OpenGL libraries? (Answer: Yes)

-> Searching for conflicting X files:

-> done.

-> Searching for conflicting OpenGL files:

-> done.

-> Installing ‘NVIDIA Accelerated Graphics Driver for Linux-x86_64’ (256.44):

executing: ‘/sbin/ldconfig’…

executing: ‘/sbin/depmod -aq’…

-> done.

-> Driver file installation is complete.

-> Running post-install sanity check:

-> done.

-> Post-install sanity check passed.

-> Shared memory test passed.

-> Running runtime sanity check:

-> done.

-> Runtime sanity check passed.

-> Installation of the kernel module for the NVIDIA Accelerated Graphics Driver

for Linux-x86_64 (version 256.44) is now complete.[/codebox]

My problem is that I am unable to get a suitable 3.1 toolkit that works. I keep getting (eg with “CUDA Toolkit for SUSE Linux Enterprise Desktop 11” :

/pkg/nvidia/3.1/cuda/lib64/libcudart.so: file not recognized: File format not recognized

when I try to compile the sdk.

Is there anyway I can get an appropriate version without upgrading?

Markus

Got it to work.

  1. Fell back to the 256.40 driver (for some reason we had a 256.44)
  2. Installed CUDA Toolkit for RedHat Enterprise Linux 4.8
  3. Compiled the SDK
  4. Ran deviceQuery but was still getting (added shrLog(“MarkEdit: %d”,cudaGetDeviceCount(&deviceCount)); )

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.

MarkEdit: 10100
FAILED

  1. Ran deviceQuery as root which initialised the /dev/nv* with correct permissions (worked!). A reboot could have fixed this one.
  2. Ran deviceQuery as normal user and got:

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 4, Device = Tesla S2050, Device = Tesla S2050

Nice. Putting this here in case others have a similar issue.
hooroo

Got it to work.

  1. Fell back to the 256.40 driver (for some reason we had a 256.44)
  2. Installed CUDA Toolkit for RedHat Enterprise Linux 4.8
  3. Compiled the SDK
  4. Ran deviceQuery but was still getting (added shrLog(“MarkEdit: %d”,cudaGetDeviceCount(&deviceCount)); )

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.

MarkEdit: 10100
FAILED

  1. Ran deviceQuery as root which initialised the /dev/nv* with correct permissions (worked!). A reboot could have fixed this one.
  2. Ran deviceQuery as normal user and got:

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 4, Device = Tesla S2050, Device = Tesla S2050

Nice. Putting this here in case others have a similar issue.
hooroo