verify2install4CUDAtoolkit

Question: how to verify CUDA toolkit installation after it has completed?

cuda_6.0.26_rc_linux64.run ← downloaded

$lspci | grep -i nvidia ← NVIDIA Device 1980(rev a2) ← why not GeForce GTX 750 Ti ?

NVIDAI X Server Settings
GPU 0 - (Quadro FX 1800) ← OK
GPU 1 - (GeForce GTX 750 Ti) ← OK

$uname -m && cat /etc/*release ← OK

X86_64 ← OK
Scientific Linux release 6.5 (Carbon) ← OK ? /RHEL 6 compatible)

$gcc --version
gcc (GCC) 4.4.7 2012 0313 (Red Hat 4.4.7-4) ← OK

$md5sum cuda_6.0.26_rc_linux64.run ← OK

$echo LD_LIBRARY_PATH ← empty, why ?

$ ls /dev/nvidia* ← /dev/nvidia0 /dev/nvidia1 /dev/nvidiaactl
These files are used by the CUDA Driver to communicate with the kernel-mode portion of the NVIDIA Driver.

$ cat /proc/driver/nvidia/version ← 334.21

[ACK] checked /usr/lib64 ← have cuda files there such as libcuda.so libcuda.so.334.21

[ACK] unpacked each demo having its own directory structure
Note: MYDISK is not the same as Linux system disk ← is this too diffcult for demos ?

[NACK] Test1 in /media/MYDISK/sw/nvidia/cuda/toolkit/demo:
$ cd simplemultigpu
$ cd “NVIDIA GPU Computing SDK”
$ cd OpenCL
$ make

make[1]: Entering directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/common' ar: creating ../..//OpenCL//common//lib/liboclUtil_x86_64.a a - obj/release/oclUtils.cpp.o make[1]: Leaving directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/common’
make[1]: Entering directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/shared' make[1]: *** No targets specified and no makefile found. Stop. make[1]: Leaving directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/shared’
make: *** [shared/libshrutil.so] Error 2

[NACK] Test 2 in /media/MYDISK/sw/nvidia/cuda/toolkit/demo:
cd bandwidth
cd “NVIDIA GPU Computing SDK”
cd OpenCL
make
make[1]: Entering directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/bandwidth/NVIDIA GPU Computing SDK/OpenCL/common' ar: creating ../..//OpenCL//common//lib/liboclUtil_x86_64.a a - obj/release/oclUtils.cpp.o make[1]: Leaving directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/bandwidth/NVIDIA GPU Computing SDK/OpenCL/common’
make[1]: Entering directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/bandwidth/NVIDIA GPU Computing SDK/shared' make[1]: *** No targets specified and no makefile found. Stop. make[1]: Leaving directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/bandwidth/NVIDIA GPU Computing SDK/shared’
make: *** [shared/libshrutil.so] Error 2

[NACK] make fails in all other demos too

[NACK] unresolved:
1) targets: what kind of target ? to which program ? how to specify ?
2) makefile: where did it try to search ? where to create ?

That’s normal. nvidia-smi shows the full name of the card(s) if driver is loaded.

Set in your ~/.bashrc file according to instructions of CUDA installer, assuming you use bash shell. Re-open a new shell before testing again.

Execute make on the root folder of the SDK, not on subfolders. That will fix the problem.

Thanks !
(1/33) looks like I don’t have Makefile on the root of the SDK ?
trial in /media/MySecretDisk/sw/nvidia/cuda/toolkit/demo with scientific linux 6.5:
$ make: *** No targets specified and no makefile found. Stop.

(2/33) all demos were unpacked to different device - should they be in system HD such as under /usr ?
/media/MySecretDisk/sw/nvidia/cuda/toolkit/demo

Example:
/media/MySecretDisk/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/src
simpleMultiGPU.cl ← __kernel void reduce
oclSimpleMultiGPU.cpp ← main program
#include <oclUtils.h> ← a. include needed OK
#include <shrQATest.h> ← b. include needed OK
a) /media/MySecretDisk/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/common/inc/oclUtils.h
b) /media/MySecretDisk/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/shared/inc/shrQATest.h

Makefile:
/media/MySecretDisk/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/src/oclSimpleMultiGPU/Makefile
PROJECTS := $(shell find src -name Makefile)

ifeq ($(dbg),1)
BINDIR := bin/linux/debug/ ← (3/33) where is bin/linux/ - not found neither in root nor in /usr ?
else
BINDIR := bin/linux/release/
endif

runall:
$(SHELL) common/runall.sh $(BINDIR) ← (4/33) where is runall.sh ?

download check (5)
ea1e1235ebc17dc8e2ca2bec3febde52 cuda_6.0.26_rc_linux64.run ← checksum ok, note version is for cuda_6.0!

environment setting command for installation
/media/MySecretDisk/DOWNLOAD/NVidia/set2env4cuda.sh ← came with download
export PATH=/usr/local/cuda-5.5/bin:$PATH ← (5/33) *** cuda-5.5. does not exist, why ?
export LD_LIBRARY_PATH=/usr/local/cuda-5.5/lib64:$LD_LIBRARY_PATH ← (6/33) *** this will also fail, why cuda-5.5 for cuda_6.0 ?

(8/33) OS is scientific linux 6.5 (RHEL 6 compatible)

It looks like you did not extract the separate OpenCL samples in the same folder where the rest of the CUDA samples were installed… that’s why you do not find some of the required files. It also seems that NVIDIA did not update the OpenCL sample makefiles to work correctly under >= CUDA 5.5 release, and there seems to be at least 1 Makefile missing that would go in the shared directory. It would be similar to this:

[url]http://projets2a.iutlan.etu.univ-rennes1.fr/Projet041213/browser/tools/nvidia-gpu-computing-sdk-4.1/shared/Makefile[/url]

Once you resolve that, you might still have to point the makefiles to the include path of the CUDA 5.5 or 6.0 directories as well… for example, it was trying to look for exception.h, after I added the missing makefile which is part of CUDA 5.5/6.0 toolkit. Even after I got through that it was still expecting another file that it couldn’t locate…

If you really want to have the OpenCL samples work on the first try, download the last SDK that contains them… I believe that’s 4.2… and then once you’re familiar with Linux and compilation tools you can create your own with CUDA 5.5 or 6.0 if need be. Regardless, NVIDIA has not updated OpenCL past 1.0 version… they are not very interested in supporting OpenCL, so unless you specifically need OpenCL, work with CUDA instead.

Thanks !

The main goal was to test the NVidia 750 Ti card by installing that new toolkit and demos.

At the moment the workstation boots but fails to achieve login stage!

Unfortunately I did the following (trace manually written down here):

  1. changed environment settings according to this new 6.0 version

  2. created folders ready for the new 6.0 version installation

  3. changed installation parameters to use /media/MYDRIVE/ disk

    • installation started to install files into that drive
    • I forgot that this drive is NOT automatically mounted in boot !
      ~ the drive has NTFS, it has been used by Win7 before it was eplaced with SL 6.5
      but it has worked fine with Linux after I installed software to handle NTFS in Linux
      ~ I created all folders for installation using Linux
      and also checked that my script refers to right folders
  4. continued installation although it gave the following warning (I recommend to STOP)!
    ** could not find a working chon argument - Defaulting to shlib_t ** ← assumed this will be ok!

  5. answered yes to "add a file in the modprobe configuration
    /etc/modprobe.d/nvidia-installer-disable-nouveaus.conf

  6. got warning
    ** unable to set security context on file
    /media/MYDRIVE/tmp/install/nvidia/cuda/toolkit/nv …
    assuming new tls ***

  7. got error
    *** failed to execute 'usr/bin/chon -t shlib_t
    //media/MYDRIVE/sw/nvidia/opengl/lib64/ …
    *** failed to change context of above

  8. continued assuming maybe OpenGL will be the the only failing part

  9. hit several times problems with '/usr/bin/chon - t shlib_t*

  10. interrupted installation

  11. booted - sl 6.5 did not come to login anymore!

  12. booted from DVD and selected Rescue!

  13. booted from HD - got old days
    *** kernel in panic ***
    ** interruption problems with nouveau drivers
    hit reset

  14. booted from HD with SHIFT, edited <root rhgb quiet nomodeset ← this worked before this new installation
    i.e. system booted ok,
    NIVIDIA FX 1800 served two monitors OK and GTX 750 Ti was recognized by NVIDIA

  15. status: >grub and unresolved how and would it help to
    a) mount the device, where the new NVIDIA software is located (assuming that is the problem) ?
    b) use VESA drivers instead of NVidia ones (if that is the solution) ?
    d) reinstall SL 6.5 (ultimate nightmare!)
    e) something else, what ?

Surfing using Windows 7 workstation while Linux workstation is in GRUB state

ANY tips more than welcome !

Thanks in advance!

A few days ago I glanced through this and couldn’t make any sense of it… now I finally understand the craziness. While you’re able to read/write NFTS volumes on Linux with the right drivers, for anything related to ownership and file permissions (installers and such) it’s definitely not a good idea to use NTFS volumes, for the reasons of the errors you found with chown (change ownership)

It seems like you’ve managed to pretty much make the system unbootable, so you might as well start from scratch unless you’re familiar with how to recover.

Thanks!

Yes, you are right .i.e the system became unbootable and with

  • “reset hell” -loop (which does not make good for any hardware)!
  • “kernel in panic”- loop!
  • massive interrupt-loop with noueveau driver - and thereby kernel - uncapable to handle interrupts!
  • problems having NVidia GTX 750 Ti card as graphics card whatever the PCI slot!
  • GPU computing problems with NVidia 750 Ti as GPU card (no displays connected) whatever the PCI slot!

Yes, NTFS volumes can be a challenge even with Linux drivers!
~ however, the were the only available one at the moment

Recovery

  • did not succeed in no ways in this case!
  • Scientific Linux 6.5 recovery DVDs did not work in this case!

Hard disk actions:

  • a new 4TB hard disk partioned with SL 6.5 parted-program
  • partitions formatted to EXT4 with Linux SL 6.5 and mkfs.ext4-program
    ~ dedicated for Linux only

NVidia 750 Ti card actions:

  • proved to work with the latest working NVidia drivers, tools and applications if the OS kernel is not in panic
  • proved to work both as a GPU-computing card and as a graphics card!

Intalled:
~ a new OS

Status:
~ everything works!

Exploring:
~ pure GPU computing cards without anything extra
~ compatibility with existing hardware - new driver bundles tend to stop supporting good working hardware still serving a lot of apps!
~ rugged OS capable to avoid panic
~ tracing tools for the whole life-cycle - boot and installation phases included!

MANY THANKS FOR HELP!