verify2install4CUDAtoolkit

dragonxi4nvidia · March 26, 2014, 11:07am

Question: how to verify CUDA toolkit installation after it has completed?

cuda_6.0.26_rc_linux64.run ← downloaded

$lspci | grep -i nvidia ← NVIDIA Device 1980(rev a2) ← why not GeForce GTX 750 Ti ?

NVIDAI X Server Settings
GPU 0 - (Quadro FX 1800) ← OK
GPU 1 - (GeForce GTX 750 Ti) ← OK

$uname -m && cat /etc/*release ← OK

X86_64 ← OK
Scientific Linux release 6.5 (Carbon) ← OK ? /RHEL 6 compatible)

$gcc --version
gcc (GCC) 4.4.7 2012 0313 (Red Hat 4.4.7-4) ← OK

$md5sum cuda_6.0.26_rc_linux64.run ← OK

$echo LD_LIBRARY_PATH ← empty, why ?

$ ls /dev/nvidia* ← /dev/nvidia0 /dev/nvidia1 /dev/nvidiaactl
These files are used by the CUDA Driver to communicate with the kernel-mode portion of the NVIDIA Driver.

$ cat /proc/driver/nvidia/version ← 334.21

dragonxi4nvidia · March 28, 2014, 7:00am

[ACK] checked /usr/lib64 ← have cuda files there such as libcuda.so libcuda.so.334.21

[ACK] unpacked each demo having its own directory structure
Note: MYDISK is not the same as Linux system disk ← is this too diffcult for demos ?

[NACK] Test1 in /media/MYDISK/sw/nvidia/cuda/toolkit/demo:
$ cd simplemultigpu
$ cd “NVIDIA GPU Computing SDK”
$ cd OpenCL
$ make

make[1]: Entering directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/common' ar: creating ../..//OpenCL//common//lib/liboclUtil_x86_64.a a - obj/release/oclUtils.cpp.o make[1]: Leaving directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/common’
make[1]: Entering directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/shared' make[1]: *** No targets specified and no makefile found. Stop. make[1]: Leaving directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/shared’
make: *** [shared/libshrutil.so] Error 2

[NACK] Test 2 in /media/MYDISK/sw/nvidia/cuda/toolkit/demo:
cd bandwidth
cd “NVIDIA GPU Computing SDK”
cd OpenCL
make
make[1]: Entering directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/bandwidth/NVIDIA GPU Computing SDK/OpenCL/common' ar: creating ../..//OpenCL//common//lib/liboclUtil_x86_64.a a - obj/release/oclUtils.cpp.o make[1]: Leaving directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/bandwidth/NVIDIA GPU Computing SDK/OpenCL/common’
make[1]: Entering directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/bandwidth/NVIDIA GPU Computing SDK/shared' make[1]: *** No targets specified and no makefile found. Stop. make[1]: Leaving directory /media/MYDISK/sw/nvidia/cuda/toolkit/demo/bandwidth/NVIDIA GPU Computing SDK/shared’
make: *** [shared/libshrutil.so] Error 2

[NACK] make fails in all other demos too

[NACK] unresolved:
1) targets: what kind of target ? to which program ? how to specify ?
2) makefile: where did it try to search ? where to create ?

vacaloca · March 29, 2014, 4:56pm

That’s normal. nvidia-smi shows the full name of the card(s) if driver is loaded.

Set in your ~/.bashrc file according to instructions of CUDA installer, assuming you use bash shell. Re-open a new shell before testing again.

Execute make on the root folder of the SDK, not on subfolders. That will fix the problem.

dragonxi4nvidia · April 1, 2014, 10:10am

Thanks !
(1/33) looks like I don’t have Makefile on the root of the SDK ?
trial in /media/MySecretDisk/sw/nvidia/cuda/toolkit/demo with scientific linux 6.5:
$ make: *** No targets specified and no makefile found. Stop.

(2/33) all demos were unpacked to different device - should they be in system HD such as under /usr ?
/media/MySecretDisk/sw/nvidia/cuda/toolkit/demo

Example:
/media/MySecretDisk/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/src
simpleMultiGPU.cl ← __kernel void reduce
oclSimpleMultiGPU.cpp ← main program
#include <oclUtils.h> ← a. include needed OK
#include <shrQATest.h> ← b. include needed OK
a) /media/MySecretDisk/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/common/inc/oclUtils.h
b) /media/MySecretDisk/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/shared/inc/shrQATest.h

Makefile:
/media/MySecretDisk/sw/nvidia/cuda/toolkit/demo/simplemultigpu/NVIDIA GPU Computing SDK/OpenCL/src/oclSimpleMultiGPU/Makefile
PROJECTS := $(shell find src -name Makefile)

ifeq ($(dbg),1)
BINDIR := bin/linux/debug/ ← (3/33) where is bin/linux/ - not found neither in root nor in /usr ?
else
BINDIR := bin/linux/release/
endif
…

runall:
$(SHELL) common/runall.sh $(BINDIR) ← (4/33) where is runall.sh ?

download check (5)
ea1e1235ebc17dc8e2ca2bec3febde52 cuda_6.0.26_rc_linux64.run ← checksum ok, note version is for cuda_6.0!

environment setting command for installation
/media/MySecretDisk/DOWNLOAD/NVidia/set2env4cuda.sh ← came with download
export PATH=/usr/local/cuda-5.5/bin:$PATH ← (5/33) *** cuda-5.5. does not exist, why ?
export LD_LIBRARY_PATH=/usr/local/cuda-5.5/lib64:$LD_LIBRARY_PATH ← (6/33) *** this will also fail, why cuda-5.5 for cuda_6.0 ?

(8/33) OS is scientific linux 6.5 (RHEL 6 compatible)

vacaloca · April 1, 2014, 2:09pm

It looks like you did not extract the separate OpenCL samples in the same folder where the rest of the CUDA samples were installed… that’s why you do not find some of the required files. It also seems that NVIDIA did not update the OpenCL sample makefiles to work correctly under >= CUDA 5.5 release, and there seems to be at least 1 Makefile missing that would go in the shared directory. It would be similar to this:

[url]http://projets2a.iutlan.etu.univ-rennes1.fr/Projet041213/browser/tools/nvidia-gpu-computing-sdk-4.1/shared/Makefile[/url]

Once you resolve that, you might still have to point the makefiles to the include path of the CUDA 5.5 or 6.0 directories as well… for example, it was trying to look for exception.h, after I added the missing makefile which is part of CUDA 5.5/6.0 toolkit. Even after I got through that it was still expecting another file that it couldn’t locate…

If you really want to have the OpenCL samples work on the first try, download the last SDK that contains them… I believe that’s 4.2… and then once you’re familiar with Linux and compilation tools you can create your own with CUDA 5.5 or 6.0 if need be. Regardless, NVIDIA has not updated OpenCL past 1.0 version… they are not very interested in supporting OpenCL, so unless you specifically need OpenCL, work with CUDA instead.

dragonxi4nvidia · April 1, 2014, 3:38pm

Thanks !

The main goal was to test the NVidia 750 Ti card by installing that new toolkit and demos.

At the moment the workstation boots but fails to achieve login stage!

Unfortunately I did the following (trace manually written down here):

changed environment settings according to this new 6.0 version
created folders ready for the new 6.0 version installation
changed installation parameters to use /media/MYDRIVE/ disk
- installation started to install files into that drive
- I forgot that this drive is NOT automatically mounted in boot !
  ~ the drive has NTFS, it has been used by Win7 before it was eplaced with SL 6.5
  but it has worked fine with Linux after I installed software to handle NTFS in Linux
  ~ I created all folders for installation using Linux
  and also checked that my script refers to right folders
continued installation although it gave the following warning (I recommend to STOP)!
** could not find a working chon argument - Defaulting to shlib_t ** ← assumed this will be ok!
answered yes to "add a file in the modprobe configuration
/etc/modprobe.d/nvidia-installer-disable-nouveaus.conf
got warning
** unable to set security context on file
/media/MYDRIVE/tmp/install/nvidia/cuda/toolkit/nv …
assuming new tls ***
got error
*** failed to execute 'usr/bin/chon -t shlib_t
//media/MYDRIVE/sw/nvidia/opengl/lib64/ …
*** failed to change context of above
continued assuming maybe OpenGL will be the the only failing part
hit several times problems with '/usr/bin/chon - t shlib_t*
interrupted installation
booted - sl 6.5 did not come to login anymore!
booted from DVD and selected Rescue!
booted from HD - got old days
*** kernel in panic ***
** interruption problems with nouveau drivers
hit reset
booted from HD with SHIFT, edited <root rhgb quiet nomodeset ← this worked before this new installation
i.e. system booted ok,
NIVIDIA FX 1800 served two monitors OK and GTX 750 Ti was recognized by NVIDIA
status: >grub and unresolved how and would it help to
a) mount the device, where the new NVIDIA software is located (assuming that is the problem) ?
b) use VESA drivers instead of NVidia ones (if that is the solution) ?
d) reinstall SL 6.5 (ultimate nightmare!)
e) something else, what ?

Surfing using Windows 7 workstation while Linux workstation is in GRUB state

ANY tips more than welcome !

Thanks in advance!

vacaloca · April 3, 2014, 5:44am

A few days ago I glanced through this and couldn’t make any sense of it… now I finally understand the craziness. While you’re able to read/write NFTS volumes on Linux with the right drivers, for anything related to ownership and file permissions (installers and such) it’s definitely not a good idea to use NTFS volumes, for the reasons of the errors you found with chown (change ownership)

It seems like you’ve managed to pretty much make the system unbootable, so you might as well start from scratch unless you’re familiar with how to recover.

dragonxi4nvidia · April 4, 2014, 6:56am

Thanks!

Yes, you are right .i.e the system became unbootable and with

“reset hell” -loop (which does not make good for any hardware)!
“kernel in panic”- loop!
massive interrupt-loop with noueveau driver - and thereby kernel - uncapable to handle interrupts!
problems having NVidia GTX 750 Ti card as graphics card whatever the PCI slot!
GPU computing problems with NVidia 750 Ti as GPU card (no displays connected) whatever the PCI slot!

Yes, NTFS volumes can be a challenge even with Linux drivers!
~ however, the were the only available one at the moment

Recovery

did not succeed in no ways in this case!
Scientific Linux 6.5 recovery DVDs did not work in this case!

Hard disk actions:

a new 4TB hard disk partioned with SL 6.5 parted-program
partitions formatted to EXT4 with Linux SL 6.5 and mkfs.ext4-program
~ dedicated for Linux only

NVidia 750 Ti card actions:

proved to work with the latest working NVidia drivers, tools and applications if the OS kernel is not in panic
proved to work both as a GPU-computing card and as a graphics card!

Intalled:
~ a new OS

Status:
~ everything works!

Exploring:
~ pure GPU computing cards without anything extra
~ compatibility with existing hardware - new driver bundles tend to stop supporting good working hardware still serving a lot of apps!
~ rugged OS capable to avoid panic
~ tracing tools for the whole life-cycle - boot and installation phases included!

MANY THANKS FOR HELP!

Topic		Replies	Views
CUDA 10 installation problems on Ubuntu 18.04 CUDA Setup and Installation	24	94588	December 11, 2020
[INFO]: Finished with code: 256 , [ERROR]: Install of driver component failed CUDA Setup and Installation	24	180325	September 29, 2024
[Solved] Titan X for CUDA 7.5 login-loop error [Ubuntu 14.04] CUDA Setup and Installation	27	57651	November 6, 2022
CUDA 4.2 Install in Ubuntu 12.04 CUDA Programming and Performance	12	19873	August 25, 2017
Nvidia Cuda Compiler not showing up in Linux 22.04 Linux cuda , linux , nvcc	24	19387	May 30, 2022
CUDA working on ubuntu-desktop not on ubuntu-server CUDA Programming and Performance	21	19109	March 13, 2014
cuda install fail - ubuntu 14.04 CUDA Setup and Installation	8	3716	February 4, 2016
[SOLVED] Run CUDA on dedicated NVIDIA GPU while connecting monitors to Intel HD graphics, is this possible? CUDA Setup and Installation	15	71966	December 9, 2018
Cuda support for legacy GPUs CUDA Setup and Installation	14	8348	November 29, 2016
"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" Ubuntu 16.04 CUDA Setup and Installation	79	371543	March 19, 2021

verify2install4CUDAtoolkit

runall: $(SHELL) common/runall.sh $(BINDIR) ← (4/33) where is runall.sh ?

Related topics

runall:
$(SHELL) common/runall.sh $(BINDIR) ← (4/33) where is runall.sh ?