nvidia-smi
on the host ec2 instance:
nvidia-smi
Fri May 18 18:28:44 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 81C P0 136W / 149W | 9730MiB / 11441MiB | 99% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3533 C python 64MiB |
| 0 3721 C python2 112MiB |
| 0 3869 C python2 115MiB |
| 0 12002 C python2 112MiB |
| 0 15372 C python2 115MiB |
| 0 31973 C /usr/local/bin/caffe 9146MiB |
±----------------------------------------------------------------------------+
nvidia-smi
via nvidia-docker2:
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Fri May 18 18:30:37 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 82C P0 150W / 149W | 9730MiB / 11441MiB | 99% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
±----------------------------------------------------------------------------+
which all seems fine…
yet… when i try to run deepstream, i get the CUDA 100 error from within the container.
Also, I tried installing drivers 384, but on reboot was getting an error, so reinstalled drivers 390.
Although nvidia-smi from my container yields a bad thing, so clearly compatibility issues perhaps with 390?
nvidia-docker run --rm -it --name test test nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
That is when i run it directly, but if I use bash as an entrypoint, then run nvidia-smi, it “seems” to work:
nvidia-docker run --rm -it --name test --entrypoint=/bin/bash -v /usr/lib/nvidia-390:/usr/lib/nvidia-390 test
root@4bc325d55d3e:/# nvidia-smi
Fri May 18 18:39:32 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 81C P0 138W / 149W | 9730MiB / 11441MiB | 100% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
±----------------------------------------------------------------------------+
So, then I go into my container, make the app, and back to the missing nvcuvid, although it compiled above fine, so still missing something as usual. It either builds succesful yielding a cuda 100 error when running the sh file, or apparently cant find nvcuvid.
nvidia-docker run --rm -it --name test --entrypoint=/bin/bash deepstream:dev
root@6135622cda12:/# cd /opt/deepstream/samples/
root@6135622cda12:/opt/deepstream/samples# make
make[1]: Entering directory ‘/opt/deepstream/samples/decPerf’
Compiling: nvCuvidPerf.cpp
Linking: …/bin/sample_decPerf
/usr/bin/ld: cannot find -lnvcuvid
collect2: error: ld returned 1 exit status
Makefile.sample_decPerf:58: recipe for target ‘…/bin/sample_decPerf’ failed
make[1]: *** […/bin/sample_decPerf] Error 1
make[1]: Leaving directory ‘/opt/deepstream/samples/decPerf’
make[1]: Entering directory ‘/opt/deepstream/samples/nvDecInfer_classification’
Compiling: nvDecInfer.cpp
Linking: …/bin/sample_classification
/usr/bin/ld: cannot find -lnvcuvid
collect2: error: ld returned 1 exit status
Makefile.sample_classification:58: recipe for target ‘…/bin/sample_classification’ failed
make[1]: *** […/bin/sample_classification] Error 1
make[1]: Leaving directory ‘/opt/deepstream/samples/nvDecInfer_classification’
make[1]: Entering directory ‘/opt/deepstream/samples/nvDecInfer_detection’
Compiling: presenterGL.cpp
Compiling: main.cpp
Compiling: drawBbox.cu
Linking: …/bin/sample_detection
/usr/bin/ld: cannot find -lnvcuvid
collect2: error: ld returned 1 exit status
Makefile.sample_detection:87: recipe for target ‘…/bin/sample_detection’ failed
make[1]: *** […/bin/sample_detection] Error 1
make[1]: Leaving directory ‘/opt/deepstream/samples/nvDecInfer_detection’
Makefile:14: recipe for target ‘all’ failed
make: *** [all] Error 2