Hi,
I am trying to run the chroma container just for testing our new C4140 comes with 4 x V100. I have installed CUDA 10.0 on the host machine as normal. But when trying to run that it failed:
$ mpirun -n 4 chroma -i ./test.ini.xml -geom 1 1 1 4 -ptxdb ./qdpdb
There is no device supporting CUDA.
There is no device supporting CUDA.
There is no device supporting CUDA.
There is no device supporting CUDA.
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[58320,1],0]
Exit code: 1
--------------------------------------------------------------------------
$ nvidia-smi
Thu Feb 28 14:38:45 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:1A:00.0 Off | 0 |
| N/A 37C P0 55W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... Off | 00000000:1C:00.0 Off | 0 |
| N/A 34C P0 53W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... Off | 00000000:1D:00.0 Off | 0 |
| N/A 33C P0 54W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... Off | 00000000:1E:00.0 Off | 0 |
| N/A 35C P0 55W / 300W | 0MiB / 16130MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
As you can see the container seems to be finding the devices OK. Any comment welcome.
Cheers,
Derrick