Hello I have two GPU’s both Tesla V100 running on Ubuntu 18.04.2 LTS with the following versions of NVIDIA-SMI 410.104, Driver Version: 410.104, CUDA Version: 10.0
When running nvidia-smi I can no longer see my two gpu’s instead I can only see GPU:0
I noticed that this error when running : $ dmesg | grep NVRM
[2956988.964627] NVRM: rm_init_adapter failed for device bearing minor number 0
[2956993.146123] NVRM: RmInitAdapter failed! (0x24:0x65:1090)
[2956993.146166] NVRM: rm_init_adapter failed for device bearing minor number 0
[2956997.148541] NVRM: RmInitAdapter failed! (0x24:0x65:1090)
[2956997.148579] NVRM: rm_init_adapter failed for device bearing minor number 0
[2957001.332258] NVRM: RmInitAdapter failed! (0x24:0x65:1090)
[2957001.332295] NVRM: rm_init_adapter failed for device bearing minor number 0
[2957005.545416] NVRM: RmInitAdapter failed! (0x24:0x65:1090)
I don’t know what happened, I was running docker on that GPU using the following command : sudo docker run -p 9999:8080 --runtime nvidia --env NVIDIA_VISIBLE_DEVICES="1" mltooling/ml-workspace-gpu:latest