You can copy the log from the terminal and then upload it as a txt file.
Please uninstall the driver.
sudo apt purge nvidia-driver-525
sudo apt autoremove
sudo apt autoclean
Then, run below
$ bash setup.sh check-inventory.yml
$ bash setup.sh install
And share with the logs.
In the stage of TASK [Waiting for the Cluster to become available], I found that I could not rmmod nvidia module.
When I ran the command: sudo rmmod nvidia, I got the error message
rmmod: ERROR: Module nvidia is in use
How could I do to deal with the problem?
The picture below is the error log in the stage of TASK [Waiting for the Cluster to become available]
The picture below is the last components related to driver
It gets stuck here, right?
Exactly, I still get stuck here.
Please try to open a new terminal to run below command.
$ kubectl delete crd clusterpolicies.nvidia.com
If I did it, it would terminate and disapper the GPU driver and GPU related network-plugin.
Is there any methods to deal with it ?
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
This command does not affect driver.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.