Hope all is well.
I have been unable (for a few days now) to successfully configure my Lenovo P53 with Ubuntu 20.04 and my AORUS RTX 2080 Ti Gaming Box.
My Lenovo P53 has two internal GPUs, a Quadro T1000 and Intel UHD 6XX. I am trying to use the RTX 2080 for machine learning computing purposes only. I don’t have a preference which card outputs my display as long as I have a display. I have searched for and tried many suggestions, such as using egpu-switcher, gswitch, and even dabbled with manually setting xorg.conf files. However nothing I’ve tried worked. Usually resulting in login loops, black screens or the complete freeze after 30 seconds after logging in.
Really appreciate the help. I attached the output from nvidia-bug-report.
I did as you suggested, I completed a fresh install, used software&updates to install the latest NVIDIA drivers, went into settings and changed Prime Profile to On Demand mode, shutdown, connected egpu, booted but it got stuck before the login screen: “nvidia XXXX:XX:XX.X: can’t change power state from D3cold to D0 (config space inaccessible)”.
I powered down, went in to bios, I changed Display from Hybrid to Discrete. Kept egpu connected and it booted fine. However, nvidia-smi will sometimes show the egpu and sometimes it won’t. I’m not even sure what it means, if it does show it. Thunderbolt will also sometimes show it as authorized and sometimes it will show as disconnected. I unplug and plug back the power and cable and this will occasionally get it connected, but not always.
Oh actually – I’m not even sure that it doesn’t work. What would be a good test to try, to see if I can use the eGPU for compute purposes?
Any suggestions greatly appreciated.
nvidia-bug-report.log.gz (285.4 KB)
I installed Rapids CUDF to test my GPU functionality:
it uses my Quadro T1000 even though I’m targeting GPU 1.
Latest nvidia bug report attached.
nvidia-bug-report.log.gz (1.2 MB)
You’re getting this:
NVRM: The NVIDIA GPU 0000:2f:00.0
NVRM: (PCI ID: 10de:1e04) installed in this system has
NVRM: fallen off the bus and is not responding to commands.
afterwards, the egpu is detected again and falls off again some 30seconds later. Which corresponds to your observations, sometimes, it’s there, sometimes not.
Look like a flakey cable connection.
@generix, makes sense, although I just bought a brand new cable. I had thought the flakiness was due to a bad cable, but I guess it could be a bad port… such a shame, my laptop is just out of warranty.
Google told me that model had a general problem with its USB-C ports, so maybe there’s some extended warranty due to that. Maybe try a bios update first.