driver broken for linux ubuntu

Hello i am running ubuntu for mining and i have some problems. I am detecting all 13 gpus ( RTX2070) and all are working (hasing) but as you well know each gpu has 2 fans that in total 26 fans, but gpu fans are working for the first 6 gpus and for last gpu only one fan is working. After that i remove the gpus that was not working and the one with a single fan working. When i pluged in the rig with only 6gpus that worked, it happend to run only for the first 3 gpus (6fans) and for last 3 gpus wore not working. I think it is a driver problem. the problem is that from how manny gpus i put only half fans are working and not one per gpu but on first half of gpus. I need a solution i have tried anything.
P.S. I am using latest driver that nvidia provided. Instaled using sudo install-nv-beta.
nvidia-bug-report.log.gz (1.11 MB)

Please run as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.

thanks for understanding, i have put the file .gz for you.I think are some driver problems.

I don’t know why, but the fans are not working because the gpus have simply nothing to do, they’re idle. ethminer is only running on those where the fans are running. So there seems to be some problem distributing the application to the gpus. Maybe check using cuda’s queryDevice if the devices are visible.

the gpus were working but they overheated and they stoped. If i restart the rig they will start hashing but for those that you see are not working their fans will not work. and the one that is hottest 57C is the one that is having a single fan working.

the driver detects the 13 gpus but the problem is he is commanding only 13 fans and because each gpu have 2 fans these fans work independently therefore half of fans will not work. gpus have to command 26 fans and then will work properly. I wrote yesterday that i put only 6 gpus and the fans work for the first 3 gpus half of them were not working because the driver comanded 6 fans instead of 12 fans and gpus worked the same, at first they are all 6 hasing but only 3 of them had working fans, two minutes later the last 3 gpus were ovearheated and they stopped and after that rig ran with 3 full working fans.

Ok, that explains it a bit more. Maybe create a new nvidia-bug-report.log when all gpus are hashing and the fans stand still after reboot and mail it to linux-bugs[at] for in-depth support.

I have sent a email a few days ago to but i have no replay. in that mail i’ve described the problems and attached the new log how you suggested.

nvidia-bug-report (1).log.gz (1.17 MB)

It’s just Monday.
I noticed in the logs that the minimal OS you’re using is using a rather old Xserver 1.17 which seems to be hanging on start. This can be impacting fan control. Can you just disable Xserver start and instead run the nvidia-persistenced to keep the gpus initialized correctly?