No objections to experiments, but downgrading to 525.xx driver will require downgrading from CUDA 12.1 to CUDA 12.0, just as a point of information.
Also, I’d like to advise of possible support avenues here:
-
Contact the system vendor for support. it’s possible this is a hardware issue, and furthermore they have their own support path to NVIDIA.
-
Purchase a support license that entitles you to NVIDIA enterprise support, such as via a license to NVIDIA AI Enterprise. You should be able to purchase this from your system vendor.
-
File a bug. If you file a bug, you will likely be asked for a set of steps that allow us to reproduce the observation. I don’t know how feasible that is; it may or may not impede progress.
-
Use forums/community-based support as you are doing here.
In addition to the two requests I have outstanding, I’d like to try to confirm that the nouveau driver is not somehow involved. Could you run the following command:
lsmod |grep nouveau