During several mouths of program development, we totally met a PCIe issue for 7 times on 7 different consoles. Six of them happened on slot for RTX 5000 GPU and only one happened on slot for Dip card (GE device which is used to acquire data from Xray detector) after software crash/shutdown when doing scanning (GPU is working for recon). Till now we have not find a way to reproduce this error.
We successfully collected some logs in last 3 times on the system where the error happened (Slot 4: RTX 5000), and found below error message in Xorg.0.log.old (not Xorg.0.log)
(EE) NVIDIA(GPU-1): Failed to initialize DMA.
(EE) NVIDIA(1): Failed to allocate push buffer
In the starting before the error message is showed, an abnormal reboot happened and no below message is showed in /var/log/messages before reboot.
2019-08-27T07:51:22.243428+08:00 ct25 gdm-autologin]: pam_unix(gdm-autologin:session): session opened for user ctuser by (uid=0)
Below attachments are uploaded:
- The screenshot when the error was showed
- ct99-issue-…NVIDIA bug report from console where the issue happened
- ct25-normal-…NVIDIA bug report from console where no such happened
Below is the situation in this 7 happening:
- System crashed after an axial scan then this issue (Error on Slot for Dip card)
- During App installation, this error happened after OS installation (app not installed yet)
- During App installation, this error happened after app installation (app installed, but not rebooting)
- (Update details after checking with colleague )
- Configuring on CT software then reboot system as system required. During OS shutdown, screen became black(nothing showed) and freeze until tester power off forcefully.
- Forcefully powered off system for two times
- Issue happened during executing reboot HAST. This HAST was executing to reproduce issue