gt-1030 random freezes xid errors

I have new gt-1030.

System: ryzen 5 1400, 8 gig ram, dell 4k 27 inch monitor attached via displayport
Linux: Arch linux, linux-zen kernel 4.13 series latest and latest nvidia driver from arch repository, running latest gnome desktop from arch repository.

symptoms are X freezes for periods of time wherein cannot move cursor. using chromium, many times at start up is an empty screen window and no site can render until shut and restarted sometimes several time. dmesg shows many NVRM errors. I have tried with stock kernel, linux-ck kernel and all have pretty much same result.

starting gnome session under wayland, considerably fewer errors, but i would like to run X for video rendering.

I see no way to attach my debug report, so i uploaded to cloud and here is link of report done recently.

https://1drv.ms/u/s!AtFMs8CDoSxvmA03BReuKnfc7F-K

Thank you for any ideas as to how to fix up.

glen.

I had to double check which forum this was posted in, microsoft onedrive, really? ^^

Anyways, do you still get the error with modesetting disabled?
That would be my first guess.

edit: http://docs.nvidia.com/deploy/xid-errors/index.html#topic_5_3

XID 32: PBDMA Error
This event is logged when a fault is reported by the DMA controller which manages the communication stream between the NVIDIA driver and the GPU over the PCI-E bus. These failures primarily involve quality issues on PCI, and are generally not caused by user application actions.

Check if a new bios is available, then check if you’re affected by the ryzen bug:
https://github.com/suaefar/ryzen-test

thanks for replies; sorry about onedrive… the modesetting enabled was only done after so much trouble in X, so that i could use wayland.

bios is the latest, it is asrock 350m pro board.

i will check now for ryzen bug and reply back. thanks for help.

i disabled modesetting and got big improvement. also moved card to the other slot. and things are a lot better thanks. i made small change in the kill ryzen program to run on arch and that is not the problem, it ran a long time until i killed it, but no segfaults running at 100 per cent on every core. so for now this system is greatly improved. thanks so much for the help.

spoke too soon. after using for a while seeing same errors. in dmesg mostly errors 12,32 and 69. if anyone has any other suggestions, I’d appreciate it. Don’t seem to have any issues on windows 10, but i did get one bluescreen error when i used latest nvidia driver as opposed to what microsoft used when i installed but since that it works fine.

i may change out system ram, as i read on the ryzen platform ram can cause unusual errors. but i did a memtest run for 4 or 5 hours without error. i’ll run another nvidia bug report and post that

Hm, could be a problem with power delivery since 1030 afaik pulls all of it’s power from PCIE slot.
Does your bios allow for tampering with this? Alternatively you could try to make the GPU use less power and see if that brings you stability with "nvidia-smi -pl ".

To find out what current power limit is try running nvidia-smi command in console, it will output something like this:


| 2% 49C P0 40W / 216W | 197MiB / 8113MiB | 0% Default |

PCIE slot should be able to provide at least 75W by specification and 1030 should pull nowhere near that but I guess it’s worth the try, especially if you have shady PSU. Not sure what the alternative is, sounds to me like possible HW problem that you might have to RMA.

Alternatively try downgrading BIOS to see if that helps with anything.

i have a brand new evga 600 watt ps. i thought of power also, but the fact that i don’t see issues in windows led me away from that. but on other hand i am not using windows that much.

nvidia-smi gives me:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 387.22 Driver Version: 387.22 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 1030 Off | 00000000:07:00.0 On | N/A |
| 35% 36C P0 N/A / 30W | 388MiB / 1992MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 522 G /usr/lib/xorg-server/Xorg 215MiB |
| 0 589 G /usr/bin/gnome-shell 127MiB |
| 0 1571 G /proc/self/exe 43MiB |
±----------------------------------------------------------------------------+

i will try the bios downgrade. i upgraded it before i even did either windows or linux install. i’ll also borrow a different 8 gig ram chip known to be stable with ryzen and see if that changes anything. thanks for the tips.

I believe i have found the problem. not 100 per cent sure. But I pulled one ram stick out, I had 2 4gb modules and now i see no errors and machine feels normal. Thanks for all the suggestions. this is definately a hardware issue somewhere. hopefully a different set of ram will solve it.