Drawbacks for pci=nocrs kernel parameter

bj-dude · May 14, 2021, 3:03pm

I am playing with a couple of recently obtained A100s with some Asus B250 motherboard + Centos 8. After some extensive playing around (disable CSM in UEFI, above 4G decoding was already enabled, etc), I still got BAR errors in dmesg, such as:
BAR 1: no space for [mem size 0x1000000000 64bit pref]
BAR 1: failed to assign [mem size 0x1000000000 64bit pref]

I eventually got it working with this kernel parameter: pci=nocrs. nvidia-smi and our cuda test program would happily recognize the cards. I read online that pci=nocrs was the default setting in the pre-3 linux kernel. As I haven’t loaded our real training tasks (still many days away), do we know any known drawbacks for using pci=nocrs option?

Thanks ahead.
Ben

robbie12 · January 9, 2024, 10:19pm

Apologies for necro-bumping-- I also don’t really have an answer to @bj-dude’s original question…

Just came here to say that I was finally able to get my GTX970 and Tesla M40-24GB to be recognized by the driver on my desktop, also by using the pci=nocrs parameter.

According to the kernel parameter docs,

            nocrs           [X86] Ignore PCI host bridge windows from ACPI.
                            If you need to use this, please report a bug.

…not sure where exactly I’d report the bug though because I’m pretty sure this is mostly related to my (outdated, consumer-grade) 2015 Asus Z97A motherboard.

That said, hopefully someone else will find this useful!