Latest aarch64 drivers fail to initialize card on HoneyComb board

jon · July 5, 2020, 6:22am

I am attempting to use nvidia-450.51 beta drivers on my HoneyComb Aarch64 board with an MSI GT 1030 graphics card. The board functions fine with Radeon GPUs as well as the GT 1030 with the nouveau kernel driver. I am testing against a Fedora 32 distribution with a 5.7.7 kernel.

I am using the X86EmulatorDxe package in edk2 and the graphics card is displaying via HDMI in UEFI without issues. In Linux the card is detected and and PCIe sets up the bars correctly and maps the registers. All the drivers initialize without errors, however when I attempt to access the card even with something as simple as nvidia-smi I receive these errors.

[ 8064.746949] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x54:1238)
[ 8064.747021] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
[ 8064.848294] NVRM: PBI is not supported for GPU 0004:01:00.0

Here is the full debug output on the failure.

[ 8062.079901] nvidia-nvlink: Nvlink Core is being initialized, major device number 511
[ 8062.080647] NVRM: probing 0x10de 0x1d01, class 0x30000
[ 8062.080705] nvidia 0004:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem
[ 8062.080804] NVRM: PCI:0004:01:00.0 (10de:1d01): BAR0 @ 0xa040000000 (16MB)
[ 8062.080807] NVRM: PCI:0004:01:00.0 (10de:1d01): BAR1 @ 0xa400000000 (256MB)
[ 8062.181842] NVRM: PBI is not supported for GPU 0004:01:00.0
[ 8062.181971] NVRM: loading NVIDIA UNIX aarch64 Kernel Module  450.51  Tue Jun 16 03:55:59 UTC 2020
[ 8064.433533] NVRM: nvidia_open...
[ 8064.433538] NVRM: nvidia_ctl_open
[ 8064.433675] NVRM: ioctl(0xd2, 0xe327f468, 0x48)
[ 8064.433711] NVRM: ioctl(0xd6, 0xe327f500, 0x8)
[ 8064.433715] NVRM: ioctl(0xca, 0x91f2b0e0, 0x4)
[ 8064.433720] NVRM: ioctl(0xc8, 0x91f2b120, 0xa00)
[ 8064.433726] NVRM: ioctl(0x2b, 0xe327f5b0, 0x20)
[ 8064.433830] NVRM: ioctl(0x2b, 0xe327f558, 0x28)
[ 8064.433882] NVRM: ioctl(0x2b, 0xe327f558, 0x28)
[ 8064.433904] NVRM: ioctl(0x2a, 0xe327f550, 0x20)
[ 8064.433925] NVRM: ioctl(0x2a, 0xe327f550, 0x20)
[ 8064.433939] NVRM: ioctl(0x2a, 0xe327f550, 0x20)
[ 8064.433952] NVRM: ioctl(0x2a, 0xe327f550, 0x20)
[ 8064.434009] NVRM: nvidia_open...
[ 8064.434013] NVRM: GPU 0004:01:00.0: Opening device bearing minor number 0
[ 8064.434414] NVRM: GPU 0004:01:00.0: RmInitAdapter
[ 8064.434417] NVRM: GPU 0004:01:00.0: RmSetupRegisters for 0x10de:0x1d01
[ 8064.434419] NVRM: GPU 0004:01:00.0: pci config info:
[ 8064.434422] NVRM: GPU 0004:01:00.0:    registers look  like: 0000000039a5edd7 00000000b08a9234NVRM: GPU 0004:01:00.0:    fb        looks like: 00000000e43ec3b1 00000000c733afc8NVRM: GPU 0004:01:00.0: Successfully mapped framebuffer and registers
[ 8064.434446] NVRM: GPU 0004:01:00.0: final mappings:
[ 8064.434448] NVRM: GPU 0004:01:00.0:     regs: 0000000039a5edd7 00000000b08a9234 0x000000003bbd7207
[ 8064.731803] NVRM: VM: nv_alloc_pages: 4 pages
[ 8064.731806] NVRM: VM:    contig 1  cache_type 1
[ 8064.731811] NVRM: VM: nv_alloc_contig_pages: 4 pages
[ 8064.731821] NVRM: VM: nv_alloc_pages:3368: 0x00000000355805c2, 4 page(s), count = 1, flags = 0x00010035, page_table = 0x00000000ec46e0d7
[ 8064.731836] NVRM: VM: nv_alloc_pages: 6 pages
[ 8064.731838] NVRM: VM:    contig 1  cache_type 1
[ 8064.731841] NVRM: VM: nv_alloc_contig_pages: 6 pages
[ 8064.731850] NVRM: VM: nv_alloc_pages:3368: 0x00000000da551326, 6 page(s), count = 1, flags = 0x00010035, page_table = 0x00000000bc3d73b6
[ 8064.731896] NVRM: VM: nv_alloc_pages: 4 pages
[ 8064.731897] NVRM: VM:    contig 1  cache_type 1
[ 8064.731899] NVRM: VM: nv_alloc_contig_pages: 4 pages
[ 8064.731905] NVRM: VM: nv_alloc_pages:3368: 0x0000000085f2d828, 4 page(s), count = 1, flags = 0x00010035, page_table = 0x0000000063ae006e
[ 8064.731910] NVRM: VM: nv_alloc_pages: 1 pages
[ 8064.731911] NVRM: VM:    contig 1  cache_type 1
[ 8064.731913] NVRM: VM: nv_alloc_contig_pages: 1 pages
[ 8064.731916] NVRM: VM: nv_alloc_pages:3368: 0x00000000fa41fb2b, 1 page(s), count = 1, flags = 0x00010035, page_table = 0x000000001e1e825e
[ 8064.732114] NVRM: GPU 0004:01:00.0:  is not primary VGA
[ 8064.732117] NVRM: GPU 0004:01:00.0:  is not primary UEFI console device
[ 8064.746648] NVRM: VM: nv_free_pages: 0x4
[ 8064.746653] NVRM: VM: nv_free_pages:3391: 0x00000000355805c2, 4 page(s), count = 1, flags = 0x00010035, page_table = 0x00000000ec46e0d7
[ 8064.746655] NVRM: VM: nv_free_contig_pages: 4 pages
[ 8064.746663] NVRM: VM: nv_free_pages: 0x6
[ 8064.746665] NVRM: VM: nv_free_pages:3391: 0x00000000da551326, 6 page(s), count = 1, flags = 0x00010035, page_table = 0x00000000bc3d73b6
[ 8064.746667] NVRM: VM: nv_free_contig_pages: 6 pages
[ 8064.746673] NVRM: VM: nv_free_pages: 0x4
[ 8064.746675] NVRM: VM: nv_free_pages:3391: 0x0000000085f2d828, 4 page(s), count = 1, flags = 0x00010035, page_table = 0x0000000063ae006e
[ 8064.746677] NVRM: VM: nv_free_contig_pages: 4 pages
[ 8064.746681] NVRM: VM: nv_free_pages: 0x1
[ 8064.746683] NVRM: VM: nv_free_pages:3391: 0x00000000fa41fb2b, 1 page(s), count = 1, flags = 0x00010035, page_table = 0x000000001e1e825e
[ 8064.746685] NVRM: VM: nv_free_contig_pages: 1 pages
[ 8064.746944] NVRM: GPU 0004:01:00.0: Tearing down registers
[ 8064.746949] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x54:1238)
[ 8064.747021] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
[ 8064.848294] NVRM: PBI is not supported for GPU 0004:01:00.0
[ 8064.848338] NVRM: ioctl(0xd1, 0xe327f380, 0xc)
[ 8064.848362] NVRM: ioctl(0x2a, 0xe327d0a0, 0x20)
[ 8064.848495] NVRM: ioctl(0x2a, 0xe327f730, 0x20)
[ 8064.848513] NVRM: ioctl(0x29, 0xe327f7a0, 0x10)
[ 8064.848562] NVRM: GPU 0000:00:00.0: nvidia_close on device bearing minor number 255
[ 8064.848563] NVRM: nvidia_ctl_close

Any pointers that you can provide would be much appreciated. Thanks.

samu_gabor · October 9, 2020, 2:38pm

Hello. I’d be curious to know if this issue has been addressed? I am interested in the HoneyComb system as well, but would require to be use a GT 1030 (or similar card) along with CUDA. Thanks!

Coincidentally I’m seeing something similar on the Solid Run macchiatoBIN with a GT 1030 card (F32 aarch64):

$ grep NVRM minicom_lspci.cap
[ 7.176715] NVRM: loading NVIDIA UNIX aarch64 Kernel Module 450.57 Sun Jul 5 15:01:14 UTC 2020
[ 56.476896] NVRM: GPU 0000:00:00.0: RmInitAdapter failed! (0x25:0x54:1238)
[ 56.483859] NVRM: GPU 0000:00:00.0: rm_init_adapter failed, device minor number 0

jon · January 26, 2021, 5:20pm

I have now tested with the latest nvidia-460.39 drivers and initialization still fails but with a different RMInitAdapter message.

NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x54:1262)

Is this something that Nvidia plans on addressing, or should I just stop testing?

benjamin14 · March 24, 2021, 11:49am

Was this issue addressed? I tested 460.67 and I get:

NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x23:0x40:624)
NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0

cheers
ben

pgwipeout · May 3, 2021, 7:20pm

I too am experiencing these issues on a board I’m doing early bring up on.
Except both Nouveau and Nvidia drivers timeout.
I’ve tried both a GTX645 and a GTX960.
Running the emulator package I manage to get a simple framebuffer, but still neither driver works correctly.
I’ve tried with the emulator and straight booting through u-boot without success.

Has anyone actually gotten Nvidia cards working on arm64?

aep · July 20, 2021, 9:47am

issue still there in 470.57.02

pgwipeout · July 20, 2021, 12:25pm

So after some exhausting troubleshooting we have isolated down the issues with the mainline drivers, and it means it’s likely the issue here as well.

It seems that cache snooping is broken on our device (Rockchip RK3566).
The upstream drivers depend on this working per the PCIe spec, and I imagine the Nvidia driver does as well.

jon · July 20, 2021, 12:34pm

HoneyComb has no issues using the Nouveau driver in mainline. It is only the Nvidia binary driver that is not initializing properly.

pgwipeout · July 20, 2021, 12:41pm

Interesting, so I wonder if the issues the RPi folk are having has anything to do with either of our issues or if they have a third issue.

nicholsman · September 14, 2021, 8:35am

I would really like to use an Nvida GPU for CUDA on my HoneyComb LX2K - any progress on this from Nvidia’s side? Has anyone tested 470.63.01 with this platform? I need a GPU for my HoneyComb and would ideally like to use CUDA.