Hi I had a question about the nvidia driver/ aerial SDK, I am trying to use the Aerial SDK in a smaller footprint embedded system, I have done some testing with the cuphycontroller to RU sim test in a 2U server, but am looking to enable it in this embedded system.
Specs below:
2U server: Ice Lake dual socket with A100 GPU and CX7 Nic
Embedded system: Ice Lake D HCC 32C sytem with A4500 ( closest to this PC Partner RTX A4500 Embedded Specs | TechPowerUp GPU Database) and CX7 NIC.
I have the latest version of DPDk/ AERIAL SDK 24-1 which is able to bind to the nic/gpu but am seeing issues where the GPU memory mapping appears to be exhausting the available VRAM (16GB) with the allocations.
I see this by adding prints in the opensource version of the nvidia driver. I am mainly seeing that all allocations (eg 10000 byte allocation which is failing) (is padded up/ requested by the application /gdrcopy to 64K) But they are then are being padded/ Aligned to 2MB I assume by the kernel driver/ linux kernel dma subsystem, do the mapped pages need to be 2MB or is there some way to make them smaller, so the allocations can have a better chance of fitting in 16 GB.
I have tried with the IOMMU off, passthrough ¶ and On/ Translated(VA) hoping dpdk / the mapping DMA’s could use a minimum closer to the 64K requested size. I don’t see any obvious request from the application, gdrcopy, or the nvidia kernel driver why this minimum Page size would be requested.
That is from looking at the problem from underneath, but from the application above if we only ever want to run the cuBB / tests with 1 cell could any of the parameters be scaled to allocate less? I saw this OrderEntity object was created causing alot of allocations repeatedly , the orig value was 40 (16UL 16DL)? i tried changing it to 16 and there were
a lot less allocations and the application got to waiting for L2 to connect, so i can try to connect to see if the test gets further
//LINE 2461 cuPHY-CP/cuphydriver/src/common/context.cpp
// Create order kernel entity
for(int i = 0; i < ORDER_ENTITY_NUM; i++)
order_entity_list[i] = std::move(new OrderEntity(static_cast<phydriver_handle>(this), gpu_device));
// order_entity_list.push_back(std::unique_ptr<OrderEntity>(new OrderEntity(static_cast<phydriver_handle>(this), gpu_device)));
gpu_device->setDevice();
This is alot of your builtin debug prints with some more i added in gdrcopy and the kenrel driver but if some of the errors can help explain Main thing i was tracing is the various allocation paths with all seem to have the min 2MB page_size, alignment and size (0x200000) when coming from the gdr pin/ map path in the kernel driver.
Successful Allocation:
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:mmap filp=0xff4cbdb64fa47f00 vma=0xff4cbdb77dec6c30 vm_file=0xff4cbdb64fa47f00 start=0x7f8c0fac0000 size=65536 off=0x1bb10
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdr_mr_from_handle_unlocked:mr->handle=0x1bb10 handle=0x1bb10
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:overwriting vma->vm_private_data=0000000000000000 with mr=ff4cbdb77dec8240
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:range start with p=0 vaddr=7f8c0fac0000 page_paddr=21bff9a0000
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:mapping p=1 entries=1 offset=0 len=65536 vaddr=7f8c0fac0000 paddr=21bff9a0000 ps 1 pe 1
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_remap_gpu_mem:mmaping phys mem addr=0x21bff9a0000 size=65536 at user virt addr=0x7f8c0fac0000
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:mr vma=0xff4cbdb77dec6c30 mapping=0xff4cbdb6dc233038
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_ioctl:ioctl called (cmd 0xc008da05)
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdr_mr_from_handle_unlocked:mr->handle=0x1bb10 handle=0x1bb10
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_ioctl:ioctl called (cmd 0xc008da05)
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdr_mr_from_handle_unlocked:mr->handle=0x1bb10 handle=0x1bb10
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM: ioctl(0x2a, 0x995d5be0, 0x20)
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_ioctl:ioctl called (cmd 0xc028da01)
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM RmP2PValidateAddressRangeOrGetPages: p2p get page ps 200000 addr 7f8c119c0000 len 10000 offset 1c0000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM RmP2PValidateAddressRangeOrGetPages: p2p get page ps 200000 addr 7f8c119c0000 len 10000 offset 1c0000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM _createThirdPartyP2PMappingExtent: mapping ps 200000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM dmaAllocMapping_GM107: else init ctx size 200000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM dmaAllocMapping_GM107: Picked Page size based on flags: 0x200000 flagVal: 0x0 phys 200000 size 200000 tmp size 200000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM dmaAllocMapping_GM107: VA FAIL vas 47139028 size 200000 req_size 200000 desc size 10000 desc ps 200000 desc np 200 align 200000 lo 0 hi 3ffffffff, mask 200000 vlo 3ffa00000 internal 0
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM RmP2PGetPagesUsingVidmemInfo: bar status 0 failed map Bar1 addr 7f8c119c0000 len 10000 offset 1c0000 ps 200000 vm ps 200000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM RmP2PRegisterCallback: p2preg ps 200000
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:__gdrdrv_pin_buffer:invoking nvidia_p2p_get_pages(va=0x7f8c119c0000 len=65536 p2p_tok=0 va_tok=0 callback=ffffffffc0fbb190) ps 1
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:__gdrdrv_pin_buffer:page table entries: 1
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:__gdrdrv_pin_buffer:page[0]=0x0000021bffbc0000
Last Failed (exhausted allocation):
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:mmap filp=0xff4cbdb64fa47f00 vma=0xff4cbdb77dec6f70 vm_file=0xff4cbdb64fa47f00 start=0x7f8c0faa0000 size=65536 off=0x1bb30
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdr_mr_from_handle_unlocked:mr->handle=0x1bb30 handle=0x1bb30
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:overwriting vma->vm_private_data=0000000000000000 with mr=ff4cbdb77decd800
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:range start with p=0 vaddr=7f8c0faa0000 page_paddr=21bffde0000
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:mapping p=1 entries=1 offset=0 len=65536 vaddr=7f8c0faa0000 paddr=21bffde0000 ps 1 pe 1
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_remap_gpu_mem:mmaping phys mem addr=0x21bffde0000 size=65536 at user virt addr=0x7f8c0faa0000
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_mmap:mr vma=0xff4cbdb77dec6f70 mapping=0xff4cbdb6dc233038
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_ioctl:ioctl called (cmd 0xc008da05)
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdr_mr_from_handle_unlocked:mr->handle=0x1bb30 handle=0x1bb30
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_ioctl:ioctl called (cmd 0xc008da05)
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdr_mr_from_handle_unlocked:mr->handle=0x1bb30 handle=0x1bb30
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM: ioctl(0x2b, 0x995d5da0, 0x30)
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM: ioctl(0x2a, 0x995d5430, 0x20)
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM thirdpartyp2pCtrlCmdRegisterVidmem_IMPL: p2p createmem ps 200000 addr 7f8c11a00000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM: ioctl(0x2a, 0x995d5be0, 0x20)
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdrdrv_ioctl:ioctl called (cmd 0xc028da01)
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM RmP2PValidateAddressRangeOrGetPages: p2p get page ps 200000 addr 7f8c11a00000 len 10000 offset 0
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM RmP2PValidateAddressRangeOrGetPages: p2p get page ps 200000 addr 7f8c11a00000 len 10000 offset 0
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM _createThirdPartyP2PMappingExtent: mapping ps 200000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM dmaAllocMapping_GM107: else init ctx size 200000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM dmaAllocMapping_GM107: Picked Page size based on flags: 0x200000 flagVal: 0x0 phys 200000 size 200000 tmp size 200000
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM dmaAllocMapping_GM107: VA FAIL vas 47139028 size 200000 req_size 200000 desc size 10000 desc ps 200000 desc np 200 align 200000 lo 0 hi 3ffffffff, mask 200000 vlo 0 internal 0
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM dmaAllocMapping_GM107: can't alloc VA space for mapping status 81.NVRM kbusMapFbAperture_GM107: Failed: [GPU0] Could not map pAperOffset: 0x0
Aug 13 16:46:15 5g-embedded-nv kernel: NVRM RmP2PGetPagesUsingVidmemInfo: bar status 81 failed map Bar1 addr 7f8c11a00000 len 10000 offset 0 ps 200000 vm ps 200000
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:__gdrdrv_pin_buffer:invoking nvidia_p2p_get_pages(va=0x7f8c11a00000 len=65536 p2p_tok=0 va_tok=0 callback=ffffffffc0fbb190) ps 0
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:__gdrdrv_pin_buffer:nvidia_p2p_get_pages(va=7f8c11a00000 len=65536 p2p_token=0 va_space=0 callback=ffffffffc0fbb190) failed [ret = -12]
Aug 13 16:46:15 5g-embedded-nv kernel: gdrdrv:gdr_free_mr_unlocked:invoking unpin_buffer while callback has already been fired
Aug 13 16:46:15 5g-embedded-nv kernel: phy_drv_init[1949]: segfault at 0 ip 00007f95f5459ea7 sp 00007ffc995d6b10 error 4 in libcuphydriver.so[7f95f53ca000+2b4000]
Aug 13 16:46:15 5g-embedded-nv kernel: Code: 89 43 08 48 b8 49 5f 45 56 45 4e 54 00 48 89 43 26 48 b8 44 52 56 2e 47 50 55 44 48 89 43 2e 4c 89 63 10 c7 43 35 44 45 56 00 <8b> 04 25 00 00 00 00 0f 0b 48 8b 7c 24 30 e8 e6 01 f7 ff 89 c3 85
Hi @eric.a.momper ,
This HW configuration is not supported. Fort this reason, we do not know what would be required changes to match your HW.
Please let us know if there is anything else we can help.
Thank you,
Balkan