Nvidia 0000:1a:00.0: swiotlb buffer is full // Failed to create a DMA mapping

dieter.kasper · December 5, 2020, 8:26pm

Hi, this is my 1st post. I hope I selected the right categories …
On 4.18.0-193.28.1.el8_2.x86_64 with nvidia-455.38 modules, Xeon 8270 CPU, 384 GB RAM and Tesla M60, when I create heavy load with darknet-yolov4 hundrets of these Kernel-Messsages show up in my Kernel-LOG:
(…)
[1687122.424850] nvidia 0000:1a:00.0: swiotlb buffer is full (sz: 516096 bytes), total 32768 (slots), used 2048 (slots)
[1687122.448080] NVRM: 0000:1a:00.0: Failed to create a DMA mapping!
[1687126.630986] nvidia 0000:1a:00.0: swiotlb buffer is full (sz: 933888 bytes), total 32768 (slots), used 2048 (slots)
[1687126.654215] NVRM: 0000:1a:00.0: Failed to create a DMA mapping!
(…)
under Appendix AA. Allocating DMA Buffers on 64-bit Platforms
I found the hint to use the ‘NVreg_RemapLimit=0x7c00000’ parameter for the nvidia Kernel Modul.
But the result was " nvidia: unknown parameter ‘NVreg_RemapLimit’ ignored",
384.69 driver - NVreg_RemapLimit
says “The parameter was indeed removed because it’s not supposed to be needed anymore”
but I still have the swiotbl-Issue
Any recommendations ?

AakankshaS · December 7, 2020, 1:28pm

Hi @dieter.kasper,
It looks like more of a hardware related issue, hence request you to the respective forum.
Thanks!

dieter.kasper · December 10, 2020, 8:30pm

Hi AakankshaS,
meanwhile I found a 2nd server, same Motherboard, same Tesla-M60 type card, Xeon Gold 6254, same OS
=> same Error picture
Therefore, I see it as unlikely that 2x the same HW problem is existing.
I updated to the latest SW versions: Driver Version: 455.45.01 CUDA Version: 11.1
I also changed the ‘swiotlb=65536’ in the Linux Kernel boot parameter. - Still the same Error picture
** How can we track down this (software ?) problem further ?