We don’t have the same server model, but the difference from the kernel perspective is only the number of CPU cores and the core number on each CPU socket in general. On R750, even core numbers are on socket 0, and odd core numbers are on socket 1. So, you may need to update irqaffinity, isolcpus, nohz_full, and ruc_nocbs based on the core assignment on DL380.
You should have 2-4 cores at minimum for the kernel and OS (i.e., may need to update irqaffinity) and can use others for Aerial workloads with isolation (i.e., may need to update isolcpus, nohz_full, and ruc_nocbs).
If only the number of CPUs is changed as above, the following booting error occurs when rebooting.
“end Kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can’t now provide you with the DMA bounce buffer”
If I exclude “iommu=off” from the kernel parameter, the booting problem is solved.
I can’t understand this problem, but I’m informing you to provide information.
You may need to check some BIOS settings related to MMIO. The error is obviously associated with MMIO regions for SWIOTLB and DMA. We use large memory GPUs and high-speed NICs, and they require significant MMIO space. For Dell R750, we recommend the following settings to enable these hardware. Please check your these kinds of BIOS settings on your server.
I can’t find BIOS setting of Memory Mapped I/O in HPE server.
excluding iommu=off and not setting BIOS related to MMIO will cause problems in future progress?
@twoheons
as @nhashimoto confirmed that, we don’t have this model of server so can not check and confirm the BIOs setting for this server and how to fix the issue you are facing. You might need to HPE how to enable large memory GPUs without IOMMU.
Thanks for your reply.
If I do not set “iommu=off”, Is there an error in the aerial function?, or is there a performance degradation that delays the aerial operation?