Hi NVIDIA Aerial team,
We are setting up NVIDIA Aerial CUDA-Accelerated RAN on DGX Spark.
During the driver installation, we observed an issue related to the installed kernel versions. The system has both of the following NVIDIA kernels installed:
- 6.17.0-1014-nvidia
- 6.17.0-1018-nvidia
When running:
./install_drivers.sh
the installer detects broken DOCA/MLNX OFED packages and runs dpkg/apt recovery. The mlnx-ofed-kernel-dkms package then attempts to build for both installed kernels:
Building for 6.17.0-1014-nvidia and 6.17.0-1018-nvidia
The build for 6.17.0-1014-nvidia succeeds. However, the build for 6.17.0-1018-nvidia fails:
Building initial module mlnx-ofed-kernel/25.10.OFED.25.10.1.7.1.1 for 6.17.0-1018-nvidia
…
Building module(s) … (bad exit status: 2)
Failed command:
make -j4 KERNELRELEASE=6.17.0-1018-nvidia
Error! Bad return status for module build on kernel: 6.17.0-1018-nvidia (aarch64)
After that, the following packages remain in a broken or unconfigured state:
- mlnx-ofed-kernel-dkms
- iser-dkms
- isert-dkms
- srp-dkms
This causes dpkg --configure -a and apt --fix-broken install -y to repeatedly retry the same DKMS build sequence. In practice, it becomes a loop:
- DKMS builds successfully for 6.17.0-1014-nvidia
- DKMS then tries to build for 6.17.0-1018-nvidia
- The 6.17.0-1018-nvidia build fails
- dpkg remains in a broken state
- apt --fix-broken install retries the same process
The Aerial environment appears to expect 6.17.0-1014-nvidia, and the installer dependency check also reports linux-headers-6.17.0-1014-nvidia as fulfilled.
Could you please clarify the recommended kernel baseline for DGX Spark with the current Aerial CUDA-Accelerated RAN release?
Specifically:
- Should DGX Spark continue to use 6.17.0-1014-nvidia for Aerial at this stage?
- Is 6.17.0-1018-nvidia currently supported by the DOCA/MLNX OFED package used by Aerial?
- If 6.17.0-1018-nvidia is not supported yet, should we remove/purge the 1018 kernel and keep only 6.17.0-1014-nvidia?
- Do you recommend pinning or holding the 6.17.0-1014-nvidia kernel to prevent automatic upgrade to 6.17.0-1018-nvidia?
- Will a future Aerial/DOCA release officially support 6.17.0-1018-nvidia on DGX Spark?
For reference, the failing package version is:
mlnx-ofed-kernel-dkms 25.10.OFED.25.10.1.7.1.1-1
The observed failure directly affects nvidia.service as well, because nvidia-peermem cannot be loaded when the DOCA/OFED stack is not configured successfully.
Any guidance on the supported kernel version and recommended recovery procedure would be appreciated.
Thanks.