Triton Runtime Error in NGC PyTorch 25.04 Apptainer Container: libcuda.so Not Found

tylov.ohno · May 6, 2025, 4:04pm

Environment:

Cluster: NEC SX‑Aurora or similar HPC
Host OS: Ubuntu 22.04
Apptainer version: 1.x
NGC Container: nvcr.io/nvidia/pytorch:25.04-py3 (pulled as pytorch_25.04.sif)
CUDA toolkit on host: 12.8.1 (module cuda/12.8.1 ucx/1.18.0 via openmpi/5.0.7/gcc11.4.0-cuda12.8.1)
PyTorch inside container: 2.5.0a0 (nightly), Python 3.12
Other deps: Triton, Transformer‑Engine, Megatron‑LM v2.x

What I’m trying to do:
Run the Megatron‑LM miniature training example https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start under an Apptainer container on 2 nodes, 1 GPU/node via OpenMPI + NCCL, using Triton‑backed kernels for tensor parallelism. But on HPC with Triton I encounter a driver lookup error.

Problem:
Inside the container, Triton’s NVIDIA driver backend fails with:

AssertionError: libcuda.so cannot found!
Possible files are located at ['/usr/local/cuda/compat/lib/libcuda.so.1'].
Please create a symlink of libcuda.so to any of the files.

and I found out that the actual path to the “libcuda.so.1” is located at /usr/local/cuda/compat/lib.real/

The only workaround that succeeded was converting the SIF into a writable sandbox and creating the symlink:

# 1) Build writable sandbox
apptainer build --sandbox pytorch_sandbox_2504 pytorch_25.04.sif

# 2) Enter and patch
apptainer exec --nv --writable pytorch_sandbox_2504 bash -lc '
  cd /usr/local/cuda/compat
  ln -s lib.real lib
'

# 3) Run training
mpirun -np 2 --map-by ppr:1:node \
  apptainer exec --nv \
    pytorch_sandbox_2504 \
    python run_simple_mcore_train_loop.py --timestamp $timestamp

After this, /usr/local/cuda/compat/lib/libcuda.so is present and Triton loads the driver successfully.

Questions:

Is there a cleaner way to satisfy Triton’s driver lookup within an Apptainer container without a sandbox conversion?
Why does the NGC image place driver libraries under compat/lib.real instead of compat/lib?
Any best practices for running Triton (and Transformer‑Engine) inside NGC containers on HPC systems?

david.chin1 · November 17, 2025, 6:56pm

I don’t really have an answer to your questions, but instead of a symlink, it’s better to add the path to the system ld path:

echo “/usr/local/cuda/compat/lib.real” >> /etc/ld.so.conf
ldconfig

Put that in the %post section of your .def file.

maniktab · November 24, 2025, 9:41am

# IMPORTANT: ignore user-site so we use container's Triton, not ~/.local
export PYTHONNOUSERSITE=1
export SINGULARITYENV_PYTHONNOUSERSITE=1

# Tell Triton exactly where libcuda.so.1 lives INSIDE the container
export SINGULARITYENV_TRITON_LIBCUDA_PATH="/usr/local/cuda/compat/lib"

# Also help the dynamic linker
export SINGULARITYENV_LD_LIBRARY_PATH="/usr/local/cuda/compat/lib:${LD_LIBRARY_PATH:-}"

Run this before running the container. E.g., here I have a Python shell with venv on top of the container, so I don’t have to make it writable

SIF=“/containers/pytorch_25.10-py3.sif”
VENV=“/virtualenvs/ngc-pytorch-25.10”

Run venv python inside the container, preserving CWD and enabling GPUs.

exec singularity exec --nv \
  --bind /data:/data --bind "$HOME:/home/$USER" \
  --pwd "$PWD" \
  "$SIF" bash -lc '
set -euo pipefail
exec "$VENV/bin/python" "$@"
' python-wrapper "$@"

shengyan10086 · January 1, 2026, 12:58pm

I don’t know the answer but 7 month ago, I hope your problem resolved bro

Topic		Replies	Views
ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /.singularity.d/libs/libGLdispatch.so.0) Container: CUDA tensorrt , cuda , cudnn	2	2054	July 15, 2024
Cuda directory inside container doesn't contain enough libraries to import torch Jetson AGX Xavier tensorrt , cuda , pytorch	4	771	June 16, 2023
Have you fixed the NGC catalog PyTroch issue for this case? Jetson Xavier NX pytorch , ngc	2	924	January 18, 2023
NGC pytorch image started container, with exception symbolic files Docker and NVIDIA Docker pytorch	1	251	July 2, 2025
NGC pytorch docker container. The NVIDIA Driver was not detected Docker and NVIDIA Docker	0	1017	February 23, 2023
Triton Server can't run with GPU TAO Toolkit inference-server-triton	20	3584	September 18, 2023
ERROR: CUDA driver version is insufficient for CUDA runtime version using triton 22.08 image in Apptainer container Container: HPC	2	702	November 30, 2023
NVIDIA L4T PyTorch Jetson AGX Xavier pytorch	8	1196	July 13, 2022
Cannot compile samples from amd64 ngc tensorRT containers Container: CUDA tensorrt	0	681	December 13, 2021
Libcurand.so.10 not found on JetPack 4.6.2 in docker Jetson AGX Xavier cuda	13	2199	July 6, 2022

Triton Runtime Error in NGC PyTorch 25.04 Apptainer Container: libcuda.so Not Found

Related topics