Hello,
I was trying to set up a repo (PAPO) , but I kept running into
RuntimeError: CUDA error: device kernel image is invalid
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
I learned that this was because “the GPU-side code you are running was compiled for a different cuda than what you are actually running”. So, I try nvidia-smi , which gives me
NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2
and nvcc --version, which gives me
Cuda compilation tools, release 12.9, V12.9.41
when I was looking into removing cuda toolkit to try and fix this miss match, I ran
conda list | grep -E "cudatoolkit|cuda-toolkit" which showed me
cuda-toolkit 12.4.1 0 nvidia
I am very confused how this could happen. I have never had this type of issue before.
- How is this possible?
- What is the best way to fix this issue?
The repo requires cuda-toolkit 12.4.
when I run: conda list | grep -E “cuda|nvcc”
I get:
cuda-cccl 12.9.27 0 nvidia
cuda-cccl_linux-64 12.9.27 0 nvidia
cuda-command-line-tools 12.4.1 0 nvidia
cuda-compiler 12.9.0 0 nvidia
cuda-crt-dev_linux-64 12.9.41 0 nvidia
cuda-crt-tools 12.9.41 0 nvidia
cuda-cudart 12.4.127 0 nvidia
cuda-cudart-dev 12.4.127 0 nvidia
cuda-cudart-dev_linux-64 12.9.79 0 nvidia
cuda-cudart-static 12.9.79 0 nvidia
cuda-cudart-static_linux-64 12.9.79 0 nvidia
cuda-cudart_linux-64 12.9.79 0 nvidia
cuda-cuobjdump 12.9.26 1 nvidia
cuda-cupti 12.4.127 0 nvidia
cuda-cuxxfilt 12.9.19 1 nvidia
cuda-documentation 12.4.127 0 nvidia
cuda-driver-dev 12.9.79 0 nvidia
cuda-driver-dev_linux-64 12.9.79 0 nvidia
cuda-gdb 12.9.79 1 nvidia
cuda-libraries 12.4.1 0 nvidia
cuda-libraries-dev 12.6.2 0 nvidia
cuda-libraries-static 12.9.1 0 nvidia
cuda-nsight 12.9.79 0 nvidia
cuda-nvcc 12.9.41 0 nvidia
cuda-nvcc-dev_linux-64 12.9.41 0 nvidia
cuda-nvcc-impl 12.9.41 0 nvidia
cuda-nvcc-tools 12.9.41 0 nvidia
cuda-nvcc_linux-64 12.9.41 0 nvidia
cuda-nvdisasm 12.9.88 1 nvidia
cuda-nvml-dev 12.9.79 1 nvidia
cuda-nvprof 12.9.79 0 nvidia
cuda-nvprune 12.9.19 1 nvidia
cuda-nvrtc 12.4.127 0 nvidia
cuda-nvrtc-dev 12.4.127 0 nvidia
cuda-nvrtc-static 12.9.86 0 nvidia
cuda-nvtx 12.4.127 0 nvidia
cuda-nvvm-dev_linux-64 12.9.41 0 nvidia
cuda-nvvm-impl 12.9.41 0 nvidia
cuda-nvvm-tools 12.9.41 0 nvidia
cuda-nvvp 12.9.79 1 nvidia
cuda-opencl 12.9.19 0 nvidia
cuda-opencl-dev 12.9.19 0 nvidia
cuda-profiler-api 12.9.79 0 nvidia
cuda-runtime 12.4.1 0 nvidia
cuda-sanitizer-api 12.9.79 1 nvidia
cuda-toolkit 12.4.1 0 nvidia
cuda-tools 12.4.1 0 nvidia
cuda-version 12.9 3 nvidia
cuda-visual-tools 12.6.2 0 nvidia
cupy-cuda12x 13.5.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
pytorch-cuda 12.4 hc786d27_7 pytorch
pytorch-mutex 1.0 cuda pytorch
I am working in a HCP cluster, so I can not use sudo.