Running Llama-3.1-8B-FP4 get triton error. Value 'sm_121a' is not defined for option 'gpu-name'

I tried to run the fp4 llama-3.1-8b that NVIDIA provided on hugging face on a DGX Spark. I got the following error when I run the given script on Llama-3.1-8B-FP4 hugging face page:

triton.runtime.errors.PTXASError: PTXAS error: Internal Triton PTX codegen error
ptxas stderr:

ptxas-blackwell fatal : Value ‘sm_121a’ is not defined for option ‘gpu-name’

Could you please help me figure out this? Thank you.

export TORCH_CUDA_ARCH_LIST=12.1a # Spark 12.0,12.1f, 12.1a
export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
1 Like

I’m trying to run it in vLLM docker container and setting TORCH_CUDA_ARCH_LIST=12.1a env var does not help.

Is there anything else that’s needed to fix this error? I’m using vllm/vllm-openai:nightly image.

1 Like