I tried to run the fp4 llama-3.1-8b that NVIDIA provided on hugging face on a DGX Spark. I got the following error when I run the given script on Llama-3.1-8B-FP4 hugging face page:
triton.runtime.errors.PTXASError: PTXAS error: Internal Triton PTX codegen error
ptxas stderr:
ptxas-blackwell fatal : Value ‘sm_121a’ is not defined for option ‘gpu-name’
Could you please help me figure out this? Thank you.
export TORCH_CUDA_ARCH_LIST=12.1a # Spark 12.0,12.1f, 12.1a
export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
1 Like
I’m trying to run it in vLLM docker container and setting TORCH_CUDA_ARCH_LIST=12.1a env var does not help.
Is there anything else that’s needed to fix this error? I’m using vllm/vllm-openai:nightly image.
1 Like