docker command :-
docker run -it --rm --gpus all --shm-size=16GB -e NGC_API_KEY=$NGC_API_KEY -v “$LOCAL_NIM_CACHE:/opt/nim/.cache” -u $(id -u) -p 8000:8000 nvcr.io/nim/meta/llama3-8b-instruct:1.0.0
Error:-
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting thedtype
flag in CLI, for example: --dtype=half.
I also tried this :-
docker run -it --rm --gpus all --shm-size=16GB -e NGC_API_KEY=$NGC_API_KEY -v “$LOCAL_NIM_CACHE:/opt/nim/.cache” -u $(id -u) -p 8000:8000 nvcr.io/nim/meta/llama3-8b-instruct:1.0.0
python3 -m vllm_nvext.entrypoints.openai.api_server --dtype half --max-model-len 26000
Error :-
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting thedtype
flag in CLI, for example: --dtype=half.
/usr/bin/python3.10: Error while finding module specification for ‘vllm_nvext.entrypoints.openai.api_server’ (ModuleNotFoundError: No module named ‘vllm_nvext’)