Can we use NIM TensorRT-LLM on H100 NVL?
I have checked profile of nvcr.io/nim/meta/llama-3.1-8b-instruct:1.2.2
on H100 NVL.
It seems TensorRT-LLM does not support H100 NVL.
SYSTEM INFO
- Free GPUs:
- [2321:10de] (0) NVIDIA H100 NVL [current utilization: 0%]
- [2321:10de] (1) NVIDIA H100 NVL [current utilization: 0%]
MODEL PROFILES
- Compatible with system and runnable:
- 6a3ba475d3215ca28f1a8c8886ab4a56b5626d1c98adbfe751025e8ff3d9886d (vllm-bf16-tp2)
- 3bb4e8fe78e5037b05dd618cebb1053347325ad6a1e709e0eb18bb8558362ac5 (vllm-bf16-tp1)
- With LoRA support:
- a95e5c7221dae587b4fc32448df265320ce79064a970297649d97a84eb9dc3ba (vllm-bf16-tp2-lora)
- dfd9bee71abb7582246f7fb8c2aedd9119909b9639e1b4b0260ef6865545ede7 (vllm-bf16-tp1-lora)
- Incompatible with system:
- 0bc4cc784e55d0a88277f5d1aeab9f6ecb756b9049dd07c1835035211fcfe77e (tensorrt_llm-h100-fp8-tp2-latency)
- 2959f7f0dfeb14631352967402c282e904ff33e1d1fa015f603d9890cf92ca0f (tensorrt_llm-h100-fp8-tp1-throughput)
- e45b4b991bbc51d0df3ce53e87060fc3a7f76555406ed534a8479c6faa706987 (tensorrt_llm-a10g-bf16-tp4-latency)
- 7781e7219b2f41c3e560bb90d6f357ce64bbed9203f087d2cfe0ea3a523a04b7 (tensorrt_llm-a100-bf16-tp2-latency)
- 7f98797c334a8b7205d4cbf986558a2b8a181570b46abed9401f7da6d236955e (tensorrt_llm-h100-bf16-tp2-latency)
- 0494aafce0df9eeaea49bbca6b25fc3013d0e8a752ebcf191a2ddeaab19481ee (tensorrt_llm-l40s-bf16-tp2-latency)
- ba515cc44a34ae4db8fe375bd7e5ad30e9a760bd032230827d8a54835a69c409 (tensorrt_llm-a10g-bf16-tp2-throughput)
- a534b0f5e885d747e819fa8b1ad7dc1396f935425a6e0539cb29b0e0ecf1e669 (tensorrt_llm-l40s-bf16-tp2-throughput)
- 7ea3369b85d7aee24e0739df829da8832b6873803d5f5aca490edad7360830c8 (tensorrt_llm-a100-bf16-tp1-throughput)
- 9cff0915527166b2e93c08907afd4f74e168562992034a51db00df802e86518c (tensorrt_llm-h100-bf16-tp1-throughput)
- 3807be802a8ab1d999bf280c96dcd8cf77ac44c0a4d72edb9083f0abb89b6a19 (tensorrt_llm-l40s-bf16-tp1-throughput)
- 407c6c5d1e29be9929f41b9a2e3193359b8ebfa512353de88cefbf1e0f0b194e (vllm-bf16-tp4)
- 6b89dc22ba60a07df3051451b7dc4ef418d205e52e19cb0845366dc18dd61bd6 (tensorrt_llm-l40s-bf16-tp2-throughput-lora)
- a506c5bed39ba002797d472eb619ef79b1ffdf8fb96bb54e2ff24d5fc421e196 (tensorrt_llm-a100-bf16-tp1-throughput-lora)
- 40543df47628989c7ef5b16b33bd1f55165dddeb608bf3ccb56cdbb496ba31b0 (tensorrt_llm-h100-bf16-tp1-throughput-lora)
- 678e6dbe53dd6fe7dc508d22eb0672743ba0b7e735007cfd0b0d2a9e05911fb9 (vllm-bf16-tp4-lora)
I also checked /etc/nim/config/model_manifest.yaml
.
It seems TensorRT-LLM support 2330:10de
which is NVIDIA H100 80GB HBM3
.
2959f7f0dfeb14631352967402c282e904ff33e1d1fa015f603d9890cf92ca0f:
container_url: nvcr.io/nim/meta/llama-3.1-8b-instruct:1.2.0
model: meta/llama-3.1-8b-instruct
release: 1.2.2
tags:
feat_lora: false
gpu: H100
gpu_device: 2330:10de
llm_engine: tensorrt_llm
pp: '1'
precision: fp8
profile: throughput
tp: '1'
workspace: !workspace
components:
- dst: ''
src:
files:
- !name 'LICENSE.txt'
- !name 'NOTICE.txt'
- !name 'checksums.blake3'
- !name 'config.json'
- !name 'generation_config.json'
- !name 'model.safetensors.index.json'
- !name 'special_tokens_map.json'
- !name 'tokenizer.json'
- !name 'tokenizer_config.json'
- !name 'tool_use_config.json'
repo_id: ngc://nim/meta/llama-3_1-8b-instruct:hf-8c22764-nim1.2
- dst: trtllm_engine
src:
files:
- !name 'LICENSE.txt'
- !name 'NOTICE.txt'
- !name 'checksums.blake3'
- !name 'config.json'
- !name 'metadata.json'
- !name 'rank0.engine'
repo_id: ngc://nim/meta/llama-3_1-8b-instruct:0.11.1+14957bf8-h100x1-fp8-throughput.1.2.18099815
Can we use NIM TensorRT-LLM on H100 NVL?