NIM TensorRT-LLM on H100 NVL

Can we use NIM TensorRT-LLM on H100 NVL?

I have checked profile of nvcr.io/nim/meta/llama-3.1-8b-instruct:1.2.2 on H100 NVL.
It seems TensorRT-LLM does not support H100 NVL.

SYSTEM INFO
- Free GPUs:
  -  [2321:10de] (0) NVIDIA H100 NVL [current utilization: 0%]
  -  [2321:10de] (1) NVIDIA H100 NVL [current utilization: 0%]
MODEL PROFILES
- Compatible with system and runnable:
  - 6a3ba475d3215ca28f1a8c8886ab4a56b5626d1c98adbfe751025e8ff3d9886d (vllm-bf16-tp2)
  - 3bb4e8fe78e5037b05dd618cebb1053347325ad6a1e709e0eb18bb8558362ac5 (vllm-bf16-tp1)
  - With LoRA support:
    - a95e5c7221dae587b4fc32448df265320ce79064a970297649d97a84eb9dc3ba (vllm-bf16-tp2-lora)
    - dfd9bee71abb7582246f7fb8c2aedd9119909b9639e1b4b0260ef6865545ede7 (vllm-bf16-tp1-lora)
- Incompatible with system:
  - 0bc4cc784e55d0a88277f5d1aeab9f6ecb756b9049dd07c1835035211fcfe77e (tensorrt_llm-h100-fp8-tp2-latency)
  - 2959f7f0dfeb14631352967402c282e904ff33e1d1fa015f603d9890cf92ca0f (tensorrt_llm-h100-fp8-tp1-throughput)
  - e45b4b991bbc51d0df3ce53e87060fc3a7f76555406ed534a8479c6faa706987 (tensorrt_llm-a10g-bf16-tp4-latency)
  - 7781e7219b2f41c3e560bb90d6f357ce64bbed9203f087d2cfe0ea3a523a04b7 (tensorrt_llm-a100-bf16-tp2-latency)
  - 7f98797c334a8b7205d4cbf986558a2b8a181570b46abed9401f7da6d236955e (tensorrt_llm-h100-bf16-tp2-latency)
  - 0494aafce0df9eeaea49bbca6b25fc3013d0e8a752ebcf191a2ddeaab19481ee (tensorrt_llm-l40s-bf16-tp2-latency)
  - ba515cc44a34ae4db8fe375bd7e5ad30e9a760bd032230827d8a54835a69c409 (tensorrt_llm-a10g-bf16-tp2-throughput)
  - a534b0f5e885d747e819fa8b1ad7dc1396f935425a6e0539cb29b0e0ecf1e669 (tensorrt_llm-l40s-bf16-tp2-throughput)
  - 7ea3369b85d7aee24e0739df829da8832b6873803d5f5aca490edad7360830c8 (tensorrt_llm-a100-bf16-tp1-throughput)
  - 9cff0915527166b2e93c08907afd4f74e168562992034a51db00df802e86518c (tensorrt_llm-h100-bf16-tp1-throughput)
  - 3807be802a8ab1d999bf280c96dcd8cf77ac44c0a4d72edb9083f0abb89b6a19 (tensorrt_llm-l40s-bf16-tp1-throughput)
  - 407c6c5d1e29be9929f41b9a2e3193359b8ebfa512353de88cefbf1e0f0b194e (vllm-bf16-tp4)
  - 6b89dc22ba60a07df3051451b7dc4ef418d205e52e19cb0845366dc18dd61bd6 (tensorrt_llm-l40s-bf16-tp2-throughput-lora)
  - a506c5bed39ba002797d472eb619ef79b1ffdf8fb96bb54e2ff24d5fc421e196 (tensorrt_llm-a100-bf16-tp1-throughput-lora)
  - 40543df47628989c7ef5b16b33bd1f55165dddeb608bf3ccb56cdbb496ba31b0 (tensorrt_llm-h100-bf16-tp1-throughput-lora)
  - 678e6dbe53dd6fe7dc508d22eb0672743ba0b7e735007cfd0b0d2a9e05911fb9 (vllm-bf16-tp4-lora)

I also checked /etc/nim/config/model_manifest.yaml.
It seems TensorRT-LLM support 2330:10de which is NVIDIA H100 80GB HBM3.

2959f7f0dfeb14631352967402c282e904ff33e1d1fa015f603d9890cf92ca0f:
  container_url: nvcr.io/nim/meta/llama-3.1-8b-instruct:1.2.0
  model: meta/llama-3.1-8b-instruct
  release: 1.2.2
  tags:
    feat_lora: false
    gpu: H100
    gpu_device: 2330:10de
    llm_engine: tensorrt_llm
    pp: '1'
    precision: fp8
    profile: throughput
    tp: '1'
  workspace: !workspace
    components:
      - dst: ''
        src:
          files:
            - !name 'LICENSE.txt'
            - !name 'NOTICE.txt'
            - !name 'checksums.blake3'
            - !name 'config.json'
            - !name 'generation_config.json'
            - !name 'model.safetensors.index.json'
            - !name 'special_tokens_map.json'
            - !name 'tokenizer.json'
            - !name 'tokenizer_config.json'
            - !name 'tool_use_config.json'
          repo_id: ngc://nim/meta/llama-3_1-8b-instruct:hf-8c22764-nim1.2
      - dst: trtllm_engine
        src:
          files:
            - !name 'LICENSE.txt'
            - !name 'NOTICE.txt'
            - !name 'checksums.blake3'
            - !name 'config.json'
            - !name 'metadata.json'
            - !name 'rank0.engine'
          repo_id: ngc://nim/meta/llama-3_1-8b-instruct:0.11.1+14957bf8-h100x1-fp8-throughput.1.2.18099815

Can we use NIM TensorRT-LLM on H100 NVL?

Currently the only optimized engines available are listed - and so this particular hardware does not have a support TensorRT-LLM engine!

Let me check back in with you in a few days to confirm if there are any plans for this to change.

1 Like

It would be good to know as we also face the same problem. If possible please provide an estimated timeline on when this support will be added.