many attempts to config on EC2 g6e.12xlarge with 4 x L40s
/tmp/nim–meta–llama-3_3-70b-instruct-cftcu7d8. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’].
# NVIDIA NIM Bug Report: TensorRT-LLM Profile Incompatible with NGC Model Format
## Summary
NIM container fails to recognize NGC-downloaded TensorRT-LLM engines when using explicit TensorRT-LLM profile, despite successful model download and profile validation.
## Environment
- **NIM Version**: 1.12.0
- **Container**: `nvcr.io/nim/meta/llama-3.3-70b-instruct:latest`
- **Hardware**: AWS EC2 g6e.12xlarge (4x NVIDIA L40S GPUs)
- **Model**: Llama 3.3 70B Instruct
- **Profile ID**: `668b575f1701fa70a97cfeeae998b5d70b048a9b917682291bb82b67f308f80c` (tensorrt_llm)
## Bug Description
When using a valid NGC model URL with an explicit TensorRT-LLM profile, NIM successfully downloads the model but fails during format validation, claiming the downloaded format doesn’t match expected TensorRT-LLM structure.
## Reproduction Steps
1. **Working NGC Download** (validates model name format):
```bash
docker run -d --name nemo \
–gpus all \
--shm-size=16GB \
-e NGC_API_KEY \
-e NIM_MODEL_NAME=‘ngc://nim/meta/llama-3.3-70b-instruct’ \
-v “$LOCAL_NIM_CACHE:/opt/nim/.cache” \
-u 1000 \
-p 80:8000 \
nvcr.io/nim/meta/llama-3.3-70b-instruct:latest
```
2. **Failing with TensorRT-LLM Profile**:
```bash
docker run -d --name nemo \
–gpus all \
--shm-size=16GB \
-e NGC_API_KEY \
-e NIM_MODEL_PROFILE=‘668b575f1701fa70a97cfeeae998b5d70b048a9b917682291bb82b67f308f80c’ \
-e NIM_MODEL_NAME=‘ngc://nim/meta/llama-3.3-70b-instruct’ \
-v “$LOCAL_NIM_CACHE:/opt/nim/.cache” \
-u 1000 \
-p 80:8000 \
nvcr.io/nim/meta/llama-3.3-70b-instruct:latest
```
## Expected Behavior
The TensorRT-LLM profile should be compatible with NGC-downloaded TensorRT engines for the same model, since both are NVIDIA-provided components.
## Actual Behavior
### 1. Successful Profile Validation
```
INFO 2025-09-17 19:02:49.996 ngc_injector.py:149] Valid profile: 668b575f1701fa70a97cfeeae998b5d70b048a9b917682291bb82b67f308f80c (tensorrt_llm) on GPUs [0, 1, 2, 3]
INFO 2025-09-17 19:02:49.996 ngc_injector.py:302] Selected profile: 668b575f1701fa70a97cfeeae998b5d70b048a9b917682291bb82b67f308f80c (tensorrt_llm)
INFO 2025-09-17 19:02:49.996 ngc_injector.py:321] Profile metadata: llm_engine: tensorrt_llm
```
### 2. Successful Model Download
```
INFO 2025-09-17 19:02:49.369 ngc_injector.py:196] Model workspace is now ready. It took 0.466 seconds
INFO 2025-09-17 19:02:49.573 utils.py:125] Found following files in /tmp/nim–meta–llama-3_3-70b-instruct-cftcu7d8
INFO 2025-09-17 19:02:49.573 utils.py:129] ├── checksums.blake3
INFO 2025-09-17 19:02:49.573 utils.py:129] ├── config.json
INFO 2025-09-17 19:02:49.573 utils.py:129] ├── metadata.json
INFO 2025-09-17 19:02:49.573 utils.py:129] ├── rank0.engine
INFO 2025-09-17 19:02:49.573 utils.py:129] ├── rank1.engine
INFO 2025-09-17 19:02:49.573 utils.py:129] ├── rank2.engine
INFO 2025-09-17 19:02:49.573 utils.py:129] └── rank3.engine
```
### 3. Format Validation Failure
```
ValueError: Invalid repository ID or local directory specified: /tmp/nim–meta–llama-3_3-70b-instruct-cftcu7d8. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.
```
## Root Cause Analysis
The NGC model downloads TensorRT-LLM engines (`rank0.engine`, `rank1.engine`, etc.) but NIM’s format detection logic in `profile_utils.py` expects a specific directory structure:
**Expected for ‘trtllm-engine’:**
```
├── config.json
├── tokenizer files…
└── trtllm_engine/
├── config.json
├── rank0.engine
└── ...
```
**Actual NGC download:**
```
├── checksums.blake3
├── config.json
├── metadata.json
├── rank0.engine
├── rank1.engine
├── rank2.engine
└── rank3.engine
```
## Impact
- TensorRT-LLM profile cannot be used with NGC models
- Forces users to use vLLM (which may have memory constraints on smaller GPUs)
- Inconsistency between NVIDIA components (NGC registry vs NIM container expectations)
## Error Location
- **File**: `/opt/nim/llm/nim_llm_sdk/hub/profile_utils.py` (line 781, 791, 521)
- **Function**: `ProfileFilter._init_() → _update_allowed_backends() → evaluate_backend()`
## Suggested Fix
1. **Update format detection** to recognize NGC TensorRT engine format as valid ‘trtllm-engine’
2. **Add format converter** to restructure NGC downloads to expected directory layout
3. **Document compatibility** between NGC model formats and NIM profiles
## Failed all Workarounds
Use vLLM backend instead of TensorRT-LLM:
```bash
# Remove NIM_MODEL_PROFILE to auto-select vLLM
docker run -d --name nemo \
–gpus all \
-e NGC_API_KEY \
-e NIM_MODEL_NAME=‘ngc://nim/meta/llama-3.3-70b-instruct’ \
-e NIM_VLLM_EXTRA_ARGS=“–gpu-memory-utilization 0.98 --max-model-len 16384” \
nvcr.io/nim/meta/llama-3.3-70b-instruct:latest
```
Encounters a similar error.
## Additional Context
- Model download and profile validation both succeed independently
- The issue appears to be purely in the format detection/validation logic
- This affects any NGC model used with explicit TensorRT-LLM profiles
- Auto-selection works but may choose suboptimal backend for hardware
**Bug Report Venues:**
2. **NGC Support**: NVIDIA NGC
**Severity**: High (no workaround available, but affects TensorRT-LLM usage)