NVIDIA Bug Report - NeMo Retriever OCR NIM v1.1.0
Bug Report Summary
Product: NVIDIA NIM for Image OCR (NeMo Retriever OCR v1)
Version: 1.1.0
Severity: High (Production Blocker)
Status: Reproducible 100%
Title: Stub process ‘scene_text_pre_0_0’ not healthy - OCR inference fails with 500 error despite health check passing
Problem Description
The NeMo Retriever OCR NIM v1.1.0 container starts successfully, all models load and show READY status, and the health endpoint reports {"ready": true}. However, all OCR inference requests fail with a 500 error:
{"error":"[500] in ensemble 'scene_text_ensemble', Stub process 'scene_text_pre_0_0' is not healthy."}
This appears to be a Triton Inference Server Python backend issue specific to the preprocessing stub in the ensemble pipeline.
Environment Details
System Configuration
Host OS: Linux (WSL2)
Kernel: 6.6.87.2-microsoft-standard-WSL2
Docker: 27.4.0
NVIDIA Driver: 581.29
CUDA: 13.0
GPU: NVIDIA GeForce RTX 5090
GPU Memory: 32GB VRAM
GPU Compute Capability: 8.9
Container Configuration
Container: nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0
Triton Server Version: 2.60.0
Shared Memory: 16GB
GPU Memory Fraction: 0.5 (50%)
Port Mapping: 8103 (host) → 8000 (container)
Runtime: nvidia
Docker Run Command
docker run -d \
--name nemo-ocr \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-p 8103:8000 \
-e NGC_API_KEY=<redacted> \
-e NIM_GPU_MEMORY_FRACTION=0.5 \
-e CUDA_VISIBLE_DEVICES=0 \
--restart unless-stopped \
nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0
Steps to Reproduce
1. Start Container
docker pull nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0
docker run -d --name nemo-ocr --runtime=nvidia --gpus all \
--shm-size=16GB -p 8103:8000 \
-e NGC_API_KEY=<your_key> \
nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0
2. Wait for Container Ready
# Wait ~30 seconds for models to load
sleep 30
# Verify health check (PASSES)
curl http://localhost:8103/v1/health/ready
# Returns: {"ready":true}
3. Attempt OCR Inference
# Encode test image to base64
base64 test_image.jpg > image_b64.txt
# Create request JSON
cat > request.json << 'EOF'
{
"input": [
{
"type": "image_url",
"url": "data:image/jpeg;base64,<BASE64_STRING_HERE>"
}
]
}
EOF
# Send inference request (FAILS)
curl -X POST http://localhost:8103/v1/infer \
-H "Content-Type: application/json" \
-d @request.json
4. Observe Error
{
"error": "[500] in ensemble 'scene_text_ensemble', Stub process 'scene_text_pre_0_0' is not healthy."
}
Expected Behavior
OCR inference should succeed and return extracted text from the image, as documented in the API reference:
{
"data": [
{
"index": 0,
"text_detections": [
{
"text": "...",
"confidence": 0.95,
"bbox": [...]
}
]
}
],
"usage": {...}
}
Actual Behavior
- ✅ Container starts successfully
- ✅ All models load and show READY status
- ✅ Health endpoint returns
{"ready": true} - ✅ GPU detected correctly
- ❌ OCR inference fails with 500 error
- ❌ Stub process
scene_text_pre_0_0not healthy
Container Logs Analysis
Model Status (from container logs)
+---------------------+---------+--------+
| Model | Version | Status |
+---------------------+---------+--------+
| scene_text_det | 1 | READY |
| scene_text_det_post | 1 | READY |
| scene_text_ensemble | 1 | READY |
| scene_text_post | 1 | READY |
| scene_text_pre | 1 | READY | ← Shows READY but stub fails
| scene_text_rec | 1 | READY |
+---------------------+---------+--------+
Initialization Logs
I1118 11:43:28.517 model_lifecycle.cc:473] "loading: scene_text_pre:1"
I1118 11:43:39.901 python_be.cc:2289] "TRITONBACKEND_ModelInstanceInitialize: scene_text_pre_0_0 (GPU device 0)"
I1118 11:43:40.020 model.py:44] "Initialization complete."
I1118 11:43:42.721 model_lifecycle.cc:849] "successfully loaded 'scene_text_pre'"
No errors in logs - all models initialize successfully.
GPU Metrics
I1118 11:43:57.629 metrics.cc:889] "Collecting metrics for GPU 0: NVIDIA GeForce RTX 5090"
GPU Memory Used: 5,904 MiB (18% of 32,607 MiB)
GPU Memory Free: 26,703 MiB (82%)
GPU has plenty of resources - not a memory issue.
Investigation Performed
1. Verified API Format
Request format matches OpenAPI specification exactly:
- Content-Type: application/json
- Input type: “image_url”
- URL format: “data:image/jpeg;base64,…”
2. Inspected Model Code
Python backend model at /opt/nim/workspace/scene_text_pre/1/model.py:
- Code appears correct (100 lines)
- Handles base64 decoding, image padding, resizing
- No obvious bugs in implementation
3. Checked Configuration
Model config at /opt/nim/workspace/scene_text_pre/config.pbtxt:
backend: "python"
max_batch_size: 32
instance_group {
count: 1
kind: KIND_GPU
}
4. Tested with Different Approaches
- ❌ Python requests library
- ❌ Direct curl command
- ❌ Container restart
- ❌ Wait 2+ minutes for initialization
All approaches fail with same error.
Related Issues
This error is similar to known Triton Inference Server Python backend issues:
-
triton-inference-server/server#3678 (Dec 2021)
- “Stub process is unhealthy and it will be restarted”
- Cause: Insufficient shared memory
- Our case: Have 16GB shared memory (more than sufficient)
-
triton-inference-server/server#7186 (May 2024)
- “Stub process is not healthy”
- Cause: Version mismatch between Python backend and Triton server
- Resolved by rebuilding backend from matching branch
-
triton-inference-server/server#8102 (Mar 2025)
- Memory leak in
triton_python_backend_stubprocess - Our case: Only 18% GPU memory used
- Memory leak in
Root Cause Hypothesis
Based on investigation, this appears to be a Triton Server 2.60.0 Python backend issue specific to:
- Ensemble models with Python preprocessing
- First inference initialization of stub process
- Potentially related to version compatibility
Key Evidence:
- Health check only verifies models are loaded (shallow check)
- Stub process fails on actual inference execution
- No errors during model initialization
- All similar issues relate to Triton Python backend, not model code
Attempted Workarounds
1. Container Restart
docker restart nemo-ocr
sleep 30
Result: Same error persists
2. Increased Shared Memory
Already using 16GB (more than sufficient for 6 models)
Result: Not a resource issue
3. Verified GPU Access
docker exec nemo-ocr nvidia-smi
Result: GPU detected correctly, 82% memory free
4. Different Image Formats
Tested with:
- Small images (100KB)
- Large images (5MB)
- Different encodings (JPEG, PNG)
Result: All fail with same error
Impact Assessment
Severity: HIGH - Complete production blocker
Impact:
- ✅ Container deploys successfully (misleading)
- ✅ Health checks pass (misleading)
- ❌ Cannot perform any OCR inference
- ❌ 100% failure rate on all requests
- ❌ Makes NIM completely unusable
Use Case Blocked:
- DeepSeek context compression (text → image conversion for 10x token reduction)
- Document understanding workflows
- Production OCR deployments
Requested Action
- Immediate: Acknowledge if this is a known issue
- Short-term: Provide workaround or configuration change
- Medium-term: Release patched container (v1.1.1 or v1.2.0)
- Long-term: Fix Triton Python backend stub initialization
Additional Information
Why This Matters
NVIDIA NIM is marketed as production-ready (“One Docker command deployment”), but this critical bug prevents any usage of the OCR NIM.
Diagnostic Artifacts Available
We can provide:
- Complete container logs
- Model configuration files
- Python backend source code
- Test images and requests
- Detailed reproduction steps
Alternative Solutions Considered
While we can deploy alternative OCR solutions (PaddleOCR, Tesseract), we specifically chose NeMo OCR NIM for:
- TensorRT optimization for RTX 5090
- Production-grade NVIDIA support
- Enterprise-ready deployment
This bug blocks adoption of the NIM product line.
Contact Information
Reported By: [Your Name/Company]
Date: November 18, 2025
Environment: Development (reproducible on RTX 5090)
Priority: High (production blocker)
Attachments
Available upon request:
- Full container logs (500+ lines)
- Model configuration files
- Test images (base64 encoded)
- API request/response samples
- GPU diagnostics output
Expected Response
- Confirmation: Is this a known issue?
- Timeline: When can we expect a fix?
- Workaround: Is there an alternative configuration?
- Communication: Updates on resolution progress
Thank you for your attention to this critical issue.
Submission Checklist
Before submitting:
- Verified issue is reproducible
- Checked existing documentation (no troubleshooting items)
- Reviewed similar Triton issues
- Tested multiple workarounds
- Prepared diagnostic information
- Documented complete reproduction steps
- Assessed business impact