Bug Report Summary

elibrown0731 · November 18, 2025, 1:46pm

NVIDIA Bug Report - NeMo Retriever OCR NIM v1.1.0

Product: NVIDIA NIM for Image OCR (NeMo Retriever OCR v1)
Version: 1.1.0
Severity: High (Production Blocker)
Status: Reproducible 100%

Title: Stub process ‘scene_text_pre_0_0’ not healthy - OCR inference fails with 500 error despite health check passing

Problem Description

The NeMo Retriever OCR NIM v1.1.0 container starts successfully, all models load and show READY status, and the health endpoint reports {"ready": true}. However, all OCR inference requests fail with a 500 error:

{"error":"[500] in ensemble 'scene_text_ensemble', Stub process 'scene_text_pre_0_0' is not healthy."}

This appears to be a Triton Inference Server Python backend issue specific to the preprocessing stub in the ensemble pipeline.

Environment Details

System Configuration

Host OS: Linux (WSL2)
Kernel: 6.6.87.2-microsoft-standard-WSL2
Docker: 27.4.0
NVIDIA Driver: 581.29
CUDA: 13.0
GPU: NVIDIA GeForce RTX 5090
GPU Memory: 32GB VRAM
GPU Compute Capability: 8.9

Container Configuration

Container: nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0
Triton Server Version: 2.60.0
Shared Memory: 16GB
GPU Memory Fraction: 0.5 (50%)
Port Mapping: 8103 (host) → 8000 (container)
Runtime: nvidia

Docker Run Command

docker run -d \
    --name nemo-ocr \
    --runtime=nvidia \
    --gpus all \
    --shm-size=16GB \
    -p 8103:8000 \
    -e NGC_API_KEY=<redacted> \
    -e NIM_GPU_MEMORY_FRACTION=0.5 \
    -e CUDA_VISIBLE_DEVICES=0 \
    --restart unless-stopped \
    nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0

Steps to Reproduce

1. Start Container

docker pull nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0
docker run -d --name nemo-ocr --runtime=nvidia --gpus all \
    --shm-size=16GB -p 8103:8000 \
    -e NGC_API_KEY=<your_key> \
    nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0

2. Wait for Container Ready

# Wait ~30 seconds for models to load
sleep 30

# Verify health check (PASSES)
curl http://localhost:8103/v1/health/ready
# Returns: {"ready":true}

3. Attempt OCR Inference

# Encode test image to base64
base64 test_image.jpg > image_b64.txt

# Create request JSON
cat > request.json << 'EOF'
{
  "input": [
    {
      "type": "image_url",
      "url": "data:image/jpeg;base64,<BASE64_STRING_HERE>"
    }
  ]
}
EOF

# Send inference request (FAILS)
curl -X POST http://localhost:8103/v1/infer \
  -H "Content-Type: application/json" \
  -d @request.json

4. Observe Error

{
  "error": "[500] in ensemble 'scene_text_ensemble', Stub process 'scene_text_pre_0_0' is not healthy."
}

Expected Behavior

OCR inference should succeed and return extracted text from the image, as documented in the API reference:

{
  "data": [
    {
      "index": 0,
      "text_detections": [
        {
          "text": "...",
          "confidence": 0.95,
          "bbox": [...]
        }
      ]
    }
  ],
  "usage": {...}
}

Actual Behavior

✅ Container starts successfully
✅ All models load and show READY status
✅ Health endpoint returns {"ready": true}
✅ GPU detected correctly
❌ OCR inference fails with 500 error
❌ Stub process scene_text_pre_0_0 not healthy

Container Logs Analysis

Model Status (from container logs)

+---------------------+---------+--------+
| Model               | Version | Status |
+---------------------+---------+--------+
| scene_text_det      | 1       | READY  |
| scene_text_det_post | 1       | READY  |
| scene_text_ensemble | 1       | READY  |
| scene_text_post     | 1       | READY  |
| scene_text_pre      | 1       | READY  | ← Shows READY but stub fails
| scene_text_rec      | 1       | READY  |
+---------------------+---------+--------+

Initialization Logs

I1118 11:43:28.517 model_lifecycle.cc:473] "loading: scene_text_pre:1"
I1118 11:43:39.901 python_be.cc:2289] "TRITONBACKEND_ModelInstanceInitialize: scene_text_pre_0_0 (GPU device 0)"
I1118 11:43:40.020 model.py:44] "Initialization complete."
I1118 11:43:42.721 model_lifecycle.cc:849] "successfully loaded 'scene_text_pre'"

No errors in logs - all models initialize successfully.

GPU Metrics

I1118 11:43:57.629 metrics.cc:889] "Collecting metrics for GPU 0: NVIDIA GeForce RTX 5090"
GPU Memory Used: 5,904 MiB (18% of 32,607 MiB)
GPU Memory Free: 26,703 MiB (82%)

GPU has plenty of resources - not a memory issue.

Investigation Performed

1. Verified API Format

Request format matches OpenAPI specification exactly:

Content-Type: application/json
Input type: “image_url”
URL format: “data:image/jpeg;base64,…”

2. Inspected Model Code

Python backend model at /opt/nim/workspace/scene_text_pre/1/model.py:

Code appears correct (100 lines)
Handles base64 decoding, image padding, resizing
No obvious bugs in implementation

3. Checked Configuration

Model config at /opt/nim/workspace/scene_text_pre/config.pbtxt:

backend: "python"
max_batch_size: 32
instance_group {
  count: 1
  kind: KIND_GPU
}

4. Tested with Different Approaches

❌ Python requests library
❌ Direct curl command
❌ Container restart
❌ Wait 2+ minutes for initialization

All approaches fail with same error.

Related Issues

This error is similar to known Triton Inference Server Python backend issues:

triton-inference-server/server#3678 (Dec 2021)
- “Stub process is unhealthy and it will be restarted”
- Cause: Insufficient shared memory
- Our case: Have 16GB shared memory (more than sufficient)
triton-inference-server/server#7186 (May 2024)
- “Stub process is not healthy”
- Cause: Version mismatch between Python backend and Triton server
- Resolved by rebuilding backend from matching branch
triton-inference-server/server#8102 (Mar 2025)
- Memory leak in triton_python_backend_stub process
- Our case: Only 18% GPU memory used

Root Cause Hypothesis

Based on investigation, this appears to be a Triton Server 2.60.0 Python backend issue specific to:

Ensemble models with Python preprocessing
First inference initialization of stub process
Potentially related to version compatibility

Key Evidence:

Health check only verifies models are loaded (shallow check)
Stub process fails on actual inference execution
No errors during model initialization
All similar issues relate to Triton Python backend, not model code

Attempted Workarounds

1. Container Restart

docker restart nemo-ocr
sleep 30

Result: Same error persists

2. Increased Shared Memory

Already using 16GB (more than sufficient for 6 models)
Result: Not a resource issue

3. Verified GPU Access

docker exec nemo-ocr nvidia-smi

Result: GPU detected correctly, 82% memory free

4. Different Image Formats

Tested with:

Small images (100KB)
Large images (5MB)
Different encodings (JPEG, PNG)
Result: All fail with same error

Impact Assessment

Severity: HIGH - Complete production blocker

Impact:

✅ Container deploys successfully (misleading)
✅ Health checks pass (misleading)
❌ Cannot perform any OCR inference
❌ 100% failure rate on all requests
❌ Makes NIM completely unusable

Use Case Blocked:

DeepSeek context compression (text → image conversion for 10x token reduction)
Document understanding workflows
Production OCR deployments

Requested Action

Immediate: Acknowledge if this is a known issue
Short-term: Provide workaround or configuration change
Medium-term: Release patched container (v1.1.1 or v1.2.0)
Long-term: Fix Triton Python backend stub initialization

Additional Information

Why This Matters

NVIDIA NIM is marketed as production-ready (“One Docker command deployment”), but this critical bug prevents any usage of the OCR NIM.

Diagnostic Artifacts Available

We can provide:

Complete container logs
Model configuration files
Python backend source code
Test images and requests
Detailed reproduction steps

Alternative Solutions Considered

While we can deploy alternative OCR solutions (PaddleOCR, Tesseract), we specifically chose NeMo OCR NIM for:

TensorRT optimization for RTX 5090
Production-grade NVIDIA support
Enterprise-ready deployment

This bug blocks adoption of the NIM product line.

Contact Information

Reported By: [Your Name/Company]
Date: November 18, 2025
Environment: Development (reproducible on RTX 5090)
Priority: High (production blocker)

Attachments

Available upon request:

Full container logs (500+ lines)
Model configuration files
Test images (base64 encoded)
API request/response samples
GPU diagnostics output

Expected Response

Confirmation: Is this a known issue?
Timeline: When can we expect a fix?
Workaround: Is there an alternative configuration?
Communication: Updates on resolution progress

Thank you for your attention to this critical issue.

Submission Checklist

Before submitting:

Verified issue is reproducible
Checked existing documentation (no troubleshooting items)
Reviewed similar Triton issues
Tested multiple workarounds
Prepared diagnostic information
Documented complete reproduction steps
Assessed business impact

Topic		Replies	Views
Getting port=443 Max retries exceeded with url: https://api.bionemo.ngc.nvidia.com/v1 BioNeMo	2	186	April 20, 2025
Bug Report: NVIDIA NIM Hosted Endpoint Reliability Issues - bugs requiring extensive client-side workarounds Models nim , deepseek	3	341	April 14, 2026
Unable to setup nvOCDR on Jetson Orin NX DeepStream SDK	16	1283	May 21, 2024
NVIDIA NIM API invoked by Langchain returns statuscode 500 Access/Accounts nim , llama-31-70b-instruct , llama	1	406	September 4, 2024
Nvinfer yields constant OCR text with NHWC engine (fast_plate_ocr – cct_s_v1_global_model) while nvinferserver returns correct results DeepStream SDK inference-server-triton , deepstream	3	107	November 7, 2025
Observing Permission error while trying to run Deepfake Image detection model, have enterprise free-trial NVIDIA NIM cuda , nim	2	115	June 12, 2025
Launch the Reranker NIM : Failing to create container for Visual AI Agent nim , llama	9	322	May 23, 2025
Error Fetching config.json When Running NIM Llama 3.2 1B Container Models nim , llama	9	394	July 31, 2025
Request for NVIDIA NIM API Rate Limit Increase (40 → 200 RPM), University Research Access/Accounts nim , llama	0	28	April 26, 2026
Having issue running NeMo Guardrail Docker Image NVIDIA NeMo nim , llama , nemo-guardrails	3	118	December 19, 2025

Bug Report Summary | Product : NVIDIA NIM for Image OCR (NeMo Retriever OCR v1) | Version: 1.1.0 | Severity: High (Production Blocker)

NVIDIA Bug Report - NeMo Retriever OCR NIM v1.1.0