Bug Report Summary | Product : NVIDIA NIM for Image OCR (NeMo Retriever OCR v1) | Version: 1.1.0 | Severity: High (Production Blocker)

NVIDIA Bug Report - NeMo Retriever OCR NIM v1.1.0


Bug Report Summary

Product: NVIDIA NIM for Image OCR (NeMo Retriever OCR v1)
Version: 1.1.0
Severity: High (Production Blocker)
Status: Reproducible 100%

Title: Stub process ‘scene_text_pre_0_0’ not healthy - OCR inference fails with 500 error despite health check passing


Problem Description

The NeMo Retriever OCR NIM v1.1.0 container starts successfully, all models load and show READY status, and the health endpoint reports {"ready": true}. However, all OCR inference requests fail with a 500 error:

{"error":"[500] in ensemble 'scene_text_ensemble', Stub process 'scene_text_pre_0_0' is not healthy."}

This appears to be a Triton Inference Server Python backend issue specific to the preprocessing stub in the ensemble pipeline.


Environment Details

System Configuration

Host OS: Linux (WSL2)
Kernel: 6.6.87.2-microsoft-standard-WSL2
Docker: 27.4.0
NVIDIA Driver: 581.29
CUDA: 13.0
GPU: NVIDIA GeForce RTX 5090
GPU Memory: 32GB VRAM
GPU Compute Capability: 8.9

Container Configuration

Container: nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0
Triton Server Version: 2.60.0
Shared Memory: 16GB
GPU Memory Fraction: 0.5 (50%)
Port Mapping: 8103 (host) → 8000 (container)
Runtime: nvidia

Docker Run Command

docker run -d \
    --name nemo-ocr \
    --runtime=nvidia \
    --gpus all \
    --shm-size=16GB \
    -p 8103:8000 \
    -e NGC_API_KEY=<redacted> \
    -e NIM_GPU_MEMORY_FRACTION=0.5 \
    -e CUDA_VISIBLE_DEVICES=0 \
    --restart unless-stopped \
    nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0

Steps to Reproduce

1. Start Container

docker pull nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0
docker run -d --name nemo-ocr --runtime=nvidia --gpus all \
    --shm-size=16GB -p 8103:8000 \
    -e NGC_API_KEY=<your_key> \
    nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:1.1.0

2. Wait for Container Ready

# Wait ~30 seconds for models to load
sleep 30

# Verify health check (PASSES)
curl http://localhost:8103/v1/health/ready
# Returns: {"ready":true}

3. Attempt OCR Inference

# Encode test image to base64
base64 test_image.jpg > image_b64.txt

# Create request JSON
cat > request.json << 'EOF'
{
  "input": [
    {
      "type": "image_url",
      "url": "data:image/jpeg;base64,<BASE64_STRING_HERE>"
    }
  ]
}
EOF

# Send inference request (FAILS)
curl -X POST http://localhost:8103/v1/infer \
  -H "Content-Type: application/json" \
  -d @request.json

4. Observe Error

{
  "error": "[500] in ensemble 'scene_text_ensemble', Stub process 'scene_text_pre_0_0' is not healthy."
}

Expected Behavior

OCR inference should succeed and return extracted text from the image, as documented in the API reference:

{
  "data": [
    {
      "index": 0,
      "text_detections": [
        {
          "text": "...",
          "confidence": 0.95,
          "bbox": [...]
        }
      ]
    }
  ],
  "usage": {...}
}

Actual Behavior

  • ✅ Container starts successfully
  • ✅ All models load and show READY status
  • ✅ Health endpoint returns {"ready": true}
  • ✅ GPU detected correctly
  • OCR inference fails with 500 error
  • Stub process scene_text_pre_0_0 not healthy

Container Logs Analysis

Model Status (from container logs)

+---------------------+---------+--------+
| Model               | Version | Status |
+---------------------+---------+--------+
| scene_text_det      | 1       | READY  |
| scene_text_det_post | 1       | READY  |
| scene_text_ensemble | 1       | READY  |
| scene_text_post     | 1       | READY  |
| scene_text_pre      | 1       | READY  | ← Shows READY but stub fails
| scene_text_rec      | 1       | READY  |
+---------------------+---------+--------+

Initialization Logs

I1118 11:43:28.517 model_lifecycle.cc:473] "loading: scene_text_pre:1"
I1118 11:43:39.901 python_be.cc:2289] "TRITONBACKEND_ModelInstanceInitialize: scene_text_pre_0_0 (GPU device 0)"
I1118 11:43:40.020 model.py:44] "Initialization complete."
I1118 11:43:42.721 model_lifecycle.cc:849] "successfully loaded 'scene_text_pre'"

No errors in logs - all models initialize successfully.

GPU Metrics

I1118 11:43:57.629 metrics.cc:889] "Collecting metrics for GPU 0: NVIDIA GeForce RTX 5090"
GPU Memory Used: 5,904 MiB (18% of 32,607 MiB)
GPU Memory Free: 26,703 MiB (82%)

GPU has plenty of resources - not a memory issue.


Investigation Performed

1. Verified API Format

Request format matches OpenAPI specification exactly:

  • Content-Type: application/json
  • Input type: “image_url”
  • URL format: “data:image/jpeg;base64,…”

2. Inspected Model Code

Python backend model at /opt/nim/workspace/scene_text_pre/1/model.py:

  • Code appears correct (100 lines)
  • Handles base64 decoding, image padding, resizing
  • No obvious bugs in implementation

3. Checked Configuration

Model config at /opt/nim/workspace/scene_text_pre/config.pbtxt:

backend: "python"
max_batch_size: 32
instance_group {
  count: 1
  kind: KIND_GPU
}

4. Tested with Different Approaches

  • ❌ Python requests library
  • ❌ Direct curl command
  • ❌ Container restart
  • ❌ Wait 2+ minutes for initialization

All approaches fail with same error.


Related Issues

This error is similar to known Triton Inference Server Python backend issues:

  1. triton-inference-server/server#3678 (Dec 2021)

    • “Stub process is unhealthy and it will be restarted”
    • Cause: Insufficient shared memory
    • Our case: Have 16GB shared memory (more than sufficient)
  2. triton-inference-server/server#7186 (May 2024)

    • “Stub process is not healthy”
    • Cause: Version mismatch between Python backend and Triton server
    • Resolved by rebuilding backend from matching branch
  3. triton-inference-server/server#8102 (Mar 2025)

    • Memory leak in triton_python_backend_stub process
    • Our case: Only 18% GPU memory used

Root Cause Hypothesis

Based on investigation, this appears to be a Triton Server 2.60.0 Python backend issue specific to:

  • Ensemble models with Python preprocessing
  • First inference initialization of stub process
  • Potentially related to version compatibility

Key Evidence:

  1. Health check only verifies models are loaded (shallow check)
  2. Stub process fails on actual inference execution
  3. No errors during model initialization
  4. All similar issues relate to Triton Python backend, not model code

Attempted Workarounds

1. Container Restart

docker restart nemo-ocr
sleep 30

Result: Same error persists

2. Increased Shared Memory

Already using 16GB (more than sufficient for 6 models)
Result: Not a resource issue

3. Verified GPU Access

docker exec nemo-ocr nvidia-smi

Result: GPU detected correctly, 82% memory free

4. Different Image Formats

Tested with:

  • Small images (100KB)
  • Large images (5MB)
  • Different encodings (JPEG, PNG)
    Result: All fail with same error

Impact Assessment

Severity: HIGH - Complete production blocker

Impact:

  • ✅ Container deploys successfully (misleading)
  • ✅ Health checks pass (misleading)
  • Cannot perform any OCR inference
  • 100% failure rate on all requests
  • Makes NIM completely unusable

Use Case Blocked:

  • DeepSeek context compression (text → image conversion for 10x token reduction)
  • Document understanding workflows
  • Production OCR deployments

Requested Action

  1. Immediate: Acknowledge if this is a known issue
  2. Short-term: Provide workaround or configuration change
  3. Medium-term: Release patched container (v1.1.1 or v1.2.0)
  4. Long-term: Fix Triton Python backend stub initialization

Additional Information

Why This Matters

NVIDIA NIM is marketed as production-ready (“One Docker command deployment”), but this critical bug prevents any usage of the OCR NIM.

Diagnostic Artifacts Available

We can provide:

  • Complete container logs
  • Model configuration files
  • Python backend source code
  • Test images and requests
  • Detailed reproduction steps

Alternative Solutions Considered

While we can deploy alternative OCR solutions (PaddleOCR, Tesseract), we specifically chose NeMo OCR NIM for:

  • TensorRT optimization for RTX 5090
  • Production-grade NVIDIA support
  • Enterprise-ready deployment

This bug blocks adoption of the NIM product line.


Contact Information

Reported By: [Your Name/Company]
Date: November 18, 2025
Environment: Development (reproducible on RTX 5090)
Priority: High (production blocker)


Attachments

Available upon request:

  1. Full container logs (500+ lines)
  2. Model configuration files
  3. Test images (base64 encoded)
  4. API request/response samples
  5. GPU diagnostics output

Expected Response

  1. Confirmation: Is this a known issue?
  2. Timeline: When can we expect a fix?
  3. Workaround: Is there an alternative configuration?
  4. Communication: Updates on resolution progress

Thank you for your attention to this critical issue.


Submission Checklist

Before submitting:

  • Verified issue is reproducible
  • Checked existing documentation (no troubleshooting items)
  • Reviewed similar Triton issues
  • Tested multiple workarounds
  • Prepared diagnostic information
  • Documented complete reproduction steps
  • Assessed business impact