Parakeet RNNT 1.1B Multilingual NIM: Confidence scores always 0.0 in streaming mode

I’m using the Parakeet RNNT 1.1B Multilingual NIM for real-time ASR via gRPC streaming, but confidence scores are always `0.0` (sentinel value), making it impossible to filter low-quality transcriptions.

Environment

  • **NIM Container:** `nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest`
  • **NIM Version:** [Check with `docker inspect parakeet-asr | grep Image`]
  • **Profile:** `rmir-bs1-25.12.6` (streaming, batch size 1)
  • **Mode:** Streaming (`mode=str`)
  • **GPU:** NVIDIA A10G
  • **Client:** Python `riva.client` (gRPC streaming)
  • **API Version:** Riva ASR gRPC 1.8.0

Deployment Configuration

```bash
docker run -d --name parakeet-asr \
–runtime nvidia \
–gpus all \
-e NGC_API_KEY=${NGC_API_KEY} \
-e NIM_HTTP_API_PORT=9001 \
-e NIM_GRPC_API_PORT=50053 \
-e NIM_TRITON_PROFILE_ID=“rmir-bs1-25.12.6” \
-e NIM_TAGS_SELECTOR=“name=parakeet-1-1b-rnnt-multilingual,mode=str” \
-p 9001:9001 -p 50053:50053 \
-v ~/.cache/nim:/opt/nim/.cache \
–shm-size=16gb \
–memory=24g \
nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest

Client config:

python

asr_config = riva.client.StreamingRecognitionConfig(
config=riva.client.RecognitionConfig(
encoding=riva.client.AudioEncoding.LINEAR_PCM,
language_code='en-US',
enable_word_time_offsets=True, # ✅ Enabled
sample_rate_hertz=16000,
),
interim_results=False,
)

Problem

text

⭐ ASR Final: 'testing the implementation' (confidence: 0.00)
⭐ ASR Final: 'hello world' (confidence: 0.00)

Both alternative.confidence and word.confidence are always 0.0.

Questions

  1. Does Parakeet RNNT 1.1B support confidence in streaming mode?

  2. Is confidence disabled in the rmir-bs1-25.12.6 profile for performance?

  3. Do I need to use offline mode (mode=ofl) instead?

  4. Is there an environment variable to enable confidence?

Related

Use Case

Building real-time translation pipeline, need confidence to filter bad ASR before sending to NMT.

Is confidence supported for this model/mode, or should I use a different approach?

Thanks!