[BUG] Parakeet 1.1b RNNT Multilingual: diarizer=disabled profiles missing in 1.5.0/latest, sortformer fails on Blackwell GPUs (1.8.0 not available)

Summary

The ASR NIM container nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest (version 1.5.0, Riva Speech 26.02) has no profiles with diarizer=disabled despite the official documentation listing them. All 60 profiles in the manifest force diarizer=sortformer. The sortformer TensorRT engine fails to deserialize on Blackwell GPUs (RTX 5080, RTX 5090, compute capability 12.0), making the container unusable on Blackwell hardware.

Environment

  • Container: nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest (NIM 1.5.0, Riva Speech 2.25.0, container version 26.02)
  • GPUs tested: RTX 5090 (sm_120), RTX 5080 (sm_120), RTX 5060 Ti (sm_120)
  • Driver: 570.x
  • OS: Ubuntu 24.04
  • Docker: with nvidia-container-toolkit

Bug 1: Missing diarizer=disabled profiles

The official support matrix documents these profiles for Parakeet 1.1b RNNT Multilingual:

NIM_TAGS_SELECTOR GPU (GB)
mode=str,diarizer=disabled 9.77
mode=ofl,diarizer=disabled 11.41
mode=str-thr,diarizer=disabled 10.69
mode=all,diarizer=disabled 28.64

However, when inspecting the actual container manifest (/opt/nim/etc/default/model_manifest.yaml), zero profiles have diarizer=disabled. Every single profile (60 total) has diarizer=sortformer. This includes both prebuilt profiles (for A100, H100, L40S, DGX Spark) and RMIR profiles.

Using NIM_TAGS_SELECTOR="mode=str,diarizer=disabled" results in:

NIMProfileIDNotFound: Could not match a profile in manifest

Comparison with version 1.4.0

Version 1.4.0 does include diarizer=disabled RMIR profiles and works correctly on Blackwell GPUs:

Matched profile: tags: {'diarizer': 'disabled', 'mode': 'str', 'model_type': 'rmir', 'name': 'parakeet-1-1b-rnnt-multilingual', 'vad': 'default'}

Bug 2: Sortformer engine fails to deserialize on Blackwell

When using NIM_TAGS_SELECTOR="mode=str" (without diarizer filter), the container matches an RMIR profile with diarizer=sortformer and proceeds to build TensorRT engines. The main ASR encoder builds successfully, but the sortformer diarizer engine fails to load:

[!] Could not deserialize engine.

During the sortformer engine build, there are also OOM warnings:

[TRT] [E] [virtualMemoryBuffer.cpp::resizePhysical::154] Error Code 2: OutOfMemory (Requested size was 5851054080 bytes.)
[TRT] [W] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory

This happens on an RTX 5080 (16GB) and RTX 5090 (32GB), so the OOM on the 5090 is suspicious — 5.85GB should fit in 32GB.

Bug 3: Versions 1.6.0, 1.7.0, and 1.8.0 documented but do not exist

The release notes document three versions that do not exist on NGC:

  • 1.8.0: “Blackwell GPU support added for Parakeet 1.1b RNNT Multilingual”
  • 1.7.0: “Sortformer diarizer support for Parakeet 1.1b RNNT Multilingual”
  • 1.6.0: “Silero VAD and Sortformer diarizer support across multiple models”

We queried the NGC container registry API directly to obtain the complete list of published tags:

GET https://nvcr.io/v2/nim/nvidia/parakeet-1-1b-rnnt-multilingual/tags/list

Response:
{"name":"nim/nvidia/parakeet-1-1b-rnnt-multilingual","tags":["1.0.0","1.0","1.1.0","1.1","1.2.0","1.2","1.3.0","1.3","1.4.0","1.4","1.5.0","1.5","1","latest",...]}

Only versions 1.0 through 1.5 exist. There is no 1.6.0, 1.7.0, or 1.8.0 — not even for Enterprise accounts. The latest tag points to 1.5.0 (created March 2, 2026).

The documentation describes features for three unreleased versions, including the Blackwell support that users are waiting for. This creates significant confusion, as users read the release notes, see “Blackwell GPU support added in 1.8.0”, and expect to find a working container.

Additionally, the Speech NIM 26.02.0 support matrix states that Parakeet RNNT Multilingual “is supported on Blackwell and DGX Spark platform” — but the only published container (1.5.0) fails on Blackwell as described above.

Expected Behavior

  1. Container latest should include diarizer=disabled profiles as documented
  2. Sortformer TensorRT engines should build and load correctly on Blackwell (sm_120) GPUs
  3. If 1.8.0 is released, the container tag should be available on NGC

Workaround

Using version 1.4.0 which has diarizer=disabled RMIR profiles that compile and run correctly on Blackwell GPUs:

docker run -d --name parakeet-asr \
  --runtime=nvidia --gpus '"device=0"' --shm-size=8GB \
  -e NGC_API_KEY -e NIM_HTTP_API_PORT=9000 -e NIM_GRPC_API_PORT=50051 \
  -e NIM_TAGS_SELECTOR="mode=str" \
  -v ~/.cache/nim:/opt/nim/.cache \
  -p 9000:9000 -p 50051:50051 \
  nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:1.4.0

Steps to Reproduce

  1. Pull the latest container: docker pull nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest
  2. Run on any Blackwell GPU with NIM_TAGS_SELECTOR="mode=str,diarizer=disabled" → profile not found
  3. Run with NIM_TAGS_SELECTOR="mode=str" → matches sortformer RMIR, builds engines, fails with “Could not deserialize engine”

+1 from RTX 5090

Any feedback yet?

Bug 1: diarizer=disabled profiles

In the latest RNNT release (1.5.0), the listed profiles use diarizer=sortformer; they don’t include separate diarizer=disabled entries. Diarization is part of all the current models, and the published profiles don’t offer an enabled/disabled choice for diarizer the way older versions had. Please refer the latest speech NIM documentation: NVIDIA ASR NIM Support Matrix — NVIDIA Speech NIM Microservices .

For Bug 2, would in be possible to share full logs of the failure to dig deeper?

Bug 3: Documented versions 1.6.0 / 1.7.0 / 1.8.0 vs. images

Those numbers refer to documentation releases, not container image tags or NIM version numbers. Doc and containers are released on different schedules, so a doc “1.6 / 1.7 / 1.8” style version won’t line up one-to-one with a specific image tag. Also, we have migrated to YY.MM style of document version now. Please refer to the latest documentation here: NVIDIA Speech NIM Microservices Overview — NVIDIA Speech NIM Microservices

Thanks @atomer for the clarifications on Bug 1 and Bug 3.

For Bug 2 — here are the full logs from running parakeet-1-1b-rnnt-multilingual:latest (NIM 1.5.0) on an RTX 5090 (Blackwell, compute 12.0).

Setup:

Image: nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual:latest
NIM: 1.5.0, nimlib 0.13.10, nim_sdk 0.11.3
GPU: 2b85:10de, compute capability (12, 0) — RTX 5090, 32GB VRAM
OS: Windows 11, Docker Desktop

Profile selected (offline, only option for compute 12.0):

profile_id: 6065e8498ddd0771af2884c0df7bda9daef2ec804a0625835ae6a28fef77d083
tags: diarizer=sortformer, mode=ofl, model_type=rmir, vad=silero

RMIR download and workspace materialization succeeded. The failure happens during TensorRT engine build for the RNNT encoder.

Key errors:

  1. ONNX export succeeds, then TRT engine build fails:
Building TensorRT engine for /tmp/tmpazq_vlyr/model.onnx
[!] Invalid Engine. Please ensure the engine was built correctly

  1. No tactic implementation for attention layers on compute 12.0:
Could not find any implementation for node
{ForeignNode[(Unnamed Layer* 878) [Constant] +
ONNXTRT_ShapeShuffle_459.../layers.0/self_attn/Where]}.

  1. CUDA driver memory allocation failures (even 1KB):
[virtualMemoryBuffer.cpp::resizePhysical::141] Error Code 1: Cuda Driver
Requested amount of GPU memory (1024 bytes) could not be allocated.

  1. All backend strategies fail:
Engine generation failed because all backend strategies failed.

TRT build config at time of failure:

Flags: [FP16, TF32, OBEY_PRECISION_CONSTRAINTS]
Memory Pools: [WORKSPACE: 32606.56 MiB, TACTIC_DRAM: 32606.56 MiB]
Tactic Sources: [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]

Note: The CTC English model (parakeet-1-1b-ctc-en-us:1.5.0) works fine on the same GPU — this is specific to the RNNT multilingual encoder + sortformer on Blackwell.

No streaming/realtime profile is available for compute 12.0 at all — the profile selector only matches the offline (ofl) profile.

Full container log is below if needed.