Speech-to-text STT api docker image with arm64 + GPU support

odn · November 13, 2025, 2:49pm

Hi, is anyone using a working speech-to-text STT server docker image with linux/arm64 with GPU support. Having trouble finding an appropriate image, most images with GPU support are linux/amd64, the ones with linux/arm64 seem to be without gpu support.

I need a transcription api endpoint on dgx spark for a non-english language (german) with vad , with or without speaker diarization for testing near or realtime transcription.

Any suggestions? Thanks.

aniculescu · December 29, 2025, 9:27pm

We have a few resources on NGC on how to deploy a speech-to-text model

Topic		Replies	Views
Please support the new canary-1b-v2 und parakeet-tdt-0.6b-v3 models via API Models	4	302	October 3, 2025
Quickly Voice Your Apps with NVIDIA NIM Microservices for Speech and Translation Technical Blog nim	1	71	September 18, 2024
GPU-Accelerated Speech to Text with Kaldi: A Tutorial on Getting Started Technical Blog	7	956	March 6, 2021
Gen AI Model support on DRIVE AGX Orin DRIVE AGX Orin General driveos-dl	2	284	May 23, 2024
Speech_to_text_citrinet infer yields random transcription results TAO Toolkit	15	1416	July 6, 2022
How to Deploy NVIDIA Riva Speech and Translation AI in the Public Cloud Technical Blog	0	383	August 29, 2023
Getting a Real Time Factor Over 60 for Text-To-Speech Services Using NVIDIA Jarvis Technical Blog	0	462	August 25, 2020
Request for Access: Riva TTS Magpie Riva	0	44	October 23, 2025
[jetson-voice] ASR/NLP/TTS for Jetson Jetson Projects	62	9662	December 10, 2023
Rub jarvis_start.sh: Got Health ready check failed DirectX, DXR, DirectCompute	0	603	June 26, 2021

Speech-to-text STT api docker image with arm64 + GPU support

Related topics