Speech-to-text STT api docker image with arm64 + GPU support

Hi, is anyone using a working speech-to-text STT server docker image with linux/arm64 with GPU support. Having trouble finding an appropriate image, most images with GPU support are linux/amd64, the ones with linux/arm64 seem to be without gpu support.

I need a transcription api endpoint on dgx spark for a non-english language (german) with vad , with or without speaker diarization for testing near or realtime transcription.

Any suggestions? Thanks.

We have a few resources on NGC on how to deploy a speech-to-text model