Architecture and library compatibility on aarch64

The core issue appears to be architecture and library compatibility on aarch64 / Grace-Blackwell, particularly when working with audio ML stacks that rely on:

  • Torch audio CUDA kernels

  • Conformer-based TTS models

  • s3tokenizer / unit vocoders

  • Riva ASR deployment containers

  • CUDA/CuDNN-accelerated audio feature extraction

  • Stable ARM64 wheels for PyTorch, Torchaudio, and related ecosystem packages

In practice, most of these packages attempt to pull x86_64-only wheels or try to JIT CUDA kernels that are not yet optimized for GB200 / ARM64 targets, resulting in one or more of the following:

  • Missing or incompatible wheels during installation

  • CUDA kernel fallback to CPU during runtime

  • Inability to launch certain Riva services

  • torchaudio codec failures due to missing torchcodec GPU bindings

  • Huge performance degradation (10–50× slower) compared to x86 systems

2 Likes

Thank you for the feedback. We understand that not having software incompatible with new hardware is frustrating and we are doing our best to work on it. You can check out our growing number of playbooks to see optimized workloads and please let us know if there is anything specific you want to see.

Got bit by this today. What is the status?

Hmm, this is not good for dgx spark as development environment

1 Like

I got similar issues with different TTS Solutions (xtts, alltalk, F5-tts)
Will this be solved soon?
Otherwise I think I have to sell the SPARK and get a MAC to full fill a dream of building a voice controlled smart personal local digital assistant.