OOM Error Running canary-1b-v2 on Jetson AGX Orin (Memory Not Actually Full)

Hi,

I’m experiencing OOM (Out of Memory) errors when trying to run nvidia/canary-1b-v2 on a Jetson AGX Orin 64GB.
The strange part is that, according to jtop monitoring, GPU memory isn’t actually being fully utilized when these errors occur.

Here’s the minimal example from the model card along with the provided short audio sample:

wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav

from nemo.collections.asr.models import ASRModel

asr_ast_model = ASRModel.from_pretrained(model_name="nvidia/canary-1b-v2")
output = asr_ast_model.transcribe(['2086-149220-0033.wav'], source_lang='en', target_lang='en')
print(output[0].text)

Tested on Jetson AGX Orin using two different base images:

  • nvcr.io/nvidia/pytorch:25.10-py3-igpu

  • dustynv/pytorch:2.7-r36.4.0-cu128-24.04

Error messages:

return inputs.to(device, non_blocking=non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA driver error: out of memory

or

x = torch.cat((x[:, 0].unsqueeze(1), x[:, 1:] - self.preemph * x[:, :-1]), dim=1)
                        ~~~^~~~~~~~~~
RuntimeError: CUDA driver error: out of memory

What’s puzzling is that jtop clearly shows GPU memory not being fully consumed at the time of failure — suggesting it might not be a real out-of-memory condition.

For comparison:

  • nvidia/canary-1b-flash works perfectly fine on the same setup.

  • canary-1b-v2 runs on an A100 GPU with a 30-minute audio file, using around 20 GB of GPU memory without issues.

This seems similar to the issue reported here: Nemo > Canary 1B > RuntimeError: CUDA driver error: out of memory .

I also tried the approach mentioned there — limiting container RAM and extending swap. Specifically:

  • Docker memory limit: 500 MB

  • Swap size: 8 GB

However, the kernel was terminated by OOM before any GPU inference could even start.

Any insights into what might be causing this or how to correctly run canary-1b-v2 on Jetson AGX Orin would be greatly appreciated.

Hi,

Please note that swap is not a GPU allocatable memory.
Could you check the memory status with free and share the output with us?

$ free -h

Thanks.