Hi,
I’m experiencing OOM (Out of Memory) errors when trying to run nvidia/canary-1b-v2 on a Jetson AGX Orin 64GB.
The strange part is that, according to jtop monitoring, GPU memory isn’t actually being fully utilized when these errors occur.
Here’s the minimal example from the model card along with the provided short audio sample:
wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
from nemo.collections.asr.models import ASRModel
asr_ast_model = ASRModel.from_pretrained(model_name="nvidia/canary-1b-v2")
output = asr_ast_model.transcribe(['2086-149220-0033.wav'], source_lang='en', target_lang='en')
print(output[0].text)
Tested on Jetson AGX Orin using two different base images:
-
nvcr.io/nvidia/pytorch:25.10-py3-igpu -
dustynv/pytorch:2.7-r36.4.0-cu128-24.04
Error messages:
return inputs.to(device, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA driver error: out of memory
or
x = torch.cat((x[:, 0].unsqueeze(1), x[:, 1:] - self.preemph * x[:, :-1]), dim=1)
~~~^~~~~~~~~~
RuntimeError: CUDA driver error: out of memory
What’s puzzling is that jtop clearly shows GPU memory not being fully consumed at the time of failure — suggesting it might not be a real out-of-memory condition.
For comparison:
-
nvidia/canary-1b-flashworks perfectly fine on the same setup. -
canary-1b-v2runs on an A100 GPU with a 30-minute audio file, using around 20 GB of GPU memory without issues.
This seems similar to the issue reported here: Nemo > Canary 1B > RuntimeError: CUDA driver error: out of memory .
I also tried the approach mentioned there — limiting container RAM and extending swap. Specifically:
-
Docker memory limit: 500 MB
-
Swap size: 8 GB
However, the kernel was terminated by OOM before any GPU inference could even start.
Any insights into what might be causing this or how to correctly run canary-1b-v2 on Jetson AGX Orin would be greatly appreciated.