Can Llama-3_3-Nemotron-Super-49B-v1_5 be converted to ONNX?

ilya.druker1 · February 3, 2026, 8:32pm

DRIVE OS Version: 7.0.3

Issue Description:
I am using DriveOS LLm SDK 0.0.3.4 to convert Llama-3_3-Nemotron-Super-49B-v1_5 from .safetensors to .onnx. This is an intermediate step before converting to the .engine format for using with TensorRT. I followed the instruction in export/README.md to setup the environment. When running

python3 llm_export.py …

I get this error:

File “/home/ilya_druker/.cache/huggingface/modules/transformers_modules/origin/modeling_decilm.py”, line 956, in forward
raise NotImplementedError("DeciLMModel does not support legacy cache format, please use a newer "
NotImplementedError: DeciLMModel does not support legacy cache format, please use a newer transformers version or use VariableCache explicitly (see import in this file).

The support for Nemotron model is not fully enabled in the standard repositories like Transformers. Therefore, the custom scripts are shipped together with the Nemotron data files, and they are called from the standard repositories as a “remote code”. That’s the reason of the above error: the structure VariableCache is defined in these custom scripts and NOT known to DriveOS LLM SDK scripts!

I tried to solve the issue by copying over all the custom scripts from Nemotron to
driveos_llm_sdk-0.0.3.4/export/utils/

Then I replaced DynamicCache by VariableCache in export_utils:

from .variable_cache import VariableCache

...

cache = VariableCache(config = config)

torch_to_onnx(
    model,
    (dummy_input_ids, {
        "past_key_values": cache,
        **extra_inputs
    }),
    output_dir,
    "model.onnx",
    input_names=input_names + list(extra_inputs.keys()),
    output_names=output_names,
    dynamic_axes=dynamic_axes | extra_dyn_axes,
)

After running again python3 llm_export.py I get the following error:

    outs = ONNXTracedModule(
  File "/mnt/homedirs/ilya_druker/Workspaces/driveos_llm_sdk-0.0.3.4/export/Condaspaces/drive_llm_sdk/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_
impl                                                                                                                                                                                            return self._call_impl(*args, **kwargs)
  File "/mnt/homedirs/ilya_druker/Workspaces/driveos_llm_sdk-0.0.3.4/export/Condaspaces/drive_llm_sdk/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/homedirs/ilya_druker/Workspaces/driveos_llm_sdk-0.0.3.4/export/Condaspaces/drive_llm_sdk/lib/python3.10/site-packages/torch/jit/_trace.py", line 108, in forward
    in_vars, in_desc = _flatten(args)
RuntimeError: Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted, but their usage is not recommended. Here, received an input
of unsupported type: VariableCache

As you see, VariableCache is not recognized by Torch JIT. Seems like Nevotron cannot be converted to ONNX by using DriveOS LLM SDK. Is that right? Any alternative solution?

SivaRamaKrishnaNV · February 4, 2026, 4:34am

Dear @ilya.druker1 ,
Nemotron is not a supported model on DRIVE OS LLM sdk. The supported models are listed in table at DriveOS LLM SDK: TensorRT’s Large Language Model Inference Framework for Auto Platforms — NVIDIA DriveOS 7.0.3 Linux SDK Developer Guide

Your plan is to test Llama-3_3-Nemotron-Super-49B-v1_5 on DRIVE AGX Thor right?

ilya.druker1 · February 4, 2026, 2:53pm

Hello

I want to run the Nemotron family on Thor with TensorRT framework. For that I need to convert the models to ONNX format and then to .engine. Is there a way how to?

SivaRamaKrishnaNV · February 5, 2026, 6:21pm

It looks too big to run on Thor devkit.

SivaRamaKrishnaNV · February 17, 2026, 3:56am

We are working on supporting nemotron models in future DRIVE OS LLM releases

system · March 3, 2026, 3:56am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to infer LLMs on the DRIVE AGX Orin Developer Kit? DRIVE AGX Orin General driveos-dl	3	1106	February 5, 2024
Convert onnx model using trtexec in DRIVE OS DRIVE AGX Orin General driveos-dl	8	172	September 4, 2024
Convert onnx model using trtexec in DRIVE OS DRIVE AGX Orin General driveos-dl	2	93	August 6, 2024
Converting nemo encoder-decoder rnnt model to onnx Riva nemo	1	1139	June 25, 2023
Using other frameworks on Nvidia Drive AGX DRIVE AGX Xavier General driveos-dl	1	849	January 13, 2022
TensorRT-LLM on Jetson Orin NX(16GB) Jetson Orin NX tensorrt , jetson-inference , generative_ai	9	1245	February 12, 2025
NVFP4 is not supported on Thor? DRIVE AGX Thor General llama , nemotron	2	85	February 26, 2026
Supercharging Llama 3.1 across NVIDIA Platforms Technical Blog	14	370	September 17, 2024
Nvidia / llama-3.1-nemotron-70b-instruct openai api is not working TensorRT llama	1	379	November 10, 2024
Llama-3.1-70b-instruct Models llama-31-70b-instruct , llama	4	420	December 2, 2024

Can Llama-3_3-Nemotron-Super-49B-v1_5 be converted to ONNX?

Related topics