Can Llama-3_3-Nemotron-Super-49B-v1_5 be converted to ONNX?

DRIVE OS Version: 7.0.3

Issue Description:
I am using DriveOS LLm SDK 0.0.3.4 to convert Llama-3_3-Nemotron-Super-49B-v1_5 from .safetensors to .onnx. This is an intermediate step before converting to the .engine format for using with TensorRT. I followed the instruction in export/README.md to setup the environment. When running

python3 llm_export.py …

I get this error:

File “/home/ilya_druker/.cache/huggingface/modules/transformers_modules/origin/modeling_decilm.py”, line 956, in forward
raise NotImplementedError("DeciLMModel does not support legacy cache format, please use a newer "
NotImplementedError: DeciLMModel does not support legacy cache format, please use a newer transformers version or use VariableCache explicitly (see import in this file).

The support for Nemotron model is not fully enabled in the standard repositories like Transformers. Therefore, the custom scripts are shipped together with the Nemotron data files, and they are called from the standard repositories as a “remote code”. That’s the reason of the above error: the structure VariableCache is defined in these custom scripts and NOT known to DriveOS LLM SDK scripts!

I tried to solve the issue by copying over all the custom scripts from Nemotron to
driveos_llm_sdk-0.0.3.4/export/utils/

Then I replaced DynamicCache by VariableCache in export_utils:

from .variable_cache import VariableCache

...

cache = VariableCache(config = config)

torch_to_onnx(
    model,
    (dummy_input_ids, {
        "past_key_values": cache,
        **extra_inputs
    }),
    output_dir,
    "model.onnx",
    input_names=input_names + list(extra_inputs.keys()),
    output_names=output_names,
    dynamic_axes=dynamic_axes | extra_dyn_axes,
)

After running again python3 llm_export.py I get the following error:

    outs = ONNXTracedModule(
  File "/mnt/homedirs/ilya_druker/Workspaces/driveos_llm_sdk-0.0.3.4/export/Condaspaces/drive_llm_sdk/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_
impl                                                                                                                                                                                            return self._call_impl(*args, **kwargs)
  File "/mnt/homedirs/ilya_druker/Workspaces/driveos_llm_sdk-0.0.3.4/export/Condaspaces/drive_llm_sdk/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/homedirs/ilya_druker/Workspaces/driveos_llm_sdk-0.0.3.4/export/Condaspaces/drive_llm_sdk/lib/python3.10/site-packages/torch/jit/_trace.py", line 108, in forward
    in_vars, in_desc = _flatten(args)
RuntimeError: Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted, but their usage is not recommended. Here, received an input
of unsupported type: VariableCache

As you see, VariableCache is not recognized by Torch JIT. Seems like Nevotron cannot be converted to ONNX by using DriveOS LLM SDK. Is that right? Any alternative solution?

Dear @ilya.druker1 ,
Nemotron is not a supported model on DRIVE OS LLM sdk. The supported models are listed in table at DriveOS LLM SDK: TensorRT’s Large Language Model Inference Framework for Auto Platforms — NVIDIA DriveOS 7.0.3 Linux SDK Developer Guide

Your plan is to test Llama-3_3-Nemotron-Super-49B-v1_5 on DRIVE AGX Thor right?

Hello

I want to run the Nemotron family on Thor with TensorRT framework. For that I need to convert the models to ONNX format and then to .engine. Is there a way how to?

It looks too big to run on Thor devkit.

We are working on supporting nemotron models in future DRIVE OS LLM releases

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.