DRIVE OS Version: 7.0.3
Issue Description:
I am using DriveOS LLm SDK 0.0.3.4 to convert Llama-3_3-Nemotron-Super-49B-v1_5 from .safetensors to .onnx. This is an intermediate step before converting to the .engine format for using with TensorRT. I followed the instruction in export/README.md to setup the environment. When running
python3 llm_export.py …
I get this error:
File “/home/ilya_druker/.cache/huggingface/modules/transformers_modules/origin/modeling_decilm.py”, line 956, in forward
raise NotImplementedError("DeciLMModel does not support legacy cache format, please use a newer "
NotImplementedError: DeciLMModel does not support legacy cache format, please use a newer transformers version or use VariableCache explicitly (see import in this file).
The support for Nemotron model is not fully enabled in the standard repositories like Transformers. Therefore, the custom scripts are shipped together with the Nemotron data files, and they are called from the standard repositories as a “remote code”. That’s the reason of the above error: the structure VariableCache is defined in these custom scripts and NOT known to DriveOS LLM SDK scripts!
I tried to solve the issue by copying over all the custom scripts from Nemotron to
driveos_llm_sdk-0.0.3.4/export/utils/
Then I replaced DynamicCache by VariableCache in export_utils:
from .variable_cache import VariableCache
...
cache = VariableCache(config = config)
torch_to_onnx(
model,
(dummy_input_ids, {
"past_key_values": cache,
**extra_inputs
}),
output_dir,
"model.onnx",
input_names=input_names + list(extra_inputs.keys()),
output_names=output_names,
dynamic_axes=dynamic_axes | extra_dyn_axes,
)
After running again python3 llm_export.py I get the following error:
outs = ONNXTracedModule(
File "/mnt/homedirs/ilya_druker/Workspaces/driveos_llm_sdk-0.0.3.4/export/Condaspaces/drive_llm_sdk/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_
impl return self._call_impl(*args, **kwargs)
File "/mnt/homedirs/ilya_druker/Workspaces/driveos_llm_sdk-0.0.3.4/export/Condaspaces/drive_llm_sdk/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/homedirs/ilya_druker/Workspaces/driveos_llm_sdk-0.0.3.4/export/Condaspaces/drive_llm_sdk/lib/python3.10/site-packages/torch/jit/_trace.py", line 108, in forward
in_vars, in_desc = _flatten(args)
RuntimeError: Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted, but their usage is not recommended. Here, received an input
of unsupported type: VariableCache
As you see, VariableCache is not recognized by Torch JIT. Seems like Nevotron cannot be converted to ONNX by using DriveOS LLM SDK. Is that right? Any alternative solution?