[HELP] Can I use local model to load LLM and start the Agent studio?

leonard.zhang · August 5, 2024, 8:43am

Hello Nvidia

Here’s the thing. We recently keep working on Jetson Orin projects. When I decided to copy my environment to another Orin. I just wonder can we use local model to load such as VILA or Llama 3 something like this.It’s more fast that I just use my USB-Flash to store these huge models to reduce the downloading time and copy to the other Orin devices.

For examples :

Agent studio

jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \ $(autotag nano_llm) \ python3 -m nano_llm.studio

Llama-speak

jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \ $(autotag nano_llm) \ python3 -m nano_llm.agents.web_chat --api=mlc \ --model meta-llama/Meta-Llama-3-8B-Instruct \ --asr=riva --tts=piper

**In the instruction of Agent studio or llamaspeak, if I want to load the local VILA or Llama 3 model directly without having to authenticate with Huggingface !

How can I modify code or instruction to achieve that effect ?

Thanks !

Best regards,
Leonard

dusty_nv · August 5, 2024, 2:27pm

Hi @leonard.zhang, you can pass a local path to --model (or the corresponding field in Agent Studio) instead of HuggingFace repo name, and if the directory already exists it will just load your specified folder instead of trying to download it from HuggingFace Hub.

leonard.zhang · August 6, 2024, 8:34am

Hi dusty

Thanks for your answering and I try it out that add the local path to the properties, which is totally work.

And a quite interesting thing is that when I start try to use local ASR/TTS model path under offline situation.
I find the model expansion property of PiperTTS is .onnx but the RivaTTS is .riva according to RIVA Official Documentation

.onnx

.riva

My question is does there any limitation to use Local pretrained Text to Speech Model and which files should be modified, cause Orin as a edge computing deivce, it will always works under offline workingfield.

Thanks !

Best regards,
Leonard

dusty_nv · August 6, 2024, 1:45pm

Hi Leonard, you should get your agent running first while connected to internet, so that all the models get downloaded and are cached on disk (or you can do this manually and specify their local paths). Then when you disconnect the internet, it already has the models onboard. There may be some minor things here & there that you can disable the requests for.

system · August 28, 2024, 7:32am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I want to try LLaVa with Jetson Orin Jetson AGX Orin generative_ai	5	1030	March 10, 2024
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	24791	May 10, 2024
Want to run a Local LLM on Nvidia Jetson AGX Orin Jetson AGX Orin generative_ai	3	3419	July 17, 2024
Can't loading "TheBloke_llava-v1.5-13B-GPTQ" with AGXorin 32GB Jetson AGX Orin generative_ai	9	191	September 10, 2024
ASR Model Unavailable on Jetson Orin AGX Riva cudnn , jetson	1	76	November 27, 2024
Running LMdeploy inference engine on the NVIDIA Jetson AGX Orin Devkit Jetson Projects jetson , llama-31-8b-instruct , llama	2	151	January 14, 2025
Cannot run LLaVa with Orin NX Jetson Orin NX generative_ai	7	383	August 1, 2024
Running llama3.3 or llama4 on Jetson AGX Orin Developer Kit (64 GB) Jetson AGX Orin generative_ai	8	494	May 12, 2025
Userprompt Node - Agent Studio Jetson AGX Orin generative_ai	2	54	March 31, 2025
TensorRT-LLM for Jetson Jetson AGX Orin generative_ai	11	2674	July 7, 2025

[HELP] Can I use local model to load LLM and start the Agent studio?

Agent studio

Llama-speak

Related topics