Jetson AI Lab - Agent Controller LLM

dusty_nv · April 2, 2024, 3:45pm

JETSON AI LAB RESEARCH GROUP

Project - Agent Controller LLM

Team Leads - @dusty_nv, Akash James, REBOTNIX

This project is to integrate a higher-level conversational LLM for interfacing with the user (either via text input or ASR from microphone) and to dynamically task/reconfigure the agent pipeline based on user commands and queries.

For example, the user should be able to say things like “if you see the door open, send me an alert” - and the LLM will output code to prompt a multimodal vision model, followed by the hooks for event detection and actions/alerts. Or, “hey robot, follow me.” and the robot’s perception & navigation system will begin tracking the person.

Current vision/language models (VLM) like Llava are not as conversational in nature as text-based LLM’s like Llama are, and may represent just one possible domain expert ‘worker model’ that the controller agent can invoke (including ViT’s like OwlVIT for open-vocabulary object detection, ect)

Further, having such a higher-level controller agent in place can lead to more adaptive intelligent behaviors of the system. Akash James and Gary Hilgemann (REBOTNIX) have independently had encouraging experiences with these multi-model dynamic agent architectures that deem further investigation.

There will be lower-level features in the LLM generation API needed to accomplish this, including function-calling (or ‘tools’ as they are referred to in the OpenAI ecosystem). Function descriptions available to the bot to invoke are embedded in the system prompt (normally using JSON format for parameter consistency) - for example, IMAGE_QUERY() DETECT_OBJECT() SEARCH_VECTORDB() GET_TIME() PERFORM_ACTION(), ect. And then when the LLM determines it necessary, it will output JSON or Python code invoking one or multiple of these functions/plugins.

Initial experiments accessing Llama’s ability to situationally call these are also encouraging - the part that remains is integration at the generation-level so that when the LLM actually outputs the code snippets, these are detected mid-generation, run, and the results injected into the bot output (for example, the time or result of search query). At that point, LLM output generation continues, with the bot now having knowledge of the result since it is now included in the prior context.

There are other prompt engineering techniques to experiment with in this realm as well for building more complex agents, such as auto-prompting, chain-of-thought (CoT), and guidance/grammars for constrained output. There are many projects having explored these with Langchain, Microsoft Jarvis, BabyAGI, ect that we can share techniques from.

Our remit is optimized integration of such techniques for building an adaptive assistant providing low-latency, responsive user experiences via vision and verbal conversations that can intuitively learn and be customized for each user.

dusty_nv · April 30, 2024, 5:05am

Self-Learning Llama-3 Voice Agent with Function Calling and Automatic RAG

Enable the LLM (Meta-Llama-3-8B) to invoke Python functions you give it access to, including the ability to save/retrieve info that it learns about you over time. Run locally on Jetson Orin, using Llama-3-8B-Instruct, Riva ASR, and Piper TTS through NanoLLM.

See the docs for function calling here: Chat — NanoLLM 24.4.2 documentation

Topic		Replies	Views
Jetbot Voice-Activated Copilot Tools with Nvidia RIVA and NanoLLM Container for ROS2 Robot - version 2.0 Jetson Projects opencv , tensorflow , jetson-inference , audio , docker , python , containers , ros-2-humble , llm , llama	1	1065	October 3, 2024
LLM based Multimodal AI w/ Azure Open AI & NVIDIA Jetson Jetson Projects	0	555	August 22, 2023
Userprompt Node - Agent Studio Jetson AGX Orin generative_ai	2	116	March 31, 2025
Jetbot Voice to Action Tools with Jetson ASR Deep Learning Interface Library for ROS2 Robot Jetson Projects tensorrt , ros , opencv , jetson-inference , audio , docker , python , deep-learning	2	1015	September 25, 2024
Clarification on Bringing Generative AI to Life with NVIDIA Jetson Jetson Projects generative_ai	2	483	March 12, 2024
LLM generate text Jetson Nano python , generative_ai	2	127	September 18, 2024
Build intelligent chatbots, enhance search engines, and develop educational tools with Llama 3-ChatQA Technical Blog	1	112	June 26, 2024
Function calling on Jetson Orin - Unable to run `bot_functions` in the latest dustynv/nano_llm docker container (dustynv/nano_llm:24.8-r36.3.0) Jetson AGX Orin generative_ai , llm , llama	5	301	October 22, 2024
Robotic, AI-driven GLaDOS from Portal Jetson Projects generative_ai	1	1042	February 12, 2024
AI-driven Interactive Lab Assistant w/ OpenCV & ChatGPT Jetson Projects camera , opencv , python , artificialintelligence , interactive , generative_ai	2	1133	March 29, 2024

Jetson AI Lab - Agent Controller LLM

Related topics