Jetbot Voice-Activated Copilot Tools: Empowering Your ROS2 Robot with Voice-Activated Copilot Functionality
Experience the power of voice control for your ROS2 robot with the Jetbot Voice-Activated Copilot Tools. This project leverages the capabilities of the Nvidia RIVA ASR-TTS service, enabling your robot to understand and respond to spoken commands.
In this version 2 update, in addition to the features of V1 Jetbot Voice To Action Tools such as natural chat greetings, Lidar-assisted self-driving for object avoidance, and real-time person following, you can further enhance your robot’s interactions. Version 2 introduces support for multiple AI models, including LLM and VLM chat support, and hosts ROS2 under the NanoLLM Docker container to simplify setup procedures.
Key Features:
- Jetbot ASR Processor: Enables your robot to decode human voice messages using the Nvidia RIVA ASR service client ROS2 node.
- Jetbot TTS Processor: Converts chat-vision LLM and VLM response text into speech using Nvidia RIVA TTS services, which is then played via the robot’s speaker. This feature enhances the interaction between the robot and humans, making it more engaging and user-friendly.
- Jetbot ASR Agent: Allows you to build a simple 1D convolutional neural network (CNN) model for text classification to predict human voice intentions and pipe corresponding LLM chat, VLM vision, and actions that the robot should take.
- Jetbot Voice Tools Copilot: Executes the actions corresponding to the voice commands posted via ROS2 topic from the Jetbot ASR Agent. Supported actions include:
- Large Language Model (LLM) Chat: Empower your Jetbot to respond using LLM chat. By default, it utilizes the meta-llama/Llama-2-7b-chat-hf model hosted in a ROS2 node.
- Vision-Language Model (VLM) Robot Camera Image Description: Enable your Jetbot to describe images captured by its camera. By default, it employs the Efficient-Large-Model/VILA1.5-3b model hosted in a ROS2 node.
- Lidar-assisted self-driving for safe navigation and object avoidance.
- Real-time object detection for seamless person following interactions.
- Basic robot navigation commands such as moving forward/backward and turning left/right.
Code:
Demos:
Jetbot Voice Activated Copilot Vision, Chat, and Robot Actions Demo - YouTube