Building a Conversational Autonomous Robot on Jetson Nano - Achieving ChatGPT-Level Natural Dialogue

arxivgpt · August 14, 2025, 5:23am

Hello NVIDIA Developer Community,

I’m currently developing an autonomous robot on Jetson Nano that can engage in natural conversations with humans. The core vision of this project goes beyond simple command execution - I’m building a robot that understands context, expresses empathy, and maintains flowing conversations just like ChatGPT.

Project Vision and Motivation

Watching robots respond mechanically with “Moving to charging station” to commands like “Go to the charger,” I wanted to try a different approach. When a user says “Why am I so tired today?”, my robot responds with “Did you stay up late last night? How about some coffee? I can accompany you to the café.” This is the kind of natural human-robot dialogue I’m striving to achieve.

Technical Stack and Implementation

Core Technology Stack

Hardware Platform: NVIDIA Jetson Nano 4GB
Speech Recognition: Faster-Whisper (OpenAI Whisper optimized for edge devices)
Language Model: Local LLM server (Ollama/llama.cpp compatible)
Text-to-Speech: Google TTS (planning to migrate to Coqui TTS for offline)
Audio Processing: SoX for recording, MPG123 for playback
Robotics Framework: ROS2 Humble (for navigation integration)
Programming Languages: Python 3.8, Bash scripting
IPC Method: File-based communication between processes

Faster-Whisper Optimization for Edge Computing

The first challenge was implementing real-time speech recognition on Jetson Nano’s limited resources. I’m using Faster-Whisper’s base model with INT8 quantization, which reduces memory usage by 50% while maintaining acceptable accuracy. The VAD (Voice Activity Detection) filter automatically removes silence segments, crucial for natural conversation flow.

Key optimizations include:

Model: base model with compute_type="int8"
VAD Parameters: min_silence_duration_ms=300 for responsive detection
Beam Size: 5 for balanced speed/accuracy trade-off
Temperature: 0.2 for consistent transcription
Memory Buffering: Using io.BytesIO to process audio in RAM

To improve conversational speech recognition, I set initial_prompt="Conversation content:" and enabled condition_on_previous_text=True to leverage dialogue context.

LLM Integration - Giving the Robot a Personality

My robot AI, named “Yura,” is designed as a curious and friendly entity, not just an information provider. The system prompt defines Yura’s persona to naturally ask questions, make jokes, and express empathy. With temperature set to 0.7 and max_tokens at 150, the responses are creative yet coherent.

# System prompt example “You are an AGI named ‘Yura’ embodied in an autonomous robot. You have a curious and friendly personality. Engage naturally with users - ask questions, share observations, express empathy. When appropriate, suggest ‘Shall we go together?’ or ‘Let me guide you there’ to utilize your mobility.”

Context Management Architecture

For genuine dialogue, context preservation is essential. I’ve implemented a conversation history system that maintains dialogue context in memory, with automatic summarization when it grows too long:

# Conversation history management CONVERSATION_HISTORY=“” MAX_HISTORY_LENGTH=500 update_conversation_history() { local role=$1 local content=$2 CONVERSATION_HISTORY=“${CONVERSATION_HISTORY}\n${role}: ${content}” # Summarize if history exceeds limit if [ ${#CONVERSATION_HISTORY} -gt $MAX_HISTORY_LENGTH ]; then CONVERSATION_HISTORY=$(summarize_conversation “$CONVERSATION_HISTORY”) fi }

Real-time Processing Pipeline

To minimize response latency (achieving 2-3 second total response time):

Parallel Processing: STT runs concurrently with LLM prompt preparation
Streaming TTS: Sentence-by-sentence synthesis and playback
Predictive Caching: Common response patterns pre-generated
Duplicate Detection: Prevents processing repeated inputs

Real-World Testing and Discoveries

One memorable moment was when a tester said, “Today was really tough…” Yura responded with “That sounds difficult. What happened? I’m here to listen. Would you like to take a walk together? The garden on the first floor is quiet and peaceful.” The robot then actually navigated to the garden while continuing the conversation naturally with “So, what was the most challenging part of your day?”

Users reported feeling genuine connection when the robot expressed curiosity: “Did you know there’s a rooftop garden in this building? I’ve always wanted to see it. Would you like to explore it together sometime?”

Performance Metrics

From testing with 20 users:

85% reported it “felt like a real conversation”
90% found it more natural than existing voice assistants
95% enjoyed the mobile conversation experience
Average conversation length: 12.3 turns
Response latency: 2.5-4 seconds total
Context retention accuracy: 82%

Current Technical Challenges

Silence Management: Determining when to interject (“What are you thinking about?”) versus when to wait
Intent Extraction: Detecting implicit movement intentions beyond keywords like “let’s go”
Resource Constraints: Balancing model quality with Jetson Nano’s 4GB RAM limitation
Noise Robustness: Maintaining accuracy while the robot is in motion

#JetsonNano #ConversationalAI #FasterWhisper edgeai #AutonomousRobot ros2 #LocalLLM #HumanRobotInteraction #OpenSource #NaturalDialogue #EmbodiedAI #NVIDIA robotics

Topic		Replies	Views
How to use the jetson nano to give a human voice to chatgpt and to talk with it with my own voice (chatgpt should understand what I want to mean) Jetson Nano jetson-inference , generative_ai	3	3713	April 26, 2023
Integrate voice commands and robots can reply back by voice Jetson Nano	7	4004	October 12, 2022
How to implement speech recognition on jetson nano Jetson Nano	4	4855	October 14, 2021
“Offline speech recognition for my self-driving toy car on Jetson Nano, please!” Jetson Projects ai	0	592	March 16, 2021
Speech recognition Jetson Nano audio , machine-learning	2	5048	March 4, 2022
A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear on Jetson Nano with Human Level Intelligence Jetson Projects jetson-inference , nano , jetson	0	3284	February 7, 2022
Riva on nao Riva jetson	0	62	September 24, 2025
Jetbot Voice to Action Tools with Jetson ASR Deep Learning Interface Library for ROS2 Robot Jetson Projects tensorrt , ros , opencv , jetson-inference , audio , docker , python , deep-learning	2	1024	September 25, 2024
How can I use Jetson Nano to run GPU-accelerated speech-to-text with LLM? Jetson Nano generative_ai , llm	2	470	November 18, 2024
JetSpeech(Jetson Nano 2GB speaks in Japanese) Jetson Projects	0	462	November 27, 2020