Quick Start
Get EdgeVox running in 5 minutes.
Prerequisites
- Python 3.10+ (up to 3.13)
- CUDA GPU (recommended, 6GB+ VRAM) or Apple Silicon
- ~4GB disk for models (auto-downloaded on first run)
Install
bash
# Create conda environment
conda create -n voicebot python=3.12 -y
conda activate voicebot
# Install with CUDA support
pip install -e .Why conda?
On some systems, uv/pip CUDA builds can break. Conda with prebuilt wheels is more reliable for PyTorch + CUDA.
Download Models
bash
# Auto-download all required models
python -m edgevox.setup_modelsThis downloads:
- Whisper STT model (auto-sized by your VRAM)
- Sherpa-ONNX Zipformer Vietnamese STT
- Gemma 4 E2B LLM (Q4_K_M quantization)
- Kokoro-82M TTS
- Silero VAD
Run
TUI Mode (default)
bash
edgevoxWeb UI
bash
edgevox --web-ui
# Opens at http://127.0.0.1:8765Simple CLI
bash
edgevox --simple-ui
edgevox --simple-ui --text-mode # text-only, no micWith specific options
bash
# Vietnamese with wake word
edgevox --language vi --wakeword "hey jarvis"
# Specify GPU device and Whisper model
edgevox --stt large-v3-turbo --stt-device cuda
# Korean with Supertonic TTS
edgevox --language ko
# Web UI on custom host/port
edgevox --web-ui --host 0.0.0.0 --port 9000
# With ROS2 bridge
edgevox --ros2First Run
On first launch, EdgeVox will:
- Load Whisper STT (~2s on GPU)
- Load Gemma 4 LLM (~1s)
- Load Kokoro TTS + warmup (~1s)
- Start listening
You'll see the TUI with a status bar showing "Listening... speak now". Just talk!
Quick Test
Once running, try these slash commands:
/say Hello, this is a TTS test # Preview TTS output
/mictest # Record 3s + playback
/langs # List all languages
/lang fr # Switch to French
/voice de-thorsten # Switch to German Piper voice
/model small # Switch to smaller Whisper model