Voice AI, MCP, IoT, ESP8266
Overview
An AI Voice Agent that controls physical smart lights through natural language — speak to the agent, it sends commands to an ESP8266, and the light responds in real-time.
The Challenge
Most voice-controlled lights require exact phrases like "Set brightness to 50%." I wanted something that understands intent: say "I'm about to sleep" and it dims to warm red.
How It Works
- Speak naturally — "Make it warm and cozy" or "Pulse blue slowly"
- AI interprets intent — Voice agent understands context
- MCP Tool execution — Commands flow through Model Context Protocol
- ESP8266 responds — Light changes color, brightness, or effects
- Real-time feedback — Sub-second response
Example Commands
| What you say | What happens |
|---|---|
| "I'm about to sleep" | Dims to warm red |
| "Party mode" | Starts color cycling |
| "I'm reading" | Bright, cool white |
| "Sunrise in 10 minutes" | Gradual warm fade-in |
Technical Stack
| Layer | Technologies |
|---|---|
| Framework | Next.js 15 · React 19 · TypeScript 5.8 |
| Styling | TailwindCSS 4 · Dark mode (class strategy) |
| State | Zustand 5 · TanStack React Query |
| Agora SDKs | agora-rtc-sdk-ng · agora-rtm-sdk v2 |
| AI | Conversational AI · LLM (OpenAI/Anthropic/Gemini) · TTS · ASR — multimodal LLM required for camera verification |
| IoT | ESP8266 + Arduino firmware in arduino_code/ · Python MCP server (run separately) |
Repo
What I Learned
The magic is in the latency. Voice control at 2 seconds feels like giving orders. At 200ms it feels like thinking out loud.