CRM Digital FTE — Omnichannel AI Customer Success Agent
Production-grade AI employee handling 24/7 customer support across Gmail, WhatsApp, and Web Form — built in 48-72 hours following the Agent Maturity Model. OpenAI Agents SDK on Groq LLaMA 3.3 70B, FastAPI, Neon PostgreSQL + pgvector RAG, AIOKafka event streaming, and Kubernetes with dual HPA auto-scaling.

// Problem
A growing SaaS company drowning in customer inquiries needs 24/7 support across Email, WhatsApp, and a Web Form — but a human FTE costs $75,000/year plus benefits, training, and management overhead. Every unanswered message after hours is a lost customer.
// Solution
Built across two phases following the Agent Maturity Model: Phase 1 (Incubation) used Claude Code to prototype the system, crystallize requirements into a 5-tool MCP server, and define agent skills. Phase 2 (Specialization) transformed it into a production Custom Agent — OpenAI Agents SDK with pre-LLM guardrails, Kafka for decoupled event streaming (webhooks return in <200ms while AI processes async), Neon PostgreSQL + pgvector for semantic RAG, and Kubernetes with separate HPA policies for API and worker pods.
// Screenshots







// Tech Stack
// Metrics
- ✓$75,000/year human FTE → <$1,000/year AI FTE — 98% cost reduction with true 24/7 uptime and zero sick days
- ✓3 channels live end-to-end: Gmail (Google Pub/Sub push), WhatsApp (Twilio Sandbox), Web Form (Next.js + Vercel)
- ✓45-test suite — 18 guardrail tests, 21 channel formatter tests, 6 full E2E pipeline tests (no live DB or LLM required)
- ✓Kafka decoupling: webhooks return in <200ms; AI processing happens async — Twilio and Gmail never timeout waiting for LLM
- ✓Dual HPA: API pods 3→20, Worker pods 3→30 (70% CPU trigger) + idempotent Kafka producer (acks=all, zero duplicates)
// Highlights
- →Agent Maturity Model: Phase 1 (Incubation) built an MCP server with 5 tools + agent skills manifest using Claude Code as the Agent Factory; Phase 2 (Specialization) promoted the same tools to production @function_tool with Pydantic validation and error handling
- →Groq tool_use_failed fix — LLaMA 3.3 70B generated malformed XML tool calls; fixed with Literal types on all enum parameters and a 3-attempt retry loop with exponential backoff
- →Groq 429 rate-limit handling — instead of fixed backoff, the worker parses 'try again in Xm Ys' from the error message and sleeps the exact reset duration
- →Duplicate email prevention via contextvars — response_sent flag stored in ProcessingContext using Python contextvars; even if the agent calls send_response twice, only one email is delivered
- →asyncio.Queue fallback for Hugging Face Spaces — Kafka replaced by asyncio.Queue at runtime (USE_LOCAL_QUEUE=true), same worker code, no Java dependency in cloud
- →Cross-channel identity resolution — same customer on WhatsApp and Email recognized as one record via customer_identifiers table; full conversation history loaded across channels