Back to Projects
AI Agents

CRM Digital FTE — Omnichannel AI Customer Success Agent

Production-grade AI employee handling 24/7 customer support across Gmail, WhatsApp, and Web Form — built in 48-72 hours following the Agent Maturity Model. OpenAI Agents SDK on Groq LLaMA 3.3 70B, FastAPI, Neon PostgreSQL + pgvector RAG, AIOKafka event streaming, and Kubernetes with dual HPA auto-scaling.

CRM Digital FTE — Omnichannel AI Customer Success Agent

// Problem

A growing SaaS company drowning in customer inquiries needs 24/7 support across Email, WhatsApp, and a Web Form — but a human FTE costs $75,000/year plus benefits, training, and management overhead. Every unanswered message after hours is a lost customer.

// Solution

Built across two phases following the Agent Maturity Model: Phase 1 (Incubation) used Claude Code to prototype the system, crystallize requirements into a 5-tool MCP server, and define agent skills. Phase 2 (Specialization) transformed it into a production Custom Agent — OpenAI Agents SDK with pre-LLM guardrails, Kafka for decoupled event streaming (webhooks return in <200ms while AI processes async), Neon PostgreSQL + pgvector for semantic RAG, and Kubernetes with separate HPA policies for API and worker pods.

// Screenshots

Web Form channel — customer submits a High-Urgent API issue; AI agent picks it up from Kafka within seconds
Web Form channel — customer submits a High-Urgent API issue; AI agent picks it up from Kafka within seconds
Ticket confirmed — UUID tracking ID generated instantly, AI response guaranteed within 5 minutes
Ticket confirmed — UUID tracking ID generated instantly, AI response guaranteed within 5 minutes
Gmail channel — AI agent searches knowledge base, composes a formal reply, and sends it via Gmail API
Gmail channel — AI agent searches knowledge base, composes a formal reply, and sends it via Gmail API
End-to-end proof — customer receives the AI-generated reply with ticket reference in their inbox
End-to-end proof — customer receives the AI-generated reply with ticket reference in their inbox
WhatsApp channel — Twilio Sandbox integration; AI responds conversationally in under 300 characters
WhatsApp channel — Twilio Sandbox integration; AI responds conversationally in under 300 characters
Neon PostgreSQL schema — 8 tables: customers, customer_identifiers, conversations, messages, tickets, knowledge_base, channel_configs, agent_metrics
Neon PostgreSQL schema — 8 tables: customers, customer_identifiers, conversations, messages, tickets, knowledge_base, channel_configs, agent_metrics
Production stack — FastAPI, Kafka, PostgreSQL, worker, and metrics containers all running simultaneously
Production stack — FastAPI, Kafka, PostgreSQL, worker, and metrics containers all running simultaneously

// Tech Stack

[OpenAI Agents SDK][Groq LLaMA 3.3 70B][FastAPI][Neon PostgreSQL + pgvector][AIOKafka][Next.js 15][Docker][Kubernetes][Twilio WhatsApp][Gmail API + Google Pub/Sub][MCP Server][sentence-transformers]

// Metrics

  • $75,000/year human FTE → <$1,000/year AI FTE — 98% cost reduction with true 24/7 uptime and zero sick days
  • 3 channels live end-to-end: Gmail (Google Pub/Sub push), WhatsApp (Twilio Sandbox), Web Form (Next.js + Vercel)
  • 45-test suite — 18 guardrail tests, 21 channel formatter tests, 6 full E2E pipeline tests (no live DB or LLM required)
  • Kafka decoupling: webhooks return in <200ms; AI processing happens async — Twilio and Gmail never timeout waiting for LLM
  • Dual HPA: API pods 3→20, Worker pods 3→30 (70% CPU trigger) + idempotent Kafka producer (acks=all, zero duplicates)

// Highlights

  • Agent Maturity Model: Phase 1 (Incubation) built an MCP server with 5 tools + agent skills manifest using Claude Code as the Agent Factory; Phase 2 (Specialization) promoted the same tools to production @function_tool with Pydantic validation and error handling
  • Groq tool_use_failed fix — LLaMA 3.3 70B generated malformed XML tool calls; fixed with Literal types on all enum parameters and a 3-attempt retry loop with exponential backoff
  • Groq 429 rate-limit handling — instead of fixed backoff, the worker parses 'try again in Xm Ys' from the error message and sleeps the exact reset duration
  • Duplicate email prevention via contextvars — response_sent flag stored in ProcessingContext using Python contextvars; even if the agent calls send_response twice, only one email is delivered
  • asyncio.Queue fallback for Hugging Face Spaces — Kafka replaced by asyncio.Queue at runtime (USE_LOCAL_QUEUE=true), same worker code, no Java dependency in cloud
  • Cross-channel identity resolution — same customer on WhatsApp and Email recognized as one record via customer_identifiers table; full conversation history loaded across channels