Voice AI has matured significantly. VAPI makes it straightforward to build voice assistants that can actually do things—not just chat, but call APIs, look up data, and take actions.

Why VAPI?

VAPI handles the hard parts of voice:

  • Speech-to-text transcription
  • LLM integration (OpenAI, Anthropic, custom)
  • Text-to-speech with natural voices (ElevenLabs, etc.)
  • Real-time streaming for low latency
  • Tool/function calling during conversations

You focus on what your assistant does. VAPI handles how it speaks and listens.

Basic Architecture

UserspeaksVAPI(STT)LLMToolcallsYourwebhookResultsLLMVAPI(TTS)Userhears

Your webhook receives tool calls, executes them, and returns results. The LLM incorporates results into its response.

Setting Up an Assistant

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Create assistant via API
curl -X POST "https://api.vapi.ai/assistant" \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Atlas",
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "messages": [{
        "role": "system",
        "content": "You are Atlas, a helpful voice assistant. Keep responses concise for voice."
      }],
      "temperature": 0.7
    },
    "voice": {
      "provider": "11labs",
      "voiceId": "your-voice-id"
    },
    "serverUrl": "https://your-domain.com/vapi/webhook"
  }'

Adding Tools

Define tools the assistant can call:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
  "model": {
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"}
            },
            "required": ["location"]
          }
        }
      },
      {
        "type": "function", 
        "function": {
          "name": "check_email",
          "description": "Check for unread emails",
          "parameters": {"type": "object", "properties": {}}
        }
      }
    ]
  }
}

Webhook Handler (FastAPI)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from fastapi import FastAPI
import httpx

app = FastAPI()

@app.post("/vapi/webhook")
async def vapi_webhook(request: dict):
    message = request.get("message", {})
    msg_type = message.get("type")
    
    if msg_type == "tool-calls":
        tool_calls = message.get("toolCalls", [])
        results = []
        
        for tc in tool_calls:
            func = tc.get("function", {})
            name = func.get("name")
            args = json.loads(func.get("arguments", "{}"))
            
            result = await handle_tool(name, args)
            results.append({
                "toolCallId": tc.get("id"),
                "result": json.dumps(result)
            })
        
        return {"results": results}
    
    return {"status": "ok"}


async def handle_tool(name: str, args: dict) -> dict:
    if name == "get_weather":
        location = args.get("location", "New York")
        # Call weather API
        async with httpx.AsyncClient() as client:
            resp = await client.get(f"https://api.weather.com/...")
            data = resp.json()
            return {"response": f"Currently {data['temp']}°F in {location}"}
    
    elif name == "check_email":
        # Check email via IMAP or API
        count = await get_unread_count()
        return {"response": f"You have {count} unread emails"}
    
    return {"response": "Unknown tool"}

Webhook Security

VAPI sends requests to your webhook—make sure it’s secure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import hmac
import hashlib

def verify_vapi_signature(payload: bytes, signature: str, secret: str) -> bool:
    expected = hmac.new(
        secret.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

@app.post("/vapi/webhook")
async def webhook(request: Request):
    body = await request.body()
    signature = request.headers.get("x-vapi-signature", "")
    
    if not verify_vapi_signature(body, signature, WEBHOOK_SECRET):
        raise HTTPException(status_code=401)
    
    # Process request...

Exposing Your Webhook

For development, use a tunnel:

1
2
3
4
5
# Cloudflared (recommended for production)
cloudflared tunnel --url http://localhost:8095

# Or ngrok for quick testing
ngrok http 8095

For production, deploy behind a proper domain with SSL.

Choosing an LLM Provider

VAPI supports multiple providers:

1
2
3
4
5
6
7
8
// OpenAI
{"provider": "openai", "model": "gpt-4o"}

// Anthropic (requires API key in VAPI credentials)
{"provider": "anthropic", "model": "claude-3-5-sonnet-20241022"}

// Custom (route to your own endpoint)
{"provider": "custom-llm", "url": "https://your-llm.com/v1/chat"}

Note: Using Anthropic requires adding your API key to VAPI’s credential settings—it’s not automatic.

Voice Selection

Natural-sounding voices matter for UX:

1
2
3
4
5
6
7
8
{
  "voice": {
    "provider": "11labs",
    "voiceId": "onwK4e9ZLuTAKqWW03F9",
    "stability": 0.5,
    "similarityBoost": 0.75
  }
}

ElevenLabs offers the most natural voices. VAPI also supports Azure, PlayHT, and others.

Handling Conversation Context

VAPI maintains conversation state, but you can inject context:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
@app.post("/vapi/webhook")
async def webhook(request: dict):
    msg_type = request.get("message", {}).get("type")
    
    # Inject customer data at call start
    if msg_type == "assistant-request":
        caller = request.get("call", {}).get("customer", {})
        phone = caller.get("number", "")
        
        # Look up customer
        customer = await get_customer_by_phone(phone)
        
        return {
            "assistant": {
                "model": {
                    "messages": [{
                        "role": "system",
                        "content": f"Customer: {customer['name']}. Account status: {customer['status']}."
                    }]
                }
            }
        }

Error Handling

Voice UX requires graceful degradation:

1
2
3
4
5
6
7
8
9
async def handle_tool(name: str, args: dict) -> dict:
    try:
        result = await execute_tool(name, args)
        return {"response": result}
    except TimeoutError:
        return {"response": "That's taking longer than expected. Let me try again."}
    except Exception as e:
        logger.error(f"Tool {name} failed: {e}")
        return {"response": "I wasn't able to complete that action."}

Testing Your Assistant

VAPI provides a web-based testing interface, but you can also test programmatically:

1
2
3
4
5
6
7
8
9
# Start a test call
response = requests.post(
    "https://api.vapi.ai/call/web",
    headers={"Authorization": f"Bearer {VAPI_API_KEY}"},
    json={"assistantId": "your-assistant-id"}
)

call_url = response.json()["webCallUrl"]
print(f"Test at: {call_url}")

Production Checklist

  1. Webhook reliability — Use a queue for async tool execution
  2. Latency — Keep tool responses under 3 seconds
  3. Error messages — Make them conversational, not technical
  4. Logging — Record all interactions for debugging
  5. Rate limiting — Protect against abuse
  6. Fallbacks — Have defaults when tools fail

Voice AI is no longer science fiction. With VAPI handling the voice pipeline and your webhook handling the logic, you can build surprisingly capable voice assistants in a weekend.

The key insight: treat tools like API endpoints. If you can build a REST API, you can build a voice assistant.