Working with LLM APIs: A Practical Guide

Large language models aren’t just chatbots. They’re APIs you can integrate into your applications for text generation, analysis, classification, and more. Here’s how to work with them effectively.

The Basics: Making API Calls

Most LLM providers follow similar patterns. Here’s OpenAI:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import openai

client = openai.OpenAI(api_key="your-api-key")

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain DNS in one sentence."}
    ]
)

print(response.choices[0].message.content)

And Anthropic’s Claude:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain DNS in one sentence."}
    ]
)

print(response.content[0].text)

Same concept, slightly different syntax.

Structuring Prompts

The quality of your output depends heavily on your prompt structure.

System Prompts

Set the context and behavior:

1
2
3
4
5
6
system_prompt = """You are a code reviewer. Your job is to:
1. Identify bugs and security issues
2. Suggest improvements
3. Rate code quality from 1-10

Be concise but thorough. Format your response as JSON."""

Few-Shot Examples

Show the model what you want:

1
2
3
4
5
6
7
8
messages = [
    {"role": "system", "content": "Extract structured data from text."},
    {"role": "user", "content": "John Smith, 35, lives in NYC"},
    {"role": "assistant", "content": '{"name": "John Smith", "age": 35, "city": "NYC"}'},
    {"role": "user", "content": "Sarah Jones is 28 and works in London"},
    {"role": "assistant", "content": '{"name": "Sarah Jones", "age": 28, "city": "London"}'},
    {"role": "user", "content": "Mike Chen, age 42, based in Tokyo"}
]

The model learns the pattern and applies it to the final input.

Handling Responses

Streaming

For better UX, stream responses instead of waiting for completion:

1
2
3
4
5
6
7
8
9
stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a haiku about APIs"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

JSON Mode

Force structured output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
response = client.chat.completions.create(
    model="gpt-4-turbo",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Output valid JSON only."},
        {"role": "user", "content": "List 3 programming languages with their use cases"}
    ]
)

import json
data = json.loads(response.choices[0].message.content)

Production Patterns

Retry Logic

APIs fail. Handle it gracefully:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import time
from openai import RateLimitError, APIError

def call_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4",
                messages=messages
            )
        except RateLimitError:
            wait = 2 ** attempt  # Exponential backoff
            print(f"Rate limited, waiting {wait}s...")
            time.sleep(wait)
        except APIError as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
    raise Exception("Max retries exceeded")

Caching

Don’t call the API twice for the same input:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import hashlib
import json
from functools import lru_cache

def get_cache_key(messages):
    return hashlib.md5(json.dumps(messages).encode()).hexdigest()

# Simple in-memory cache
cache = {}

def cached_completion(messages):
    key = get_cache_key(messages)
    if key in cache:
        return cache[key]
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=messages
    )
    cache[key] = response.choices[0].message.content
    return cache[key]

For production, use Redis or a database.

Cost Control

Track and limit spending:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
def estimate_cost(messages, model="gpt-4"):
    # Rough token estimation (actual varies)
    text = " ".join(m["content"] for m in messages)
    estimated_tokens = len(text) / 4
    
    # Prices per 1K tokens (check current pricing)
    prices = {
        "gpt-4": {"input": 0.03, "output": 0.06},
        "gpt-3.5-turbo": {"input": 0.001, "output": 0.002}
    }
    
    return (estimated_tokens / 1000) * prices[model]["input"]

# Set a budget
DAILY_BUDGET = 10.0
daily_spend = 0.0

def budget_check(estimated_cost):
    global daily_spend
    if daily_spend + estimated_cost > DAILY_BUDGET:
        raise Exception("Daily budget exceeded")
    daily_spend += estimated_cost

Practical Use Cases

Text Classification

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
def classify_support_ticket(ticket_text):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": """Classify support tickets into categories:
            - billing
            - technical
            - account
            - feature_request
            - other
            
            Respond with only the category name."""},
            {"role": "user", "content": ticket_text}
        ]
    )
    return response.choices[0].message.content.strip().lower()

Content Generation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def generate_product_description(product_name, features):
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "Write compelling product descriptions. Be concise, highlight benefits, use active voice."},
            {"role": "user", "content": f"Product: {product_name}\nFeatures: {', '.join(features)}"}
        ],
        max_tokens=200
    )
    return response.choices[0].message.content

Code Analysis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def review_code(code, language):
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": """You are a senior code reviewer. Analyze code for:
            1. Bugs and potential issues
            2. Security vulnerabilities  
            3. Performance concerns
            4. Style and readability
            
            Format as markdown with sections."""},
            {"role": "user", "content": f"```{language}\n{code}\n```"}
        ]
    )
    return response.choices[0].message.content

Testing LLM Integrations

LLMs are non-deterministic. Test patterns, not exact outputs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import pytest

def test_classification_returns_valid_category():
    categories = ["billing", "technical", "account", "feature_request", "other"]
    
    result = classify_support_ticket("I can't log into my account")
    assert result in categories

def test_classification_technical_ticket():
    result = classify_support_ticket("The API returns 500 errors")
    assert result == "technical"

def test_product_description_contains_product_name():
    result = generate_product_description("SuperWidget", ["fast", "reliable"])
    assert "SuperWidget" in result or "widget" in result.lower()

The Bottom Line

LLM APIs are powerful but require careful handling:

Structure your prompts — system messages, few-shot examples, clear instructions
Handle failures — retry logic, timeouts, fallbacks
Control costs — caching, model selection, budget limits
Test thoughtfully — patterns over exact matches

Start simple, iterate based on real outputs, and always have a human review critical use cases.

Building with LLM APIs? Questions? Find me on Twitter.

The Basics: Making API Calls#

Structuring Prompts#

System Prompts#

Few-Shot Examples#

Handling Responses#

Streaming#

JSON Mode#

Production Patterns#

Retry Logic#

Caching#

Cost Control#

Practical Use Cases#

Text Classification#

Content Generation#

Code Analysis#

Testing LLM Integrations#

The Bottom Line#

📬 Get the Newsletter