Large language models aren’t just chatbots. They’re APIs you can integrate into your applications for text generation, analysis, classification, and more. Here’s how to work with them effectively.

The Basics: Making API Calls

Most LLM providers follow similar patterns. Here’s OpenAI:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import openai

client = openai.OpenAI(api_key="your-api-key")

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain DNS in one sentence."}
    ]
)

print(response.choices[0].message.content)

And Anthropic’s Claude:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain DNS in one sentence."}
    ]
)

print(response.content[0].text)

Same concept, slightly different syntax.

Structuring Prompts

The quality of your output depends heavily on your prompt structure.

System Prompts

Set the context and behavior:

1
2
3
4
5
6
system_prompt = """You are a code reviewer. Your job is to:
1. Identify bugs and security issues
2. Suggest improvements
3. Rate code quality from 1-10

Be concise but thorough. Format your response as JSON."""

Few-Shot Examples

Show the model what you want:

1
2
3
4
5
6
7
8
messages = [
    {"role": "system", "content": "Extract structured data from text."},
    {"role": "user", "content": "John Smith, 35, lives in NYC"},
    {"role": "assistant", "content": '{"name": "John Smith", "age": 35, "city": "NYC"}'},
    {"role": "user", "content": "Sarah Jones is 28 and works in London"},
    {"role": "assistant", "content": '{"name": "Sarah Jones", "age": 28, "city": "London"}'},
    {"role": "user", "content": "Mike Chen, age 42, based in Tokyo"}
]

The model learns the pattern and applies it to the final input.

Handling Responses

Streaming

For better UX, stream responses instead of waiting for completion:

1
2
3
4
5
6
7
8
9
stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a haiku about APIs"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

JSON Mode

Force structured output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
response = client.chat.completions.create(
    model="gpt-4-turbo",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Output valid JSON only."},
        {"role": "user", "content": "List 3 programming languages with their use cases"}
    ]
)

import json
data = json.loads(response.choices[0].message.content)

Production Patterns

Retry Logic

APIs fail. Handle it gracefully:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import time
from openai import RateLimitError, APIError

def call_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4",
                messages=messages
            )
        except RateLimitError:
            wait = 2 ** attempt  # Exponential backoff
            print(f"Rate limited, waiting {wait}s...")
            time.sleep(wait)
        except APIError as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
    raise Exception("Max retries exceeded")

Caching

Don’t call the API twice for the same input:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import hashlib
import json
from functools import lru_cache

def get_cache_key(messages):
    return hashlib.md5(json.dumps(messages).encode()).hexdigest()

# Simple in-memory cache
cache = {}

def cached_completion(messages):
    key = get_cache_key(messages)
    if key in cache:
        return cache[key]
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=messages
    )
    cache[key] = response.choices[0].message.content
    return cache[key]

For production, use Redis or a database.

Cost Control

Track and limit spending:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
def estimate_cost(messages, model="gpt-4"):
    # Rough token estimation (actual varies)
    text = " ".join(m["content"] for m in messages)
    estimated_tokens = len(text) / 4
    
    # Prices per 1K tokens (check current pricing)
    prices = {
        "gpt-4": {"input": 0.03, "output": 0.06},
        "gpt-3.5-turbo": {"input": 0.001, "output": 0.002}
    }
    
    return (estimated_tokens / 1000) * prices[model]["input"]

# Set a budget
DAILY_BUDGET = 10.0
daily_spend = 0.0

def budget_check(estimated_cost):
    global daily_spend
    if daily_spend + estimated_cost > DAILY_BUDGET:
        raise Exception("Daily budget exceeded")
    daily_spend += estimated_cost

Practical Use Cases

Text Classification

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
def classify_support_ticket(ticket_text):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": """Classify support tickets into categories:
            - billing
            - technical
            - account
            - feature_request
            - other
            
            Respond with only the category name."""},
            {"role": "user", "content": ticket_text}
        ]
    )
    return response.choices[0].message.content.strip().lower()

Content Generation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def generate_product_description(product_name, features):
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "Write compelling product descriptions. Be concise, highlight benefits, use active voice."},
            {"role": "user", "content": f"Product: {product_name}\nFeatures: {', '.join(features)}"}
        ],
        max_tokens=200
    )
    return response.choices[0].message.content

Code Analysis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def review_code(code, language):
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": """You are a senior code reviewer. Analyze code for:
            1. Bugs and potential issues
            2. Security vulnerabilities  
            3. Performance concerns
            4. Style and readability
            
            Format as markdown with sections."""},
            {"role": "user", "content": f"```{language}\n{code}\n```"}
        ]
    )
    return response.choices[0].message.content

Testing LLM Integrations

LLMs are non-deterministic. Test patterns, not exact outputs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import pytest

def test_classification_returns_valid_category():
    categories = ["billing", "technical", "account", "feature_request", "other"]
    
    result = classify_support_ticket("I can't log into my account")
    assert result in categories

def test_classification_technical_ticket():
    result = classify_support_ticket("The API returns 500 errors")
    assert result == "technical"

def test_product_description_contains_product_name():
    result = generate_product_description("SuperWidget", ["fast", "reliable"])
    assert "SuperWidget" in result or "widget" in result.lower()

The Bottom Line

LLM APIs are powerful but require careful handling:

  1. Structure your prompts — system messages, few-shot examples, clear instructions
  2. Handle failures — retry logic, timeouts, fallbacks
  3. Control costs — caching, model selection, budget limits
  4. Test thoughtfully — patterns over exact matches

Start simple, iterate based on real outputs, and always have a human review critical use cases.


Building with LLM APIs? Questions? Find me on Twitter.