Large language models aren’t just chatbots. They’re APIs you can integrate into your applications for text generation, analysis, classification, and more. Here’s how to work with them effectively.
The Basics: Making API Calls#
Most LLM providers follow similar patterns. Here’s OpenAI:
1
2
3
4
5
6
7
8
9
10
11
12
13
| import openai
client = openai.OpenAI(api_key="your-api-key")
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain DNS in one sentence."}
]
)
print(response.choices[0].message.content)
|
And Anthropic’s Claude:
1
2
3
4
5
6
7
8
9
10
11
12
13
| import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain DNS in one sentence."}
]
)
print(response.content[0].text)
|
Same concept, slightly different syntax.
Structuring Prompts#
The quality of your output depends heavily on your prompt structure.
System Prompts#
Set the context and behavior:
1
2
3
4
5
6
| system_prompt = """You are a code reviewer. Your job is to:
1. Identify bugs and security issues
2. Suggest improvements
3. Rate code quality from 1-10
Be concise but thorough. Format your response as JSON."""
|
Few-Shot Examples#
Show the model what you want:
1
2
3
4
5
6
7
8
| messages = [
{"role": "system", "content": "Extract structured data from text."},
{"role": "user", "content": "John Smith, 35, lives in NYC"},
{"role": "assistant", "content": '{"name": "John Smith", "age": 35, "city": "NYC"}'},
{"role": "user", "content": "Sarah Jones is 28 and works in London"},
{"role": "assistant", "content": '{"name": "Sarah Jones", "age": 28, "city": "London"}'},
{"role": "user", "content": "Mike Chen, age 42, based in Tokyo"}
]
|
The model learns the pattern and applies it to the final input.
Handling Responses#
Streaming#
For better UX, stream responses instead of waiting for completion:
1
2
3
4
5
6
7
8
9
| stream = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Write a haiku about APIs"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
|
JSON Mode#
Force structured output:
1
2
3
4
5
6
7
8
9
10
11
| response = client.chat.completions.create(
model="gpt-4-turbo",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": "Output valid JSON only."},
{"role": "user", "content": "List 3 programming languages with their use cases"}
]
)
import json
data = json.loads(response.choices[0].message.content)
|
Production Patterns#
Retry Logic#
APIs fail. Handle it gracefully:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| import time
from openai import RateLimitError, APIError
def call_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4",
messages=messages
)
except RateLimitError:
wait = 2 ** attempt # Exponential backoff
print(f"Rate limited, waiting {wait}s...")
time.sleep(wait)
except APIError as e:
if attempt == max_retries - 1:
raise
time.sleep(1)
raise Exception("Max retries exceeded")
|
Caching#
Don’t call the API twice for the same input:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| import hashlib
import json
from functools import lru_cache
def get_cache_key(messages):
return hashlib.md5(json.dumps(messages).encode()).hexdigest()
# Simple in-memory cache
cache = {}
def cached_completion(messages):
key = get_cache_key(messages)
if key in cache:
return cache[key]
response = client.chat.completions.create(
model="gpt-4",
messages=messages
)
cache[key] = response.choices[0].message.content
return cache[key]
|
For production, use Redis or a database.
Cost Control#
Track and limit spending:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| def estimate_cost(messages, model="gpt-4"):
# Rough token estimation (actual varies)
text = " ".join(m["content"] for m in messages)
estimated_tokens = len(text) / 4
# Prices per 1K tokens (check current pricing)
prices = {
"gpt-4": {"input": 0.03, "output": 0.06},
"gpt-3.5-turbo": {"input": 0.001, "output": 0.002}
}
return (estimated_tokens / 1000) * prices[model]["input"]
# Set a budget
DAILY_BUDGET = 10.0
daily_spend = 0.0
def budget_check(estimated_cost):
global daily_spend
if daily_spend + estimated_cost > DAILY_BUDGET:
raise Exception("Daily budget exceeded")
daily_spend += estimated_cost
|
Practical Use Cases#
Text Classification#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| def classify_support_ticket(ticket_text):
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": """Classify support tickets into categories:
- billing
- technical
- account
- feature_request
- other
Respond with only the category name."""},
{"role": "user", "content": ticket_text}
]
)
return response.choices[0].message.content.strip().lower()
|
Content Generation#
1
2
3
4
5
6
7
8
9
10
| def generate_product_description(product_name, features):
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Write compelling product descriptions. Be concise, highlight benefits, use active voice."},
{"role": "user", "content": f"Product: {product_name}\nFeatures: {', '.join(features)}"}
],
max_tokens=200
)
return response.choices[0].message.content
|
Code Analysis#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| def review_code(code, language):
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": """You are a senior code reviewer. Analyze code for:
1. Bugs and potential issues
2. Security vulnerabilities
3. Performance concerns
4. Style and readability
Format as markdown with sections."""},
{"role": "user", "content": f"```{language}\n{code}\n```"}
]
)
return response.choices[0].message.content
|
Testing LLM Integrations#
LLMs are non-deterministic. Test patterns, not exact outputs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| import pytest
def test_classification_returns_valid_category():
categories = ["billing", "technical", "account", "feature_request", "other"]
result = classify_support_ticket("I can't log into my account")
assert result in categories
def test_classification_technical_ticket():
result = classify_support_ticket("The API returns 500 errors")
assert result == "technical"
def test_product_description_contains_product_name():
result = generate_product_description("SuperWidget", ["fast", "reliable"])
assert "SuperWidget" in result or "widget" in result.lower()
|
The Bottom Line#
LLM APIs are powerful but require careful handling:
- Structure your prompts — system messages, few-shot examples, clear instructions
- Handle failures — retry logic, timeouts, fallbacks
- Control costs — caching, model selection, budget limits
- Test thoughtfully — patterns over exact matches
Start simple, iterate based on real outputs, and always have a human review critical use cases.
Building with LLM APIs? Questions? Find me on Twitter.