LLM API Pricing Comparison 2025: The Complete Developer Guide

tl;dr: GPT-4o-mini and DeepSeek-V3 are the price-performance winners. Anthropic charges safety premiums. Together/Fireworks beat proprietary APIs at scale. NanoGPT undercuts everyone for budget builds.

Quick Price Reference
#

Model	Input	Output	Context
GPT-5 mini	$0.25	$2.00	128K
GPT-5.2	$1.75	$14.00	128K
GPT-4o	$2.50	$10.00	128K
GPT-4o-mini	$0.15	$0.60	128K
Claude 3.5 Sonnet	$3.00	$15.00	200K
Claude 3.5 Haiku	$0.25	$1.25	200K
Gemini 2.0 Flash	$0.10	$0.40	1M
Gemini 2.0 Pro	$3.50	$10.50	2M
DeepSeek-V3	$0.14	$0.28	64K
DeepSeek-R1	$0.55	$2.19	64K
Llama 3.1 70B	$0.20	$0.20	128K
OpenRouter	+15%	+15%	Varies

Per 1M tokens. February 2025.

Hidden Costs
#

1. Input vs Output Asymmetry
#

Most providers charge 3-10x more for output. Changes your cost structure completely.

Use Case	Input:Output	Cost Pattern
Classification	100:1	Cheap
Chat	1:0.3	Moderate
Code gen	1:3	Expensive
Long-form writing	1:10	Very expensive

Example with Claude 3.5 Sonnet ($3/$15):

Classification (1M in, 10K out): $3.15
Code gen (100K in, 300K out): $4.80

2. Google’s Context Trap
#

Gemini Flash advertises $0.10/$0.40. Cheapest on paper. But read the fine print.

0-128K context: $0.10/$0.40 (as advertised)
128K-1M context: Non-linear scaling
Long context requests: 2-4x actual cost

For big RAG with many chunks, Gemini’s advantage disappears.

3. Batch Discounts
#

OpenAI offers 50% off via Batch API. But:

24-hour turnaround
No streaming
No immediate response

Good for: Nightly reports, bulk processing
Bad for: Real-time features, interactive chat

4. Cached Input
#

Model	Standard	Cached	Savings
GPT-5 mini	$0.25	$0.025	90%
GPT-5.2	$1.75	$0.175	90%

Catch: Must reuse same context quickly. Good for multi-turn chats, document Q&A.

5. Minimum Top-Ups
#

Provider	Minimum
OpenAI	$5
Anthropic	$5
Google	$0
DeepSeek	$5
Together	$5

NanoGPT advantage: No minimums. Pay-as-you-go. Unified balance. Spend $10 testing five models without $25 tied up in minimums.

Provider Breakdown
#

OpenAI: Safe Default
#

Aggressive discounts on mini models. Premium on frontier.

Best value: GPT-4o-mini at $0.15/$0.60. Hard to beat for most tasks.

Gotchas:

o1/o4 models charge for “thinking tokens”
Realtime API is $4/$16 per 1M (voice apps get expensive)
Priority processing doubles cost

Use when: You need the default that just works.

Anthropic: Premium Option
#

2-3x more expensive than OpenAI equivalents.

What you pay for:

Safety tuning (industry-leading)
Long context that actually works (200K)
Honest “I don’t know” responses

Best value: Claude 3.5 Haiku at $0.25/$1.25. Competitive with GPT-4o-mini. Better instruction following.

Skip: Claude 3 Opus at $15/$75. Only for safety-critical use cases.

Use when: Reliability and safety matter more than cost.

Google Gemini: Long Context Play
#

Loss-leader pricing. Gemini Flash at $0.10/$0.40 is cheapest quality model.

Catch:

1M context sounds great. Quality degrades at length.
API reliability historically spotty
Docs and errors are frustrating

Best value: Flash for RAG under 100K context.

Use when: You genuinely need 500K+ context (rare) or cost is paramount.

DeepSeek: Open Source King
#

Undercuts everyone dramatically. DeepSeek-V3 at $0.14/$0.28 delivers GPT-4o quality at 1/10th cost.

Trade-offs:

Chinese company (compliance concerns)
API availability varies by region
Docs behind OpenAI

Best value: DeepSeek-V3 for high-volume production.

Use when: Cost matters more than geopolitical risk.

Together/Fireworks: Open Source Gateway
#

Competitive with direct open-source pricing plus convenience.

Best value: Llama 3.1 70B at ~$0.20/$0.20. GPT-4o quality at 1/5th cost.

Catch:

You pick models (no smart defaults)
Performance varies by model/load
No unified billing

Use when: You want open-source without managing infrastructure.

Recommendations By Use Case
#

Production Apps
#

Winner: GPT-4o-mini (OpenAI)

$0.15/$0.60. 99.9% uptime. Mature tooling. Pragmatic default.

Runner-up: DeepSeek-V3 if you can handle compliance ambiguity.

Prototyping
#

Winner: NanoGPT or OpenRouter

Access 20+ models without 20 accounts. Convenience markup worth avoiding setup friction.

Alternative: Gemini 2.0 Flash at $0.10/$0.40.

Code Generation
#

Winner: Claude 3.5 Sonnet (Anthropic)

Best-in-class for code. $3/$15 hurts but quality justifies it for dev tools.

Budget: DeepSeek-V3 or self-hosted CodeLlama.

Batch Processing
#

Winner: OpenAI Batch API with GPT-4o-mini

50% discount. $0.075/$0.30 per 1M. Unbeatable for overnight jobs.

Alternative: Self-hosted open-source if you have GPUs.

Vision
#

Winner: GPT-4o (OpenAI)

Still gold standard. $2.50/$10.00 hurts but alternatives lag significantly.

Budget: Gemini 2.0 Flash. Good enough for many tasks at 1/10th cost.

NanoGPT
#

NanoGPT sits between direct access and marked-up gateways:

Model	Direct	NanoGPT	OpenRouter
GPT-4o-mini	$0.15/$0.60	~$0.17/$0.65	~$0.17/$0.69
Claude Haiku	$0.25/$1.25	~$0.28/$1.40	~$0.29/$1.44
DeepSeek-V3	$0.14/$0.28	~$0.15/$0.31	~$0.16/$0.32

Real value: No minimums. Unified billing. Spending $50 across four models? Avoid 4x $5 top-ups. 40% better cash flow.

Optimization Checklist
#

Use the cheapest model that works. GPT-4o-mini handles 80% of tasks.
Cache prompts. 90% savings on repeated context.
Batch non-interactive work. 50% discount.
Monitor output ratios. Code generation burns credits fast.
Self-host above $500/month. Break-even at scale.
Negotiate enterprise rates. Everyone does discounts.

2025 Trends
#

Prices falling 20-30% year-over-year
Mini models good enough for most uses
Open-source closing the quality gap
Context windows expanding faster than pricing adapts

Predictions:

$0.05/$0.20 standard for quality small models by year-end
DeepSeek forces Western price cuts
Self-hosting becomes default for high-volume

February 2025. Pricing changes fast. Verify before committing.

Dive deeper:

DeepSeek-V3 review — my top budget pick for frontier models
NanoGPT review — simplest way to access multiple cheap models
Together.ai review — best for production open-source workloads
My $5 chatbot build — real-world cost optimization in action

Together.ai Review: The Open-Source Inference Powerhouse

707 words·4 mins

AI Infrastructure Together-Ai Open-Source-Llm Llm-Inference Api-Provider Llama Deepseek

Claude Opus 4.6 Review: The $175K/Year AI Analyst That Never Sleeps

734 words·4 mins

AI Models Claude Claude-Opus-4.6 Anthropic Agentic-Ai Enterprise-Ai Roi

Claude Opus 4.6: Benchmarks, Capabilities, and the Agentic Shift

1041 words·5 mins

AI Models Claude Claude-Opus-4.6 Anthropic Agentic-Ai Benchmarks Gpt-5.3-Codex

Quick Price Reference#

Hidden Costs#

1. Input vs Output Asymmetry#

2. Google’s Context Trap#

3. Batch Discounts#

4. Cached Input#

5. Minimum Top-Ups#

Provider Breakdown#

OpenAI: Safe Default#

Anthropic: Premium Option#

Google Gemini: Long Context Play#

DeepSeek: Open Source King#

Together/Fireworks: Open Source Gateway#

Recommendations By Use Case#

Production Apps#

Prototyping#

Code Generation#

Batch Processing#

Vision#

NanoGPT#

Optimization Checklist#

2025 Trends#

Related