tl;dr: GPT-4o-mini and DeepSeek-V3 are the price-performance winners. Anthropic charges safety premiums. Together/Fireworks beat proprietary APIs at scale. NanoGPT undercuts everyone for budget builds.
Quick Price Reference#
| Model | Input | Output | Context |
|---|---|---|---|
| GPT-5 mini | $0.25 | $2.00 | 128K |
| GPT-5.2 | $1.75 | $14.00 | 128K |
| GPT-4o | $2.50 | $10.00 | 128K |
| GPT-4o-mini | $0.15 | $0.60 | 128K |
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200K |
| Claude 3.5 Haiku | $0.25 | $1.25 | 200K |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M |
| Gemini 2.0 Pro | $3.50 | $10.50 | 2M |
| DeepSeek-V3 | $0.14 | $0.28 | 64K |
| DeepSeek-R1 | $0.55 | $2.19 | 64K |
| Llama 3.1 70B | $0.20 | $0.20 | 128K |
| OpenRouter | +15% | +15% | Varies |
Per 1M tokens. February 2025.
Hidden Costs#
1. Input vs Output Asymmetry#
Most providers charge 3-10x more for output. Changes your cost structure completely.
| Use Case | Input:Output | Cost Pattern |
|---|---|---|
| Classification | 100:1 | Cheap |
| Chat | 1:0.3 | Moderate |
| Code gen | 1:3 | Expensive |
| Long-form writing | 1:10 | Very expensive |
Example with Claude 3.5 Sonnet ($3/$15):
- Classification (1M in, 10K out): $3.15
- Code gen (100K in, 300K out): $4.80
2. Google’s Context Trap#
Gemini Flash advertises $0.10/$0.40. Cheapest on paper. But read the fine print.
- 0-128K context: $0.10/$0.40 (as advertised)
- 128K-1M context: Non-linear scaling
- Long context requests: 2-4x actual cost
For big RAG with many chunks, Gemini’s advantage disappears.
3. Batch Discounts#
OpenAI offers 50% off via Batch API. But:
- 24-hour turnaround
- No streaming
- No immediate response
Good for: Nightly reports, bulk processing
Bad for: Real-time features, interactive chat
4. Cached Input#
| Model | Standard | Cached | Savings |
|---|---|---|---|
| GPT-5 mini | $0.25 | $0.025 | 90% |
| GPT-5.2 | $1.75 | $0.175 | 90% |
Catch: Must reuse same context quickly. Good for multi-turn chats, document Q&A.
5. Minimum Top-Ups#
| Provider | Minimum |
|---|---|
| OpenAI | $5 |
| Anthropic | $5 |
| $0 | |
| DeepSeek | $5 |
| Together | $5 |
NanoGPT advantage: No minimums. Pay-as-you-go. Unified balance. Spend $10 testing five models without $25 tied up in minimums.
Provider Breakdown#
OpenAI: Safe Default#
Aggressive discounts on mini models. Premium on frontier.
Best value: GPT-4o-mini at $0.15/$0.60. Hard to beat for most tasks.
Gotchas:
- o1/o4 models charge for “thinking tokens”
- Realtime API is $4/$16 per 1M (voice apps get expensive)
- Priority processing doubles cost
Use when: You need the default that just works.
Anthropic: Premium Option#
2-3x more expensive than OpenAI equivalents.
What you pay for:
- Safety tuning (industry-leading)
- Long context that actually works (200K)
- Honest “I don’t know” responses
Best value: Claude 3.5 Haiku at $0.25/$1.25. Competitive with GPT-4o-mini. Better instruction following.
Skip: Claude 3 Opus at $15/$75. Only for safety-critical use cases.
Use when: Reliability and safety matter more than cost.
Google Gemini: Long Context Play#
Loss-leader pricing. Gemini Flash at $0.10/$0.40 is cheapest quality model.
Catch:
- 1M context sounds great. Quality degrades at length.
- API reliability historically spotty
- Docs and errors are frustrating
Best value: Flash for RAG under 100K context.
Use when: You genuinely need 500K+ context (rare) or cost is paramount.
DeepSeek: Open Source King#
Undercuts everyone dramatically. DeepSeek-V3 at $0.14/$0.28 delivers GPT-4o quality at 1/10th cost.
Trade-offs:
- Chinese company (compliance concerns)
- API availability varies by region
- Docs behind OpenAI
Best value: DeepSeek-V3 for high-volume production.
Use when: Cost matters more than geopolitical risk.
Together/Fireworks: Open Source Gateway#
Competitive with direct open-source pricing plus convenience.
Best value: Llama 3.1 70B at ~$0.20/$0.20. GPT-4o quality at 1/5th cost.
Catch:
- You pick models (no smart defaults)
- Performance varies by model/load
- No unified billing
Use when: You want open-source without managing infrastructure.
Recommendations By Use Case#
Production Apps#
Winner: GPT-4o-mini (OpenAI)
$0.15/$0.60. 99.9% uptime. Mature tooling. Pragmatic default.
Runner-up: DeepSeek-V3 if you can handle compliance ambiguity.
Prototyping#
Winner: NanoGPT or OpenRouter
Access 20+ models without 20 accounts. Convenience markup worth avoiding setup friction.
Alternative: Gemini 2.0 Flash at $0.10/$0.40.
Code Generation#
Winner: Claude 3.5 Sonnet (Anthropic)
Best-in-class for code. $3/$15 hurts but quality justifies it for dev tools.
Budget: DeepSeek-V3 or self-hosted CodeLlama.
Batch Processing#
Winner: OpenAI Batch API with GPT-4o-mini
50% discount. $0.075/$0.30 per 1M. Unbeatable for overnight jobs.
Alternative: Self-hosted open-source if you have GPUs.
Vision#
Winner: GPT-4o (OpenAI)
Still gold standard. $2.50/$10.00 hurts but alternatives lag significantly.
Budget: Gemini 2.0 Flash. Good enough for many tasks at 1/10th cost.
NanoGPT#
NanoGPT sits between direct access and marked-up gateways:
| Model | Direct | NanoGPT | OpenRouter |
|---|---|---|---|
| GPT-4o-mini | $0.15/$0.60 | ~$0.17/$0.65 | ~$0.17/$0.69 |
| Claude Haiku | $0.25/$1.25 | ~$0.28/$1.40 | ~$0.29/$1.44 |
| DeepSeek-V3 | $0.14/$0.28 | ~$0.15/$0.31 | ~$0.16/$0.32 |
Real value: No minimums. Unified billing. Spending $50 across four models? Avoid 4x $5 top-ups. 40% better cash flow.
Optimization Checklist#
- Use the cheapest model that works. GPT-4o-mini handles 80% of tasks.
- Cache prompts. 90% savings on repeated context.
- Batch non-interactive work. 50% discount.
- Monitor output ratios. Code generation burns credits fast.
- Self-host above $500/month. Break-even at scale.
- Negotiate enterprise rates. Everyone does discounts.
2025 Trends#
- Prices falling 20-30% year-over-year
- Mini models good enough for most uses
- Open-source closing the quality gap
- Context windows expanding faster than pricing adapts
Predictions:
- $0.05/$0.20 standard for quality small models by year-end
- DeepSeek forces Western price cuts
- Self-hosting becomes default for high-volume
February 2025. Pricing changes fast. Verify before committing.
Dive deeper:
- DeepSeek-V3 review — my top budget pick for frontier models
- NanoGPT review — simplest way to access multiple cheap models
- Together.ai review — best for production open-source workloads
- My $5 chatbot build — real-world cost optimization in action
