LLM API Pricing 2026: 20+ Models, Cost Per Token

Q: How does DeepSeek-V3 pricing compare to GPT-4.1 and Claude Sonnet 4.6?

DeepSeek-V3 hosted on DeepInfra costs $0.27 per million input tokens and $1.10 per million output tokens. GPT-4.1 costs $2/$8. Claude Sonnet 4.6 costs $3/$15. DeepSeek-V3 is approximately 7x cheaper than GPT-4.1 and 10-14x cheaper than Sonnet 4.6 per token. V3 sits near the top of LiveCodeBench and matches GPT-4.1 on many coding benchmarks. The gap widens on multi-step agents and frontier-grade reasoning, where Claude Sonnet 4.6 and GPT-4.1 still lead.

Full Model Pricing Comparison

For the complete multi-provider pricing table with GPT-4.1, Claude Opus, Gemini Pro, Llama 4, and 15+ models compared, see LLM Pricing 2026: Every Model from $0.01 to $75/1M. This page covers token cost fundamentals and batch/caching discount calculations.

LLM API pricing changes constantly. New models launch, old ones get cheaper, and providers quietly adjust rates between announcements. This page tracks every major model's cost per million tokens, updated for April 2026.

Whether you're picking a model for a new project, budgeting compute costs, or comparing providers for a procurement decision, the tables below give you the numbers you need without digging through five different pricing pages.

Master Pricing Table: All Major LLM APIs (April 2026)

Prices are per 1 million tokens. Input is what you send to the model. Output is what the model generates back. Every model charges differently for each direction because output tokens require more compute.

Provider	Model	Input / 1M Tokens	Output / 1M Tokens	Context Window
OpenAI
OpenAI	GPT-5	$1.25	$10.00	128K
OpenAI	GPT-4.1	$2.00	$8.00	1M
OpenAI	GPT-4.1 Mini	$0.40	$1.60	1M
OpenAI	GPT-4.1 Nano	$0.10	$0.40	1M
OpenAI	GPT-4o	$2.50	$10.00	128K
OpenAI	GPT-4o Mini	$0.15	$0.60	128K
OpenAI	o3	$10.00	$40.00	200K
OpenAI	o4-mini	$1.10	$4.40	200K
Anthropic
Anthropic	Claude Opus 4.6	$15.00	$75.00	1M
Anthropic	Claude Sonnet 4.6	$3.00	$15.00	1M
Anthropic	Claude Haiku 4.5	$0.80	$4.00	200K
Google
Google	Gemini 2.5 Pro	$1.25	$10.00	1M
Google	Gemini 2.0 Flash	$0.10	$0.40	1M
Mistral
Mistral	Mistral Large	$2.00	$6.00	128K
Mistral	Mistral Small	$0.10	$0.30	128K
Cohere
Cohere	Command R+	$2.50	$10.00	128K
Cohere	Command R	$0.15	$0.60	128K

Note on Gemini 2.5 Pro: Google charges $1.25/$10 for prompts over 200K tokens. Under 200K, input drops to $0.625 and output to $5.00. The table shows the higher tier since most production use cases hit the 200K+ range with system prompts and context.

Models by Budget Tier

Not every project needs a frontier model. Here is how models break down by cost, so you can match your budget to the right capability level.

Under $1 per 1M Input Tokens (Budget Tier)

These models handle classification, extraction, summarization, and simple chat at rock-bottom prices.

Model	Input / 1M	Output / 1M	Best For
GPT-4.1 Nano	$0.10	$0.40	High-volume classification, simple extraction
Gemini 2.0 Flash	$0.10	$0.40	Fast inference, multimodal on a budget
Mistral Small	$0.10	$0.30	Lightweight European-hosted tasks
GPT-4o Mini	$0.15	$0.60	General-purpose cheap model
Command R	$0.15	$0.60	RAG-optimized retrieval tasks
GPT-4.1 Mini	$0.40	$1.60	Coding and instruction following on a budget
Claude Haiku 4.5	$0.80	$4.00	Fast responses, customer-facing chat

$1 to $5 per 1M Input Tokens (Mid-Range)

The sweet spot for most production applications. These models handle complex reasoning, coding, and multi-step tasks reliably.

Model	Input / 1M	Output / 1M	Best For
o4-mini	$1.10	$4.40	Reasoning tasks at mid-range cost
GPT-5	$1.25	$10.00	Frontier general intelligence
Gemini 2.5 Pro	$1.25	$10.00	Long-context analysis, multimodal
GPT-4.1	$2.00	$8.00	Coding, long-context, instruction following
Mistral Large	$2.00	$6.00	European data residency, multilingual
GPT-4o	$2.50	$10.00	Multimodal (vision + text)
Command R+	$2.50	$10.00	Enterprise RAG, grounded generation
Claude Sonnet 4.6	$3.00	$15.00	Coding, analysis, agentic workflows

$5+ per 1M Input Tokens (Premium)

Model	Input / 1M	Output / 1M	Best For
o3	$10.00	$40.00	Hard reasoning, math, science problems
Claude Opus 4.6	$15.00	$75.00	Complex agentic tasks, deep analysis

Premium models are rarely needed for production workloads. Use them for difficult reasoning tasks, complex code generation, or when accuracy on edge cases justifies the 10-50x cost increase over mid-range options.

Batch API Discounts

If your workload can tolerate latency (minutes to hours instead of seconds), batch APIs cut costs significantly.

Provider	Batch Discount	Typical Latency	How It Works
OpenAI	50% off all models	Up to 24 hours	Submit JSONL file, results returned asynchronously. Available for all GPT and o-series models.
Anthropic	50% off all models	Up to 24 hours	Message Batches API. Submit up to 100,000 requests per batch. Results within 24 hours.
Google	50% off Gemini models	Up to 24 hours	BatchGenerateContent API. Minimum 2x discount on all Gemini models through Vertex AI.
Mistral	Variable	Varies	Batch inference available through La Plateforme. Discount varies by volume commitment.

With batch pricing, GPT-4.1 drops to $1.00 input / $4.00 output per million tokens. Claude Sonnet 4.6 drops to $1.50 / $7.50. These are significant savings for data processing pipelines, evaluation runs, and content generation at scale.

Prompt Caching

Prompt caching reduces costs when you send the same system prompt or context prefix repeatedly. Instead of reprocessing identical tokens every call, the provider caches them and charges a reduced rate.

Provider	Cache Write Cost	Cache Read Discount	TTL	Min Tokens
OpenAI	Free (automatic)	50% off input	5-10 min	1,024
Anthropic	25% surcharge on first write	90% off input	5 min (refreshes on hit)	1,024 (Haiku), 2,048 (Sonnet/Opus)
Google	Same as input	75% off input	Configurable	32,768

Anthropic's caching is the most aggressive: 90% off cached input tokens means a long system prompt that costs $3.00/1M on Sonnet 4.6 drops to $0.30/1M on cache hits. The 25% write surcharge pays for itself after just a few requests. OpenAI's caching is automatic (no code changes needed) but gives a smaller discount. Google requires the most tokens before caching kicks in but offers configurable TTL.

Cost Per 1K Tokens (Conversion Table)

Some documentation and older pricing pages still reference cost per 1,000 tokens. To convert: divide the per-1M price by 1,000.

Model	Input / 1K Tokens	Output / 1K Tokens
GPT-4.1 Nano	$0.0001	$0.0004
Gemini 2.0 Flash	$0.0001	$0.0004
GPT-4o Mini	$0.00015	$0.0006
GPT-4.1 Mini	$0.0004	$0.0016
Claude Haiku 4.5	$0.0008	$0.004
GPT-5	$0.00125	$0.01
GPT-4.1	$0.002	$0.008
Claude Sonnet 4.6	$0.003	$0.015
GPT-4o	$0.0025	$0.01
o3	$0.01	$0.04
Claude Opus 4.6	$0.015	$0.075

Per-1K pricing looks deceptively cheap. Always multiply by 1,000 to understand real costs at scale. A chatbot handling 1 million tokens per day at $0.002/1K input costs $2/day or $60/month just for input tokens.

How to Estimate Your Monthly API Costs

Use this formula to budget your LLM spend before committing to a provider.

Monthly Cost = (Daily Requests x Avg Input Tokens x Input Price/1M) + (Daily Requests x Avg Output Tokens x Output Price/1M) x 30

Example 1: Customer Support Chatbot

500 conversations/day, 800 input tokens avg (system prompt + user message), 400 output tokens avg
Using Claude Sonnet 4.6 ($3/$15 per 1M)
Input: 500 x 800 = 400,000 tokens/day = $1.20/day
Output: 500 x 400 = 200,000 tokens/day = $3.00/day
Monthly: ($1.20 + $3.00) x 30 = $126/month

Example 2: Document Processing Pipeline

200 documents/day, 5,000 input tokens avg (document + extraction prompt), 500 output tokens avg
Using GPT-4.1 Mini ($0.40/$1.60 per 1M)
Input: 200 x 5,000 = 1,000,000 tokens/day = $0.40/day
Output: 200 x 500 = 100,000 tokens/day = $0.16/day
Monthly: ($0.40 + $0.16) x 30 = $16.80/month

Example 3: High-Volume Classification

50,000 items/day, 200 input tokens avg, 50 output tokens avg
Using GPT-4.1 Nano ($0.10/$0.40 per 1M)
Input: 50,000 x 200 = 10,000,000 tokens/day = $1.00/day
Output: 50,000 x 50 = 2,500,000 tokens/day = $1.00/day
Monthly: ($1.00 + $1.00) x 30 = $60/month

These estimates assume no caching or batching. With prompt caching on a chatbot (where the system prompt repeats), expect 30-60% lower input costs. With batch API, cut both input and output costs in half.

Provider Comparison by Use Case

Cheapest for High-Volume Chatbots

Winner: GPT-4.1 Nano ($0.10/$0.40) or Gemini 2.0 Flash ($0.10/$0.40). Both cost the same and handle conversational tasks well. Gemini Flash has the edge for multimodal inputs (images in chat). GPT-4.1 Nano has stronger instruction following for structured system prompts. Mistral Small ($0.10/$0.30) is cheapest on output if you need European data residency.

Best for Coding Assistants

Winner: Claude Sonnet 4.6 ($3/$15). Consistently top-ranked on coding benchmarks. GPT-4.1 ($2/$8) is a strong alternative at lower cost, especially for its 1M context window that fits entire codebases. For budget coding, GPT-4.1 Mini ($0.40/$1.60) punches well above its price.

Best for Complex Reasoning

Winner: o3 ($10/$40) for math-heavy and scientific reasoning. Claude Opus 4.6 ($15/$75) for detail-sensitive analysis and agentic multi-step tasks. These are premium models for premium problems. For most reasoning tasks, Claude Sonnet 4.6 or GPT-5 at a fraction of the cost will be sufficient.

Best for RAG and Retrieval

Winner: Command R+ ($2.50/$10). Cohere built Command R+ specifically for retrieval-augmented generation with built-in citation support. Google Gemini 2.5 Pro is the alternative when you need a massive context window (1M tokens) to stuff retrieved documents into a single prompt.

Best for Enterprise with Data Residency Requirements

Winner: Mistral Large ($2/$6). Hosted in Europe, strong multilingual performance, and competitive pricing. Mistral is the default choice when GDPR compliance and data residency are non-negotiable.

Open Model APIs vs Frontier Model APIs: Where Open Wins on Price in 2026

The shortest answer: yes, open-weight model APIs are dramatically cheaper than frontier model APIs in 2026, sometimes by 10-100x. The catch is they trail frontier models on hard reasoning, agentic tool use, and frontier-grade coding by a measurable margin. For workloads where the trade-off makes sense (high-volume chat, classification, RAG, summarization, content generation), open-weight APIs through Together AI, Fireworks AI, Groq, and DeepInfra are now the price floor that frontier providers like OpenAI, Anthropic, and Google compete against.

Open-weight pricing across the major hosted-inference providers (verified April 2026):

Open Model	Host	Input / 1M Tokens	Output / 1M Tokens	Context Window
Llama 4 Maverick (400B)	Together AI	$0.27	$0.85	1M
Llama 4 Maverick (400B)	Fireworks AI	$0.22	$0.88	1M
Llama 4 Scout (109B)	Together AI	$0.18	$0.59	10M
Llama 3.3 70B	Together AI	$0.88	$0.88	128K
Llama 3.3 70B	Groq	$0.59	$0.79	128K
DeepSeek-V3	DeepInfra	$0.27	$1.10	128K
DeepSeek-R1	Together AI	$3.00	$7.00	128K
Qwen 2.5 72B	Together AI	$1.20	$1.20	128K
Mistral Nemo (12B)	Mistral La Plateforme	$0.15	$0.15	128K
Mixtral 8x22B	Together AI	$1.20	$1.20	64K

The headline matchups against frontier models:

Llama 4 Scout ($0.18/$0.59) vs Claude Sonnet 4.6 ($3/$15): Scout is 16x cheaper on input and 25x cheaper on output. Quality gap on general chat is small; gap on hard coding and multi-step agents is real. For RAG, summarization, classification, and most production chatbots, Scout will save 90%+ of the bill at acceptable quality.
DeepSeek-V3 ($0.27/$1.10) vs GPT-4.1 ($2/$8): DeepSeek-V3 is 7x cheaper on input and 7x cheaper on output. V3 matches GPT-4.1 on many coding benchmarks (it sits near the top of LiveCodeBench). The trade-off is provider concentration (US users typically route through DeepInfra, Fireworks, or Together rather than DeepSeek directly).
DeepSeek-R1 ($3/$7) vs o3 ($10/$40): R1 is 3x cheaper on input and 5x cheaper on output. R1 is a reasoning model in the same family as o3 and scores within a few points on AIME, GPQA, and MATH. For reasoning workloads where you can tolerate a small quality gap, R1 is the cheapest credible alternative to o3 in 2026.
Llama 4 Maverick ($0.22/$0.88 on Fireworks) vs Claude Opus 4.6 ($15/$75): Maverick is 68x cheaper on input and 85x cheaper on output. Quality gap is meaningful on agents and tool use; on document analysis and long-context summarization, the gap narrows.

The economic case for open-model APIs strengthens at three points: when the workload is high volume (token cost compounds fast), when you want to fine-tune (open-weight models support LoRA and full-parameter fine-tuning on the host), and when you need data residency or BYOC deployment (Fireworks, Together, and DeepInfra all support private deployments at higher rates but still below frontier prices).

When Open Loses to Frontier on Price

The naive "open = cheaper" framing breaks in a few specific cases. Three to flag:

Frontier mini-tier vs open. GPT-4.1 Nano at $0.10/$0.40 and Gemini 2.0 Flash at $0.10/$0.40 are price-competitive with most open-weight 7B-13B models on hosted inference, while delivering quality closer to mid-range frontier models. For high-volume classification and routing, GPT-4.1 Nano often beats Llama 3.1 8B and Mistral 7B on both price and quality.
Reasoning workloads. For dedicated reasoning models, the gap narrows. DeepSeek-R1 at $3/$7 is cheaper than o3 at $10/$40, but it is not cheaper than o4-mini at $1.10/$4.40, which performs close enough on most reasoning tasks that o4-mini is often the better dollar-per-quality choice.
Caching-heavy workloads. Anthropic's 90% prompt caching discount on cached Sonnet 4.6 reads drops effective input to $0.30/MTok, which is competitive with Llama 4 Scout and beats most open-model options once you factor in Sonnet 4.6's higher quality on hard tasks. If your workload has a 60%+ cache hit rate, Anthropic can come out ahead on total cost.

The pattern in 2026: open-weight APIs dominate the price floor at mid quality, frontier mini-tiers dominate the price floor at low-medium quality with stronger instruction following, and frontier flagships dominate the quality ceiling at a 5-100x price premium. Pick the tier that maps to your accuracy bar, not just the cheapest line.

How to Pick the Right Tier: A Decision Framework

A practical hierarchy for 2026 deployments, in order of decreasing volume tolerance and increasing quality demand:

10M+ tokens/day, classification/extraction: GPT-4.1 Nano, Gemini 2.0 Flash, or Llama 4 Scout on Together. $0.10-$0.18 input. Expect $30-60/month at low volume.
1-10M tokens/day, customer-facing chat: Llama 4 Scout, GPT-4.1 Mini, or Claude Haiku 4.5 with caching. $0.18-$0.80 input. Expect $50-300/month.
500K-5M tokens/day, coding assistant or analysis: DeepSeek-V3 (open) or Claude Sonnet 4.6 with caching (frontier). $0.27-$3.00 input. Expect $100-700/month.
100K-1M tokens/day, agentic workflows: Claude Sonnet 4.6, GPT-4.1, or Llama 4 Maverick for cost-sensitive agents. Caching becomes critical here.
50K-500K tokens/day, hard reasoning: DeepSeek-R1 (open) or o3 / Opus 4.6 (frontier). Reserve these for the 10-20% of queries that truly need them.

For internal links to deeper coverage, see Anthropic Claude API Pricing for the full Anthropic-specific breakdown, Best Open-Source LLMs for model-quality comparisons across Llama, Mistral, Qwen, and DeepSeek, and AI Free Tiers Compared for which providers give you the most before billing kicks in.

Frequently Asked Questions

What is the cheapest LLM API?

As of April 2026, the cheapest LLM APIs are GPT-4.1 Nano, Gemini 2.0 Flash, and Mistral Small, all at $0.10 per million input tokens. Mistral Small edges ahead on output cost at $0.30/1M vs $0.40/1M for the other two. For batch workloads, GPT-4.1 Nano with the 50% batch discount drops to $0.05/$0.20 per million tokens, making it the absolute cheapest option for asynchronous processing.

How much does GPT-4.1 cost per token?

GPT-4.1 costs $2.00 per million input tokens and $8.00 per million output tokens. That works out to $0.000002 per input token and $0.000008 per output token. With the OpenAI Batch API (50% discount), those drop to $1.00/$4.00 per million. With prompt caching (automatic, 50% off cached tokens), repeated system prompts cost $1.00 per million cached input tokens.

How do LLM API prices compare to self-hosting?

Self-hosting open-source models (Llama 3, Mistral, etc.) on your own GPUs costs roughly $1-3 per GPU-hour on cloud providers. At high volume (millions of tokens per day), self-hosting can be 50-80% cheaper than API pricing. At low to moderate volume, APIs are almost always cheaper because you avoid idle GPU costs, infrastructure management, and the engineering overhead of running inference servers. The break-even point is typically around 10-50 million tokens per day, depending on the model size and hardware choice.

What's the difference between input and output token pricing?

Input tokens are what you send to the model: your system prompt, user message, uploaded documents, and any context. Output tokens are what the model generates in response. Output tokens cost 2-5x more than input tokens because generating each output token requires a full forward pass through the model, while input tokens can be processed in parallel. This is why long system prompts with short responses are relatively cheap, while asking a model to write a 5,000-word essay gets expensive fast.

Which LLM API has the best free tier?

Google offers the most generous free tier through Google AI Studio: Gemini 2.0 Flash is free up to 15 requests per minute with generous daily limits. OpenAI offers limited free credits for new accounts. Anthropic provides free access through claude.ai but no free API tier. Mistral offers a free tier on La Plateforme with rate limits. For serious development and testing, Google's free Gemini access is the clear winner.

How often do LLM API prices change?

Prices have been trending down 30-50% per year since 2023. Major price drops usually happen when providers release new model generations (the old model gets cheaper or the new model matches performance at lower cost). OpenAI and Google have been the most aggressive on price cuts. Anthropic tends to hold pricing longer but offers batch and caching discounts. Expect at least 2-3 significant pricing changes per provider per year.

Is an open model API really cheaper than a frontier model API in 2026?

Yes, in most volume tiers. Llama 4 Scout on Together AI costs $0.18/$0.59 per million tokens against Claude Sonnet 4.6 at $3/$15. That is 16x cheaper on input, 25x on output. DeepSeek-V3 at $0.27/$1.10 is 7x cheaper than GPT-4.1 at $2/$8. The savings are real for high-volume chat, RAG, classification, and summarization. The catch is that frontier mini-tiers like GPT-4.1 Nano at $0.10/$0.40 are price-competitive with the smaller open models, and Anthropic's 90% caching discount can flip the math on cache-heavy workloads.

Which open model API has the cheapest input tokens in 2026?

Mistral Nemo (12B) on Mistral La Plateforme at $0.15/$0.15 per million tokens, followed by Llama 4 Scout (109B) on Together AI at $0.18 input and $0.59 output. DeepSeek-V3 on DeepInfra is the cheapest for frontier-quality general intelligence at $0.27/$1.10. For reasoning specifically, DeepSeek-R1 on Together AI at $3/$7 is the cheapest credible alternative to o3.

How does DeepSeek-V3 pricing compare to GPT-4.1 and Claude Sonnet 4.6?

DeepSeek-V3 (hosted on DeepInfra) costs $0.27 per million input tokens and $1.10 per million output tokens. GPT-4.1 costs $2/$8. Claude Sonnet 4.6 costs $3/$15. DeepSeek-V3 is approximately 7x cheaper than GPT-4.1 and 10-14x cheaper than Sonnet 4.6 per token. Quality-wise, V3 sits near the top of LiveCodeBench and matches GPT-4.1 on many coding benchmarks. The gap widens on multi-step agents and frontier-grade reasoning, where Claude Sonnet 4.6 and GPT-4.1 still lead.

When does it make sense to use a frontier API instead of an open model API?

Three cases. First, frontier mini-tiers (GPT-4.1 Nano, Gemini 2.0 Flash) at $0.10/$0.40 are price-competitive with small open models and often have stronger instruction following. Second, caching-heavy workloads with Anthropic Sonnet 4.6 drop effective input to $0.30/MTok at 60%+ cache hit rates, which beats most open-model pricing. Third, hard reasoning, agentic tool use, and frontier coding still favor Claude Sonnet 4.6, Opus 4.6, GPT-4.1, and o3 by a meaningful quality margin. The decision rule: open wins on volume at mid quality, frontier wins on the hardest tasks regardless of cost.

LLM API Pricing 2026 - 20+ Models Compared Per Token - data visualization and comparison chart — Visual summary for LLM API Pricing 2026 - 20+ Models Compared Per Token. Data verified by PE Collective.

About the Author

Rome Thorndike is the founder of the Prompt Engineer Collective, a community of over 1,300 prompt engineering professionals, and author of The AI News Digest, a weekly newsletter with 2,700+ subscribers. Rome brings hands-on AI/ML experience from Microsoft, where he worked with Dynamics and Azure AI/ML solutions, and later led sales at Datajoy (acquired by Databricks).