The Lowest Cost
AI Inference Anywhere

Use the latest models, scale effortlessly, and only pay for the tokens you use with no rate limits on Lambda's serverless inference API endpoints.

Generate your API key Read the docs

Built for Builders

Low cost

Cut costs, not corners. Get fast, cost-effective AI Inference without the middleman.

Scalable

Easily scale from prototype to production without ever having to manage infrastructure and no rate limits.

Engineered for AI

An AI-first stack by Lambda, optimized to power AI workloads of any size, from hardware to API.

Read the docs

Lambda Inference API Pricing*

131k context windows, state-of-the-art models, and automatic scaling–all in one serverless API.

Model	Quantization	Context	Price per 1M input tokens	Price per 1M output tokens
Core
Llama-3.1-8B-Instruct	BF16	131K	$0.025	$0.04
Llama-3.1-70B-Instruct	FP8	131K	$0.12	$0.30
Llama-3.1-405B-Instruct	FP8	131K	$0.80	$0.80
Sandbox
Llama-3.3-70B-Instruct	FP8	131K	$0.12	$0.30
Llama-3.2-3B-Instruct	FP8	131K	$0.015	$0.025
Hermes-3-Llama-3.1-8B	BF16	131K	$0.025	$0.04
Hermes-3-Llama-3.1-70B (FP8)	FP8	131K	$0.12	$0.30
Hermes-3-Llama-3.1-405B (FP8)	FP8	131K	$0.80	$0.80
LFM-40b	BF16	66K	$0.15	$0.15
Llama3.1-nemotron-70b-instruct	FP8	131K	$0.12	$0.30
Qwen2.5-Coder-32B	BF16	33K	$0.07	$0.16

* plus applicable sales tax

Generate your API key

Bring In Those Long Prompts!

Access state-of-the-art models and effortless scalability all in one serverless inference API endpoint.

Example Inference API transaction
Model	Llama-3.1-405B-Instruct-FP8
Pricing	$0.80 per 1M tokens of input or output 1 token = $0.00000008
Input	"Convert 30,000 tokens to single space pages of text assuming there are an average of 4 characters per token including spaces and punctuation." Tokens: 63 Cost: 0.005¢
Output	“To estimate the conversion of 30,000 tokens to single-spaced pages of text, we'll break down the process step by step …” Tokens: 234 Cost: 0.02¢
Total Tokens Used	297
Grand Total	0.025¢

Generate your API key

Ready to get started?

Unlock fast, scalable inference at the lowest market cost per token with no rate limits, or contact us to learn how our Private Cloud can accelerate your AI workloads.

Generate your API key

Contact sales

The Lowest Cost AI Inference Anywhere

Built for Builders

Low cost

Scalable

Engineered for AI

Lambda Inference API Pricing*

Bring In Those Long Prompts!

Ready to get started?

The Lowest Cost
AI Inference Anywhere