The Lowest Cost
AI Inference Anywhere
Use the latest models, scale effortlessly, and only pay for the tokens you use with no rate limits on Lambda's serverless inference API endpoints.
Built for Builders
Low cost
Cut costs, not corners. Get fast, cost-effective AI Inference without the middleman.
Scalable
Easily scale from prototype to production without ever having to manage infrastructure and no rate limits.
Engineered for AI
An AI-first stack by Lambda, optimized to power AI workloads of any size, from hardware to API.
Lambda Inference API Pricing*
131k context windows, state-of-the-art models, and automatic scaling–all in one serverless API.
Model | Quantization | Context | Price per 1M input tokens |
Price per 1M output tokens
|
Core | ||||
Llama-3.1-8B-Instruct |
BF16 | 131K | $0.025 | $0.04 |
Llama-3.1-70B-Instruct |
FP8 | 131K | $0.12 | $0.30 |
Llama-3.1-405B-Instruct |
FP8 | 131K | $0.80 | $0.80 |
Sandbox | ||||
Llama-3.3-70B-Instruct | FP8 | 131K | $0.12 | $0.30 |
Llama-3.2-3B-Instruct |
FP8 | 131K | $0.015 | $0.025 |
Hermes-3-Llama-3.1-8B |
BF16 | 131K | $0.025 | $0.04 |
Hermes-3-Llama-3.1-70B (FP8) |
FP8 | 131K | $0.12 | $0.30 |
Hermes-3-Llama-3.1-405B (FP8) |
FP8 | 131K | $0.80 | $0.80 |
LFM-40b |
BF16 | 66K | $0.15 | $0.15 |
Llama3.1-nemotron-70b-instruct | FP8 | 131K | $0.12 | $0.30 |
Qwen2.5-Coder-32B |
BF16 | 33K | $0.07 | $0.16 |
* plus applicable sales tax
Bring In Those Long Prompts!
Example Inference API transaction | |
Model | Llama-3.1-405B-Instruct-FP8 |
Pricing |
$0.80 per 1M tokens of input or output
1 token = $0.00000008
|
Input |
"Convert 30,000 tokens to single space pages of text assuming there are an average of 4 characters per token including spaces and punctuation." Tokens: 63
Cost: 0.005¢
|
Output |
“To estimate the conversion of 30,000 tokens to single-spaced pages of text, we'll break down the process step by step …”
Tokens: 234
Cost: 0.02¢
|
Total Tokens Used |
297
|
Grand Total | 0.025¢ |
Ready to get started?
Unlock fast, scalable inference at the lowest market cost per token with no rate limits, or contact us to learn how our Private Cloud can accelerate your AI workloads.