Documentation
Everything you need to integrate SpendexAI into your application.
On this page
Quickstart
SpendexAI is an OpenAI-compatible proxy. Change two lines and start saving immediately.
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="spx_sk_live_...",
base_url="https://api.spendexai.com/v1"
)
response = client.chat.completions.create(
model="auto", # Smart routing picks the optimal model
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "spx_sk_live_...",
baseURL: "https://api.spendexai.com/v1"
});
const response = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "Hello!" }]
});
console.log(response.choices[0].message.content);
Swift
let config = OpenAI.Configuration(
apiKey: "spx_sk_live_...",
baseURL: "https://api.spendexai.com/v1"
)
let client = OpenAI(configuration: config)
let response = try await client.chat.completions.create(
model: "auto",
messages: [.user("Hello!")]
)
print(response.choices[0].message.content)
cURL
curl https://api.spendexai.com/v1/chat/completions \
-H "Authorization: Bearer spx_sk_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Authentication
All API requests require an API key prefixed with spx_. Generate keys from your dashboard.
Pass your key via the Authorization header:
Authorization: Bearer spx_sk_live_...
You can create multiple API keys for different environments and teams.
Chat Completions
POST /v1/chat/completions
Fully compatible with the OpenAI Chat Completions API. Accepts the same request body and returns the same response format.
Base URL: https://api.spendexai.com
Request body
| Parameter | Type | Description |
|---|---|---|
model | string | "auto" for smart routing, or a specific model name |
messages | array | Array of message objects with role and content |
temperature | number | Sampling temperature (0–2). Default: 1 |
max_tokens | integer | Maximum tokens to generate |
stream | boolean | Enable streaming responses. Default: false |
top_p | number | Nucleus sampling parameter (0–1) |
stop | string | array | Stop sequences |
tools | array | Tool/function definitions for function calling |
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709000000,
"model": "gpt-5-nano",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 12,
"total_tokens": 22
},
"spendex": {
"routed_model": "gpt-5-nano",
"tier": "simple",
"cost_usd": 0.00003,
"saved_usd": 0.00042
}
}
Responses may include Spendex routing metadata such as the selected model and estimated savings.
Model Routing
Smart routing (model: "auto")
When you set model: "auto", our AI classifier analyzes each request and routes it to the lowest-cost model that still fits the task’s quality, capability, and reliability requirements. Signals include:
- Token count — short prompts go to nano models
- Code detection — code-heavy prompts go to specialized models
- Domain classification — higher-risk domains such as legal or financial tasks are routed more conservatively
- Complexity scoring — multi-step reasoning triggers premium routing
- Criticality assessment — production-flagged requests get reliable models
- Message depth — long conversations maintain model consistency
Specific model
Pass any supported model name to bypass smart routing:
model: "gpt-5" // Routes directly to OpenAI GPT-5
model: "claude-sonnet-4-6" // Routes directly to Anthropic Claude Sonnet 4.6
model: "gemini-2.5-pro" // Routes directly to Google Gemini 2.5 Pro
model: "auto" // Smart routing (recommended)
Cost tiers
| Tier | When it's used | Example models |
|---|---|---|
| Simple | Short, factual, single-turn queries | GPT-5-nano, Haiku 4.5, Flash Lite |
| Medium | Multi-turn, moderate reasoning, code | GPT-5-mini, Sonnet 4.6, Flash |
| Complex | Long context, deep reasoning, critical | GPT-5, Opus 4.6, Gemini 2.5 Pro |
Streaming
Set stream: true to receive Server-Sent Events (SSE) as the response is generated.
Python
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Explain quantum computing"}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
Node.js
const stream = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "Explain quantum computing" }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
cURL
curl https://api.spendexai.com/v1/chat/completions \
-H "Authorization: Bearer spx_sk_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"stream": true
}'
Supported Models
We currently optimize across the providers connected in your workspace. Core routing today is centered on OpenAI, Anthropic, Google, and Mistral, with broader provider support expanding across the platform.
- OpenAI — GPT-5, GPT-5-mini, GPT-5-nano, plus other supported models such as GPT-5.4, GPT-5.3, o3, and o4-mini depending on task complexity and availability.
- Anthropic — Claude Opus 4.6, Sonnet 4.6, Haiku 4.5
- Google — Gemini 2.5 Pro, Flash, Flash Lite
- Mistral — Mistral Large, Mistral Small
| Model | Provider | Price (input / 1M) |
|---|---|---|
GPT-5-nano | OpenAI | $0.05 |
Claude Haiku 4.5 | Anthropic | $1.00 |
Gemini 2.5 Flash Lite | $0.10 | |
Mistral Small | Mistral | $0.10 |
GPT-5-mini | OpenAI | $0.25 |
Claude Sonnet 4.6 | Anthropic | $3.00 |
Gemini 2.5 Flash | $0.30 | |
Mistral Large | Mistral | $0.50 |
GPT-5 | OpenAI | $1.25 |
Claude Opus 4.6 | Anthropic | $5.00 |
Gemini 2.5 Pro | $1.25 |
BYOK live now. Credits coming soon.
Available Now / Coming Next
Available now
- OpenAI-compatible routing endpoint
model: "auto"smart routing- Streaming
- BYOK provider setup
- Request classification
- Retries and failover
Coming next
- Spend controls
- Alerts
- Hosted provider vault
- Agent budgets
- Orchestration layer
Budget controls, alerts, and agent-level spend policies are part of the next layer on top of the BYOK router. These features are being rolled out progressively.
Error Handling
SpendexAI returns standard HTTP status codes and OpenAI-compatible error objects.
| Code | Meaning |
|---|---|
400 | Bad request — invalid parameters |
401 | Unauthorized — invalid or missing API key |
402 | Payment required — managed credits unavailable or billing limit reached |
429 | Rate limit exceeded — too many requests |
500 | Internal error — retry with exponential backoff |
503 | Provider unavailable — automatic failover in progress |
{
"error": {
"message": "Payment required for this request.",
"type": "billing_error",
"code": "payment_required"
}
}
Automatic failover
If a provider returns a 5xx error, SpendexAI automatically retries with an equivalent model from another provider. This happens transparently — you get a successful response with the spendex.routed_model field showing which model was used.
Migration Guide
Switching to SpendexAI takes under a minute. You only need to change two things:
- API key: Replace your provider key with your SpendexAI key (
spx_sk_live_...) - Base URL: Point to
https://api.spendexai.com/v1
In most OpenAI-compatible integrations, your existing code, prompts, streaming, and tool definitions continue to work with minimal changes. To revert, remove the two lines — back to normal in 10 seconds.
Before
client = OpenAI(api_key="sk-...")
After
client = OpenAI(
api_key="spx_sk_live_...",
base_url="https://api.spendexai.com/v1"
)
What stays the same
- All OpenAI SDK methods (
chat.completions, streaming, function calling) - Request and response formats
- Error codes and retry logic
- All your existing prompts and tools
FAQ
How is SpendexAI different from OpenRouter?
OpenRouter is a strong option if you want a hosted model access layer across many providers. SpendexAI is different: it is built BYOK-first. You connect your own OpenAI, Anthropic, Google, Mistral, and other provider accounts, keep direct billing with those providers, and use SpendexAI as the routing, retry, and failover layer in front.
How is SpendexAI different from open-source routers like LiteLLM or Portkey Gateway?
Tools like LiteLLM and Portkey can be a good fit if you want to assemble, host, and operate your own routing stack. SpendexAI is for teams that want the outcome without the operational overhead: one endpoint, automatic routing, fallback handling, and a simpler path to production.
Why use SpendexAI instead of building routing in-house?
Because most in-house routers start simple, then grow into provider-specific logic, retries, failover rules, and model mapping spread across the codebase. SpendexAI keeps that logic in one place behind one OpenAI-compatible endpoint.
Do I lose control over model choice?
No. If you need a specific provider or model, you can pin it directly. Smart routing only applies when you choose automatic routing.
What changes in my code?
Usually just two things: the base URL and the API key. Your existing OpenAI-compatible SDK flow stays nearly the same.
Why BYOK first?
Because many teams want better routing without giving up direct provider relationships, billing visibility, and account-level control. BYOK lets you keep ownership of the underlying provider accounts while SpendexAI handles the routing layer.
Who pays the providers in BYOK mode?
You do. In BYOK mode, usage is billed directly to your OpenAI, Anthropic, Google, Mistral, and other connected provider accounts. SpendexAI sits in front as the intelligence and reliability layer.
Is SpendexAI only about cost savings?
No. Cost savings matter, but the bigger value is operational: better model selection, cleaner multi-provider architecture, retries, failover, and one consistent interface for your app.
When should I use SpendexAI instead of OpenRouter?
Use SpendexAI when you want BYOK, direct provider billing, and routing on top of your own accounts. If you want a hosted aggregation layer where the platform sits between you and the providers commercially, OpenRouter may be a better fit.
When should I use an open-source router instead of SpendexAI?
Use an open-source router if your team wants full infrastructure ownership and is comfortable maintaining the routing stack itself. Use SpendexAI if you want the same class of routing outcome with less engineering overhead.
Can I still force one provider only?
Yes. If you only want OpenAI, or only Anthropic, you can keep routing constrained to the providers you connect and choose.
What happens if a provider goes down?
SpendexAI can retry or reroute to another suitable connected provider or model. Your application still talks to one endpoint.
Can I switch back instantly?
Yes. Remove the SpendexAI base URL and point your SDK back to your original provider. There is no lock-in.
Do you store prompts?
Prompt content is not used as product analytics. Routing decisions rely on request signals and metadata needed to operate the router.
Need Help?
Email us at contact@spendexai.com or book a call.