Welcome to the Keyplex API documentation. Keyplex provides a unified API that gives you access to all major AI models through a single integration point. Instead of managing multiple API keys and different SDK implementations for each provider, you can use one Keyplex API key to access GPT, Claude, Gemini, Grok, Mistral, Llama, and more.
OpenAI Compatible: The Keyplex API is fully compatible with OpenAI's API specification. If you're already using OpenAI, you can switch to Keyplex by simply changing your base URL and API key.
All API requests should be made to:
https://keyplex.ai/api/v1
Get started with Keyplex in under 5 minutes. Follow these steps to make your first API call.
Sign up at app.keyplex.ai to create your account. No credit card is required to get started.
Navigate to the API Keys section in your dashboard and click Create New Key. Give your key a descriptive name (e.g., "Development", "Production").
Security Notice: Your API key will only be shown once. Store it securely and never expose it in client-side code or public repositories.
# Make a chat completion request
curl https://keyplex.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer kpx_your_api_key_here" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'
import requests
# API Configuration
API_KEY = "kpx_your_api_key_here"
BASE_URL = "https://keyplex.ai/api/v1"
# Create a chat completion
response = requests.post(
f"{BASE_URL}/chat/completions",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
},
json={
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, world!"}],
"temperature": 0.7
}
)
data = response.json()
print(data["choices"][0]["message"]["content"])
// API Configuration
const API_KEY = 'kpx_your_api_key_here';
const BASE_URL = 'https://keyplex.ai/api/v1';
// Create a chat completion
const response = await fetch(`${BASE_URL}/chat/completions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'openai/gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello, world!' }],
temperature: 0.7
})
});
const data = await response.json();
console.log(data.choices[0].message.content);
All API requests require authentication using an API key. Your API key should be included in the Authorization header of every request.
API keys are used to authenticate requests to the Keyplex API. You can create multiple API keys to separate concerns (development, staging, production) or to track usage across different applications.
Authorization: Bearer kpx_your_api_key_here
API keys follow the format kpx_ followed by a 48-character alphanumeric string. Example:
kpx_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4
You can manage your API keys from the dashboard at app.keyplex.ai/keys.
Each API key can be configured with specific permissions:
| Permission | Description |
|---|---|
chat:write |
Create chat completions |
completions:write |
Create text completions |
embeddings:write |
Generate embeddings |
models:read |
List available models |
usage:read |
Read usage statistics |
We recommend rotating your API keys periodically for security. You can create a new key before deactivating the old one to ensure zero downtime.
Never expose your API key: Do not include API keys in client-side code, public repositories, or share them publicly. Use environment variables or secret management systems.
Keyplex provides access to a wide range of AI models from different providers. All models are accessed through the same unified API, making it easy to switch between them.
| Model ID | Context Window | Best For |
|---|---|---|
gpt-5.2 |
256K tokens | Most capable, complex reasoning, coding |
gpt-5.2-mini |
128K tokens | Balanced performance and speed |
gpt-5.2-nano |
32K tokens | Fast responses, simple tasks |
o3 |
200K tokens | Advanced reasoning, math, science |
o3-mini |
128K tokens | Efficient reasoning tasks |
| Model ID | Context Window | Best For |
|---|---|---|
claude-opus-4.5 |
500K tokens | Most capable Claude, complex analysis |
claude-sonnet-4.5 |
200K tokens | Balanced performance, general use |
claude-haiku-4.5 |
200K tokens | Fast responses, cost-effective |
| Model ID | Context Window | Best For |
|---|---|---|
gemini-3-pro |
2M tokens | Multimodal, long context, research |
gemini-3-flash |
1M tokens | Fast multimodal processing |
| Model ID | Provider | Best For |
|---|---|---|
grok-4 |
xAI | Real-time knowledge, direct responses |
mistral-3-large |
Mistral AI | European AI, enterprise deployments |
mistral-3-medium |
Mistral AI | Balanced, cost-effective |
llama-4-405b |
Meta | Open-weight, research, customization |
llama-4-70b |
Meta | Efficient open-source alternative |
deepseek-v3 |
DeepSeek | Coding, technical tasks |
Model Availability: All models are available on all subscription plans. Use the /v1/models endpoint to get the current list of available models.
The Keyplex API provides several endpoints for different AI capabilities. All endpoints follow REST conventions and return JSON responses.
POST /v1/chat/completions
Creates a model response for the given chat conversation. This is the most commonly used endpoint for building conversational AI applications.
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | The model ID to use (e.g., "gpt-5.2", "claude-opus-4.5") |
messages |
array | Yes | Array of message objects with role and content |
temperature |
number | No | Sampling temperature (0-2). Default: 1 |
max_tokens |
integer | No | Maximum tokens to generate |
stream |
boolean | No | Enable streaming responses. Default: false |
top_p |
number | No | Nucleus sampling parameter (0-1). Default: 1 |
stop |
string/array | No | Stop sequences to end generation |
presence_penalty |
number | No | Penalize new tokens (-2 to 2). Default: 0 |
frequency_penalty |
number | No | Penalize frequent tokens (-2 to 2). Default: 0 |
user |
string | No | Unique identifier for end-user (for analytics) |
| Role | Description |
|---|---|
system |
Sets the behavior and context for the assistant |
user |
Messages from the user |
assistant |
Previous responses from the assistant |
{
"model": "claude-opus-4.5",
"messages": [
{
"role": "system",
"content": "You are a helpful coding assistant."
},
{
"role": "user",
"content": "Write a Python function to calculate fibonacci numbers."
}
],
"temperature": 0.7,
"max_tokens": 1000
}
{
"id": "chatcmpl-abc123xyz789",
"object": "chat.completion",
"created": 1733745600,
"model": "claude-opus-4.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's a Python function to calculate Fibonacci numbers:\n\n```python\ndef fibonacci(n):\n if n <= 1:\n return n\n return fibonacci(n-1) + fibonacci(n-2)\n```"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 156,
"total_tokens": 184
}
}
POST /v1/completions
Creates a completion for the provided prompt. This is a legacy endpoint - we recommend using Chat Completions for most use cases.
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | The model ID to use |
prompt |
string/array | Yes | The prompt(s) to generate completions for |
max_tokens |
integer | No | Maximum tokens to generate. Default: 16 |
temperature |
number | No | Sampling temperature (0-2). Default: 1 |
POST /v1/embeddings
Creates an embedding vector representing the input text. Useful for semantic search, clustering, and similarity comparisons.
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | The embedding model (e.g., "text-embedding-3-large") |
input |
string/array | Yes | Input text(s) to embed |
encoding_format |
string | No | "float" or "base64". Default: "float" |
dimensions |
integer | No | Number of dimensions for the output |
| Model ID | Dimensions | Max Input |
|---|---|---|
text-embedding-3-large |
3072 | 8191 tokens |
text-embedding-3-small |
1536 | 8191 tokens |
voyage-3 |
1024 | 32000 tokens |
GET /v1/models
Returns a list of all available models.
{
"object": "list",
"data": [
{
"id": "gpt-5.2",
"object": "model",
"created": 1733745600,
"owned_by": "openai",
"context_window": 256000
},
{
"id": "claude-opus-4.5",
"object": "model",
"created": 1733745600,
"owned_by": "anthropic",
"context_window": 500000
}
// ... more models
]
}
Keyplex supports streaming responses using Server-Sent Events (SSE). This allows you to receive partial responses as they're generated, providing a better user experience for real-time applications.
Set stream: true in your request to enable streaming:
{
"model": "gpt-5.2",
"messages": [...],
"stream": true
}
import requests
# Streaming request with requests library
response = requests.post(
"https://keyplex.ai/api/v1/chat/completions",
headers={
"Authorization": "Bearer kpx_your_api_key_here",
"Content-Type": "application/json"
},
json={
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
print(line.decode('utf-8'))
const response = await fetch('https://keyplex.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer kpx_your_api_key_here',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'openai/gpt-4o-mini',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true
})
});
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
console.log(new TextDecoder().decode(value));
}
Each chunk in the stream follows this format:
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" world"}}]}
data: [DONE]
The Keyplex dashboard at app.keyplex.ai provides comprehensive analytics and management tools for your API usage.
Monitor your API usage in real-time with detailed metrics:
GET /v1/usage
Retrieve your usage statistics programmatically:
{
"object": "usage",
"period": {
"start": "2025-12-01T00:00:00Z",
"end": "2025-12-09T23:59:59Z"
},
"data": {
"total_requests": 15420,
"total_tokens": 2845000,
"prompt_tokens": 1250000,
"completion_tokens": 1595000,
"models": {
"gpt-5.2": { "requests": 8500, "tokens": 1500000 },
"claude-opus-4.5": { "requests": 4200, "tokens": 980000 },
"gemini-3-pro": { "requests": 2720, "tokens": 365000 }
}
}
}
With Keyplex's fixed pricing model, you always know your monthly cost upfront. The dashboard shows:
Access detailed logs for debugging and auditing:
Log Retention: Request logs are retained for 30 days. Enterprise plans include extended retention options.
To ensure fair usage and service stability, Keyplex enforces rate limits on API requests. These limits are designed to accommodate typical development and production workloads.
| Plan | Requests/min | Tokens/min | Tokens/day |
|---|---|---|---|
| Free Trial | 20 | 40,000 | 100,000 |
| Monthly | 60 | 150,000 | 1,000,000 |
| 3 Months | 100 | 250,000 | 2,000,000 |
| 12 Months | 200 | 500,000 | 5,000,000 |
| Enterprise | Custom | Custom | Custom |
Every API response includes headers indicating your current rate limit status:
| Header | Description |
|---|---|
X-RateLimit-Limit-Requests |
Maximum requests per minute |
X-RateLimit-Remaining-Requests |
Remaining requests in current window |
X-RateLimit-Limit-Tokens |
Maximum tokens per minute |
X-RateLimit-Remaining-Tokens |
Remaining tokens in current window |
X-RateLimit-Reset-Requests |
Timestamp when request limit resets |
X-RateLimit-Reset-Tokens |
Timestamp when token limit resets |
When you exceed rate limits, you'll receive a 429 Too Many Requests response. Implement exponential backoff:
import time
import keyplex
def make_request_with_retry(client, messages, max_retries=5):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-5.2",
messages=messages
)
except keyplex.RateLimitError as e:
if attempt == max_retries - 1:
raise
wait_time = (2 ** attempt) + random.random()
time.sleep(wait_time)
The Keyplex API uses standard HTTP response codes and returns detailed error messages in JSON format.
| Code | Meaning | Description |
|---|---|---|
200 |
OK | Request succeeded |
400 |
Bad Request | Invalid request parameters |
401 |
Unauthorized | Invalid or missing API key |
403 |
Forbidden | API key lacks required permissions |
404 |
Not Found | Resource not found |
429 |
Too Many Requests | Rate limit exceeded |
500 |
Internal Server Error | Server error (retry with backoff) |
503 |
Service Unavailable | Temporary overload (retry later) |
{
"error": {
"type": "invalid_request_error",
"code": "invalid_model",
"message": "The model 'gpt-999' does not exist.",
"param": "model",
"request_id": "req_abc123xyz789"
}
}
| Type | Description |
|---|---|
invalid_request_error |
Request parameters are invalid |
authentication_error |
API key is invalid or missing |
permission_error |
API key lacks required permissions |
rate_limit_error |
Rate limit has been exceeded |
model_error |
Model is unavailable or invalid |
context_length_error |
Input exceeds model's context window |
server_error |
Internal server error |
Keyplex provides official SDKs for popular programming languages. All SDKs are open-source and available on GitHub.
| Language | Package | Installation |
|---|---|---|
| Python | keyplex |
pip install keyplex |
| JavaScript/Node.js | keyplex |
npm install keyplex |
| Go | keyplex-go |
go get github.com/keyplex/keyplex-go |
| Ruby | keyplex |
gem install keyplex |
| PHP | keyplex/keyplex-php |
composer require keyplex/keyplex-php |
| Rust | keyplex |
cargo add keyplex |
You can use the official OpenAI SDK with Keyplex by changing the base URL:
from openai import OpenAI
client = OpenAI(
api_key="kpx_your_api_key_here",
base_url="https://keyplex.ai/api/v1"
)
# Use exactly as you would with OpenAI
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'kpx_your_api_key_here',
baseURL: 'https://keyplex.ai/api/v1'
});
const response = await client.chat.completions.create({
model: 'openai/gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello!' }]
});
The community has built additional SDKs for other languages:
KeyplexSwift - iOS/macOS applicationskeyplex-kotlin - Android applicationsKeyplex.NET - .NET applicationskeyplex-java - Java applicationsKeyplex can send webhook notifications for important events related to your account and API usage.
| Event | Description |
|---|---|
usage.threshold_reached |
When usage reaches 80% or 90% of limits |
usage.limit_exceeded |
When usage limit is exceeded |
api_key.created |
When a new API key is created |
api_key.revoked |
When an API key is revoked |
subscription.renewed |
When subscription is renewed |
subscription.expired |
When subscription expires |
{
"id": "evt_abc123xyz789",
"type": "usage.threshold_reached",
"created": 1733745600,
"data": {
"threshold": 80,
"current_usage": 820000,
"limit": 1000000,
"period_end": "2025-12-31T23:59:59Z"
}
}
All webhook requests include a signature header for verification:
X-Keyplex-Signature: sha256=abc123...
Verify the signature in your webhook handler:
import hmac
import hashlib
def verify_webhook(payload, signature, secret):
expected = hmac.new(
secret.encode(),
payload.encode(),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(f"sha256={expected}", signature)
Follow these recommendations to get the most out of the Keyplex API.
Migrating to Keyplex from other providers is straightforward thanks to our OpenAI-compatible API.
If you're using the OpenAI SDK, migration takes less than 5 minutes:
from openai import OpenAI
client = OpenAI(api_key="sk-...") # OpenAI key
response = client.chat.completions.create(
model="gpt-4o",
messages=[...]
)
from openai import OpenAI
client = OpenAI(
api_key="kpx_...", # Keyplex key
base_url="https://keyplex.ai/api/v1" # Add this line
)
response = client.chat.completions.create(
model="openai/gpt-4o", # Use OpenRouter model IDs
messages=[...]
)
Migrating from Anthropic's Claude API requires changing the API format:
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
import requests
response = requests.post(
"https://keyplex.ai/api/v1/chat/completions",
headers={
"Authorization": "Bearer kpx_...",
"Content-Type": "application/json"
},
json={
"model": "anthropic/claude-3.5-sonnet", # OpenRouter model ID
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello"}]
}
)
Keyplex uses OpenRouter model IDs. Here are some popular models:
| Provider | Model Name | Keyplex Model ID (OpenRouter) |
|---|---|---|
| OpenAI | GPT-4o | openai/gpt-4o |
| OpenAI | GPT-4o Mini | openai/gpt-4o-mini |
| OpenAI | GPT-4 Turbo | openai/gpt-4-turbo |
| OpenAI | o1 | openai/o1 |
| Anthropic | Claude 3.5 Sonnet | anthropic/claude-3.5-sonnet |
| Anthropic | Claude 3.5 Haiku | anthropic/claude-3.5-haiku |
| Anthropic | Claude 3 Opus | anthropic/claude-3-opus |
| Gemini 2.0 Flash | google/gemini-2.0-flash-exp:free |
|
| Gemini Pro 1.5 | google/gemini-pro-1.5 |
|
| Mistral | Mistral Large | mistralai/mistral-large |
| Meta | Llama 3.1 405B | meta-llama/llama-3.1-405b-instruct |
| DeepSeek | DeepSeek Chat | deepseek/deepseek-chat |
Full Model List: Use the GET /v1/models endpoint to retrieve the complete list of available models with their IDs.
We're here to help you succeed with Keyplex.
Check the current API status and subscribe to updates at status.keyplex.ai.
Enterprise customers receive:
Contact sales@keyplex.ai for enterprise plans.
Ready to get started? Create your free account and get your API key in seconds.