API Documentation - Keyplex.ai

Introduction

Welcome to the Keyplex API documentation. Keyplex provides a unified API that gives you access to all major AI models through a single integration point. Instead of managing multiple API keys and different SDK implementations for each provider, you can use one Keyplex API key to access GPT, Claude, Gemini, Grok, Mistral, Llama, and more.

OpenAI Compatible: The Keyplex API is fully compatible with OpenAI's API specification. If you're already using OpenAI, you can switch to Keyplex by simply changing your base URL and API key.

Key Features

Unified Access: One API key for all major AI models
OpenAI Compatible: Drop-in replacement for OpenAI SDK
Model Switching: Change models with a single parameter
Fixed Pricing: Predictable monthly costs, no per-token billing
Real-time Analytics: Monitor usage and performance in your dashboard
99.9% Uptime SLA: Enterprise-grade reliability

Base URL

All API requests should be made to:

https://keyplex.ai/api/v1

Quick Start

Get started with Keyplex in under 5 minutes. Follow these steps to make your first API call.

Step 1: Create an Account

Sign up at app.keyplex.ai to create your account. No credit card is required to get started.

Step 2: Generate an API Key

Navigate to the API Keys section in your dashboard and click Create New Key. Give your key a descriptive name (e.g., "Development", "Production").

Security Notice: Your API key will only be shown once. Store it securely and never expose it in client-side code or public repositories.

Step 3: Make Your First Request

Using cURL

# Make a chat completion request
curl https://keyplex.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer kpx_your_api_key_here" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

Using Python

import requests

# API Configuration
API_KEY = "kpx_your_api_key_here"
BASE_URL = "https://keyplex.ai/api/v1"

# Create a chat completion
response = requests.post(
    f"{BASE_URL}/chat/completions",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "model": "openai/gpt-4o-mini",
        "messages": [{"role": "user", "content": "Hello, world!"}],
        "temperature": 0.7
    }
)

data = response.json()
print(data["choices"][0]["message"]["content"])

Using JavaScript/Node.js

// API Configuration
const API_KEY = 'kpx_your_api_key_here';
const BASE_URL = 'https://keyplex.ai/api/v1';

// Create a chat completion
const response = await fetch(`${BASE_URL}/chat/completions`, {
    method: 'POST',
    headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        model: 'openai/gpt-4o-mini',
        messages: [{ role: 'user', content: 'Hello, world!' }],
        temperature: 0.7
    })
});

const data = await response.json();
console.log(data.choices[0].message.content);

Authentication

All API requests require authentication using an API key. Your API key should be included in the Authorization header of every request.

API Keys

API keys are used to authenticate requests to the Keyplex API. You can create multiple API keys to separate concerns (development, staging, production) or to track usage across different applications.

Authentication Header

Authorization: Bearer kpx_your_api_key_here

API keys follow the format kpx_ followed by a 48-character alphanumeric string. Example:

kpx_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4

Key Management

You can manage your API keys from the dashboard at app.keyplex.ai/keys.

Key Permissions

Each API key can be configured with specific permissions:

Permission	Description
`chat:write`	Create chat completions
`completions:write`	Create text completions
`embeddings:write`	Generate embeddings
`models:read`	List available models
`usage:read`	Read usage statistics

Key Rotation

We recommend rotating your API keys periodically for security. You can create a new key before deactivating the old one to ensure zero downtime.

Never expose your API key: Do not include API keys in client-side code, public repositories, or share them publicly. Use environment variables or secret management systems.

Available Models

Keyplex provides access to a wide range of AI models from different providers. All models are accessed through the same unified API, making it easy to switch between them.

OpenAI Models

Model ID	Context Window	Best For
`gpt-5.3`	256K tokens	Most capable, complex reasoning, coding
`gpt-5.3-mini`	128K tokens	Balanced performance and speed
`gpt-5.3-nano`	32K tokens	Fast responses, simple tasks
`o3`	200K tokens	Advanced reasoning, math, science
`o3-mini`	128K tokens	Efficient reasoning tasks

Anthropic Models

Model ID	Context Window	Best For
`claude-opus-4.6`	500K tokens	Most capable Claude, complex analysis
`claude-sonnet-4.6`	200K tokens	Balanced performance, general use
`claude-haiku-4.5`	200K tokens	Fast responses, cost-effective

Google Models

Model ID	Context Window	Best For
`gemini-3-pro`	2M tokens	Multimodal, long context, research
`gemini-3-flash`	1M tokens	Fast multimodal processing

Other Models

Model ID	Provider	Best For
`grok-4`	xAI	Real-time knowledge, direct responses
`mistral-3-large`	Mistral AI	European AI, enterprise deployments
`mistral-3-medium`	Mistral AI	Balanced, cost-effective
`llama-4-405b`	Meta	Open-weight, research, customization
`llama-4-70b`	Meta	Efficient open-source alternative
`deepseek-v3`	DeepSeek	Coding, technical tasks

Model Availability: All models are available on all subscription plans. Use the /v1/models endpoint to get the current list of available models.

API Endpoints

The Keyplex API provides several endpoints for different AI capabilities. All endpoints follow REST conventions and return JSON responses.

Chat Completions

POST /v1/chat/completions

Creates a model response for the given chat conversation. This is the most commonly used endpoint for building conversational AI applications.

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	The model ID to use (e.g., "gpt-5.3", "claude-opus-4.6")
`messages`	array	Yes	Array of message objects with `role` and `content`
`temperature`	number	No	Sampling temperature (0-2). Default: 1
`max_tokens`	integer	No	Maximum tokens to generate
`stream`	boolean	No	Enable streaming responses. Default: false
`top_p`	number	No	Nucleus sampling parameter (0-1). Default: 1
`stop`	string/array	No	Stop sequences to end generation
`presence_penalty`	number	No	Penalize new tokens (-2 to 2). Default: 0
`frequency_penalty`	number	No	Penalize frequent tokens (-2 to 2). Default: 0
`user`	string	No	Unique identifier for end-user (for analytics)

Message Roles

Role	Description
`system`	Sets the behavior and context for the assistant
`user`	Messages from the user
`assistant`	Previous responses from the assistant

Example Request

{
    "model": "claude-opus-4.6",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful coding assistant."
        },
        {
            "role": "user",
            "content": "Write a Python function to calculate fibonacci numbers."
        }
    ],
    "temperature": 0.7,
    "max_tokens": 1000
}

Example Response

{
    "id": "chatcmpl-abc123xyz789",
    "object": "chat.completion",
    "created": 1733745600,
    "model": "claude-opus-4.6",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Here's a Python function to calculate Fibonacci numbers:\n\n```python\ndef fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)\n```"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 28,
        "completion_tokens": 156,
        "total_tokens": 184
    }
}

Completions (Legacy)

POST /v1/completions

Creates a completion for the provided prompt. This is a legacy endpoint - we recommend using Chat Completions for most use cases.

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	The model ID to use
`prompt`	string/array	Yes	The prompt(s) to generate completions for
`max_tokens`	integer	No	Maximum tokens to generate. Default: 16
`temperature`	number	No	Sampling temperature (0-2). Default: 1

Embeddings

POST /v1/embeddings

Creates an embedding vector representing the input text. Useful for semantic search, clustering, and similarity comparisons.

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	The embedding model (e.g., "text-embedding-3-large")
`input`	string/array	Yes	Input text(s) to embed
`encoding_format`	string	No	"float" or "base64". Default: "float"
`dimensions`	integer	No	Number of dimensions for the output

Available Embedding Models

Model ID	Dimensions	Max Input
`text-embedding-3-large`	3072	8191 tokens
`text-embedding-3-small`	1536	8191 tokens
`voyage-3`	1024	32000 tokens

List Models

GET /v1/models

Returns a list of all available models.

Example Response

{
    "object": "list",
    "data": [
        {
            "id": "gpt-5.3",
            "object": "model",
            "created": 1733745600,
            "owned_by": "openai",
            "context_window": 256000
        },
        {
            "id": "claude-opus-4.6",
            "object": "model",
            "created": 1733745600,
            "owned_by": "anthropic",
            "context_window": 500000
        }
        // ... more models
    ]
}

Streaming

Keyplex supports streaming responses using Server-Sent Events (SSE). This allows you to receive partial responses as they're generated, providing a better user experience for real-time applications.

Enable Streaming

Set stream: true in your request to enable streaming:

{
    "model": "gpt-5.3",
    "messages": [...],
    "stream": true
}

Python Example

import requests

# Streaming request with requests library
response = requests.post(
    "https://keyplex.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer kpx_your_api_key_here",
        "Content-Type": "application/json"
    },
    json={
        "model": "openai/gpt-4o-mini",
        "messages": [{"role": "user", "content": "Tell me a story"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

JavaScript Example

const response = await fetch('https://keyplex.ai/api/v1/chat/completions', {
    method: 'POST',
    headers: {
        'Authorization': 'Bearer kpx_your_api_key_here',
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        model: 'openai/gpt-4o-mini',
        messages: [{ role: 'user', content: 'Tell me a story' }],
        stream: true
    })
});

const reader = response.body.getReader();
while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    console.log(new TextDecoder().decode(value));
}

Stream Response Format

Each chunk in the stream follows this format:

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Dashboard & Analytics

The Keyplex dashboard at app.keyplex.ai provides comprehensive analytics and management tools for your API usage.

Usage Statistics

Monitor your API usage in real-time with detailed metrics:

Total Requests: Number of API calls made
Tokens Used: Input and output tokens consumed
Model Distribution: Usage breakdown by model
Response Times: Average latency per model
Success Rate: Percentage of successful requests
Geographic Distribution: Request origins by region

Usage API Endpoint

GET /v1/usage

Retrieve your usage statistics programmatically:

{
    "object": "usage",
    "period": {
        "start": "2025-12-01T00:00:00Z",
        "end": "2025-12-09T23:59:59Z"
    },
    "data": {
        "total_requests": 15420,
        "total_tokens": 2845000,
        "prompt_tokens": 1250000,
        "completion_tokens": 1595000,
        "models": {
            "gpt-5.3": { "requests": 8500, "tokens": 1500000 },
            "claude-opus-4.6": { "requests": 4200, "tokens": 980000 },
            "gemini-3-pro": { "requests": 2720, "tokens": 365000 }
        }
    }
}

Cost Tracking

With Keyplex's fixed pricing model, you always know your monthly cost upfront. The dashboard shows:

Current Plan: Your active subscription tier
Billing Cycle: Start and end dates
Usage vs Allowance: Current consumption relative to fair use limits
Historical Costs: Previous billing periods

Request Logs

Access detailed logs for debugging and auditing:

Request ID: Unique identifier for each request
Timestamp: When the request was made
Model: Which model was used
Status: Success or error code
Latency: Response time in milliseconds
Tokens: Input/output token counts

Log Retention: Request logs are retained for 30 days. Enterprise plans include extended retention options.

Rate Limits

To ensure fair usage and service stability, Keyplex enforces rate limits on API requests. These limits are designed to accommodate typical development and production workloads.

Rate Limit Tiers

Plan	Requests/min	Tokens/min	Tokens/day
Free Trial	20	40,000	100,000
Monthly	60	150,000	1,000,000
3 Months	100	250,000	2,000,000
12 Months	200	500,000	5,000,000
Enterprise	Custom	Custom	Custom

Rate Limit Headers

Every API response includes headers indicating your current rate limit status:

Header	Description
`X-RateLimit-Limit-Requests`	Maximum requests per minute
`X-RateLimit-Remaining-Requests`	Remaining requests in current window
`X-RateLimit-Limit-Tokens`	Maximum tokens per minute
`X-RateLimit-Remaining-Tokens`	Remaining tokens in current window
`X-RateLimit-Reset-Requests`	Timestamp when request limit resets
`X-RateLimit-Reset-Tokens`	Timestamp when token limit resets

Handling Rate Limits

When you exceed rate limits, you'll receive a 429 Too Many Requests response. Implement exponential backoff:

import time
import keyplex

def make_request_with_retry(client, messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.3",
                messages=messages
            )
        except keyplex.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait_time = (2 ** attempt) + random.random()
            time.sleep(wait_time)

Error Handling

The Keyplex API uses standard HTTP response codes and returns detailed error messages in JSON format.

HTTP Status Codes

Code	Meaning	Description
`200`	OK	Request succeeded
`400`	Bad Request	Invalid request parameters
`401`	Unauthorized	Invalid or missing API key
`403`	Forbidden	API key lacks required permissions
`404`	Not Found	Resource not found
`429`	Too Many Requests	Rate limit exceeded
`500`	Internal Server Error	Server error (retry with backoff)
`503`	Service Unavailable	Temporary overload (retry later)

Error Response Format

{
    "error": {
        "type": "invalid_request_error",
        "code": "invalid_model",
        "message": "The model 'gpt-999' does not exist.",
        "param": "model",
        "request_id": "req_abc123xyz789"
    }
}

Error Types

Type	Description
`invalid_request_error`	Request parameters are invalid
`authentication_error`	API key is invalid or missing
`permission_error`	API key lacks required permissions
`rate_limit_error`	Rate limit has been exceeded
`model_error`	Model is unavailable or invalid
`context_length_error`	Input exceeds model's context window
`server_error`	Internal server error

SDKs & Libraries

Keyplex provides official SDKs for popular programming languages. All SDKs are open-source and available on GitHub.

Official SDKs

Language	Package	Installation
Python	`keyplex`	`pip install keyplex`
JavaScript/Node.js	`keyplex`	`npm install keyplex`
Go	`keyplex-go`	`go get github.com/keyplex/keyplex-go`
Ruby	`keyplex`	`gem install keyplex`
PHP	`keyplex/keyplex-php`	`composer require keyplex/keyplex-php`
Rust	`keyplex`	`cargo add keyplex`

OpenAI SDK Compatibility

You can use the official OpenAI SDK with Keyplex by changing the base URL:

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="kpx_your_api_key_here",
    base_url="https://keyplex.ai/api/v1"
)

# Use exactly as you would with OpenAI
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: 'kpx_your_api_key_here',
    baseURL: 'https://keyplex.ai/api/v1'
});

const response = await client.chat.completions.create({
    model: 'openai/gpt-4o-mini',
    messages: [{ role: 'user', content: 'Hello!' }]
});

Community SDKs

The community has built additional SDKs for other languages:

Swift: KeyplexSwift - iOS/macOS applications
Kotlin: keyplex-kotlin - Android applications
C#/.NET: Keyplex.NET - .NET applications
Java: keyplex-java - Java applications

Webhooks

Keyplex can send webhook notifications for important events related to your account and API usage.

Available Events

Event	Description
`usage.threshold_reached`	When usage reaches 80% or 90% of limits
`usage.limit_exceeded`	When usage limit is exceeded
`api_key.created`	When a new API key is created
`api_key.revoked`	When an API key is revoked
`subscription.renewed`	When subscription is renewed
`subscription.expired`	When subscription expires

Webhook Payload

{
    "id": "evt_abc123xyz789",
    "type": "usage.threshold_reached",
    "created": 1733745600,
    "data": {
        "threshold": 80,
        "current_usage": 820000,
        "limit": 1000000,
        "period_end": "2025-12-31T23:59:59Z"
    }
}

Webhook Security

All webhook requests include a signature header for verification:

X-Keyplex-Signature: sha256=abc123...

Verify the signature in your webhook handler:

import hmac
import hashlib

def verify_webhook(payload, signature, secret):
    expected = hmac.new(
        secret.encode(),
        payload.encode(),
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)

Best Practices

Follow these recommendations to get the most out of the Keyplex API.

Security

Never expose API keys in client-side code - Always make API calls from your backend
Use environment variables - Store API keys in environment variables, not in code
Rotate keys regularly - Create new keys periodically and revoke old ones
Use key permissions - Only grant the permissions each key needs
Monitor for unusual activity - Set up webhooks for usage alerts

Performance

Use streaming for long responses - Improves perceived latency
Choose the right model - Use smaller models for simple tasks
Implement caching - Cache responses for repeated queries when appropriate
Use connection pooling - Reuse HTTP connections for multiple requests
Set appropriate timeouts - Handle slow responses gracefully

Reliability

Implement retry logic - Use exponential backoff for transient errors
Handle rate limits gracefully - Queue requests when approaching limits
Use request IDs - Include for easier debugging and support
Monitor your usage - Use the dashboard and webhooks
Have fallback models - Switch models if one is unavailable

Cost Optimization

Choose the right plan - Longer commitments offer better rates
Use efficient prompts - Shorter prompts = fewer tokens
Set max_tokens appropriately - Don't generate more than needed
Use the user parameter - Track usage by end-user for analytics

Migration Guide

Migrating to Keyplex from other providers is straightforward thanks to our OpenAI-compatible API.

From OpenAI

If you're using the OpenAI SDK, migration takes less than 5 minutes:

Before (OpenAI)

from openai import OpenAI

client = OpenAI(api_key="sk-...")  # OpenAI key

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...]
)

After (Keyplex)

from openai import OpenAI

client = OpenAI(
    api_key="kpx_...",  # Keyplex key
    base_url="https://keyplex.ai/api/v1"  # Add this line
)

response = client.chat.completions.create(
    model="openai/gpt-4o",  # Use OpenRouter model IDs
    messages=[...]
)

From Anthropic

Migrating from Anthropic's Claude API requires changing the API format:

Before (Anthropic)

from anthropic import Anthropic

client = Anthropic(api_key="sk-ant-...")

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

After (Keyplex)

import requests

response = requests.post(
    "https://keyplex.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer kpx_...",
        "Content-Type": "application/json"
    },
    json={
        "model": "anthropic/claude-3.5-sonnet",  # OpenRouter model ID
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": "Hello"}]
    }
)

Model ID Mapping

Keyplex uses OpenRouter model IDs. Here are some popular models:

Provider	Model Name	Keyplex Model ID (OpenRouter)
OpenAI	GPT-4o	`openai/gpt-4o`
OpenAI	GPT-4o Mini	`openai/gpt-4o-mini`
OpenAI	GPT-4 Turbo	`openai/gpt-4-turbo`
OpenAI	o1	`openai/o1`
Anthropic	Claude 3.5 Sonnet	`anthropic/claude-3.5-sonnet`
Anthropic	Claude 3.5 Haiku	`anthropic/claude-3.5-haiku`
Anthropic	Claude 3 Opus	`anthropic/claude-3-opus`
Google	Gemini 2.0 Flash	`google/gemini-2.0-flash-exp:free`
Google	Gemini Pro 1.5	`google/gemini-pro-1.5`
Mistral	Mistral Large	`mistralai/mistral-large`
Meta	Llama 3.1 405B	`meta-llama/llama-3.1-405b-instruct`
DeepSeek	DeepSeek Chat	`deepseek/deepseek-chat`

Full Model List: Use the GET /v1/models endpoint to retrieve the complete list of available models with their IDs.

Support

We're here to help you succeed with Keyplex.

Getting Help

Documentation: You're reading it! Check the sidebar for specific topics.
Email Support: support@keyplex.ai - Response within 24 hours
Discord Community: Join our developer community for real-time help
GitHub Issues: Report SDK bugs on our GitHub repositories

Status Page

Check the current API status and subscribe to updates at status.keyplex.ai.

Enterprise Support

Enterprise customers receive:

Priority email support with 4-hour SLA
Dedicated account manager
Custom onboarding and integration assistance
Direct Slack channel access

Contact sales@keyplex.ai for enterprise plans.

Ready to get started? Create your free account and get your API key in seconds.