Introduction

Welcome to the Keyplex API documentation. Keyplex provides a unified API that gives you access to all major AI models through a single integration point. Instead of managing multiple API keys and different SDK implementations for each provider, you can use one Keyplex API key to access GPT, Claude, Gemini, Grok, Mistral, Llama, and more.

OpenAI Compatible: The Keyplex API is fully compatible with OpenAI's API specification. If you're already using OpenAI, you can switch to Keyplex by simply changing your base URL and API key.

Key Features

  • Unified Access: One API key for all major AI models
  • OpenAI Compatible: Drop-in replacement for OpenAI SDK
  • Model Switching: Change models with a single parameter
  • Fixed Pricing: Predictable monthly costs, no per-token billing
  • Real-time Analytics: Monitor usage and performance in your dashboard
  • 99.9% Uptime SLA: Enterprise-grade reliability

Base URL

All API requests should be made to:

https://keyplex.ai/api/v1

Quick Start

Get started with Keyplex in under 5 minutes. Follow these steps to make your first API call.

Step 1: Create an Account

Sign up at app.keyplex.ai to create your account. No credit card is required to get started.

Step 2: Generate an API Key

Navigate to the API Keys section in your dashboard and click Create New Key. Give your key a descriptive name (e.g., "Development", "Production").

Security Notice: Your API key will only be shown once. Store it securely and never expose it in client-side code or public repositories.

Step 3: Make Your First Request

Using cURL

# Make a chat completion request
curl https://keyplex.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer kpx_your_api_key_here" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

Using Python

import requests

# API Configuration
API_KEY = "kpx_your_api_key_here"
BASE_URL = "https://keyplex.ai/api/v1"

# Create a chat completion
response = requests.post(
    f"{BASE_URL}/chat/completions",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "model": "openai/gpt-4o-mini",
        "messages": [{"role": "user", "content": "Hello, world!"}],
        "temperature": 0.7
    }
)

data = response.json()
print(data["choices"][0]["message"]["content"])

Using JavaScript/Node.js

// API Configuration
const API_KEY = 'kpx_your_api_key_here';
const BASE_URL = 'https://keyplex.ai/api/v1';

// Create a chat completion
const response = await fetch(`${BASE_URL}/chat/completions`, {
    method: 'POST',
    headers: {
        'Authorization': `Bearer ${API_KEY}`,
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        model: 'openai/gpt-4o-mini',
        messages: [{ role: 'user', content: 'Hello, world!' }],
        temperature: 0.7
    })
});

const data = await response.json();
console.log(data.choices[0].message.content);

Authentication

All API requests require authentication using an API key. Your API key should be included in the Authorization header of every request.

API Keys

API keys are used to authenticate requests to the Keyplex API. You can create multiple API keys to separate concerns (development, staging, production) or to track usage across different applications.

Authentication Header

Authorization: Bearer kpx_your_api_key_here

API keys follow the format kpx_ followed by a 48-character alphanumeric string. Example:

kpx_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4

Key Management

You can manage your API keys from the dashboard at app.keyplex.ai/keys.

Key Permissions

Each API key can be configured with specific permissions:

Permission Description
chat:write Create chat completions
completions:write Create text completions
embeddings:write Generate embeddings
models:read List available models
usage:read Read usage statistics

Key Rotation

We recommend rotating your API keys periodically for security. You can create a new key before deactivating the old one to ensure zero downtime.

Never expose your API key: Do not include API keys in client-side code, public repositories, or share them publicly. Use environment variables or secret management systems.

Available Models

Keyplex provides access to a wide range of AI models from different providers. All models are accessed through the same unified API, making it easy to switch between them.

OpenAI Models

Model ID Context Window Best For
gpt-5.2 256K tokens Most capable, complex reasoning, coding
gpt-5.2-mini 128K tokens Balanced performance and speed
gpt-5.2-nano 32K tokens Fast responses, simple tasks
o3 200K tokens Advanced reasoning, math, science
o3-mini 128K tokens Efficient reasoning tasks

Anthropic Models

Model ID Context Window Best For
claude-opus-4.5 500K tokens Most capable Claude, complex analysis
claude-sonnet-4.5 200K tokens Balanced performance, general use
claude-haiku-4.5 200K tokens Fast responses, cost-effective

Google Models

Model ID Context Window Best For
gemini-3-pro 2M tokens Multimodal, long context, research
gemini-3-flash 1M tokens Fast multimodal processing

Other Models

Model ID Provider Best For
grok-4 xAI Real-time knowledge, direct responses
mistral-3-large Mistral AI European AI, enterprise deployments
mistral-3-medium Mistral AI Balanced, cost-effective
llama-4-405b Meta Open-weight, research, customization
llama-4-70b Meta Efficient open-source alternative
deepseek-v3 DeepSeek Coding, technical tasks

Model Availability: All models are available on all subscription plans. Use the /v1/models endpoint to get the current list of available models.

API Endpoints

The Keyplex API provides several endpoints for different AI capabilities. All endpoints follow REST conventions and return JSON responses.

Chat Completions

POST /v1/chat/completions

Creates a model response for the given chat conversation. This is the most commonly used endpoint for building conversational AI applications.

Request Body

Parameter Type Required Description
model string Yes The model ID to use (e.g., "gpt-5.2", "claude-opus-4.5")
messages array Yes Array of message objects with role and content
temperature number No Sampling temperature (0-2). Default: 1
max_tokens integer No Maximum tokens to generate
stream boolean No Enable streaming responses. Default: false
top_p number No Nucleus sampling parameter (0-1). Default: 1
stop string/array No Stop sequences to end generation
presence_penalty number No Penalize new tokens (-2 to 2). Default: 0
frequency_penalty number No Penalize frequent tokens (-2 to 2). Default: 0
user string No Unique identifier for end-user (for analytics)

Message Roles

Role Description
system Sets the behavior and context for the assistant
user Messages from the user
assistant Previous responses from the assistant

Example Request

{
    "model": "claude-opus-4.5",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful coding assistant."
        },
        {
            "role": "user",
            "content": "Write a Python function to calculate fibonacci numbers."
        }
    ],
    "temperature": 0.7,
    "max_tokens": 1000
}

Example Response

{
    "id": "chatcmpl-abc123xyz789",
    "object": "chat.completion",
    "created": 1733745600,
    "model": "claude-opus-4.5",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Here's a Python function to calculate Fibonacci numbers:\n\n```python\ndef fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)\n```"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 28,
        "completion_tokens": 156,
        "total_tokens": 184
    }
}

Completions (Legacy)

POST /v1/completions

Creates a completion for the provided prompt. This is a legacy endpoint - we recommend using Chat Completions for most use cases.

Request Body

Parameter Type Required Description
model string Yes The model ID to use
prompt string/array Yes The prompt(s) to generate completions for
max_tokens integer No Maximum tokens to generate. Default: 16
temperature number No Sampling temperature (0-2). Default: 1

Embeddings

POST /v1/embeddings

Creates an embedding vector representing the input text. Useful for semantic search, clustering, and similarity comparisons.

Request Body

Parameter Type Required Description
model string Yes The embedding model (e.g., "text-embedding-3-large")
input string/array Yes Input text(s) to embed
encoding_format string No "float" or "base64". Default: "float"
dimensions integer No Number of dimensions for the output

Available Embedding Models

Model ID Dimensions Max Input
text-embedding-3-large 3072 8191 tokens
text-embedding-3-small 1536 8191 tokens
voyage-3 1024 32000 tokens

List Models

GET /v1/models

Returns a list of all available models.

Example Response

{
    "object": "list",
    "data": [
        {
            "id": "gpt-5.2",
            "object": "model",
            "created": 1733745600,
            "owned_by": "openai",
            "context_window": 256000
        },
        {
            "id": "claude-opus-4.5",
            "object": "model",
            "created": 1733745600,
            "owned_by": "anthropic",
            "context_window": 500000
        }
        // ... more models
    ]
}

Streaming

Keyplex supports streaming responses using Server-Sent Events (SSE). This allows you to receive partial responses as they're generated, providing a better user experience for real-time applications.

Enable Streaming

Set stream: true in your request to enable streaming:

{
    "model": "gpt-5.2",
    "messages": [...],
    "stream": true
}

Python Example

import requests

# Streaming request with requests library
response = requests.post(
    "https://keyplex.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer kpx_your_api_key_here",
        "Content-Type": "application/json"
    },
    json={
        "model": "openai/gpt-4o-mini",
        "messages": [{"role": "user", "content": "Tell me a story"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

JavaScript Example

const response = await fetch('https://keyplex.ai/api/v1/chat/completions', {
    method: 'POST',
    headers: {
        'Authorization': 'Bearer kpx_your_api_key_here',
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        model: 'openai/gpt-4o-mini',
        messages: [{ role: 'user', content: 'Tell me a story' }],
        stream: true
    })
});

const reader = response.body.getReader();
while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    console.log(new TextDecoder().decode(value));
}

Stream Response Format

Each chunk in the stream follows this format:

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Dashboard & Analytics

The Keyplex dashboard at app.keyplex.ai provides comprehensive analytics and management tools for your API usage.

Usage Statistics

Monitor your API usage in real-time with detailed metrics:

  • Total Requests: Number of API calls made
  • Tokens Used: Input and output tokens consumed
  • Model Distribution: Usage breakdown by model
  • Response Times: Average latency per model
  • Success Rate: Percentage of successful requests
  • Geographic Distribution: Request origins by region

Usage API Endpoint

GET /v1/usage

Retrieve your usage statistics programmatically:

{
    "object": "usage",
    "period": {
        "start": "2025-12-01T00:00:00Z",
        "end": "2025-12-09T23:59:59Z"
    },
    "data": {
        "total_requests": 15420,
        "total_tokens": 2845000,
        "prompt_tokens": 1250000,
        "completion_tokens": 1595000,
        "models": {
            "gpt-5.2": { "requests": 8500, "tokens": 1500000 },
            "claude-opus-4.5": { "requests": 4200, "tokens": 980000 },
            "gemini-3-pro": { "requests": 2720, "tokens": 365000 }
        }
    }
}

Cost Tracking

With Keyplex's fixed pricing model, you always know your monthly cost upfront. The dashboard shows:

  • Current Plan: Your active subscription tier
  • Billing Cycle: Start and end dates
  • Usage vs Allowance: Current consumption relative to fair use limits
  • Historical Costs: Previous billing periods

Request Logs

Access detailed logs for debugging and auditing:

  • Request ID: Unique identifier for each request
  • Timestamp: When the request was made
  • Model: Which model was used
  • Status: Success or error code
  • Latency: Response time in milliseconds
  • Tokens: Input/output token counts

Log Retention: Request logs are retained for 30 days. Enterprise plans include extended retention options.

Rate Limits

To ensure fair usage and service stability, Keyplex enforces rate limits on API requests. These limits are designed to accommodate typical development and production workloads.

Rate Limit Tiers

Plan Requests/min Tokens/min Tokens/day
Free Trial 20 40,000 100,000
Monthly 60 150,000 1,000,000
3 Months 100 250,000 2,000,000
12 Months 200 500,000 5,000,000
Enterprise Custom Custom Custom

Rate Limit Headers

Every API response includes headers indicating your current rate limit status:

Header Description
X-RateLimit-Limit-Requests Maximum requests per minute
X-RateLimit-Remaining-Requests Remaining requests in current window
X-RateLimit-Limit-Tokens Maximum tokens per minute
X-RateLimit-Remaining-Tokens Remaining tokens in current window
X-RateLimit-Reset-Requests Timestamp when request limit resets
X-RateLimit-Reset-Tokens Timestamp when token limit resets

Handling Rate Limits

When you exceed rate limits, you'll receive a 429 Too Many Requests response. Implement exponential backoff:

import time
import keyplex

def make_request_with_retry(client, messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.2",
                messages=messages
            )
        except keyplex.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait_time = (2 ** attempt) + random.random()
            time.sleep(wait_time)

Error Handling

The Keyplex API uses standard HTTP response codes and returns detailed error messages in JSON format.

HTTP Status Codes

Code Meaning Description
200 OK Request succeeded
400 Bad Request Invalid request parameters
401 Unauthorized Invalid or missing API key
403 Forbidden API key lacks required permissions
404 Not Found Resource not found
429 Too Many Requests Rate limit exceeded
500 Internal Server Error Server error (retry with backoff)
503 Service Unavailable Temporary overload (retry later)

Error Response Format

{
    "error": {
        "type": "invalid_request_error",
        "code": "invalid_model",
        "message": "The model 'gpt-999' does not exist.",
        "param": "model",
        "request_id": "req_abc123xyz789"
    }
}

Error Types

Type Description
invalid_request_error Request parameters are invalid
authentication_error API key is invalid or missing
permission_error API key lacks required permissions
rate_limit_error Rate limit has been exceeded
model_error Model is unavailable or invalid
context_length_error Input exceeds model's context window
server_error Internal server error

SDKs & Libraries

Keyplex provides official SDKs for popular programming languages. All SDKs are open-source and available on GitHub.

Official SDKs

Language Package Installation
Python keyplex pip install keyplex
JavaScript/Node.js keyplex npm install keyplex
Go keyplex-go go get github.com/keyplex/keyplex-go
Ruby keyplex gem install keyplex
PHP keyplex/keyplex-php composer require keyplex/keyplex-php
Rust keyplex cargo add keyplex

OpenAI SDK Compatibility

You can use the official OpenAI SDK with Keyplex by changing the base URL:

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="kpx_your_api_key_here",
    base_url="https://keyplex.ai/api/v1"
)

# Use exactly as you would with OpenAI
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: 'kpx_your_api_key_here',
    baseURL: 'https://keyplex.ai/api/v1'
});

const response = await client.chat.completions.create({
    model: 'openai/gpt-4o-mini',
    messages: [{ role: 'user', content: 'Hello!' }]
});

Community SDKs

The community has built additional SDKs for other languages:

  • Swift: KeyplexSwift - iOS/macOS applications
  • Kotlin: keyplex-kotlin - Android applications
  • C#/.NET: Keyplex.NET - .NET applications
  • Java: keyplex-java - Java applications

Webhooks

Keyplex can send webhook notifications for important events related to your account and API usage.

Available Events

Event Description
usage.threshold_reached When usage reaches 80% or 90% of limits
usage.limit_exceeded When usage limit is exceeded
api_key.created When a new API key is created
api_key.revoked When an API key is revoked
subscription.renewed When subscription is renewed
subscription.expired When subscription expires

Webhook Payload

{
    "id": "evt_abc123xyz789",
    "type": "usage.threshold_reached",
    "created": 1733745600,
    "data": {
        "threshold": 80,
        "current_usage": 820000,
        "limit": 1000000,
        "period_end": "2025-12-31T23:59:59Z"
    }
}

Webhook Security

All webhook requests include a signature header for verification:

X-Keyplex-Signature: sha256=abc123...

Verify the signature in your webhook handler:

import hmac
import hashlib

def verify_webhook(payload, signature, secret):
    expected = hmac.new(
        secret.encode(),
        payload.encode(),
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)

Best Practices

Follow these recommendations to get the most out of the Keyplex API.

Security

  • Never expose API keys in client-side code - Always make API calls from your backend
  • Use environment variables - Store API keys in environment variables, not in code
  • Rotate keys regularly - Create new keys periodically and revoke old ones
  • Use key permissions - Only grant the permissions each key needs
  • Monitor for unusual activity - Set up webhooks for usage alerts

Performance

  • Use streaming for long responses - Improves perceived latency
  • Choose the right model - Use smaller models for simple tasks
  • Implement caching - Cache responses for repeated queries when appropriate
  • Use connection pooling - Reuse HTTP connections for multiple requests
  • Set appropriate timeouts - Handle slow responses gracefully

Reliability

  • Implement retry logic - Use exponential backoff for transient errors
  • Handle rate limits gracefully - Queue requests when approaching limits
  • Use request IDs - Include for easier debugging and support
  • Monitor your usage - Use the dashboard and webhooks
  • Have fallback models - Switch models if one is unavailable

Cost Optimization

  • Choose the right plan - Longer commitments offer better rates
  • Use efficient prompts - Shorter prompts = fewer tokens
  • Set max_tokens appropriately - Don't generate more than needed
  • Use the user parameter - Track usage by end-user for analytics

Migration Guide

Migrating to Keyplex from other providers is straightforward thanks to our OpenAI-compatible API.

From OpenAI

If you're using the OpenAI SDK, migration takes less than 5 minutes:

Before (OpenAI)

from openai import OpenAI

client = OpenAI(api_key="sk-...")  # OpenAI key

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...]
)

After (Keyplex)

from openai import OpenAI

client = OpenAI(
    api_key="kpx_...",  # Keyplex key
    base_url="https://keyplex.ai/api/v1"  # Add this line
)

response = client.chat.completions.create(
    model="openai/gpt-4o",  # Use OpenRouter model IDs
    messages=[...]
)

From Anthropic

Migrating from Anthropic's Claude API requires changing the API format:

Before (Anthropic)

from anthropic import Anthropic

client = Anthropic(api_key="sk-ant-...")

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

After (Keyplex)

import requests

response = requests.post(
    "https://keyplex.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer kpx_...",
        "Content-Type": "application/json"
    },
    json={
        "model": "anthropic/claude-3.5-sonnet",  # OpenRouter model ID
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": "Hello"}]
    }
)

Model ID Mapping

Keyplex uses OpenRouter model IDs. Here are some popular models:

Provider Model Name Keyplex Model ID (OpenRouter)
OpenAI GPT-4o openai/gpt-4o
OpenAI GPT-4o Mini openai/gpt-4o-mini
OpenAI GPT-4 Turbo openai/gpt-4-turbo
OpenAI o1 openai/o1
Anthropic Claude 3.5 Sonnet anthropic/claude-3.5-sonnet
Anthropic Claude 3.5 Haiku anthropic/claude-3.5-haiku
Anthropic Claude 3 Opus anthropic/claude-3-opus
Google Gemini 2.0 Flash google/gemini-2.0-flash-exp:free
Google Gemini Pro 1.5 google/gemini-pro-1.5
Mistral Mistral Large mistralai/mistral-large
Meta Llama 3.1 405B meta-llama/llama-3.1-405b-instruct
DeepSeek DeepSeek Chat deepseek/deepseek-chat

Full Model List: Use the GET /v1/models endpoint to retrieve the complete list of available models with their IDs.

Support

We're here to help you succeed with Keyplex.

Getting Help

  • Documentation: You're reading it! Check the sidebar for specific topics.
  • Email Support: support@keyplex.ai - Response within 24 hours
  • Discord Community: Join our developer community for real-time help
  • GitHub Issues: Report SDK bugs on our GitHub repositories

Status Page

Check the current API status and subscribe to updates at status.keyplex.ai.

Enterprise Support

Enterprise customers receive:

  • Priority email support with 4-hour SLA
  • Dedicated account manager
  • Custom onboarding and integration assistance
  • Direct Slack channel access

Contact sales@keyplex.ai for enterprise plans.

Ready to get started? Create your free account and get your API key in seconds.