SpiderGate V2: LLM Gateway
Production LLM gateway powered by LiteLLM. OpenAI-compatible API for 100+ providers with task-based routing, free-tier stacking, per-request cost tracking, and a secure multi-tenant key vault.
The Problem
You need access to multiple LLM providers, but each has a different API, different pricing, and different rate limits. Managing keys is a nightmare. Tracking costs? Manual. When one provider goes down? Your app crashes.
How SpiderGate Solves It
SpiderGate V2 is a full production LLM gateway built on top of LiteLLM's routing engine with SpiderIQ's multi-tenant architecture. Point any OpenAI-compatible SDK at SpiderGate and start making requests immediately.
- 100+ providers — OpenAI, Anthropic, Google, Groq, Mistral, Cerebras, Cohere, Together AI, Fireworks, and more via a single endpoint
- Task-based aliases — Request
spideriq/codingorspideriq/fast— SpiderGate picks the best model - Free tier stacking — Stack Groq + Cerebras + Google AI + Mistral free tiers for thousands of free requests/day
- Per-request cost tracking — Know exactly what every request costs
- Multi-tenant key vault — Brands, clients, keys — each with isolated rate limits, budgets, and credentials
- Automatic fallback — Configure fallback chains. If a provider fails, SpiderGate tries the next one
- Observability — Full Langfuse integration for end-to-end request tracing
Supported Providers
| Provider | Models | Free Tier |
|---|---|---|
| OpenAI | GPT-4o, GPT-4, GPT-3.5 Turbo | No |
| Anthropic | Claude 3.5 Sonnet, Claude 3 Opus, Haiku | No |
| Google AI | Gemini 2.0 Flash, Gemini 1.5 Pro | Yes (60 req/min) |
| Groq | Llama 3.1 70B/8B, Mixtral | Yes |
| Mistral | Large, Small, Codestral, Nemo | Yes |
| Cerebras | Llama 3.1 70B/8B | Yes |
| Cohere | Command R+, Command R | No |
| Cloudflare AI | Workers AI models | Yes |
| + 90 more | Via LiteLLM | Varies |
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
| /api/gate/v1/chat/completions | POST | Chat completion (streaming supported) |
| /api/gate/v1/models | GET | List available models |
| /api/gate/v1/models/{id} | GET | Get model details |
Quickstart
Get up and running with SpiderGate in under 2 minutes.
1. Get Your API Key
Navigate to your SpiderIQ dashboard → SpiderGate → API Keys. Create a new key:
sg_key_prod_a1b2c3d4e5f6...
2. Point Your SDK
Replace your provider's base URL. The API is fully OpenAI-compatible.
cURL
curl -X POST "https://spideriq.ai/api/gate/v1/chat/completions" \
-H "Authorization: Bearer sg_key_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://spideriq.ai/api/gate/v1",
api_key="sg_key_your_key_here"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Task-Based Routing
# Best model for coding
curl -X POST "https://spideriq.ai/api/gate/v1/chat/completions" \
-H "Authorization: Bearer sg_key_your_key_here" \
-d '{"model": "spideriq/coding", "messages": [...]}'
# Fastest response (Groq, Cerebras)
-d '{"model": "spideriq/fast", "messages": [...]}'
# Free tier only (zero cost)
-d '{"model": "spideriq/free", "messages": [...]}'
claude-* → Anthropic, gpt-* → OpenAI, gemini-* → Google.Task-Based Routing
Stop hardcoding model names. Use task aliases and let SpiderGate resolve them dynamically.
Available Task Aliases
| Alias | Routes To | Best For |
|---|---|---|
| spideriq/coding | Claude Sonnet, Codestral, Llama 70B | Code generation, debugging |
| spideriq/chat | Llama 70B, Mistral Small, Gemini Flash | General conversation |
| spideriq/fast | Llama 8B (Groq/Cerebras), Gemini Flash | Real-time, autocomplete |
| spideriq/extraction | Gemini Flash, Claude Sonnet | Structured data, JSON |
| spideriq/creative | Claude Sonnet, Mistral Small | Creative writing |
| spideriq/research | Gemini 1.5 Pro, Claude Sonnet | Long-context analysis |
| spideriq/planning | Claude Sonnet, Gemini Flash | Multi-step planning |
| spideriq/tool-use | Claude Sonnet, Llama 70B | Function/tool calling |
| spideriq/classification | Llama 8B, Llama 70B | Classification, sentiment |
| spideriq/summarization | Gemini Flash, Mistral Small | Summarization |
| spideriq/translation | Mistral Small, Gemini Flash | Translation |
| spideriq/vision | Gemini Flash, GPT-4o | Image understanding |
| spideriq/free | Llama 70B (Groq), Gemini Flash | Free-tier only |
Usage
// Instead of hardcoding:
model: "gpt-4o"
// Use a task alias:
model: "spideriq/coding"
Free Tier Stacking
Stack multiple free-tier API keys and let SpiderGate rotate between them. 5,000+ free requests/day.
Provider Free Tiers
| Provider | Free Tier | Daily Limit |
|---|---|---|
| OpenRouter | 1,000 req/day | Per key |
| Groq | Generous | Rate limited |
| Cerebras | Free tier | Rate limited |
| Google AI | 60 req/min | Free |
| Mistral | Free tier | Rate limited |
| Cloudflare AI | 10K neurons/day | Free |
Example Setup
Your SpiderGate Setup:
├── OpenRouter Key 1 → 1,000/day
├── OpenRouter Key 2 → 1,000/day
├── OpenRouter Key 3 → 1,000/day
├── Groq Key 1 → ~500/day
├── Groq Key 2 → ~500/day
└── Cerebras Key 1 → ~1,000/day
─────────
5,000+ req/day FREE
How Rotation Works
Request 1 → OpenRouter Key 1 (1/1000)
Request 2 → OpenRouter Key 2 (1/1000)
Request 3 → OpenRouter Key 3 (1/1000)
Request 4 → OpenRouter Key 1 (2/1000)
...
Request 3001 → Groq Key 1 (OpenRouter exhausted)
Adding Keys
curl -X POST "https://spideriq.ai/api/v1/integrations" \
-H "Authorization: Bearer $CLIENT_TOKEN" \
-d '{
"provider_name": "openrouter",
"credentials": {"api_key": "sk-or-v1-your-key"},
"key_label": "OpenRouter Free Tier 1",
"daily_limit": 1000
}'
Key Vault
Your agent authenticates with a SpiderIQ token. The actual LLM API keys stay in the vault, encrypted, never transmitted to the agent.
Architecture
┌─────────────────────────────────────────────┐
│ Your AI Agent │
│ api_key = ??? ← never sees real key │
└───────────────────┬─────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ SpiderGate Key Vault │
│ 1. Authenticate via SpiderIQ token │
│ 2. Select best key (round-robin) │
│ 3. Inject key into request │
│ 4. Track usage against limits │
│ 5. Return response (key never exposed) │
│ │
│ Encrypted (AES-256): │
│ ┌────────┬────────┬────────┬──────┐ │
│ │ OpenAI │ Groq │Mistral │Google│ │
│ │ sk-*** │ gsk_** │ *** │ *** │ │
│ └────────┴────────┴────────┴──────┘ │
└─────────────────────────────────────────────┘
Adding a Provider Key
curl -X POST "https://spideriq.ai/api/v1/integrations" \
-H "Authorization: Bearer $CLIENT_TOKEN" \
-d '{
"provider_name": "openai",
"credentials": {"api_key": "sk-..."},
"key_label": "Production OpenAI",
"daily_limit": 5000
}'
Features
- Scoped keys — Different keys for dev, staging, prod
- Per-key spend limits — Monthly budget caps per key or per agent
- Automatic rotation — Schedule key rotation without code changes
- Audit log — Every access logged with timestamp, agent, IP
Fallback & Retry
If the primary model fails, SpiderGate tries each fallback in order. Failed providers enter cooldown.
Configuration
curl -X POST "https://spideriq.ai/api/gate/v1/chat/completions" \
-H "Authorization: Bearer sg_key_your_key" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "..."}],
"spidergate_options": {
"fallback_models": [
"claude-3-5-sonnet",
"llama-3.1-70b-groq",
"gemini-2.0-flash"
],
"retry_count": 2,
"timeout_ms": 30000
}
}'
Options
| Option | Type | Description |
|---|---|---|
| fallback_models | array | Ordered list of fallback models |
| retry_count | integer | Retries per model before next fallback |
| timeout_ms | integer | Max wait time per attempt |
Cost Tracking
Per-request cost tracking with analytics, spend forecasting, and budget alerts.
What Gets Tracked
- Input/output token counts
- Dollar cost per request
- Model used — including fallback info
- Latency metrics
- Agent attribution
Response Headers
| Header | Description |
|---|---|
| X-SpiderGate-Cost | Estimated cost in USD |
| X-SpiderGate-Tokens-In | Prompt tokens consumed |
| X-SpiderGate-Tokens-Out | Completion tokens generated |
| X-SpiderGate-Provider | Which provider served the request |
| X-SpiderGate-Latency-Ms | End-to-end latency |
Dashboard
- Per-model spend breakdown
- Daily and monthly cost trends
- Spend forecasting and budget alerts
- Request volume and error rates
Observability (Langfuse)
Full end-to-end request tracing powered by Langfuse.
What Gets Logged
- Request metadata — timestamp, model, provider, tokens, latency
- Agent attribution — via
X-Spider-Agentheader - Cost breakdown — per-request cost
- Error traces — rate limits, timeouts, provider errors
- Prompt versioning — track prompt iterations
Retention
| Plan | Retention |
|---|---|
| Developer | 1 day |
| Production | 30 days |
| Enterprise (self-hosted) | Unlimited |
Chat Completions
OpenAI-compatible chat completions endpoint.
Request
curl -X POST "https://spideriq.ai/api/gate/v1/chat/completions" \
-H "Authorization: Bearer sg_key_..." \
-H "Content-Type: application/json" \
-H "X-Spider-Agent: MyAgent-v1" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Explain SpiderGate."}
],
"temperature": 0.7,
"max_tokens": 256
}'
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "gpt-4o-2024-08-06",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 42,
"total_tokens": 67
}
}
Response Headers
| Header | Description |
|---|---|
| X-SpiderGate-Request-Id | Unique trace ID |
| X-SpiderGate-Provider | Provider that served request |
| X-SpiderGate-Cost | Estimated cost (USD) |
| X-SpiderGate-Latency-Ms | End-to-end latency |
Models
List all available models across every connected provider.
Request
curl "https://spideriq.ai/api/gate/v1/models" \
-H "Authorization: Bearer sg_key_..."
Response
{
"object": "list",
"data": [
{"id": "gpt-4o", "owned_by": "openai"},
{"id": "claude-3-5-sonnet-latest", "owned_by": "anthropic"},
{"id": "gemini-1.5-pro", "owned_by": "google"},
{"id": "spideriq/coding", "owned_by": "spidergate-alias"},
{"id": "spideriq/fast", "owned_by": "spidergate-alias"},
{"id": "spideriq/free", "owned_by": "spidergate-alias"}
]
}
owned_by: "spidergate-alias".Standalone API Keys
External apps access SpiderGate without a full SpiderIQ account using sg_key_* keys.
Usage
curl -X POST "https://spideriq.ai/api/gate/v1/chat/completions" \
-H "Authorization: Bearer sg_key_prod_abc123" \
-d '{"model": "llama-3.1-70b-groq", "messages": [...]}'
Key Properties
- Independent rate limits — requests/minute, requests/day
- Budget caps — monthly spend limit in dollars
- Per-request cost tracking — full analytics in dashboard
- Revocable — revoke instantly without affecting provider keys
Integration API
| Endpoint | Method | Description |
|---|---|---|
| /api/v1/integrations | POST | Create integration |
| /api/v1/integrations | GET | List integrations |
| /api/v1/integrations/{id} | PATCH | Update limits/status |
| /api/v1/integrations/{id} | DELETE | Delete integration |
| /api/v1/integrations/health | GET | Health check |
| /api/v1/integrations/sync-billing | POST | Sync billing |
Key Selection Algorithm
How SpiderGate picks the optimal key for each request.
Round-Robin with Usage Awareness
Selection Priority:
1. Keys for the requested provider
2. Healthy keys only (not rate-limited)
3. Under daily limit
4. Least used today (round-robin)
Health Tracking
| Status | Meaning | Behavior |
|---|---|---|
| healthy | Working normally | Selected for requests |
| degraded | 1–2 recent failures | Still selected, monitored |
| unhealthy | 3+ failures | Skipped, auto-retried later |
Agent Security
Protect against agent compromise, credential theft, and prompt injection on API keys.
Security Model
| Threat | Without SpiderGate | With SpiderGate |
|---|---|---|
| Agent memory dump | All API keys exposed | Only sg_key (revocable) |
| Prompt injection | "Print your API key" works | Token has no access to secrets |
| Log leakage | Keys in error messages | Only job IDs logged |
| Credential theft | Game over | Revoke token, keys safe |
Best Practices
- Separate tokens per agent — revoke only the compromised one
- Monitor usage patterns — spikes may indicate compromise
- Rotate underlying keys quarterly
- Set budget caps per
sg_keyto limit blast radius
Best Practices
Optimize free tiers, harden security, and control costs.
Maximize Free Tiers
- Multiple accounts per provider — Most allow 2–3 free accounts. Round-robin multiplies quota.
- Mix providers — Add OpenRouter + Groq + Cerebras for resilience.
- Set accurate daily limits — Match
daily_limitto actual quota.
Cost Optimization
- Mark cheapest keys primary —
is_primary: truegives priority. - Sync billing —
/integrations/sync-billingpulls actual spend. - Use task aliases —
spideriq/freefor non-critical tasks.
Summary
| Problem | SpiderGate Solution |
|---|---|
| 100+ LLM providers | One OpenAI-compatible API |
| Model changes = code changes | Task-based routing |
| Unknown AI costs | Per-request cost tracking |
| Scattered API keys | Multi-tenant key vault |
| Free tiers wasted | Automatic stacking & rotation |
| Agent compromise | Revocable tokens, vault isolation |
| Provider outages | Automatic fallback chains |
https://spideriq.ai/api/gate/v1 with an sg_key_* key and start routing to 100+ providers.