Images, Audio & Embeddings
Beyond chat, SpiderGate exposes OpenAI-compatible endpoints for image generation, text-to-speech, transcription, and embeddings. They're drop-in replacements for the matching openai SDK calls — point at https://spideriq.ai/api/gate/v1 and use the real OpenAI model names.
Important: Multi-modal endpoints do not use task aliases. Pass the actual model name (
dall-e-3,tts-1,whisper-1,text-embedding-3-large). They also require an OpenAI key in the vault — if none is registered, the call returns503with codeno_openai_key. Add one via the contributor invite flow on The Key Vault.
Image generation
POST /api/gate/v1/images/generations — generate images with dall-e-3, dall-e-2, or gpt-image-1.
curl -X POST "https://spideriq.ai/api/gate/v1/images/generations" \
-H "Authorization: Bearer $SPIDERIQ_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "dall-e-3",
"prompt": "A minimalist logo of a spider on a dark background",
"n": 1,
"size": "1024x1024",
"quality": "standard"
}'model (required) —
dall-e-3,dall-e-2, orgpt-image-1.prompt (required) — up to 4000 characters.
n (default
1) — number of images,1–10.size — e.g.
1024x1024,1024x1792,1792x1024(dall-e-3);256x256,512x512,1024x1024(dall-e-2).quality / style —
standard|hdandvivid|natural(dall-e-3 only).response_format —
url(default) orb64_json.
The response is the OpenAI image object: { "created": ..., "data": [{ "url": ... }] }.
Text-to-speech
POST /api/gate/v1/audio/speech — synthesize speech with tts-1, tts-1-hd, or gpt-4o-mini-tts. The response body is the raw audio bytes (a streaming response), with Content-Type set from the requested format.
curl -X POST "https://spideriq.ai/api/gate/v1/audio/speech" \
-H "Authorization: Bearer $SPIDERIQ_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Welcome to SpiderGate.",
"voice": "nova",
"response_format": "mp3"
}' --output speech.mp3input (required) — text to speak, up to 4096 characters.
voice (required) — one of
alloy,echo,fable,onyx,nova,shimmer. Any other value returns400invalid_voice.response_format —
mp3(default),opus,aac,flac,wav, orpcm. Any other value returns400invalid_response_format.speed — playback speed,
0.25–4.0.
Transcription
POST /api/gate/v1/audio/transcriptions — transcribe audio with whisper-1, gpt-4o-transcribe, or gpt-4o-mini-transcribe. This is a multipart/form-data upload.
curl -X POST "https://spideriq.ai/api/gate/v1/audio/transcriptions" \
-H "Authorization: Bearer $SPIDERIQ_TOKEN" \
-F file="@meeting.mp3" \
-F model="whisper-1" \
-F response_format="json"file (required) — the audio file. Maximum 25 MB (OpenAI's hard limit); larger files return
413file_too_large.model (required) —
whisper-1,gpt-4o-transcribe, orgpt-4o-mini-transcribe.language — ISO-639-1 code (
en,de, …) to improve accuracy.response_format —
json(default),text,srt,verbose_json, orvtt.temperature —
0.0–1.0.
Embeddings
POST /api/gate/v1/embeddings — vectorize text with text-embedding-3-large, text-embedding-3-small, or text-embedding-ada-002.
curl -X POST "https://spideriq.ai/api/gate/v1/embeddings" \
-H "Authorization: Bearer $SPIDERIQ_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "SpiderGate routes requests across providers."
}'input (required) — a single string or an array of strings.
dimensions — output vector size,
1–3072(supported by the-3models).encoding_format —
float(default) orbase64.
The response is the OpenAI embeddings object: { "object": "list", "data": [{ "embedding": [...] }], "model": ..., "usage": ... }.
Cost tracking
All four endpoints are metered the same way chat is. Each request is tagged with its kind (image, audio_tts, audio_stt, embedding) and priced per the provider's published rates, so multi-modal spend shows up in Usage and Traces alongside chat.
Next steps
Add an OpenAI key so these endpoints work — The Key Vault.
Track multi-modal spend in Traces.
See every endpoint in the API Reference.