Streaming
Set stream: true and SpiderGate streams the completion token-by-token over Server-Sent Events (SSE) — the same format as the OpenAI API, so any OpenAI-compatible streaming client works unchanged.
curl -N -X POST "https://spideriq.ai/api/gate/v1/chat/completions" \
-H "Authorization: Bearer $SPIDERIQ_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "spideriq/chat",
"stream": true,
"messages": [{"role": "user", "content": "Write a limerick about latency."}]
}'The -N flag disables curl's output buffering so you see chunks arrive live.
The chunk format
Each event is a line of the form data: {json} followed by a blank line. The JSON is a chat.completion.chunk object:
data: {"id":"chatcmpl-…","object":"chat.completion.chunk","created":1780690000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-…","object":"chat.completion.chunk","created":1780690000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":"There "},"finish_reason":null}]}
data: {"id":"chatcmpl-…","object":"chat.completion.chunk","created":1780690000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":"once "},"finish_reason":null}]}
data: [DONE]Each chunk's
choices[].deltacarries the incremental piece —roleon the first chunk, thencontent(ortool_calls) on subsequent ones.finish_reasonisnulluntil the final content chunk, where it becomesstop,length,tool_calls, orcontent_filter.The stream ends with the literal line
data: [DONE].
Including usage
Token counts are omitted from streamed responses by default. Ask for them with stream_options:
{
"model": "spideriq/chat",
"stream": true,
"stream_options": {"include_usage": true},
"messages": [{"role": "user", "content": "Hello"}]
}When set, a final chunk carries the usage object (prompt_tokens, completion_tokens, total_tokens) before [DONE].
Streaming with the SDK
The OpenAI SDK handles SSE parsing for you:
from openai import OpenAI
client = OpenAI(
base_url="https://spideriq.ai/api/gate/v1",
api_key="spideriq_pat_…",
)
stream = client.chat.completions.create(
model="agent/chat",
messages=[{"role": "user", "content": "Stream a short poem."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)Response headers
Streaming responses are sent with Content-Type: text/event-stream and Cache-Control: no-cache, with proxy buffering disabled so chunks aren't held back.
Note: The
X-SpiderGate-Fallback-Fromheader and the?include_route_trace=truebody field are available on non-streaming responses only. For streamed requests, read the served model from each chunk'smodelfield instead.
Next steps
Review the full parameter set in Chat Completions.
See per-request detail after the fact in Traces.