SpiderGate

Set stream: true and SpiderGate streams the completion token-by-token over Server-Sent Events (SSE) — the same format as the OpenAI API, so any OpenAI-compatible streaming client works unchanged.

curl -N -X POST "https://spideriq.ai/api/gate/v1/chat/completions" \
  -H "Authorization: Bearer $SPIDERIQ_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "spideriq/chat",
    "stream": true,
    "messages": [{"role": "user", "content": "Write a limerick about latency."}]
  }'

The -N flag disables curl's output buffering so you see chunks arrive live.

The chunk format

Each event is a line of the form data: {json} followed by a blank line. The JSON is a chat.completion.chunk object:

data: {"id":"chatcmpl-…","object":"chat.completion.chunk","created":1780690000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-…","object":"chat.completion.chunk","created":1780690000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":"There "},"finish_reason":null}]}

data: {"id":"chatcmpl-…","object":"chat.completion.chunk","created":1780690000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":"once "},"finish_reason":null}]}

data: [DONE]

Each chunk's choices[].delta carries the incremental piece — role on the first chunk, then content (or tool_calls) on subsequent ones.
finish_reason is null until the final content chunk, where it becomes stop, length, tool_calls, or content_filter.
The stream ends with the literal line data: [DONE].

Including usage

Token counts are omitted from streamed responses by default. Ask for them with stream_options:

{
  "model": "spideriq/chat",
  "stream": true,
  "stream_options": {"include_usage": true},
  "messages": [{"role": "user", "content": "Hello"}]
}

When set, a final chunk carries the usage object (prompt_tokens, completion_tokens, total_tokens) before [DONE].

Streaming with the SDK

The OpenAI SDK handles SSE parsing for you:

from openai import OpenAI

client = OpenAI(
    base_url="https://spideriq.ai/api/gate/v1",
    api_key="spideriq_pat_…",
)

stream = client.chat.completions.create(
    model="agent/chat",
    messages=[{"role": "user", "content": "Stream a short poem."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Response headers

Streaming responses are sent with Content-Type: text/event-stream and Cache-Control: no-cache, with proxy buffering disabled so chunks aren't held back.

Note: The X-SpiderGate-Fallback-From header and the ?include_route_trace=true body field are available on non-streaming responses only. For streamed requests, read the served model from each chunk's model field instead.

Next steps

Review the full parameter set in Chat Completions.
See per-request detail after the fact in Traces.