WebSockets API
BetaThe AI Gateway WebSockets API provides a single persistent connection, enabling continuous communication. By using WebSockets, you can establish a single connection for multiple AI requests, eliminating the need for repeated handshakes and TLS negotiations, which enhances performance and reduces latency. This API supports all AI providers connected to AI Gateway, including those that do not natively support WebSockets.
WebSockets are long-lived TCP connections that enable bi-directional, real-time communication between client and server. Unlike HTTP connections, which require repeated handshakes for each request, WebSockets maintain the connection, supporting continuous data exchange with reduced overhead. WebSockets are ideal for applications needing low-latency, real-time data, such as voice assistants.
- Reduced Overhead: Avoid overhead of repeated handshakes and TLS negotiations by maintaining a single, persistent connection.
- Provider Compatibility: Works with all AI providers in AI Gateway. Even if your chosen provider does not support WebSockets, we handle it for you, managing the requests to your preferred AI provider.
- Generate an AI Gateway token with appropriate AI Gateway Run and opt in to using an authenticated gateway.
- Modify your Universal Endpoint URL by replacing
https://
withwss://
to initiate a WebSocket connection: - Open a WebSocket connection authenticated with a Cloudflare token with the AI Gateway Run permission.
For streaming requests, AI Gateway sends an initial message with request metadata indicating the stream is starting:
After this initial message, all streaming chunks are relayed in real-time to the WebSocket connection as they arrive from the inference provider. Only the eventId
field is included in the metadata for these streaming chunks. The eventId
allows AI Gateway to include a client-defined ID with each message, even in a streaming WebSocket environment.
Once all chunks for a request have been streamed, AI Gateway sends a final message to signal the completion of the request. For added flexibility, this message includes all the metadata again, even though it was initially provided at the start of the streaming process.