Streaming Long-Running AI Operations with SSE

Some AI operations take a long time. Extracting data from a 50-page PDF? 30 seconds. Running a multi-step agent that calls 5 APIs? 20 seconds. Extended thinking? 15 seconds.

You cant show a spinner for 30 seconds. Users will assume the app is broken and hit refresh. You need to stream progress as work happens.

Server-Sent Events (SSE) is the simplest way to do this. Simpler than WebSockets, works over regular HTTP, no special server config.

How SSE works

An HTTP response with Content-Type: text/event-stream that stays open. The server sends messages whenever it wants, each prefixed with data: and separated by blank lines. Connection stays open until the server closes it.

Thats it. No handshakes, no frame parsing, no ping-pong.

The pattern

Server side: set SSE headers, emit events as your operation progresses. Start event when work begins, progress events during each step, result event with final data, done event to close the connection. I wrap this in a helper that takes an event type and data object, serializes to JSON, formats as SSE. Every event gets a timestamp so the client can track timing.

Client side (React): Fetch API with a ReadableStream reader. Parse SSE events from chunks as they arrive, update component state. The latest progress event drives the UI.

Proxying between services

In a multi-service setup, the API gateway often needs to proxy an SSE stream from a backend to the client. Open a fetch to the backend, read events, optionally inspect or transform them, forward to the client response. Useful when the orchestrator needs to do something with events (save results to a database) while still passing them through.

Things that will bite you

Nginx buffering. If you're behind Nginx, it buffers responses by default. Your events pile up and arrive all at once when the response finishes. Fix: X-Accel-Buffering: no header or proxy_buffering off in nginx config. I lost a couple hours to this the first time.

Timeouts. Proxies and load balancers have idle connection timeouts, usually 60 seconds. If your operation runs longer, the connection just drops. Increase the timeout or send keep-alive comments periodically. A comment line (starts with :) keeps the connection alive without being treated as an event.

Client disconnect. When a user closes the tab mid-operation, stop processing. In Node.js, listen for close on the request. In Go, use context cancellation. No point burning compute on responses nobody will read.

Go needs explicit flushing. Go's net/http buffers response data. You have to assert the ResponseWriter as an http.Flusher and call Flush() after each event. Without this, events batch up and arrive all at once. Looks identical to the Nginx buffering problem, which makes it annoying to debug.

Why not WebSockets

The communication here is one-directional. Server sends, client reads. WebSockets give you bidirectional but you dont need it. SSE works with standard HTTP infrastructure and reconnects automatically when the connection drops (if you use EventSource).

I'd only reach for WebSockets if the client needs to send messages back during the stream — cancelling an operation, providing input mid-process. For progress streaming, SSE is less code and fewer things to break.

I use this pattern for everything long-running in our system. Document processing, agent workflows, analytics queries. Same helper, same client hook.