Streaming LLM responses is not a mystic art

Content-Type: application/x-ndjson is kind of a dream.

Streaming stuff over HTTP is a lot easier than I thought!

When experimenting with various LLM-enabled ideas recently, I’ve stuck to a fairly simple backend API structure: HTTP server has controllers, those controllers await some asynchronous request(s) to a model API, and then I return the results.

I knew that streaming responses existed, of course, and can see from daily use (in ChatGPT, in coding agents) how useful it can be for a nicer UX. In my head, though, streaming was this weird, arcane topic; I kept thinking “eh, let’s keep it simple” and never really diving in.

During a recent hackathon project at Mintlify, streaming was too obvious of an idea to pass up, and so I had Claude write up a basic version of streaming to get me started. The code was much shorter and easier to read than I’d anticipated! Here’s a version I stripped down to be even simpler, no helper functions or anything:

import { streamText } from "ai";

// server is a simple Express app
app.post("/api/stream", async (_req, res) => {
  // Tell the client to accept event-stream
  res.writeHead(200, {
    // Also common to use "text/event-stream" for Server-Sent Events
    "Content-Type": "application/x-ndjson",
    "Cache-Control": "no-cache",
    Connection: "keep-alive",
  });

  // ai-sdk method
  const result = streamText({
    model: openai("gpt-4o-mini"),
    prompt: "Tell me a short story about a robot learning to paint.",
  });

  // data type 1: text to append to the frontend
  for await (const chunk of result.textStream) {
    res.write(JSON.stringify({ type: "text", content: chunk }) + "\n");
  }

  // data type 2: running count of token usage
  const usage = await result.usage;
  res.write(JSON.stringify({ type: "usage", tokens: usage.totalTokens }) + "\n");
  res.end();
});

@ai-sdk is an unsung hero here. Being able to swap in different models with a single line change is obviously huge, but the streamText method is a simple abstraction while being flexible enough to do pretty much whatever you want it to. streamObject is also great for structured output via OpenAPI spec or (my preference) Zod schema. Generally, I think the DX is super nice; highly recommend dropping it in instead of (my previous default) the OpenAI SDK. I haven’t even gotten into tool-definition/calling or agent-loop control stuff, but it feels really easy to pick up.

Aside: What's the difference between `text/event-stream` and `application/x-ndjson` content types?

Both are ways to send streams of data over HTTP, but have distinct formats and typical uses. text/event-stream is used for Server-Sent Events (SSE), where the server sends specially-formatted text messages with data: prefixes and double newlines—this is one of the default options for MCP servers, incidentally. It’s natively supported by browsers via the EventSource API.

Personally, I kinda like NDJSON (Newline-Delimited JSON) better: each line is a valid JSON object, and that’s it, no special prefixing. There might be some reason to do SSE instead, but I haven’t hit it yet!

The code to receive the stream on the frontend is very reasonable, too:

const [text, setText] = useState('')
const [tokens, setTokens] = useState<number | null>(null)
const [streaming, setStreaming] = useState(false)

const startStream = async () => {
  setText('')
  setTokens(null)
  setStreaming(true)

  const response = await fetch('http://localhost:3001/api/stream', {
    method: 'POST',
  })

  const reader = response.body?.getReader()
  const decoder = new TextDecoder()

  while (reader) {
    const { done, value } = await reader.read()
    if (done) break

    const chunk = decoder.decode(value)
    const lines = chunk.split('\n').filter(line => line.startsWith('data: '))

    for (const line of lines) {
      const data = JSON.parse(line.slice(6))

      if (data.type === 'text') {
        setText(prev => prev + data.content)
      } else if (data.type === 'usage') {
        setTokens(data.tokens)
      }
    }
  }

  setStreaming(false)
}

This is without any kind of helper functions or type-safety! A couple super quick wrappers make this even nicer:

import { z } from "zod";

// Event schemas for type safety
const TextEventSchema = z.object({
  type: z.literal("text"),
  content: z.string(),
});

const UsageEventSchema = z.object({
  type: z.literal("usage"),
  tokens: z.number(),
});

const StreamEventSchema = z.discriminatedUnion("type", [
  TextEventSchema,
  UsageEventSchema,
]);

// Helper functions for cleaner streaming
function withNDJSONHeaders(res: Response) {
  res.writeHead(200, {
    "Content-Type": "application/x-ndjson",
    "Cache-Control": "no-cache",
    Connection: "keep-alive",
  });
}

const writeEvent = (res: Response, data: z.infer<typeof StreamEventSchema>) => {
  res.write(JSON.stringify(data) + "\n");
};

const writeText = (res: Response, content: string) => {
  writeEvent(res, { type: "text", content });
};

const writeUsage = (res: Response, tokens: number) => {
  writeEvent(res, { type: "usage", tokens });
};

// Updated controller:
app.post("/api/stream", async (_req, res) => {

  // ...

  for await (const chunk of result.textStream) {
    res.write(`data: ${JSON.stringify({ type: "text", content: chunk })}\n\n`);
    writeText(res, chunk);
  }

  const usage = await result.usage;
  res.write(`data: ${JSON.stringify({ type: "usage", tokens: usage.totalTokens })}\n\n`);
  writeUsage(res, usage.totalTokens);
  res.end();
});

Now we’ve got type-safe events—and you can see how easy this would be to extend to a ton of different events, all being sent back from the same response.

And on the frontend:

import { StreamEventSchema } from "./schemas";

const parseStreamData = (line: string) => {
  if (!line.startsWith('data: ')) return null;

  try {
    const rawData = JSON.parse(line.slice(6));
    return StreamEventSchema.parse(rawData);
  } catch (error) {
    console.warn('Invalid stream data:', error);
    return null;
  }
};

// Updated streaming logic
const startStream = async () => {
  setText('')
  setTokens(null)
  setStreaming(true)

  const response = await fetch('http://localhost:3001/api/stream', {
    method: 'POST',
  })

  const reader = response.body?.getReader()
  const decoder = new TextDecoder()

  while (reader) {
    const { done, value } = await reader.read()
    if (done) break

    const chunk = decoder.decode(value)
    const lines = chunk.split('\n').filter(line => line.startsWith('data: '))

    for (const line of lines) {
      const data = JSON.parse(line.slice(6))
      if (data.type === 'text') {
        setText(prev => prev + data.content)
      } else if (data.type === 'usage') {
        setTokens(data.tokens)
      }
      const data = parseStreamData(line)
      switch (data.type) {
        case 'text':
          setText(prev => prev + data.content);
          break;
        case 'usage':
          setTokens(data.tokens);
          break;
        default:
          console.warn('Unknown event type:', data.type);
      }
    }

    setStreaming(false)
  }
}

So ends another tale of “code pattern I thought was hard/scary for no reason is actually simple and good.” You’d think I would learn at some point!