Streaming Lambda responses with API Gateway and Serverless Framework

Streaming support for Lambda

Lambda response streaming has been available since 2023, but the main way of supporting it was with Lambda Function URLs which aren't perfect, especially if everything you've been building so far is quite likely centered around API Gateway (validators, custom authorizers, etc).

Fortunately, that changed recently and now API Gateway REST APIs support response streaming natively, and with v4.25.0 release, we can configure them with Serverless Framework.

A quick refresher on response streaming

If you haven't used response streaming before, the idea is that instead of returning a single value at the end of the handler, we write to a responseStream object as we go. Our Lambda pushes those bytes to the caller as they're written, with no waiting for the handler to finish:

exports.handler = awslambda.streamifyResponse(
  async (event, responseStream, context) => {
    responseStream.setContentType("text/plain");
    responseStream.write("first chunk\n");
    await new Promise(resolve => setTimeout(resolve, 1000));
    responseStream.write("second chunk\n");
    responseStream.end();
  }
);

The streamifyResponse wrapper comes from the Lambda Node.js runtime, so we don't need to install anything extra. We can stream up to 20 MB of payload and the function timeout still stays at 15 minutes. The handler that we need to write is the same as for function URLs.

Enabling it in Serverless Framework

In Serverless Framework v4.25.0 and later, we can very easily turn our http event into streaming one via the response.transferMode setting:

functions:
  streamer:
    handler: src/handler.handler
    events:
      - http:
          path: stream
          method: get
          response:
            transferMode: STREAM

A few things to keep in mind when wiring this up:

It only applies to proxy integrations (AWS_PROXY / HTTP_PROXY).
It can't be combined with http.async, since async invocations cannot stream a response back.
The Lambda function still has to use the streaming API on its end (so awslambda.streamifyResponse or the equivalent in our runtime) as in our example above.

A few things worth knowing

A couple of practical gotchas if this is the first time you're working with streaming responses:

Status code is set after the first write. Once any byte has been sent to the client, the HTTP status code is set and we can't change it anymore. If our handler runs fine for the first 30 seconds and then throws, the response has already been delivered as 200 OK, and we can only signal the failure inside the body.
Partial responses look like success. A truncated stream is hard to differentiate from a correct one if the client doesn't have its own handling for that. For SSE that's usually okay, but for "generate a 15 MB CSV" cases, we probably want a checksum or something similar at the end so the client knows it got everything from our stream.

There are more and more use cases where streaming makes a ton of sense, like LLM completions, streaming of trading quotes as they appear via Server-Sent Events, or generating large files. It's great that we can still keep our trusty API Gateway setup and just extend it to support streams instead of using fURLs. Thanks for reading!