Wait, you can invoke Lambda from a SQL query in RDS PostgreSQL?
Introduction
A while ago, when I was exploring various CDC options for our RDS PostgreSQL database, I came across a very surprising capability. Turns out, you can invoke an AWS Lambda function directly from a SQL query by taking advntage of aws_lambda extension, available to us out of the box. In the end, we didn't decide to use it in production as there are some other, better (imo) options, but it's still a really useful tool to rememeber about and can still be quite handy for some automation work.
Prerequisites
Before you can call Lambda from your database, you need to take care of:
- Network access from RDS to Lambda. Your RDS instance needs a path to the Lambda service endpoint. You can achieve this by either making the RDS instance publicly accessible, putting it in a private subnet with NAT access, or by setting up a VPC endpoint for Lambda and enabling
rds.custom_dns_resolution=1in the RDS parameter group. - An IAM role on the RDS instance. The role needs to allow
lambda:InvokeFunctionon the specific Lambda function (or set of functions) you want to call. The role is attached to the RDS instance the same way you'd attach a role for S3 import/export. - The
aws_lambdaextension installed. Once the role is attached and networking is sorted, you can runCREATE EXTENSION aws_lambda;(along withCREATE EXTENSION aws_commons;, which is its dependency) inside the database where you want to call Lambda from.
If any one of these is missing, you'll get a confusing error message and lose half an hour to debugging, don't ask me how I know that.
The function signature
Once the extension is installed, you get a new schema called aws_lambda with an invoke function:
aws_lambda.invoke(
function_name TEXT | aws_commons._lambda_function_arn_1,
payload JSONB,
region TEXT DEFAULT NULL,
invocation_type TEXT DEFAULT 'RequestResponse',
log_type TEXT DEFAULT 'None',
context JSONB DEFAULT NULL,
qualifier TEXT DEFAULT NULL
) RETURNS (status_code INT, payload JSONB, executed_version TEXT, log_result TEXT)The invocation_type parameter is the most interesting one. With the default 'RequestResponse', the SQL query waits for the Lambda to finish and returns the response payload back to you. With 'Event', the call is a fire-and-forget, so the database doesn't wait for a response, which is what you want for things like notifications or background processing where you don't care about the return value.
The function returns a row with the HTTP-style status code, the JSON payload from the Lambda, the version that was executed, and an optional log result. So you can do things like check status_code = 200 directly in your SQL.
A concrete example
Here's a real example - a query that aggregates meter readings from a table and sends them to a Lambda function for processing:
SELECT * FROM aws_lambda.invoke(
aws_commons.create_lambda_function_arn(
'arn:aws:lambda:us-east-1:600238737408:function:rds-invoke',
'us-east-1'
),
(
SELECT jsonb_build_object(
'meter_readings',
jsonb_agg(row_to_json(mr) ORDER BY mr.reading_timestamp DESC)
)
FROM (
SELECT id, reading_timestamp, value
FROM meter_readings AS mr
) AS mr
)
);The aws_commons.create_lambda_function_arn helper builds a properly-typed reference to your function (you can also pass a regular TEXT ARN as the first argument to invoke if you prefer). The second argument is the JSONB payload, which in this case is constructed by aggregating rows from meter_readings into a JSON object.
The Lambda receives this payload exactly as it would if you'd called Invoke from the AWS SDK, and it can return any JSON it wants. That JSON shows up in the payload column of the result.
Where it works well
If your use case fits, this is a pretty clean pattern:
- The setup is simple once the prerequisites are in place. No intermediate services, no glue Lambda just to forward data, no Kafka topic to manage.
- Both sync and async invocations are supported via the
invocation_typeparameter. Sync is great for "transform this data and write the result back" workflows. Async is great for fire-and-forget notifications. - It's a regular SQL function, which means you can call it from triggers, stored procedures, scheduled jobs (e.g. with
pg_cron), or just ad-hoc from a SQL client. Wherever you can run SQL, you can invoke a Lambda.
Where it falls down
There are real limits, and they're the reason we ended up choosing a different approach for our specific use case:
- Concurrent Lambda invocations are a real concern. If your SQL is firing many invocations per second (e.g. a trigger on a high-write table), you can blow through your account's concurrent Lambda limit very quickly, and the failures will cascade back into your SQL transactions.
- High-throughput scenarios produce a lot of invocations. Every row-level call is a Lambda invocation, billed individually, and that adds up fast in busy systems. Batching helps but it's pretty tricky to properly express it in the existing SQL queries (if they're not really batched)
- You can't target other AWS services directly. The
aws_lambdaextension only invokes Lambda - if you want to push events to MSK, Kinesis, or EventBridge from inside the database, you have to go through a Lambda function as an intermediary. That's an extra component in the path you may not want.
For our CDC use case, where we needed to stream every change to multiple downstream consumers reliably, those limits made it the wrong choice and we ended up moving to MSK Connect + Debezium. But for lower-volume scenarios e.g. "send a notification when this row gets inserted", "transform this data via Lambda and write the result back into the same query", the aws_lambda extension can be a really effective and simple choice vs overengineering it with Step Functions or a full blown CDC pipeline.
Summary
While we didn't end up using it in prod, I still think that aws_lambda extension is a feature that can really shine in the right scenario and its really good to have it in your toolbelt, just remember to be aware of it's limitations. Thanks for reading!