At Scanner, we use Amazon Lambda functions and Rust in our log query engine. While Rust is technically supported in Lambda functions, it is not as easy to set up as the officially blessed languages: Node.js, Python, Ruby, Java, Go, C#, and PowerShell. In this post, we will share the information we wished someone had given us we got started using Rust in Lambda functions.
Why use Rust in Lambda functions?
Rust is a strong systems programming language that provides strong guarantees about memory safety without the overhead of an interpreter, language virtual machine, or garbage collector.
We use Rust to get every ounce of performance that we can out of our log search product, and we love the memory and thread safety guarantees granted by the compiler. We can write multi-threaded code running SIMD accelerated functions with a high degree of confidence that we won’t introduce memory safety bugs that could crash a C or C++ program, or worse: expose a vulnerability attackers could exploit.
Since it does not need to boot up an interpreter or language virtual machine, Rust’s cold-start time in Lambda functions is consistently the fastest of any language, usually less than 20 ms.
Getting started
In this post, we’ll do a quick walkthrough to get you started using Rust in a Lambda function. We’ll build a function to handle a simple use case: read 1 GB of logs from S3 stored in a json-lines
file compressed with zstd
, and decompress and parse its contents on-the-fly. If you need an example file to play with, you can use this one: https://scanner-dev-public.s3.us-west-2.amazonaws.com/json_logs_1GB.jl.zst. The file is comprised of fake logs generated using flog, and it is 178MB compressed.
Use cargo-lambda
We first started using Rust with Lambda functions in early 2022, and documentation was sparse or outdated. We relied on blog posts (much like this one) to help us navigate the space of configuration possibilities.
Since then, Lambda tooling for Rust has gotten much better. The official AWS Rust docs for getting started with Lambda functions don’t mention it, but we highly recommend that you use cargo-lambda to get started. Our early approach required plenty of trial and error to create a fairly complex Docker image to installmusl
tools for static compilation. Thankfully, cargo-lambda
handles all of this out-of-the-box, and you can create a static build with a single command. It just works.
To get started, we install cargo-lambda
. There are a few ways to install it, but here’s the easiest:
pip3 install cargo-lambda
Once cargo-lambda
is installed, you can create a new Rust project containing lambda boilerplate like this:
cargo lambda new rust-lambda-s3-test
You will be prompted for some details. Our function will be triggered manually by us for now, not by an HTTP request, so we answer no to the first prompt:
? Is this function an HTTP function? (y/N) N
[type `yes` if the Lambda function is triggered by an API Gateway, Amazon Load Balancer(ALB), or a Lambda URL]
For the next prompt, we are using our own custom request schema, not the request schema used by some other AWS service, so we press Enter to leave it blank:
? AWS Event type that this function receives
activemq::ActiveMqEvent
autoscaling::AutoScalingEvent
chime_bot::ChimeBotEvent
cloudwatch_events::CloudWatchEvent
cloudwatch_logs::CloudwatchLogsEvent
cloudwatch_logs::CloudwatchLogsLogEvent
v codebuild::CodeBuildEvent
[↑↓ to move, tab to auto-complete, enter to submit. Leave it blank if you don't want to use any event from the aws_lambda_events crate]
Rust code to read, decompress, and parse JSON from S3
A new Rust project will be created called rust-lambda-s3-test
, and its main.rs
file will contain some helpful boilerplate to get you started.
Under the hood, the AWS lambda_runtime
library simply fetches the next request via an HTTP GET to Lambda Runtime API, executes the function_handler
using that request as input, and then sends the response via HTTP POST to the Lambda Runtime API.
Your job is to implement the function_handler
.
Modify the Request
object to contain a bucket
and key
parameter. We will use these to read an object from S3.
#[derive(Deserialize)]
struct Request {
bucket: String,
key: String,
}
Next, we will install some Rust crates (i.e. Rust library packages) to read from S3, decompress with zstd
, and parse JSON. We also want an ergonomic way to deal with errors. Edit your Cargo.toml
file to add these crates with the appropriate features: anyhow
, async-compression
, rusoto_core
, rusoto_s3
, and serde_json:
[dependencies]
anyhow = "1"
async-compression = { version = "0.3", features = ["tokio", "zstd"] }
lambda_runtime = "0.7"
rusoto_core = { version = "0.48", default_features = false, features = ["rustls"] }
rusoto_s3 = { version = "0.48", default_features = false, features = ["rustls"] }
serde = "1.0.136"
serde_json = "1"
tokio = { version = "1", features = ["macros"] }
tracing = { version = "0.1", features = ["log"] }
tracing-subscriber = { version = "0.3", default-features = false, features = ["fmt"] }
Here is an explanation for what the crates are used for:
anyhow
– ergonomic error propagation with the?
syntax in Rust. Converts many types of errors into a single error.async-compression
with thetokio
andzstd
features enabled.tokio
is the library we are already using to execute our asynchronous code, and we want to runzstd
decompression in an asynchronous way since we are also fetching the object asynchronously from S3.
rusoto_core
andrusoto_s3
- The
rusoto
library family is the most widely used AWS client library family at the moment despite the fact that there is an official AWS Rust SDK. The official SDK is considered to be in “developer preview” and is not ready for production workloads yet, so we userusoto
. - We use the
rustls
feature so that all TLS operations in the HTTP clients are handled by therustls
library instead of the default OpenSSL. We do this because compiling a Rust Lambda function using OpenSSL is an experience riddled with obtuse compiler error messages, whereasrustls
just works. Also,rustls
has the strong memory safety guarantees of having been implemented in Rust, so it is probably more secure.
- The
serde_json
is used to serialize and deserialize JSON.
Add the necessary use
directives to the top of your main.rs
file to start using the libraries:
use rusoto_core::Region;
use rusoto_s3::{S3Client, S3};
use tokio::io::AsyncBufReadExt;
Update your function_handler
to look like this, but customize your S3Client
to read from the actual S3 region where your AWS account resides.
This code assumes that the S3 object is a zstd
compressed file containing JSON
objects separated by newlines.
async fn function_handler(event: LambdaEvent) -> Result {
let bucket = &event.payload.bucket;
let key = &event.payload.key;
let started_at = std::time::Instant::now();
// Customize with the region of your actual S3 bucket.
let client = S3Client::new(Region::UsWest2);
// Initiate a GetObject request to S3.
let output = client
.get_object(rusoto_s3::GetObjectRequest {
bucket: bucket.to_string(),
key: key.to_string(),
..Default::default()
})
.await?;
let Some(body) = output.body else {
return Err(anyhow::anyhow!("No body found in S3 response").into());
};
// Begin streaming the contents down, decompressing on the fly, and
// iterating over each chunk split by newlines.
let body = body.into_async_read();
let body = tokio::io::BufReader::new(body);
let decoder = async_compression::tokio::bufread::ZstdDecoder::new(body);
let reader = tokio::io::BufReader::new(decoder);
let mut lines = reader.lines();
// For each line we encounter while asynchronously streaming down the
// S3 data, parse the JSON object.
let mut num_log_events = 0;
while let Some(line) = lines.next_line().await? {
let _value: serde_json::Value = serde_json::from_str(&line)?;
num_log_events += 1;
if num_log_events % 1000 == 0 {
println!("num_log_events={}", num_log_events);
}
}
let msg = format!(
"elapsed={:?} num_log_events={}",
started_at.elapsed(),
num_log_events
);
let resp = Response {
req_id: event.context.request_id,
msg,
};
Ok(resp)
}
In the function_handler
, we initiate an S3 GetObject
request, stream the contents asynchronously, and decompress and parse the JSON on the fly.
Deploy
Now we can deploy the lambda in a remarkably easy way using cargo-lambda
.
We compile a new release build like this:
cargo lambda build --release --arm64
Note that we use the --arm64
architecture flag. The alternative is --x86-64
, but we use arm64 so that our Lambda function uses the cheaper (and often faster) Graviton CPUs in AWS.
To deploy the compiled program to your Lambda function, you will need your local AWS credentials set up either in your local environment variables or in ~/.aws/credentials
.
Here is an example with an AWS profile called dev
set up in the local credentials file.
/.aws/credentials
:
[dev]
aws_access_key_id =
aws_secret_access_key =
region =
We deploy using the command below, and it uses a few important arguments.
cargo lambda deploy
--profile dev
--region us-west-2
--timeout 45
rust-lambda-s3-test
- We use the AWS profile named
dev
to deploy, which will use the matching profile from the local AWS credentials file. - We choose the region where our AWS account is situated.
- We use a timeout of 45 seconds. This will allow our function plenty of time to read and parse 1 GB of logs from S3. As we will see shortly, with proper memory tuning, our function will only need 1.8 seconds to complete the task. Note: the default Lambda function timeout is 3 seconds.
You can also add an IAM role to associate with the lambda function, which is important if your S3 bucket needs special access permissions. See cargo lambda deploy --help
for a full list of supported parameters.
If everything went well, you should now have a Lambda function named rust-lambda-s3-test
in your AWS account.
To invoke the function, you can use the aws-cli
, or you can run in the AWS Console in your browser, which allows you to see more details about the execution. We’ll use the AWS console to view these details.
In your browser, log in to the AWS console, then visit Lambda > Functions > rust-lambda-s3-test
.
Open up the Test
tab, create a new event, and type this:
{
"bucket": "",
"key": ""
}
When you click the Test
button, your Lambda function will be executed with the given event as its input. If all goes well, you should see a widget saying something like Execution result: succeeded
.
Click Details
to view more information about the execution.
You should see the last several lines of logs emitted by your function and some other interesting pieces of information.
Here are some notable details:
- The lambda function took 43.79 ms to start up from a cold-start. In our experience, Rust has one of the fastest lambda cold-start times of any language, typically 20-30ms.
- It used 25MB of memory out of the 128MB allocated.
- In our example case, we read, decompressed, and parsed 1 GB of JSON data in roughly 32.7 seconds.
- We were billed for the combined init duration (cold startup time) and the function’s duration.
- Note: In our testing, we used 1 GB of real production logs from the backend. If you are using the example JSON logs file mentioned at beginning of the post, your performance results may vary.
Increasing memory allocation will increase performance
Even though we only used 25MB out of the 128MB of memory allocated, we will increase our performance by an order of magnitude if we increase the amount of memory we allocate.
Let’s deploy again but allocate 2GB of memory instead of the default 128MB.
cargo lambda deploy
--profile dev
--region us-west-2
--timeout 900
--memory 2048
rust-lambda-s3-test
If we run the test again, we’ll see a massive performance improvement.
The duration was only 1.8 seconds – an 18x performance speed up.
Here is an important fact to remember. If you increase the memory capacity of a Lambda function, you will increase its CPU and network capacity as well. Specifically, different memory allocation values will cause AWS to run your Lambda function with different EC2 instance types, which have different CPU and network performance characteristics. This aspect of Lambda function performance tuning is unfortunately quite opaque, and the official documentation provides few details. We plan to share more useful tips for Lambda function performance tuning in future blog posts, including information about how many vCPUs you can access and which EC2 instances types are used at various memory allocation levels.
Here is what the performance of the Lambda function looks like using different memory allocation amounts:
We see the best performance when we allocate 2GB of memory. Increasing memory allocation beyond this value does not improve performance.
Billing notes
In some ways, it is unfortunate that we need to allocate so much memory to our lambda function to get optimal performance. We only used 25MB of memory during execution, so allocating 2GB seems wasteful, particularly considering that AWS bills you using the formula (memory allocated) * (billed duration)
.
You will need to assess the performance and cost tradeoffs to decide what memory allocation you should use.
There is some consolation knowing that targeting the arm64
architecture instead of x86-64
gives us an immediate 30% discount in our Lambda function. We recommend doing this if you can. For the Rust program above, using arm64
also leads to a 30% performance improvement over x86-64
, but results may vary depending on your program.
Future posts: Comparing Rust with other languages in Amazon Lambda functions
While there are good reasons to use Rust in Lambda functions, there are good reasons to use other languages as well.
In future posts, we will explain how to get started with Lambda functions with other languages:
- Python
- Go
- Java
In a final post, we will show performance data comparing Rust, Python, Go, and Java when used in Lambda functions.
Amazon is continuing to add significant improvements to Lambda functions, like the new SnapStart feature which largely solves the problem of long JVM cold-start times. Developers are increasingly adopting serverless tools, including teams at large enterprises. We are excited to leverage new Lambda features in our products at Scanner and provide incredibly fast and powerful observability and security tools for large log data sets.
Try out Scanner for fast log search
If searching through logs at high speeds sounds useful to you, feel free to join our beta and try out Scanner. You can sign up and try it out here.
Upcoming Talk @ DeveloperWeek 2023
Join Scanner at DeveloperWeek on February 16, 2023 to hear more about comparing Rust with Go, Java, and Python in AWS Lambda functions!