We’re excited to announce the release of a few new features our customers have been asking for.
Even faster querying
Queries are now powered by a new monoid data structure server we built in Rust. The monoid server is about 2x faster than Redis for our specific use case, and we’ll share more on that in a future blog post. We will likely cover interesting lessons learned while tackling some hefty systems engineering problems, including why it is important to experiment with alternative memory allocators like jemalloc
.
Performance on a 25TB data set
Needle-in-haystack: Searching for a UUID takes ~3 seconds.
Worst case query: Wildcard *
query that matches all 25TB of logs takes a little less than one minute.

Compare with worst case query performance in AWS CloudWatch, which would take 300 minutes to scan 25TB, a factor of 300x slower than Scanner.

Basic aggregations
Scanner now supports these basic query aggregations:
countdistinct
: Return the number of distinct values there are for a field. Uses a hyper-log-log-plus sketch to give a low-error estimate for high cardinality data sets.groupbycount
: Group results by a key and display the number of results for each key. Also uses a hyper-log-log-plus sketch to give a low-error estimate for high cardinality data sets.max
: Return the maximum value of a field across all hits.count
: Return the number hits.

Saved queries
Users can now save queries and share with their team. This should reduce the need to tab-switch between internal wiki pages and the search tool, and it should also help new team members become productive faster.

You can learn more about our Query Syntax in our documentation here.
Coming soon
Over the next few months we’ll be rolling out the following new features:
Advanced aggregations
The new monoid data structure server will allow us to support more advanced aggregations in the coming months, including the ability to build more sophisticated result tables with many columns. This will make it easier to create advanced detection rules.
Real-time detection rules engine
Our new monoid data structures allow us to build a detection rules engine that runs on logs as they are indexed in real-time. Alerts will be sent to custom webhooks, Slack, or PagerDuty.
Querying across multiple AWS accounts
We’ve been getting requests to support querying logs from multiple AWS accounts from one Scanner instance. Logs and index files will be stored locally in the S3 buckets in each account, and the instance will read from them all and aggregate search results.
We would love to hear about your security data lake use cases
If you have security data lake use cases you would like to see a tool support, please reach out. We think Scanner is on its way to being the most useful security data lake tool in the world, and we want to hear from you so we can improve it even more.
Share this article
Scanner is a security data lake tool designed for rapid search and detection over petabyte-scale log data sets in AWS S3. With highly efficient serverless compute, it is 10x cheaper than tools like Splunk and Datadog for large log volumes and 100x faster than tools like AWS Athena.