Faster Querying, Basic Aggregations, and Saved Queries

Scanner Saved Query

We’re excited to announce the release of a few new features our customers have been asking for.

Even faster querying

Queries are now powered by a new monoid data structure server we built in Rust. The monoid server is about 2x faster than Redis for our specific use case, and we’ll share more on that in a future blog post. We will likely cover interesting lessons learned while tackling some hefty systems engineering problems, including why it is important to experiment with alternative memory allocators like jemalloc.

 

Performance on a 25TB data set

Needle-in-haystack: Searching for a UUID takes ~3 seconds.

Worst case query: Wildcard * query that matches all 25TB of logs takes a little less than one minute.

25TB Query
Scanner worst-case query performance: Less than 1 minute to scan 25TB

Compare with worst case query performance in AWS CloudWatch, which would take 300 minutes to scan 25TB, a factor of 300x slower than Scanner.

AWS CloudWatch worst-case query: 300+ minutes to scan 25TB
Basic aggregations

Scanner now supports these basic query aggregations:

  • countdistinct: Return the number of distinct values there are for a field. Uses a hyper-log-log-plus sketch to give a low-error estimate for high cardinality data sets.
  • groupbycount: Group results by a key and display the number of results for each key. Also uses a hyper-log-log-plus sketch to give a low-error estimate for high cardinality data sets.
  • max: Return the maximum value of a field across all hits.
  • count: Return the number hits.
Basic aggregation- `groupbycount`
Saved queries

Users can now save queries and share with their team. This should reduce the need to tab-switch between internal wiki pages and the search tool, and it should also help new team members become productive faster.

Scanner Saved Query

 

You can learn more about our Query Syntax in our documentation here.

Coming soon

Over the next few months we’ll be rolling out the following new features:

Advanced aggregations
The new monoid data structure server will allow us to support more advanced aggregations in the coming months, including the ability to build more sophisticated result tables with many columns. This will make it easier to create advanced detection rules.
Real-time detection rules engine
Our new monoid data structures allow us to build a detection rules engine that runs on logs as they are indexed in real-time. Alerts will be sent to custom webhooks, Slack, or PagerDuty.
Querying across multiple AWS accounts
We’ve been getting requests to support querying logs from multiple AWS accounts from one Scanner instance. Logs and index files will be stored locally in the S3 buckets in each account, and the instance will read from them all and aggregate search results.
We would love to hear about your security data lake use cases
If you have security data lake use cases you would like to see a tool support, please reach out. We think Scanner is on its way to being the most useful security data lake tool in the world, and we want to hear from you so we can improve it even more.
                 Share this article

Scanner is a security data lake tool designed for rapid search and detection over petabyte-scale log data sets in AWS S3. With highly efficient serverless compute, it is 10x cheaper than tools like Splunk and Datadog for large log volumes and 100x faster than tools like AWS Athena.