A few days ago, Okta announced a breach of their support systems, which may have caused cookies and session tokens to be compromised. To help their users detect activity from attackers, they published a list of threat indicators, specifically IP addresses and user agents, that may be connected with activity from these attackers.
Many teams use AWS S3 to store their security logs and query it using data lake search tools like AWS Athena, but if their log volume is high, they might run into trouble with slow queries. During breaches like this, speed is of the essence, which is the primary reason why we built Scanner to provide fast search for logs in AWS S3.
In this post, we’ll show how to use Athena to find the threat indicators from the Okta breach, and then we’ll show the same with Scanner and highlight the performance differences.
Find the threat indicators using AWS Athena
If you are storing Okta System logs in S3, you can use AWS Glue to create tables in AWS Athena which you can query using SQL. Here is an example of what the schema for the Okta Systems log table might look like:
CREATE EXTERNAL TABLE okta_system_logs ( id STRING, eventType STRING, ..., client STRUCT< ipAddress: STRING, userAgent: STRUCT< rawUserAgent: STRING, os: STRING, browser: STRING >, ... >, ... ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1' ) LOCATION 's3://your-bucket-name/path-to-okta-system-logs/' TBLPROPERTIES ('has_encrypted_data'='false');
Here is a way you can query your Okta System logs using Athena to find log events matching the published threat indicators:
SELECT * FROM okta_system_logs WHERE client.ipAddress IN ( '126.96.36.199', '188.8.131.52', '184.108.40.206', '220.127.116.11', '18.104.22.168', '22.214.171.124', '126.96.36.199', '188.8.131.52', '184.108.40.206', '220.127.116.11', '18.104.22.168', '22.214.171.124', '126.96.36.199', '188.8.131.52', '184.108.40.206', '220.127.116.11', '18.104.22.168', '22.214.171.124', '126.96.36.199', '188.8.131.52', '184.108.40.206', '220.127.116.11', '18.104.22.168' ) OR client.userAgent.rawUserAgent IN ( 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.7113.93 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36' );
Athena downside: Slow queries
Unfortunately, this could be a very slow operation. For example, we ran this Athena query against a relatively small data set of only 1TB of uncompressed JSON log data in S3, and it took 1 minute 35 seconds to run this query.
And if you want to look across all log sources for the presence of these IP addresses, you’ll have to query each table either individually or using a large SQL query with many UNION ALL statements. If you have 100+ TB of log data, this could take hundreds of minutes – i.e. hours – to run.
Find the threat indicators using Scanner
Instead of using AWS Glue to set up a table for your Okta System logs in S3, Scanner automatically parses the log files in their original JSON format and adds the field %ingest.source_type: “okta:system” to each of the Okta log events in its index.
Here is how to query in Scanner for these threat indicators in your Okta System logs:
%ingest.source_type: "okta:system" and ( ( 22.214.171.124 or 126.96.36.199 or 188.8.131.52 or 184.108.40.206 220.127.116.11 or 18.104.22.168 or 22.214.171.124 or 126.96.36.199 188.8.131.52 or 184.108.40.206 or 220.127.116.11 or 18.104.22.168 22.214.171.124 or 126.96.36.199 or 188.8.131.52 or 184.108.40.206 220.127.116.11 or 18.104.22.168 or 22.214.171.124 or 126.96.36.199 188.8.131.52 or 184.108.40.206 or 220.127.116.11 ) or ( "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.7113.93 Safari/537.36" or "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36" ) )
Scanner was built to make it easy to perform fast search across all of your log sources simultaneously. Hence, if you want to look for the presence of the IP address threat indicators across all of your log sources, you can simply run this query instead.
18.104.22.168 or 22.214.171.124 or 126.96.36.199 or 188.8.131.52 184.108.40.206 or 220.127.116.11 or 18.104.22.168 or 22.214.171.124 126.96.36.199 or 188.8.131.52 or 184.108.40.206 or 220.127.116.11 18.104.22.168 or 22.214.171.124 or 126.96.36.199 or 188.8.131.52 184.108.40.206 or 220.127.116.11 or 18.104.22.168 or 22.214.171.124 126.96.36.199 or 188.8.131.52 or 184.108.40.206
Scanner upside: Fast queries
Here is an example video where we run the IP address query across all log sources in a data set containing 25 TB of JSON. This would likely take 25 minutes in Athena – it takes only a few seconds in Scanner.
The reason Scanner can provide such fast search performance is because it analyzes log files in S3 buckets and creates small, compact index files (storing them in S3 as well). At query time, a large number of Lambda functions spin up and traverse the index files rapidly, narrowing down the search space to the regions of logs that contain hits.
When you search for needles in your haystack (like the IP address threat indicators), Scanner quickly narrows down the search space in a few seconds, even if the data set contains dozens or hundreds of terabytes.
S3 logs need to be searchable – fast
We built Scanner so that teams would no longer need to accept the status quo of slow search performance through their logs in S3. The volume of log data that security teams need to manage is rapidly increasing, and it only makes sense that the most scalable storage medium – cloud object storage – should be optimized as much as possible. Searching for threat indicators, like those from the Okta breach, should take seconds, not minutes or hours.
Share this article
Scanner is a security data lake platform that supercharges security investigations with fast search and detections for petabyte-scale log data sets in AWS S3. It’s 100x faster than Athena and 10x cheaper than traditional tools like Splunk and DataDog.
Scanner can be deployed into your own AWS account or into an AWS account managed by Scanner with read-only permissions to the logs in your S3 buckets. This zero-cost data transfer gives users complete and full control over their data with no vendor lock-in and avoids log shipping over the public internet.
Cliff is the CEO and co-founder of Scanner.dev, a security data lake product built for scale, speed, and cost efficiency. Prior to founding Scanner, he was a Principal Engineer at Cisco where he led the backend infrastructure team for the Webex People Graph. He was also the engineering lead for the data platform team at Accompany before its acquisition by Cisco. He has a love-hate relationship with Rust, but it’s mostly love these days.