Title
AWS re:Invent 2022 - What’s new in Amazon Athena (ANT208)
Summary
- Scott Rigney, a product manager on the Athena team, introduced new features and updates for Amazon Athena.
- Athena is a serverless, interactive query service that makes it easy to analyze data in data lakes.
- Athena is built on open standards like Presto and supports multiple data formats.
- Athena's cost-effectiveness is highlighted, with savings up to 90% using compression, partitioning, and columnar formats.
- Common use cases for Athena include interactive analytics, business intelligence, data workflows, and machine learning.
- Athena Engine version 3 was announced, offering faster queries, more features from open source, and incorporation of Trino.
- New features include 50 new functions, 90 enhancements to existing functions, and pre-result caching for faster query execution.
- Ofer Eliasov from Mobileye presented their use case of Athena for autonomous vehicle mapping technology.
- Mobileye's Road Experience Management (REM) uses crowd-sourced data to create high-definition maps for autonomous driving.
- Athena's federated query feature allows querying data from various sources without data movement.
- Athena's support for Apache Iceberg and AWS Lake Formation was discussed, enhancing data lake management and security.
- Amazon AppFlow's integration with AWS Glue Data Catalog was announced, facilitating data ingestion from SaaS sources to S3 for querying with Athena.
Insights
- Athena's serverless nature and pay-per-query model provide a cost-effective solution for data analysis, especially for organizations with variable query loads.
- The introduction of Athena Engine version 3 with Trino integration signifies AWS's commitment to staying current with open-source advancements and providing customers with more powerful and efficient query capabilities.
- The use case of Mobileye demonstrates Athena's capability to handle large-scale, complex data workloads and its applicability in cutting-edge fields like autonomous driving.
- Pre-result caching is a significant feature that can improve query performance and reduce costs by avoiding redundant data scans for repeated queries.
- Athena's federated query capability is a strategic feature that addresses the common challenge of data silos, enabling a more unified view of an organization's data landscape.
- The support for Apache Iceberg and integration with AWS Lake Formation indicates a move towards more robust data governance and management within data lakes.
- Amazon AppFlow's new features show AWS's efforts to streamline data integration from various SaaS platforms, enhancing the ease of bringing external data into the AWS ecosystem for analysis with Athena.