Title
AWS re:Invent 2022 - Better, faster, and lower-cost storage: Optimizing Amazon S3 (STG202)
Summary
- Christoph Bartenstein, director of Amazon S3 Intelligent Storage, and Andrew Cutze, a product manager, discuss optimizing cost and performance in Amazon S3.
- Zane Reynolds from Torque Robotics shares how they optimize storage at petabyte scale.
- The session covers different S3 storage classes, features like S3 Storage Lens, lifecycle policies, and customer case studies.
- S3 has evolved from backup and disaster recovery to supporting a wide range of use cases like data lakes and media archives.
- IDC estimates over 100 zettabytes of data created or replicated in the current year.
- Pillars of cost optimization include understanding use case requirements, developing insights into storage, and optimizing and measuring actions.
- S3 Storage Lens provides organization-wide visibility into storage usage and has recently added 34 new metrics.
- Intelligent Tiering automatically optimizes storage costs by moving data between access tiers based on usage patterns.
- Performance optimization involves optimizing request rates, throughput, and monitoring performance with tools like CloudWatch and S3 Storage Lens.
- Torque Robotics uses S3 for their data lake, leveraging intelligent tiering, prefix design, and Athena for cost-effective and efficient data management.
Insights
- S3 Intelligent Tiering is a significant cost-saving tool for customers with unpredictable or changing data access patterns, as it automatically moves data between tiers.
- S3 Storage Lens is a powerful tool for gaining insights into storage usage and can help identify cost-saving opportunities such as cleaning up incomplete multi-part uploads.
- Choosing the right S3 storage class based on data access patterns and retrieval needs is crucial for cost optimization.
- The use of lifecycle policies can automate the process of moving data to more cost-effective storage classes based on predefined conditions.
- Optimizing key namespace and leveraging multiple prefixes can help avoid throttling and improve request rate performance.
- Using multi-part uploads and range gets can optimize throughput performance for large data transfers.
- Server-side encryption does not impact Athena query performance, ensuring data security without sacrificing efficiency.
- Real-world examples like Torque Robotics illustrate the practical application of S3 features for managing large-scale data lakes effectively.