Title

AWS re:Invent 2023 - Analyze Amazon Aurora PostgreSQL data in Amazon Redshift with zero-ETL (DAT343)

Summary

AWS introduced a new capability for zero-ETL integration between Amazon Aurora PostgreSQL and Amazon Redshift.
Operational analytics is becoming increasingly important for real-time data analysis to drive business decisions.
AWS aims to provide purpose-built databases for transactions (Amazon Aurora) and analytics (Amazon Redshift).
Zero-ETL integration allows for near real-time analytics on transactional data without the need for complex data pipelines.
The integration is based on change data capture (CDC) and supports DML and metadata operations.
Zero-ETL integration is available in preview in the US East (Ohio) region starting with Aurora PostgreSQL version 15.4.
The integration process is simple and can be set up in minutes, with AWS managing the underlying infrastructure.
Redshift offers a range of analytics capabilities, including data sharing, machine learning, and querying across various data sources.
The session included demonstrations of setting up the integration, managing it, and performing analytics on the data once in Redshift.

Zero-ETL integration simplifies the process of moving data from operational databases to analytical stores, potentially saving months of development time.
The integration is storage-level replication, which means it does not impact the performance of the production Aurora PostgreSQL clusters.
AWS's approach to zero-ETL leverages the separation of compute and storage in Aurora and Redshift, offloading as much processing as possible to the storage layer.
The integration supports a wide range of Postgres operations, including DDL changes, which are traditionally challenging with logical replication.
AWS is actively seeking feedback on the zero-ETL integration during its preview phase to improve and expand its capabilities before general availability.
The zero-ETL integration aligns with AWS's vision of enabling customers to focus on deriving value from their data rather than managing data pipelines.