Title
AWS re:Invent 2022 - How Starburst uses CockroachDB to power Starburst Galaxy (PRT099)
Summary
- Background on Starburst: Starburst is the company behind Trino (formerly Presto), a fast distributed SQL query engine for big data analytics. Trino, developed by the Starburst founding team at Facebook in 2013, is used for interactive analytics, BI tools, and ETL pipelines.
- Starburst Galaxy: Starburst Galaxy is Trino offered as a service, allowing users to query data where it lives without ingestion, create clusters, and choose deployment regions.
- Use of CockroachDB: Starburst Galaxy uses CockroachDB to store operational data, including cluster information, authentication, authorization, and metadata for user data. CockroachDB is chosen for its reliability, low latency, developer efficiency, and operational simplicity.
- CockroachDB Benefits:
- Ensures no downtime or data loss and maintains availability even if a region goes down.
- Provides standard SQL semantics, unique indexes, constraints, and serializable transactions for developer efficiency.
- Offers managed service, online schema changes, and transparent scaling for operational ease.
- CockroachDB Implementation:
- Stores operational data for Trino cluster management, user permissions, and Metastore for user data.
- Saves Trino query history, which is significantly larger than other data types.
- Utilizes global tables for geo-distributed data, allowing fast, low latency reads.
- Schema Migration:
- Starburst uses Flyway for schema migrations and has contributed to making it compatible with CockroachDB.
- Developers are educated on transaction interactions, particularly avoiding mixing DML and DDL operations.
Insights
- Reliability and Global Reach: Starburst's choice of CockroachDB underscores the importance of a database that can provide high availability and consistent performance across global regions, which is crucial for SaaS offerings like Starburst Galaxy.
- Developer Productivity: The emphasis on developer efficiency through familiar SQL semantics and robust transaction support indicates a trend towards databases that minimize the learning curve and reduce the potential for errors.
- Operational Efficiency: The preference for managed services and the ability to handle schema changes without downtime reflects a broader industry move towards outsourcing database management to focus on core product development.
- Schema Migration Strategy: The use of Flyway and the collaboration with the open-source community to improve its compatibility with CockroachDB demonstrates a commitment to leveraging and contributing to open-source tools for better product integration.
- Transactional Caveats: The discussion on transaction interactions and the need to educate developers on specific use cases (like mixing DML and DDL) highlights the complexities that can arise even with advanced databases and the importance of thorough testing and developer awareness.