Title
AWS re:Invent 2022 - Advancing security analytics with Snowflake and AWS at Snyk (PRT329)
Summary
- Snyk, a company that helps secure applications, has developed a new customer-facing reporting product using Snowflake on AWS.
- The product addresses the need for better visibility, observability, and analytics for Snyk's customers, enabling data-informed security decisions.
- Five key challenges were identified: the need for better reporting and monitoring, reducing data lag, improving report loading speed, ensuring deployment in all environments where Snyk is deployed, and creating a flexible and exploratory analytics experience.
- Snyk's solution involved creating a unified data model across products, a low-latency data pipeline, responsive dashboards, deployment in multiple environments, and an exploratory analytics interface.
- The stack includes Snowflake for data warehousing, dbt for data transformations, and Topcode for visualizations.
- Data quality and correctness are ensured through testing with dbt, and performance is continuously optimized.
- The reporting infrastructure is deployed using infrastructure as code, allowing for deployment in various environments.
- The reporting product is designed to be flexible and exploratory, catering to different customer use cases and personas.
- Snyk chose Snowflake due to its ability to meet the requirements and its ecosystem that allows data professionals to participate in application development.
- Future plans include leveraging Snowflake's data sharing capabilities to allow customers to access and build upon Snyk's data in their own environments.
Insights
- Snyk's approach to data management emphasizes real-time data processing and the importance of data producers conforming to a common schema, which is a shift left in data akin to the shift left in security.
- The use of Kafka for streaming data into Snowflake and dbt for transformations highlights the modern data stack's capabilities in handling real-time analytics.
- Data quality is a critical aspect, and Snyk uses dbt to write tests against the data, ensuring that any issues are detected and corrected before reaching production.
- Performance optimization is an ongoing process, with a focus on clustering data in Snowflake to reduce the number of micropartitions scanned during queries.
- The infrastructure as code approach for deployment reflects a best practice in cloud computing, ensuring consistency and scalability across different environments.
- The design of the reporting product shows a strong emphasis on user experience and the need for data visualization to be both flexible and purpose-built for specific use cases.
- Snyk's decision to use Snowflake is strategic, not only for meeting their current requirements but also for the platform's extensibility and the potential for customers to build upon Snyk's data in the future.