Title
AWS re:Invent 2022 - Goldman Sachs: Accelerating time to value in data analytics (FSI201)
Summary
- Gerard Cowburn, Senior Solutions Architect at AWS, introduces the session on accelerating time to value in data analytics.
- Ram Rajamoni, VP and Tech Fellow in Data Engineering, and Francesco Pontrandolfo, Product Manager for Goldman Sachs Financial Cloud, join the session.
- The session covers how Goldman Sachs blends diverse hybrid data sources to inform investment hypotheses and deliver insights quickly.
- An overview of Goldman Sachs and its data-driven investment process is provided.
- The architecture of Goldman Sachs Financial Cloud is discussed, focusing on high-speed, low-latency, real-time analytics for financial market data.
- Key data sourcing integrations and the use of the open-source Legend platform for data modeling and wrangling are examined.
- The session concludes with insights and lessons learned to help others accelerate their data analytics journeys.
Insights
- Goldman Sachs heavily invests in technology, with a significant portion of its workforce being engineers.
- Time and speed are critical factors for competitiveness in the financial industry.
- The Goldman Sachs Financial Cloud is a modular set of services addressing data curation, management, and analytics.
- Data curation includes access to GS proprietary data and third-party data sets with an added curation layer.
- Compute instances can be spun up on the GS Financial Cloud, allowing for the ingestion and enrichment of data with a consistent data model.
- Data analysis tools include REST endpoints, a Python SDK called GSQuant, and a data visualization tool called Portal Pro.
- The architecture of GS Financial Cloud on AWS includes ECS, DynamoDB, ElastiCache, OpenSearch, and a custom time series database optimized for AWS.
- Real-time market data is streamed into the platform using a bespoke API and a solution called Electron.
- Challenges in financial data include its evolving nature, the breadth and structure of data sources, and the need for real-time data streaming.
- Cloud scalability, managed serverless infrastructure, and early engagement in risk-managed data transfer are key enablers for overcoming these challenges.