Finra Cat Overcoming Challenges When Big Data Becomes Massive Fsi316

Title

AWS re:Invent 2023 - FINRA CAT: Overcoming challenges when big data becomes massive (FSI316)

Summary

  • Leah Crawford introduced the session, highlighting the importance of FINRA's Consolidated Audit Trail (CAT) for market integrity and investor protection.
  • CAT is a single source of truth for all exchange-listed equities and options trading activity in the U.S. markets.
  • Scott Donaldson, CTO of FINRA CAT, and Steven Diamond, Senior Director at FINRA, shared their experiences with CAT's evolution.
  • CAT manages nearly 700 petabytes of data, expected to reach an exabyte scale within three years.
  • AWS services like Amazon EMR, EC2 with Graviton compute types, S3, and Lambda have been instrumental in managing CAT's scale and complexity.
  • Scott detailed the history of CAT, its architecture, and the challenges faced during the pandemic when market volumes surged.
  • Steven discussed specific optimizations and keys to success, including a move to Graviton2 instances, which led to significant cost savings and performance improvements.
  • The improvements in CAT's system have enabled regulators to conduct more thorough examinations and improve market regulation, with examples of criminal cases where CAT data was pivotal.
  • Future plans include exploring EKS, EMR on EKS, Graviton3, and Graviton4, and optimizing data management with S3 intelligent tiering.

Insights

  • The transition to AWS Graviton2 instances and NVMe disks resulted in a 70% reduction in total compute time and $10 million in annual savings.
  • The use of S3 Intelligent Tiering and Archive Instant Access has led to a 65% reduction in storage costs.
  • The scalability and dynamic provisioning of AWS services have been critical in handling unpredictable market volumes and maintaining service level agreements (SLAs).
  • Continuous optimization of code, staying current with software upgrades, and leveraging cost-saving strategies like compute savings plans and on-demand capacity reservations are key to managing costs and performance.
  • The CAT system's robust data set is not only crucial for regulatory purposes but also for informing economic and regulatory policy decisions.
  • Open communication with AWS, such as opening tickets for issues, is essential for resolving problems and improving service stability.
  • The CAT team is continuously looking for new AWS features and technologies to further improve their system's scalability and cost-efficiency.