New Accelerate Workloads Using Parallelism Wstep Functions Lambda Api205

Title

AWS re:Invent 2022 - [NEW] Accelerate workloads using parallelism w/Step Functions & Lambda (API205)

Summary

  • Adam Wagner, a Principal Solutions Architect at AWS, along with Justin Callison, General Manager of Step Functions, and Brian Zambrano, Specialist Solutions Architect, presented on accelerating workloads using parallelism with AWS Step Functions and Lambda.
  • The session covered the evolution of data processing, an overview of Step Functions, parallel data processing, and the new distributed map feature.
  • Step Functions is a serverless workflow service that integrates with over 220 AWS services and supports various use cases across industries.
  • The workflows in Step Functions are called state machines, with each step being a state and transitions between states being state transitions.
  • Step Functions supports optimized and SDK integrations with AWS services, including request-response, wait for callback, and sync integration patterns.
  • There are two types of Step Functions workflows: standard workflows for long-running processes and express workflows for high-volume, short-duration workflows.
  • The session introduced the Distributed Map feature, which allows for high concurrency (up to 10,000) and is ideal for processing large-scale workloads and iterating over objects in an S3 bucket.
  • Brian Zambrano demonstrated the Distributed Map feature using a NOAA global dataset to find the highest average temperature by month.
  • Justin Callison provided a deeper dive into the Distributed Map functionality, discussing input sourcing, concurrency management, batching, failure handling, and results management.
  • The session concluded with resources for getting started with Step Functions and other serverless services.

Insights

  • The Distributed Map feature in Step Functions addresses limitations of the inline map state by allowing child workflow executions to run independently, thus not contributing to the 25K history limit of the parent workflow.
  • The feature simplifies the processing of large datasets by automatically listing objects in S3 and handling large files line by line without loading the entire object into memory.
  • Concurrency control is crucial for not overwhelming downstream systems, and Step Functions provides mechanisms to manage this, including setting max concurrency and using retries for tasks.
  • Batching is an optimization technique that can reduce costs and latency by avoiding repeated steps and processing items in groups.
  • Failure handling in Distributed Map allows for setting failure tolerances, enabling the workflow to complete successfully even if some items fail, which can be reprocessed later.
  • The session highlighted the importance of choosing the right AWS service for data processing needs and how Step Functions can integrate with other AWS services like Glue, EMR, and Athena.
  • Resources such as serverlessland.com, the Step Functions workshop, and the AWS learning path for serverless on Skill Builder were recommended for those interested in learning more about Step Functions and serverless architectures.