Accelerate Secure Data Migrations at Scale with Aws Datasync Stg222

Title

AWS re:Invent 2023 - Accelerate secure data migrations at scale with AWS DataSync (STG222)

Summary

  • Jeff Bartley, a product manager on the AWS DataSync team, provided an overview of AWS DataSync, its capabilities, and how it addresses common data migration challenges.
  • Raghavendra from Workday shared their experience using DataSync for migrating their Hadoop cluster to AWS.
  • DataSync is designed to handle large-scale data transfers securely and efficiently, with built-in encryption and data verification.
  • It supports various AWS services, including S3, FSx, and EFS, and can be used for recurring business workflows, migrations, data protection, and archiving.
  • DataSync can move data from on-premises, edge locations, other clouds, and within AWS, with support for NFS, SMB, S3 API, and HDFS protocols.
  • Performance optimization techniques include deploying multiple agents, using parallel tasks, and throttling bandwidth.
  • Cost optimization strategies include incremental transfers, data filtering, and direct transfers into S3 storage classes.
  • Workday's migration to AWS using DataSync resulted in predictable data speeds, improved SLAs, and access to advanced analytics tools.

Insights

  • DataSync's custom protocol built on TCP/IP and parallel streams allows for high throughput, with customers achieving up to 10 gigabits per second per task.
  • The fully managed service integrates with AWS monitoring and management tools, simplifying the migration process.
  • DataSync's incremental transfer approach ensures that only changed data is moved, reducing transfer volumes and costs.
  • The ability to deploy DataSync agents on-premises, in the cloud, or on edge devices like AWS Snowcone provides flexibility in various environments.
  • Workday's use case highlights the importance of planning and testing in large-scale migrations, as well as the benefits of cloud-based data lakes over on-premises solutions.
  • DataSync's ability to handle multi-cloud environments and its recent support for Google Cloud Storage and Azure Files/Blob Storage indicates AWS's commitment to interoperability and customer needs in multi-cloud scenarios.
  • The session emphasized the importance of considering network bandwidth, storage I/O, and latency when optimizing for performance, as well as the need to manage request charges when using S3 with DataSync.
  • Workday's migration challenges and solutions provide practical insights for organizations planning similar data migrations, emphasizing the need for thorough testing, checkpointing, and handling data integrity issues.