How to Build Generative Aipowered American Sign Language Avatars Boa305

Title

AWS re:Invent 2023 - How to build generative AI–powered American Sign Language avatars (BOA305)

Summary

  • The session was presented by Alaki Eswaradas and Suresh Pupandi, senior solutions architects at AWS, and Rob Koch, a principal data engineer at Slalom who is deaf.
  • The talk focused on building generative AI-powered American Sign Language (ASL) avatars to aid communication for the deaf and hard of hearing.
  • The presenters discussed the importance of accessibility and assistive technology, highlighting the curb cut effect and its relevance to subtitles and other assistive tools.
  • They introduced GenASL, an application that translates speech or text into ASL animations or videos using AWS Cloud and generative AI capabilities.
  • The solution approach involves three steps: converting audio to English text using Amazon Transcribe, generating ASL gloss with Amazon Bedrock and Anthropic Cloud V2 LLM, and creating signed videos using MMPose and RTMPose models.
  • A live demo of the GenASL application was shown, demonstrating its ability to accept audio, speech, or text input and generate corresponding ASL videos.
  • The architecture behind GenASL was detailed, including the use of AWS Amplify, API Gateway, Step Functions, and other AWS services.
  • Future plans for GenASL include creating 3D avatars, improving video smoothness, and developing reverse translation from ASL video to English audio.
  • The session concluded with a call to action for building products with accessibility in mind and making the world more inclusive.

Insights

  • The GenASL application is a significant step towards bridging the communication gap for the deaf and hard of hearing community, demonstrating the potential of generative AI in creating assistive technologies.
  • The use of AWS services such as Amazon Transcribe, Amazon Bedrock, and AWS Amplify showcases the versatility and integration capabilities of AWS in developing complex applications.
  • The session highlighted the importance of considering accessibility in product development, emphasizing that inclusivity can lead to innovations beneficial to a broader audience, as illustrated by the curb cut effect.
  • The presenters' commitment to improving the GenASL application with 3D avatars and smoother video transitions indicates ongoing efforts to enhance the user experience and realism of the avatars.
  • The potential for reverse translation from ASL video to English audio suggests a future where two-way communication using ASL avatars could become a reality, further aiding inclusivity.
  • The session's focus on practical demonstrations and detailed architecture insights provides valuable knowledge for developers interested in building similar generative AI-powered applications.