Title

AWS re:Invent 2023 - How to build generative AI–powered American Sign Language avatars (BOA305)

Summary

The session was presented by Alaki Eswaradas and Suresh Pupandi, senior solutions architects at AWS, and Rob Koch, a principal data engineer at Slalom who is deaf.
The talk focused on building generative AI-powered American Sign Language (ASL) avatars to aid communication for the deaf and hard of hearing.
The presenters discussed the importance of accessibility and assistive technology, highlighting the curb cut effect and its relevance to subtitles and other assistive tools.
They introduced GenASL, an application that translates speech or text into ASL animations or videos using AWS Cloud and generative AI capabilities.
The solution approach involves three steps: converting audio to English text using Amazon Transcribe, generating ASL gloss with Amazon Bedrock and Anthropic Cloud V2 LLM, and creating signed videos using MMPose and RTMPose models.
A live demo of the GenASL application was shown, demonstrating its ability to accept audio, speech, or text input and generate corresponding ASL videos.
The architecture behind GenASL was detailed, including the use of AWS Amplify, API Gateway, Step Functions, and other AWS services.
Future plans for GenASL include creating 3D avatars, improving video smoothness, and developing reverse translation from ASL video to English audio.
The session concluded with a call to action for building products with accessibility in mind and making the world more inclusive.

The GenASL application is a significant step towards bridging the communication gap for the deaf and hard of hearing community, demonstrating the potential of generative AI in creating assistive technologies.
The use of AWS services such as Amazon Transcribe, Amazon Bedrock, and AWS Amplify showcases the versatility and integration capabilities of AWS in developing complex applications.
The session highlighted the importance of considering accessibility in product development, emphasizing that inclusivity can lead to innovations beneficial to a broader audience, as illustrated by the curb cut effect.
The presenters' commitment to improving the GenASL application with 3D avatars and smoother video transitions indicates ongoing efforts to enhance the user experience and realism of the avatars.
The potential for reverse translation from ASL video to English audio suggests a future where two-way communication using ASL avatars could become a reality, further aiding inclusivity.
The session's focus on practical demonstrations and detailed architecture insights provides valuable knowledge for developers interested in building similar generative AI-powered applications.