Title
AWS re:Invent 2023 - How to build generative AI–powered American Sign Language avatars (BOA305)
Summary
- The session was presented by Alaki Eswaradas and Suresh Pupandi, senior solutions architects at AWS, and Rob Koch, a principal data engineer at Slalom who is deaf.
- The talk focused on building generative AI-powered American Sign Language (ASL) avatars to aid communication for the deaf and hard of hearing.
- The presenters discussed the importance of accessibility and assistive technology, highlighting the curb cut effect and its relevance to subtitles and other assistive tools.
- They introduced GenASL, an application that translates speech or text into ASL animations or videos using AWS Cloud and generative AI capabilities.
- The solution approach involves three steps: converting audio to English text using Amazon Transcribe, generating ASL gloss with Amazon Bedrock and Anthropic Cloud V2 LLM, and creating signed videos using MMPose and RTMPose models.
- A live demo of the GenASL application was shown, demonstrating its ability to accept audio, speech, or text input and generate corresponding ASL videos.
- The architecture behind GenASL was detailed, including the use of AWS Amplify, API Gateway, Step Functions, and other AWS services.
- Future plans for GenASL include creating 3D avatars, improving video smoothness, and developing reverse translation from ASL video to English audio.
- The session concluded with a call to action for building products with accessibility in mind and making the world more inclusive.
Insights
- The GenASL application is a significant step towards bridging the communication gap for the deaf and hard of hearing community, demonstrating the potential of generative AI in creating assistive technologies.
- The use of AWS services such as Amazon Transcribe, Amazon Bedrock, and AWS Amplify showcases the versatility and integration capabilities of AWS in developing complex applications.
- The session highlighted the importance of considering accessibility in product development, emphasizing that inclusivity can lead to innovations beneficial to a broader audience, as illustrated by the curb cut effect.
- The presenters' commitment to improving the GenASL application with 3D avatars and smoother video transitions indicates ongoing efforts to enhance the user experience and realism of the avatars.
- The potential for reverse translation from ASL video to English audio suggests a future where two-way communication using ASL avatars could become a reality, further aiding inclusivity.
- The session's focus on practical demonstrations and detailed architecture insights provides valuable knowledge for developers interested in building similar generative AI-powered applications.