Title: AWS re:Inforce 2024 - Use AWS WAF to help avoid cost-prohibitive traffic in LLM apps (NIS221)

Insights:

Introduction to AWS WAF and CloudFront: The session focuses on using AWS WAF and Amazon CloudFront to secure public-facing generative AI applications, particularly large language models (LLMs).
Prevalence of Non-Human Traffic: Approximately 47-50% of internet traffic is non-human (bots), which poses a significant cost issue for applications, including LLMs.
Model Denial of Service (DoS): A major threat identified by OWASP for LLMs is model DoS, where fraudulent traffic drives up operational costs.
Cost Structure of LLMs: Understanding the cost structure of running LLMs, particularly using Amazon Bedrock, is crucial. Costs are calculated per thousand input and output tokens.
Retrieval Augmented Generation (RAG): This technique involves providing additional context and instructions to LLMs to generate more relevant responses. AWS offers a managed service for this called Knowledge Bases for Bedrock.
Cost Analysis: A single LLM request can cost about six-tenths of a cent. However, using AWS WAF Bot Control can significantly reduce costs by blocking unwanted bot traffic.
Cost Savings with AWS WAF: For every dollar spent on AWS WAF Bot Control, there can be a potential saving of $277 in generative AI costs.
Architectural Best Practices: Adding WAF to load balancers and using CloudFront can improve security and performance. CloudFront also offers Layer 7 DDoS mitigation capabilities.
WAF Rules and Monitoring: Amazon-managed WAF rules like the IP reputation list and core rule set can provide basic protections. WAF can also be set to monitor-only mode to observe traffic without affecting production.
Implementation and Monitoring: Enabling WAF is straightforward through the CloudFront console. Monitoring traffic for a few days can provide insights into bot activity, helping to plan further security measures.

Quotes:

"About half of internet traffic is non-human right 47-50 percent somewhere in that ballpark non-human traffic."
"Model denial of service... is trying to drive costs up for us running large language models."
"For every dollar that we spend on AWS WAF Bot Control, we can save $277 in our generative AI costs."
"CloudFront automatically detected this and blocked these attacks before they ever caused any disruption to the customers that are using CloudFront."
"AWS WAF is a great tool that you should be thinking about using with any new large language model application that you're building out today."
"Turning on WAF gives you visibility into some of those known bad offenders, or it can block some of those known bad offenders."
"Make sure that you're thinking about turning on WAF so that you get that observability into those bot problems today."

Traffic Safety Auditing and Enforcing Iam Best Practices Iam303 S Use Generative Ai and Amazon Security Lake to Enhance Threat Analysis Tdr320