Title: AWS re:Inforce 2024 - Use AWS WAF to help avoid cost-prohibitive traffic in LLM apps (NIS221)
Insights:
- Introduction to AWS WAF and CloudFront: The session focuses on using AWS WAF and Amazon CloudFront to secure public-facing generative AI applications, particularly large language models (LLMs).
- Prevalence of Non-Human Traffic: Approximately 47-50% of internet traffic is non-human (bots), which poses a significant cost issue for applications, including LLMs.
- Model Denial of Service (DoS): A major threat identified by OWASP for LLMs is model DoS, where fraudulent traffic drives up operational costs.
- Cost Structure of LLMs: Understanding the cost structure of running LLMs, particularly using Amazon Bedrock, is crucial. Costs are calculated per thousand input and output tokens.
- Retrieval Augmented Generation (RAG): This technique involves providing additional context and instructions to LLMs to generate more relevant responses. AWS offers a managed service for this called Knowledge Bases for Bedrock.
- Cost Analysis: A single LLM request can cost about six-tenths of a cent. However, using AWS WAF Bot Control can significantly reduce costs by blocking unwanted bot traffic.
- Cost Savings with AWS WAF: For every dollar spent on AWS WAF Bot Control, there can be a potential saving of $277 in generative AI costs.
- Architectural Best Practices: Adding WAF to load balancers and using CloudFront can improve security and performance. CloudFront also offers Layer 7 DDoS mitigation capabilities.
- WAF Rules and Monitoring: Amazon-managed WAF rules like the IP reputation list and core rule set can provide basic protections. WAF can also be set to monitor-only mode to observe traffic without affecting production.
- Implementation and Monitoring: Enabling WAF is straightforward through the CloudFront console. Monitoring traffic for a few days can provide insights into bot activity, helping to plan further security measures.
Quotes:
- "About half of internet traffic is non-human right 47-50 percent somewhere in that ballpark non-human traffic."
- "Model denial of service... is trying to drive costs up for us running large language models."
- "For every dollar that we spend on AWS WAF Bot Control, we can save $277 in our generative AI costs."
- "CloudFront automatically detected this and blocked these attacks before they ever caused any disruption to the customers that are using CloudFront."
- "AWS WAF is a great tool that you should be thinking about using with any new large language model application that you're building out today."
- "Turning on WAF gives you visibility into some of those known bad offenders, or it can block some of those known bad offenders."
- "Make sure that you're thinking about turning on WAF so that you get that observability into those bot problems today."