Scaling & Load Balancing
Elastic Load Balancing and Auto Scaling — how AWS architectures stay available and handle any amount of traffic.
Two services turn a pile of EC2 instances into a resilient, elastic application: Elastic Load Balancing (ELB) spreads incoming traffic across healthy instances, and Amazon EC2 Auto Scaling adds or removes instances to match demand. Together — usually across multiple AZs — they are the standard high-availability pattern.
Elastic Load Balancing
Key points
- Distributes traffic across multiple targets (EC2, containers, IPs) in multiple AZs.
- Health checks stop sending traffic to unhealthy targets automatically.
- Application Load Balancer (ALB) — HTTP/HTTPS (layer 7), path-based routing for web apps and microservices.
- Network Load Balancer (NLB) — TCP/UDP (layer 4), millions of requests/sec, static IPs, ultra-low latency.
- Gateway Load Balancer — for deploying third-party network appliances.
EC2 Auto Scaling
Key points
- Keeps a fleet at the desired capacity: replaces unhealthy instances automatically (self-healing).
- Dynamic scaling reacts to metrics like CPU; scheduled scaling handles predictable patterns; predictive scaling uses ML forecasts.
- You set minimum, desired, and maximum instance counts.
- Scaling out (more instances) and back in (fewer) is how you pay only for capacity you need — elasticity in action.
The load balancer is the restaurant host seating guests evenly across open tables and skipping the broken one. Auto Scaling is the manager who calls in extra staff for the dinner rush and sends them home at 10pm.
"Distribute traffic across AZs" → ELB. "Automatically add/remove instances based on demand" → Auto Scaling. "Replace failed instances automatically" → Auto Scaling health checks. The trio *ELB + Auto Scaling + Multi-AZ* is AWS's canonical highly available architecture.
Which service automatically distributes incoming application traffic across multiple EC2 instances in different AZs?