Monitoring & Observability
CloudWatch, CloudTrail, X-Ray, and the health dashboards — knowing what's happening in your account at all times.
The observability lineup
Metrics, logs, alarms, dashboards: CPU utilization, request counts, custom metrics. Alarms trigger notifications or Auto Scaling actions.
API call history for auditing: who did what, when, from where.
Distributed tracing: follow a request through microservices to find bottlenecks and errors.
Service health: the public dashboard shows AWS-wide status; your account view shows events affecting *your* resources (e.g., scheduled maintenance).
Automated best-practice checks in five categories: cost optimization, performance, security, fault tolerance, service limits.
Serverless event bus that reacts to events (schedule or state change) and routes them to targets like Lambda.
The classic trio, one more time: CloudWatch = performance monitoring (metrics/logs/alarms), CloudTrail = API auditing ("who deleted that bucket?"), Config = configuration compliance. Add X-Ray = tracing requests across services and you can answer nearly every monitoring question.
CloudWatch is the fitness tracker (vitals over time). CloudTrail is the security camera (who did what). Config is the building inspector (is everything up to code?). X-Ray is the GPS trace showing exactly where a delivery got stuck.
Trusted Advisor specifics worth memorizing
- Basic & Developer support: core checks only (a handful of security/service-limit checks).
- Business support and above: all checks across the five categories.
- Typical findings: underutilized EC2 instances, open security groups (0.0.0.0/0), S3 bucket permissions, MFA on root, approaching service limits.
A team needs dashboards of EC2 CPU utilization and an alarm when it exceeds 80%. Which service provides this?