AMAZON CLOUDWATCHIcon-Architecture/64/Arch_Amazon-CloudWatch_64

AN OBSERVABILITY PLATFORM FOR YOUR AWS & ON-PREMISE SERVICES

INTRODUCTION👋

Amazon CloudWatch is AWS's central logging and metrics service.

Amongst the collected metrics are for example the number of Lambda invocations, free space in your database, or how much CPU your ECS cluster uses. Based on these metrics you can create alarms for certain thresholds. If your free space is below a certain threshold you can get notified.

ALL LOGS IN ONE PLACE💎

One of the core functionalities of CloudWatch is CloudWatch Logs. This is the centralized logging space in AWS. Services like Lambda, API Gateway, or ECS log directly into CloudWatch Logs.

This is a huge benefit when working with AWS. You have one central space where all your logs are stored.

LOG EVENTS📝

A log event is your actual log statement. It contains the timestamp of your log and the raw log statement your put into it.

These events contain a timestamp and the actual message. You can see the start and the end of the Lambda execution indicated by START and END.

LOG STREAMS💫

A log stream contains one or more log events from the same source and can be seen as a sequence of logs. For example, a log stream of a Lambda function can contain more executions of the same Lambda.

In the case of Lambda one log stream belongs to one warm Lambda container. This means one Lambda container that wasn't destroyed. This is reflected in the log events.

LOG GROUPS📂

A log group is a container that holds multiple log streams. Typically one log group is dedicated to one service. One Lambda function for example has one log group.

The name of a log group is prefixed with /aws/ and the service name. For Lambda it would be: /aws/lambda/FUNCTION_NAME.

Each Log Group contains different Log Streams. The retention settings are associated with the log group and define how long your logs are stored.

Often you can jump into the correct log group by going to your service -> monitoring tab. Or the monitoring tab -> click on open logs in CloudWatch.

LOG INSIGHTS🔍

CloudWatch Log Insights allows you to query your logs with a SQL-like query language. Querying logs across different log groups can be quite challenging.

Logs Insights allows you to query multiple log groups with statements.

BUILDING QUERIES🔨

You're able to build queries with fields, filter, sort, limit, and many more. Also, after executing one query, Log Insights will show you available fields with the percentage of how often they were available. For example, in 60% of all logs, a field with string type was available.

Let's have a look at the example query on the right and inspect what it does.

  • fields - all fields we want to have - timestamp & message
  • filter - an expression-based filter - a certain correlation.id
  • sort - sorting based on a target field - the timestamp
  • limit - only showing a certain number of matches

This query shows you a report of how much memory you over-provisioned in your Lambda functions.

PRE-DEFINED QUERIES🎯

Log Insights also has some pre-made queries. These queries are available on the right side. You find examples of different AWS Services like Lambda or ECS.

SERVICE METRICS📊

CloudWatch is a metric repository. Each application sends default metrics to CloudWatch. You can use CloudWatch to understand how your application behaves.

Example metrics for Lambda are:

  • Number of invocations
  • Number of errors
  • Execution time of the Lambda

You can also use statistical functions like averages, sums, or medians.

DASHBOARDS📱

Dashboards give you the opportunity to get an overview of lots of different metrics and alarms at the same time. CloudWatch creates automatic dashboards for you.

You can see them if you open CloudWatch and head to the tab Automatic dashboards. They are available for different services like DynamoDB, Lambda, and CloudWatch itself). But you can also create custom ones.

Dashboards make it much easier to understand how your system behaves. You can also share dashboards across your entire organization.

COMPOSITE ALARMS🚨

This alarm takes several alarms into account and their states. For example, you can build an alarm that will only be triggered if two of the three metric alarms are in the state IN_ALARM.

ALARMS🚨

CloudWatch Alarms notify you once a system or service reaches a pre-defined threshold like errors or usage:

  • Messages available in your Dead Letter Queue
  • Errors in your Lambda function
  • API is throwing more HTTP 500 status codes than usual

There are two alarm states:

  • IN_ALARM -> there is a need to take action
  • OK -> The alarm is not active

It is possible that the alarm doesn't have sufficient data. You can define if this will trigger the alarm or not. Normally it won't trigger an alarm.

PRICING💰

CloudWatch can be quite expensive. This is due to the fact that ingesting logs has a high price tag of ~$0.5 per GB.

You'll also be charged for storing and analysing logs and for custom metrics. The standard metrics for each service come for free!

ALARM TOPICS📧

CloudWatch uses SNS to inform you of alarm changes via different channels like personalized emails, SMS, or in-App notifications.

Since SNS is really flexible, you can also attach a Lambda function to the SNS topic.

X-RAY🔍

X-Ray gives you the opportunity to build distributed traces for all requests. Every request will have an X-Ray TraceID which you can follow in the X-Ray system. You then see how users interacted with your system and can also check all logs attached to this trace.

SYNTHETICS🤖

CloudWatch Synthetics is a service that creates so-called Canaries. Canaries are scripts that run on a schedule. Their goal is to check endpoints and API calls.

There are different canary types, including:

  • Heartbeat monitor: Check URLs regularly
  • API Canary: Check API endpoints
  • Visual Monitoring: Open a webpage in the browser and check elements on that page like buttons
  • GUI Workflow builder: Verifies that actions can be done on your web page