re:invent 2024: What's New in CloudWatch ✨

⌛ Reading time: 12 minutes
🎓 Main Learning: CloudWatch Launches re:invent 2024
✍️ Read the Full Post Online 🔗

re:invent happened already two weeks ago and there were some amazing launches 👀

CloudWatch got a lot of love at that re:invent. This is why we are showing you our top CloudWatch launches for this year. We've worked through all of them, tried to get them working with our example application of the CloudWatch Book, and are now busy updating the book ✍🏽.

Let's dive into CloudWatch.

TLDR;

The launches were categorized into 5 main topics:

More Coverage: Database Insights, Container enhanced visibility, new metrics
Easier Correlation: Unified Navigation
Less silos, more analytics: OpenSearch Integration
Deeper distributed tracing: Transaction Search
Aided investigations: AI Investigations

Infographic

Tired of reading the whole email? Check out the infographic.

CloudWatch Unified Navigation (easier correlation)

Let’s start with something very cool, the CloudWatch Unified Navigation.

This feature aims to integrate CloudWatch into almost every service pane available on AWS. It is basically a new sidebar that you can trigger.

You will mostly see this feature with the explore-related button's name (naming is hard, yes).

The new feature should help you find things that belong together. Often you will find yourself looking at certain traces and you know that something else belongs to that as well. E.g. another trace, log, or metrics. This is what this is meant for.

Finding this feature was harder than I thought. In the documentation, it states that it is available on different pages of CloudWatch. In the launch session, there was also a compass icon with the name “explore related” available. Somehow, that wasn’t the case for me.

You need to look for it in the top right corner. It is not the compass icon described in the documentation 🤷🏽‍♂️ but it is a laptop with a wrench - I already submitted feedback.

Screenshot of a dashboard interface with an "Untitled graph" showing data, a toolbar for time selection, and various actions. A sidebar titled "Operational troubleshooting" includes a warning about retaining context in an AWS service console. — Open Unified Navigation from the top right corner

The pages you can access it from:

CloudWatch Metrics (navigation, legend, data points)
Console toolbar
In different services (e.g. Lambda → Monitoring → … → Explore related)

A screenshot showing a CloudWatch dashboard with various metrics for a Lambda function. The left side displays graphs for duration, error count, success rate, and other metrics. The right side features an operational troubleshooting interface with a topology map of related AWS services. — Unified Navigation on Lambda

Once you open up this pane, you will see additional information. This is quite neat! First of all the tracing overview page got a nice overhaul. Let’s hope this comes to the general trace map as well.

From this pane, you can see all related metrics, logs, and traces. You can also go further by clicking on the connected resources. For example, on another service or API that is used from these services. Then you can see the metrics, logs, and traces of this resource.

For everybody who knows how hard it can be to even find the correct log group name, this can be a lifesaver.

Here is a list of supported services within the explore-related page. For some services that are mentioned, it somehow doesn’t work anyway. For example, for our Step Function.

Overall, a very cool feature in our opinion. Especially, to find fast-related logs, traces, and components.

Logs Insights News (less silos, more analytics)

We love logs insights. And if you use CloudWatch as your main observability solution, you will use logs insights daily. There were a couple of launches for Logs Insights itself. I’ll summarize them here.

New Languages to analyze logs - SQL and PPL

You can now use two more languages to analyze logs. Piped Processing Language (PPL) and SQL.

PPL follows a typical Pipe approach like you’re used to it in Linux:

And SQL, well is SQL.

In SQL you can use cool SQL functions like

join
aggregations

and all the other stuff SQL has to offer 😉

Here, for example, we join the logs of a Lambda Log Group with API Access logs on the requestId.

10,000 Log Groups

There was a limitation of having 50 log groups in one query. This was changed if you search for log groups by a prefix or use all log groups available

A dropdown menu for selecting log groups with options for "Log group name" and "Name prefix," along with a selected option for "All log groups." — Match up to 10k Log Groups on Prefixes

Field Indexes

You can now also index fields of logs that you are analyzing. This will improve the performance of queries and hence reduce the costs.

Screenshot of an interface for configuring index policy details. The section includes fields for policy name and log group selection, with options for "All standard log groups" or "Select log group(s) by prefix match." A text box for entering a prefix name is filled with "/aws/lambda/dev." It also shows field index details, with "correlationIds.requestId" as the field path. Options to add or remove field paths are present. Buttons for "Cancel" and "Save changes" are at the bottom. — Index your fields to increase the performance of queries

For example, here I’ve created a new index on all our Lambda log groups (/aws/lambda/dev prefix) on the request ID in our correlation IDs.

OpenSearch ❤️ CloudWatch (less silos, more analytics)

OpenSearch now natively integrates with CloudWatch. You can create dashboards for some pre-defined use cases like:

VPC Flow Logs
CloudTrail Logs
WAF Logs

The idea is quite cool. You can use it everywhere where you can use OpenSearch Direct Query. This is kind of a serverless variant of OpenSearch. You only pay for the usage (but not too little).

Their pricing still seems a bit harsh and hard to calculate. Here is a pricing example from their landing page:

The total monthly charges = $732

$3 (Direct Query OCU)$350 (Serverless Indexing)$29 (Serverless Storage)$350 (Serverless Search)

This is with a monthly ingest of over 1 TB!

Great feature, especially for getting an ELK stack-like experience. Let’s see if we can build dashboards ourselves soon without the need to use a pre-defined dashboard.

Transaction Search (deeper, distributed tracing)

Transaction search is another very interesting piece! Once you enable it it will transform your X-Ray traces into Open Telemetry spans. These spans help you gain visibility into your application.

For us, this simply looks like distributed tracing for now. But maybe this is the way of AWS to support more Open Telemetry instead of only supporting X-Ray. Maybe this will even replace X-Ray at some point? 🤔

View of the visual editor for spans — Transaction Search Dashboard

We’ve enabled transaction search for our GitHub repository tracker (our example CloudWatch Book application) and got a few spans:

Screenshot of a log analysis interface displaying spans with filters applied. The table lists 15 records for the duration, environment, status code, and service related to API requests. HTTP status code 200 is highlighted with visualizations on the right. — Spans of the GitHub Tracker

Once you open one of those you will be redirected to the actual X-Ray trace.

You can also do some basic aggregations:

A screenshot of a web interface showing span query results with a visualization. It includes search filters, a query section, and a horizontal bar graph displaying counts of HTTP response status codes 403 and 202. — Aggregation query in transaction search

But for us some services are missing, so that needs to be further investigated.

Application Signals

With this one, we needed to think first. Because Application Signals already exist as a category of services.

Services like Evidently (RIP), RUM, and Synthetics fall into the category of Application Signals. However, this launch also describes the service or feature of Application Signals. Yes, naming things is hard. This feature already existed and was launched last year at re:invent.

Application Signals wants to give you an overall view of your application and give you the whole visibility. The launch post promises three main features for developers

Developers can answer any question related to performance through an interactive visual editor
Developers can diagnose rarely occurring issues
Logs offer advanced features for transaction spans

With Application Signals, you can also define Service Level Objectives (SLO). These can help you understand if you meet the goals you’ve set for yourself or not. These can for example be availability, latency, errors, etc.

Application Signals are there for whole services. You can enable it for:

ECS
EKS
Lambda

But you can also enable it for everything that the CloudWatch agent can run on. You need to enable them by installing the CloudWatch Agent or AWS Distro for OpenTelemetry.

AWS Distro for OpenTelemetry Lambda layer diagram flow converting X-Ray tracing to OpenTelemetry and X-Ray again. — CloudWatch ADOT Layer that collects OTEL spans

We’ve activated Transaction Search for our example web application for the CloudWatch Book and an Application Signal Service was automatically created as well:

Dashboard displaying metrics for "dev-ApiStack-WebsocketApiConstructwshandlerC4E7E85-JGOU7NayoSKs" over a 3-hour period. Sections include operations, dependencies, Synthetics Canaries, and client pages. Graphs show metrics such as latency, request count, availability, fault rate, and error rate by time. No faults or unhealthy states are reported. — Application Signals for our GitHub Tracker

The canaries (we have one) are not connected yet, but we already get an overview like that.

If you want to learn more about Application Signals, make sure to check out the amazing One Observability workshop.

X-Ray to OTEL

We think one main insight into all of these launches is that AWS supports more and more OpenTelemetry now! It seems that AWS is basing its new services on OTEL data spans instead of their format. This is quite cool because it allows you to use third-party software for traces as well.

AI Investigation

Investigations is the first 👆🏽 AI feature of CloudWatch in this re:invent. The idea is to help you debug and investigate any issues you have. You can connect it with your chat applications via connecting it to SNS. And it also allows you to connect your ticketing system like Linear, Jira, or whatever you use.

You can trigger a sample investigation to get an idea how what it looks like:

Dashboard showing a sample investigation in Amazon Q. The left pane contains a feed with observations noting high latency in PutItem operations on DynamoDB. A chart shows availability and latency over time, indicating possible throttling. The right pane includes suggestions and observations related to DynamoDB deployment and traffic throttling. — AI Sample Investigation

There are different panes you can see:

Feed: The feed is the overview you are often used to in a ticketing system. You can see what you’re other developers posted to this investigation.
Suggestions: Suggestions are auto-generated by Q. It looks at recent deployments, configs, and much more to give you an idea of how you can improve. This looks quite nice!

Overall, the idea is amazing. It hardly depends on how well it will work. I’m amazed by it and will make use of it. Let’s see how good it will work in a production app with lots of traffic!

Auditing Tracing Configuration

CloudWatch gives you a new overview of your tracing settings. You can turn it on for your whole account or organization. Once activated it will search for resources in your account.

It then shows you an overview of activated traces of the following resource types:

EC2 Instances
VPCs
Lambda Functions

The idea here is to give you an overview of all the different tracing settings within your infrastructure. You don’t want to miss traces of a crucial application. Especially, since for the OTEL spans they clearly recommend to sample 100% of your traces, this will help you with that!

Screenshot of a dashboard showing resource metrics. It includes sections for AWS EC2 Instances, VPC, and Lambda Functions, all with 0% coverage in logs, metrics, and traces. The update was 0 minutes ago. — Overview of your trace configs

Unfortunately, for our accounts, it didn’t work yet and we couldn’t find any resources.

Synthetics

Synthetics also got two minor updates. With Synthetics you can build E2E web tests. Typically, you use a headless browser for that. That is a browser that you can control from code. There is now a new runtime, playwright for that. This is quite nice! What comes with that as well is that you can store your logs directly in CloudWatch instead of storing them as text files in S3. That’s quite cool!

Synthetics will now also finally delete Lambda resources when canaries are removed. This was quite a hassle always if you’ve removed a canary you needed to remove the CloudWatch Log Group, Lambda, and everything yourself. This should now be automated!

New Metrics (more coverage)

CloudWatch announced several new metrics for some services.

Event Source Mapping Metrics for Lambda

There are now metrics available for the actual event source mapping (ESM) in Lambda. This is quite useful. If you connect SQS with a Lambda, for example, the main magic happens within the event source mapping. Until now this was kind of a black box. Now you can see metrics like

PolledEventCount (events read by ESM)
InvokedEventCount (events invoking Lambda function)
FilteredOutEventCount (events filtered out)
FailedInvokeEventCount (events failing to invoke)

ECS Container Insights enhanced observability

ECS now has an additional mode called enhanced observability. Before it was only called ECS Container Insights and the enhanced observability bit gives you some more metrics.

You can set it up very easily: aws ecs put-account-setting --name containerInsights --value enhanced

Some more metrics are:

ContainerMemoryUtilization
ContainerCpuUtilization
ContainerCpuReserved

Database Insights

Database Insights gives you more insights into your database (🥁). Only Aurora MySQL and Aurora PostgreSQL are supported right now. It will mainly summarize logs and metrics from your DB in a dashboard.

There are two modes: Standard and Advanced.

Comparison table of database features showing support in Standard and Advanced modes. Advanced mode supports more features, such as visualizing per-query statistics and analyzing slow SQL queries, while both modes support defining access control policies and analyzing DB load contributors.

Network Flow Monitoring

Network flow monitoring allows you to get network data to CloudWatch. You need to install an agent for that. If you do that you get near real-time information about your network traffic. While this is a bit bigger than “we’ve added some new metrics”, in the end, you’ll have new metrics 😉

Summary

This re:invent had some amazing launches. Only the CloudWatch launches were amazing!

TLDR;

More Coverage: More Metrics
Easier Correlation: CloudWatch Unified Navigation
Less silos, more analytics: OpenSearch integration
Deeper distributed tracing: X-Ray → OTEL spans
Aided investigations: AI Q Developer Assistant

Improving the user experience for CloudWatch should be one of the number one topics of AWS in our opinion. CloudWatch is often the only service why developers log into the console still a lot. The unified navigation is a great first step.

Making use of OTEL spans instead of their own X-Ray format is a great idea as well from our perspective. It allows AWS to support more observability tools and gives customers the ability to export them into third-party tools and correlate with more systems.

Let’s see what the future brings!

See you in two weeks ✌🏽

Sandro & Tobi

P.S. Sandro was also interviewed on this one the podcast: Living in the Cloud. The episode is not out yet, keep your eyes open.

Tobias Schmidt & Sandro Volpicella & from AWS Fundamentals
Cloud Engineers • Fullstack Developers • Educators

You're receiving this email because you're part of our awesome community!

If you'd prefer not to receive updates, you can easily unsubscribe anytime by clicking here:

Our address:

re:invent 2024: What's New in CloudWatch ✨

TLDR;

Infographic

CloudWatch Unified Navigation (easier correlation)

Logs Insights News (less silos, more analytics)

New Languages to analyze logs - SQL and PPL

10,000 Log Groups

Field Indexes

OpenSearch ❤️ CloudWatch (less silos, more analytics)

Transaction Search (deeper, distributed tracing)

Application Signals

X-Ray to OTEL

AI Investigation

Auditing Tracing Configuration

Synthetics

New Metrics (more coverage)

​Event Source Mapping Metrics for Lambda​

​ECS Container Insights enhanced observability​

​Database Insights​

​Network Flow Monitoring​

Summary

Event Source Mapping Metrics for Lambda

ECS Container Insights enhanced observability

Database Insights

Network Flow Monitoring