Let’s start with something very cool, the CloudWatch Unified Navigation.
This feature aims to integrate CloudWatch into almost every service pane available on AWS. It is basically a new sidebar that you can trigger.
You will mostly see this feature with the explore-related button's name (naming is hard, yes).
The new feature should help you find things that belong together. Often you will find yourself looking at certain traces and you know that something else belongs to that as well. E.g. another trace, log, or metrics. This is what this is meant for.
Finding this feature was harder than I thought. In the documentation, it states that it is available on different pages of CloudWatch. In the launch session, there was also a compass icon with the name “explore related” available. Somehow, that wasn’t the case for me.
You need to look for it in the top right corner. It is not the compass icon described in the documentation 🤷🏽♂️ but it is a laptop with a wrench - I already submitted feedback.
The pages you can access it from:
CloudWatch Metrics (navigation, legend, data points)
Console toolbar
In different services (e.g. Lambda → Monitoring → … → Explore related)
Once you open up this pane, you will see additional information. This is quite neat! First of all the tracing overview page got a nice overhaul. Let’s hope this comes to the general trace map as well.
From this pane, you can see all related metrics, logs, and traces. You can also go further by clicking on the connected resources. For example, on another service or API that is used from these services. Then you can see the metrics, logs, and traces of this resource.
For everybody who knows how hard it can be to even find the correct log group name, this can be a lifesaver.
Here is a list of supported services within the explore-related page. For some services that are mentioned, it somehow doesn’t work anyway. For example, for our Step Function.
Overall, a very cool feature in our opinion. Especially, to find fast-related logs, traces, and components.
Logs Insights News (less silos, more analytics)
We love logs insights. And if you use CloudWatch as your main observability solution, you will use logs insights daily. There were a couple of launches for Logs Insights itself. I’ll summarize them here.
New Languages to analyze logs - SQL and PPL
You can now use two more languages to analyze logs. Piped Processing Language (PPL) and SQL.
PPL follows a typical Pipe approach like you’re used to it in Linux:
And SQL, well is SQL.
In SQL you can use cool SQL functions like
join
aggregations
and all the other stuff SQL has to offer 😉
Here, for example, we join the logs of a Lambda Log Group with API Access logs on the requestId.
10,000 Log Groups
There was a limitation of having 50 log groups in one query. This was changed if you search for log groups by a prefix or use all log groups available
Field Indexes
You can now also index fields of logs that you are analyzing. This will improve the performance of queries and hence reduce the costs.
For example, here I’ve created a new index on all our Lambda log groups (/aws/lambda/dev prefix) on the request ID in our correlation IDs.
OpenSearch ❤️ CloudWatch (less silos, more analytics)
The idea is quite cool. You can use it everywhere where you can use OpenSearch Direct Query. This is kind of a serverless variant of OpenSearch. You only pay for the usage (but not too little).
Their pricing still seems a bit harsh and hard to calculate. Here is a pricing example from their landing page:
Great feature, especially for getting an ELK stack-like experience. Let’s see if we can build dashboards ourselves soon without the need to use a pre-defined dashboard.
Transaction Search (deeper, distributed tracing)
Transaction search is another very interesting piece! Once you enable it it will transform your X-Ray traces into Open Telemetry spans. These spans help you gain visibility into your application.
For us, this simply looks like distributed tracing for now. But maybe this is the way of AWS to support more Open Telemetry instead of only supporting X-Ray. Maybe this will even replace X-Ray at some point? 🤔
We’ve enabled transaction search for our GitHub repository tracker (our example CloudWatch Book application) and got a few spans:
Once you open one of those you will be redirected to the actual X-Ray trace.
You can also do some basic aggregations:
But for us some services are missing, so that needs to be further investigated.
Application Signals
With this one, we needed to think first. Because Application Signals already exist as a category of services.
Services like Evidently (RIP), RUM, and Synthetics fall into the category of Application Signals. However, this launch also describes the service or feature of Application Signals. Yes, naming things is hard. This feature already existed and was launched last year at re:invent.
Application Signals wants to give you an overall view of your application and give you the whole visibility. The launch post promises three main features for developers
Developers can answer any question related to performance through an interactive visual editor
Developers can diagnose rarely occurring issues
Logs offer advanced features for transaction spans
With Application Signals, you can also define Service Level Objectives (SLO). These can help you understand if you meet the goals you’ve set for yourself or not. These can for example be availability, latency, errors, etc.
Application Signals are there for whole services. You can enable it for:
ECS
EKS
Lambda
But you can also enable it for everything that the CloudWatch agent can run on. You need to enable them by installing the CloudWatch Agent or AWS Distro for OpenTelemetry.
We’ve activated Transaction Search for our example web application for the CloudWatch Book and an Application Signal Service was automatically created as well:
The canaries (we have one) are not connected yet, but we already get an overview like that.
If you want to learn more about Application Signals, make sure to check out the amazing One Observability workshop.
X-Ray to OTEL
We think one main insight into all of these launches is that AWS supports more and more OpenTelemetry now! It seems that AWS is basing its new services on OTEL data spans instead of their format. This is quite cool because it allows you to use third-party software for traces as well.
AI Investigation
Investigations is the first 👆🏽 AI feature of CloudWatch in this re:invent. The idea is to help you debug and investigate any issues you have. You can connect it with your chat applications via connecting it to SNS. And it also allows you to connect your ticketing system like Linear, Jira, or whatever you use.
You can trigger a sample investigation to get an idea how what it looks like:
There are different panes you can see:
Feed: The feed is the overview you are often used to in a ticketing system. You can see what you’re other developers posted to this investigation.
Suggestions: Suggestions are auto-generated by Q. It looks at recent deployments, configs, and much more to give you an idea of how you can improve. This looks quite nice!
Overall, the idea is amazing. It hardly depends on how well it will work. I’m amazed by it and will make use of it. Let’s see how good it will work in a production app with lots of traffic!
Auditing Tracing Configuration
CloudWatch gives you a new overview of your tracing settings. You can turn it on for your whole account or organization. Once activated it will search for resources in your account.
It then shows you an overview of activated traces of the following resource types:
EC2 Instances
VPCs
Lambda Functions
The idea here is to give you an overview of all the different tracing settings within your infrastructure. You don’t want to miss traces of a crucial application. Especially, since for the OTEL spans they clearly recommend to sample 100% of your traces, this will help you with that!
Unfortunately, for our accounts, it didn’t work yet and we couldn’t find any resources.
Synthetics
Synthetics also got two minor updates. With Synthetics you can build E2E web tests. Typically, you use a headless browser for that. That is a browser that you can control from code. There is now a new runtime, playwright for that. This is quite nice! What comes with that as well is that you can store your logs directly in CloudWatch instead of storing them as text files in S3. That’s quite cool!
Synthetics will now also finallydelete Lambda resources when canaries are removed. This was quite a hassle always if you’ve removed a canary you needed to remove the CloudWatch Log Group, Lambda, and everything yourself. This should now be automated!
New Metrics (more coverage)
CloudWatch announced several new metrics for some services.
There are now metrics available for the actual event source mapping (ESM) in Lambda. This is quite useful. If you connect SQS with a Lambda, for example, the main magic happens within the event source mapping. Until now this was kind of a black box. Now you can see metrics like
ECS now has an additional mode called enhanced observability. Before it was only called ECS Container Insights and the enhanced observability bit gives you some more metrics.
You can set it up very easily: aws ecs put-account-setting --name containerInsights --value enhanced
Database Insights gives you more insights into your database (🥁). Only Aurora MySQL and Aurora PostgreSQL are supported right now. It will mainly summarize logs and metrics from your DB in a dashboard.
Network flow monitoring allows you to get network data to CloudWatch. You need to install an agent for that. If you do that you get near real-time information about your network traffic. While this is a bit bigger than “we’ve added some new metrics”, in the end, you’ll have new metrics 😉
Summary
This re:invent had some amazing launches. Only the CloudWatch launches were amazing!
TLDR;
More Coverage: More Metrics
Easier Correlation: CloudWatch Unified Navigation
Less silos, more analytics: OpenSearch integration
Deeper distributed tracing: X-Ray → OTEL spans
Aided investigations: AI Q Developer Assistant
Improving the user experience for CloudWatch should be one of the number one topics of AWS in our opinion. CloudWatch is often the only service why developers log into the console still a lot. The unified navigation is a great first step.
Making use of OTEL spans instead of their own X-Ray format is a great idea as well from our perspective. It allows AWS to support more observability tools and gives customers the ability to export them into third-party tools and correlate with more systems.
Let’s see what the future brings!
See you in two weeks ✌🏽
Sandro & Tobi
P.S. Sandro was also interviewed on this one the podcast: Living in the Cloud. The episode is not out yet, keep your eyes open.