AWS Fundamentals LogoAWS Fundamentals
Back to Blog

AWS FinOps - Real-Time Cost Monitoring with CloudTrail and EventBridge

Debojyoti Mahapatra
by Debojyoti Mahapatra
AWS FinOps - Real-Time Cost Monitoring with CloudTrail and EventBridge

Introduction

AWS billing cycles create a delay between resource usage and cost visibility. This delay can lead to unexpected bills when infrastructure scales unexpectedly or costly resources are launched due to mistakes and unawareness.

As AWS cost forecasting also has significant delays, meaning expensive resources can accumulate costs before being detected by AWS budget forecasts.

Proactive monitoring bridges this gap by detecting cost-impacting events as they happen.

This article shows how to build a proactive cost monitoring system using native AWS services. The solution detects EC2 instance launches and other infrastructure changes immediately. We'll learn to use EventBridge, Lambda, and SNS to create instant cost alerts.

This approach works for any AWS service that generates CloudTrail events. The architecture can also be used across multiple accounts in AWS Organizations.

AWS Lambda Infographic

AWS Lambda on One Page (No Fluff)

Skip the 300-page docs. Our Lambda cheat sheet covers everything from cold starts to concurrency limits - the stuff we actually use daily.

HD quality, print-friendly. Stick it next to your desk.

Privacy Policy
By entering your email, you are opting in for our twice-a-month AWS newsletter. Once in a while, we'll promote our paid products. We'll never send you spam or sell your data.

Problem Scenario

Our application uses Auto Scaling Groups to handle traffic spikes. During normal business hours, scaling follows predictable patterns. A DDoS attack triggers unexpected scaling events.

Our infrastructure scales up to handle the attack traffic. We only discover the cost impact after the billing cycle completes or with AWS cost forecasting (which can be delayed up to 24 hours!). By then, significant costs have already accumulated and we likely have to pay for them.

Proactive monitoring detects these events as they happen. We receive immediate alerts when the changes happen. This allows us to investigate and respond before costs escalate.

Solution Architecture

The monitoring system uses five core AWS services:

  • 🔍 CloudTrail captures all API calls and infrastructure changes.
  • 🛣️ EventBridge routes specific events to our processing function.
  • ⚡ Lambda analyzes events and determines if alerts are needed.
  • 📢 SNS delivers notifications to our team. (We can also use AWS Chatbot or direct webhooks.)
  • 🔐 IAM controls permissions for secure event processing.

This architecture provides immediate detection with minimal latency. Each service has a specific role in the monitoring pipeline.

Step-by-Step Walkthrough Guide

This guide creates a monitoring system that alerts us when EC2 instances launch. The architecture is shown below:

Architecture Overview

It's a very simplistic architecture and only includes Serverless components. This means that we don't have to worry about much.

We'll only be charged for actual events that occur and not for idling resources.

Step 1: Enable CloudTrail to Capture API Events

CloudTrail is AWS's service for logging and monitoring API calls across our AWS infrastructure. It creates a comprehensive audit trail of all actions taken in our AWS account.

We'll need to create a management trail to capture all API calls. This trail records events and other infrastructure changes.

The trail must be enabled before EventBridge can access the events.

CloudTrail Management Events

On any infrastructure change, CloudTrail will create an event and send it to EventBridge. With EventBridge, we can filter the events and route them to our Lambda function.

Step 2: Configure EventBridge for Event Filtering

EventBridge is AWS's event routing service that connects different AWS services and applications. It acts as a central event bus that can receive, filter, and route events to appropriate targets. It's one of the main building blocks of event-driven architectures.

It receives all CloudTrail events and filters them based on our rules. The filtering happens before events reach our Lambda function, reducing Lambda execution costs.

Basic EC2 Monitoring Pattern

Start with a simple pattern that captures all EC2 instance launches.

{
    "source": ["aws.ec2"],
    "detail-type": ["AWS API Call via CloudTrail"],
    "detail": {
        "eventSource": ["ec2.amazonaws.com"],
        "eventName": ["RunInstances"]
    }
}

We filter for the EC2 RunInstances event. This means, we'll invoke our Lambda function when a new EC2 instance is launched. It will be invoked regardless of the source that triggered the action, so it can be an auto-scale action or a user launching an instance.

Advanced Filtering Options

EventBridge patterns are highly configurable and can target specific scenarios. We can filter by instance types, regions, accounts, or multiple services. This flexibility allows us to create fine-grained monitoring rules for our awareness needs.

Step 3: Create SNS Topic for Alert Delivery

SNS (Simple Notification Service) is AWS's messaging service for sending notifications to distributed systems. It provides a reliable way to deliver messages to multiple subscribers through various protocols.

Let's create a topic that will deliver cost alerts to our team (or just ourselves). We can subscribe email addresses, phone numbers, or other endpoints to receive notifications.

The topic acts as a central distribution point for all cost-related alerts. It's a pub/sub model. We can add multiple subscribers without changing our Lambda function. This decoupled design makes it easy to add new notification channels later.

We can also use AWS Chatbot or direct webhooks and skip the SNS topic at all.

Step 4: Build A Lambda Function for Event Analysis & Alerting

AWS Lambda is a serverless compute service that runs code in response to events. It automatically scales and manages the underlying infrastructure for us, so we don't have to care about anything. Lambda is the backbone of Serverless architectures and it's not only useful for our cost monitoring solution.

In our scenario, it analyzes each event to determine if it represents a cost risk. When thresholds are exceeded, it formats and sends alerts via SNS (or other channels). As mentioned earlier, we don't have to use SNS for this; we can also send alerts to our team's Slack or Teams channel using AWS Chatbot or direct webhooks.

Lambda Trigger

As seen in the dependency overview of Lambda, our function should be triggered by EventBridge. And it should be allowed to send messages to SNS.

The diagram misses the reference to the pricing API, as the basic solution only filters for instance sizes. As mentioned, it's better to configure dollar thresholds that respect the prices from the AWS Pricing API.

Key Functions:

  1. Parse CloudTrail event details
  2. Calculate estimated costs based on resource specifications
  3. Compare against configured thresholds
  4. Format alert messages with relevant information
  5. Send notifications through SNS

Step 2 is the most important one in this architecture. We can use the AWS Pricing API to get the price of our target resource. In our example case, we'll need to look for the price of the EC2 instance type, (e.g. t3.large), the operating system, (e.g. Windows), and the region, (e.g. us-east-1).

With the pricing API, we can easily retrieve the price of the resource and compare it with the configured threshold.

Let's have a look at our example:

aws pricing get-products \
    --service-code AmazonEC2 \
    --filters "Type=TERM_MATCH,Field=instanceType,Value=t3.micro" \
    "Type=TERM_MATCH,Field=location,Value=US East (N. Virginia)" \
    "Type=TERM_MATCH,Field=tenancy,Value=Shared" \
    "Type=TERM_MATCH,Field=operatingSystem,Value=Linux" \
    "Type=TERM_MATCH,Field=preInstalledSw,Value=NA" \
    --region us-east-1 --output json | \
    jq '.PriceList[] | fromjson | select(.terms.OnDemand) | \
    .terms.OnDemand | to_entries[0].value.priceDimensions | \
    to_entries[0].value | \
    {description: .description, pricePerUnit: .pricePerUnit.USD}'

This will return something like this:

{
  "description": "$0.00 per Reservation Linux t3.micro Instance Hour",
  "pricePerUnit": "0.0000000000"
}
{
  "description": "$0.0104 per On Demand Linux t3.micro Instance Hour",
  "pricePerUnit": "0.0104000000"
}
{
  "description": "$0.0104 per Unused Reservation Linux t3.micro Instance Hour",
  "pricePerUnit": "0.0104000000"
}

If our configured threshold is breached, only then we'll send an alert. Obviously, we should not use the AWS CLI in our function but make use of the AWS SDK for our preferred language.

Step 5: Configure IAM Permissions

IAM (Identity and Access Management) is AWS's service for managing permissions and access control. It defines who can access which AWS resources and what actions they can perform.

Lambda receives events from EventBridge, so it doesn't need direct CloudTrail permissions. The Lambda function needs permissions to send SNS messages and access the AWS Pricing API. EventBridge requires permission to invoke our Lambda function.

Required Permissions:

  • Lambda execution role with SNS publish permissions and AWS Pricing API access. We should also give it permissions to write to CloudWatch Logs.
  • EventBridge rule with Lambda invoke permissions.

Follow the principle of least privilege (don't give more permissions than necessary) when configuring permissions.

Step 6: Deploy and Test the System

Let's deploy all components and test the system. Testing can be done by launching a new EC2 instance. We should be able to verify that the alert is received by looking at our Lambda function logs.

A small testing checklist:

  • Lambda function processes events without errors.
  • SNS delivers notifications to our configured subscribers. The easiest way is to set up an email subscription for ourselves.
  • Alert content includes relevant cost information (so we can actually take actions based on the alert).

Complete Architecture

Let's have a look at the complete architecture.

Detailed Architecture

The system works across multiple AWS accounts in an Organization. Centralized monitoring reduces management overhead. Cross-account event routing enables organization-wide cost control.

Benefits and Considerations

This architecture is a good starting point for proactive cost monitoring and avoiding unexpected cost explosions due to small mistakes or unawareness.

Benefits:

  • Immediate cost detection prevents budget overruns.
  • Low to zero operational overhead with serverless components.

Considerations:

  • Requires some development and configuration effort for the custom Lambda logic.

We can start with a simple filter set in EventBridge and Lambda and then gradually add more filters and thresholds as we go.

Reminder: we can disable regions and services we don't want to use via AWS Organizations. This way, we can restrict scopes significantly and sleep better at night. For example, if we're solely using a Serverless architecture, why not disabling EC2 and EBS completely?

Conclusion

This proactive monitoring solution provides immediate visibility, instead of a delayed one, into AWS cost changes.

The serverless architecture doesn't add any operations overhead. We set it up once and it will just work in the background, without adding much or any cost or operation overhead.

We can extend this pattern to monitor any CloudTrail-enabled AWS service. This approach directly addresses the billing cycle delay problem mentioned in the introduction.

AWS Lambda Infographic

AWS Lambda on One Page (No Fluff)

Skip the 300-page docs. Our Lambda cheat sheet covers everything from cold starts to concurrency limits - the stuff we actually use daily.

HD quality, print-friendly. Stick it next to your desk.

Privacy Policy
By entering your email, you are opting in for our twice-a-month AWS newsletter. Once in a while, we'll promote our paid products. We'll never send you spam or sell your data.