ClaimedAccountConcurrency: The Lambda Metric That Predicts Throttling

AWS Lambda concurrency monitoring overview with ClaimedAccountConcurrency

What is AWS Lambda?

AWS Lambda is a compute service that runs your code in response to events (API requests, queue messages, file uploads) without requiring you to manage servers. It scales automatically as traffic increases.

AWS Lambda on One Page (No Fluff)

Skip the 300-page docs. Our Lambda cheat sheet covers everything from cold starts to concurrency limits - the stuff we actually use daily.

HD quality, print-friendly. Stick it next to your desk.

By entering your email, you are opting in for our twice-a-month AWS newsletter. Once in a while, we'll promote our paid products. We'll never send you spam or sell your data.

The problem

When you scale serverless architectures, observability gets noisy fast. Dozens of metrics, dashboards per function, alarms on individual errors. Then one night at 3am you get an alert: regional concurrency limit reached. Requests are being dropped across the entire account. Which metric should you have been watching? What could you have done before it got to this point?

The answer is ClaimedAccountConcurrency: the one metric that reflects how much of your regional capacity is truly unavailable for new invocations. This guide shows you how to monitor it and set up an alarm that fires at 70% utilization, giving you time to react before throttling begins.

Topics:

How Lambda concurrency works (just enough to understand the metric choice)
Why ClaimedAccountConcurrency is the right metric to monitor
Setting up a CloudWatch alarm step by step
Deploying a dashboard to see the whole Region at a glance
Automating quota increases when the alarm fires

Important considerations before you start:

This guide includes an automated quota increase considering organic traffic growth. Examples where increasing the limit will not help:

Reserved concurrency over a limit increase. If you run several functions in the same Region and expect on-demand growth across them, use reserved concurrency to protect the critical ones first. It guarantees their slice of the pool so no other function can take it. Plan RC to expected traffic, since it also caps the function.
Runaway error loops. An erroring function. Raising the limit just gives it more room to fail. When the alarm fires, always check whether the consumer is healthy before requesting more capacity. If it is broken, cap it with reserved concurrency instead.
Async invocation silent delays. Synchronous invocations get throttled visibly. Asynchronous invocations (CloudFormation custom resources, S3 triggers, EventBridge) are queued for up to 6 hours. For these, check AsyncEventAge and set MaximumEventAgeInSeconds for critical async functions.

In every one of these cases, throttling works in your favor as a circuit breaker. Only raise the limit when the consumer is healthy and the traffic is legitimate.

For functions that must never be delayed, set a small reserved concurrency (e.g. 5) as a permanent guardrail regardless of monitoring. Reserved concurrency guarantees dedicated capacity that no other function can consume.

Note: This guide uses the AWS Console intentionally. While Infrastructure as Code (CloudFormation, CDK, Terraform) is more efficient, console-first instructions make the concepts easier to learn. Once you understand the mechanics, translating to IaC is straightforward. Ready-to-deploy CDK examples (a monitoring dashboard and the auto-increase automation) are available in the companion repo.

How Lambda concurrency works

Before building the alarm, you need to understand how Lambda allocates capacity and where the limits come from.

From AWS documentation:

Concurrency is the number of in-flight requests that your AWS Lambda function is handling at the same time.

Each execution environment handles one request at a time. These environments are secure, isolated sandboxes running on hardware-virtualized micro virtual machines (MicroVMs); each one manages the resources for your function and the lifecycle of its runtime and any extensions. If an environment is busy (during both the Init and Invoke phases), Lambda provisions another. When an environment finishes processing, it can serve the next request without re-initializing (warm start).

When multiple requests arrive simultaneously, Lambda spins up as many environments as needed. Draw a vertical line at any point in time and count the active environments. That number is your concurrency at that moment.

Lambda concurrency timeline showing cold and warm starts across requests over time

Follow the green lines. At t1 there are three active environments serving three concurrent requests. At t2 there are 5, so concurrency at that moment is 5. Requests 1 through 5 each ran their Init phase, so all five were cold starts, each consuming one unit of concurrency.

Between t3 and t4, requests 6, 7, and 8 skip Init. They reuse environments that started earlier, so they are warm starts. Around t4, request 9 needs a fresh environment (cold start), and request 10 reuses an existing one (warm start). Across the whole window, 6 requests needed new environments and 4 reused warm ones.

Concurrency is regional and shared

Concurrency is not per function. All Lambda functions in an account share the same concurrency pool, scoped to a single AWS Region.

By default, every account gets 1,000 concurrent executions per Region. This is a soft limit you can increase via Service Quotas. New accounts start lower; AWS raises the quota automatically as your usage grows. Lambda also always keeps 100 units of the pool available for functions without reserved concurrency, so no single function can claim every last unit.

Lambda also enforces a requests per second limit equal to 10x your concurrency limit (e.g. 10,000 RPS at 1,000 concurrency). You can be throttled by request rate even if concurrency is not maxed out.

Good to know: When your regional concurrency limit is hit, throttling affects everything using Lambda in that account and Region. Not just your API functions. SQS consumers, Kinesis stream pollers, DynamoDB Streams processors, EventBridge targets, scheduled jobs. Every team and every service relying on Lambda in that Region gets impacted simultaneously.

The limit is not the only ceiling: scaling rate

The account concurrency limit is one quota. There is a second: burst concurrency, the rate at which Lambda creates new execution environments.

In each Region, for each function, the scaling rate is 1,000 new execution environments every 10 seconds (equivalently, 10,000 requests per second every 10 seconds). It exists to protect against runaway over-scaling when traffic spikes.

The catch: you can hit this ceiling even when account concurrency is nowhere near full. A sharp spike gets throttled while the dashboard still shows plenty of available capacity. So watch both. ClaimedAccountConcurrency tells you how full the pool is; the scaling rate governs how fast you are allowed to fill it. See Lambda scaling behavior for details.

Why ClaimedAccountConcurrency is the right metric

Now that we know concurrency is shared and finite, the question becomes: which metric actually tells us how close we are to the limit?

Lambda exposes three concurrency metrics in CloudWatch:

Metric	What it measures
ConcurrentExecutions	Actively running invocations
UnreservedConcurrentExecutions	Invocations using the shared (unreserved) pool
ClaimedAccountConcurrency	Total concurrency unavailable for new on-demand invocations

Looking at these three metrics, the intuitive choice seems obvious: just track ConcurrentExecutions and you are done. Not quite.

Why ConcurrentExecutions is not enough

ConcurrentExecutions only counts what is actively running. It ignores concurrency that has been allocated through reserved or provisioned concurrency. That allocated capacity is blocked from other functions even when idle, so the real available capacity is lower than ConcurrentExecutions suggests.

What ClaimedAccountConcurrency captures

ClaimedAccountConcurrency = UnreservedConcurrentExecutions + Allocated Concurrency

Allocated concurrency is the sum of:

Reserved concurrency (RC) across all functions in the Region. Reserved concurrency sets both a guaranteed minimum and a hard maximum for a function. The function gets a dedicated slice of the pool, but it cannot exceed that amount or use unreserved capacity. No other function can use it, even when idle. No additional charge.
Provisioned concurrency (PC) across functions that do not have reserved concurrency. Provisioned concurrency pre-initializes environments to eliminate cold starts. These environments count against the pool even when not processing requests. Incurs additional charges.

Important: If a function has both RC and PC, Lambda only counts RC toward allocated concurrency (since RC is always >= PC for a given function). This avoids double-counting. In the example below, each function uses one or the other, so the values simply add up.

Try it yourself: In us-east-1, set a high reserved concurrency on one function (just below your limit), invoke a different function, then watch the metrics. ClaimedAccountConcurrency jumps within a few seconds. You can open a pre-filled CloudWatch metrics view with the expressions already wired up.

Scenario 1: small account, almost full

Configuration	Value
Account concurrency limit	10
Reserved concurrency (function A)	3
Reserved concurrency (function B)	3
Provisioned concurrency (function C, no RC)	2
Active executions (unreserved, function D)	1

ClaimedAccountConcurrency = 1 (unreserved) + 8 (allocated: 3 + 3 + 2) = 9. Only 1 unit of capacity remains for new invocations.

Scenario 1 breakdown: account limit 10, ClaimedAccountConcurrency = 9, only 1 unit of headroom left

Scenario 2: production account at steady state

Configuration	Value
Account concurrency limit	1,000
Reserved concurrency (function A)	400
Reserved concurrency (function B)	400
Provisioned concurrency (function C, no RC)	100
Active executions (unreserved, across functions D, E, F)	60

ClaimedAccountConcurrency = 60 (unreserved) + 900 (allocated: 400 + 400 + 100) = 960

Only 60 on-demand invocations are running, but 900 additional units are allocated. Total claimed is 960. Available for new invocations: 40.

Scenario 2 breakdown: account limit 1,000, 60 unreserved + 900 allocated = 960 claimed, 40 available

Scenario 3: traffic spike causes throttling

Using the same account from Scenario 2, a sudden spike hits. Functions G, H, and I receive 150 new concurrent requests on unreserved concurrency. At that point, only 40 units are available.

Available = Regional limit - ClaimedAccountConcurrency
Available = 1,000 - 960 = 40

Throttled = New requests - Available
Throttled = 150 - 40 = 110

Only 40 of the 150 requests can run immediately. The remaining 110 are throttled (visible in the Throttles metric).

Scenario 3 breakdown: 150 concurrent requests arrive on unreserved functions, 40 run immediately and 110 are throttled

For more worked examples, see the AWS docs on the reserved concurrency diagram and the provisioned plus reserved concurrency diagram.

This is why Lambda uses ClaimedAccountConcurrency, not ConcurrentExecutions, to determine whether capacity is available.

Setting up the CloudWatch alarm

With the right metric identified, let's wire it into a CloudWatch alarm that fires before throttling begins.

Step 1: Configure the metrics

Go to CloudWatch → All metrics
Click the Source tab
Paste the following JSON:

{
    "metrics": [
        [
            "AWS/Lambda",
            "ConcurrentExecutions",
            {
                "id": "m1",
                "yAxis": "left",
                "label": "ConcurrentExecutionsMetric",
                "visible": false
            }
        ],
        [
            {
                "expression": "SERVICE_QUOTA(m1)",
                "label": "Current Concurrent Limit",
                "id": "e1",
                "period": 60,
                "yAxis": "left",
                "color": "#9467bd"
            }
        ],
        [
            "AWS/Lambda",
            "ClaimedAccountConcurrency",
            {
                "id": "m2",
                "yAxis": "left",
                "color": "#ff7f0e"
            }
        ],
        [
            {
                "expression": "(m2/e1) * 100",
                "label": "% Claimed",
                "id": "e2",
                "period": 60,
                "yAxis": "left"
            }
        ],
        [
            {
                "expression": "e1 - m2",
                "label": "Available",
                "id": "e5",
                "period": 60,
                "yAxis": "left",
                "color": "#2ca02c"
            }
        ]
    ],
    "sparkline": false,
    "view": "pie",
    "stacked": false,
    "region": "us-east-1",
    "period": 60,
    "stat": "Maximum",
    "liveData": false,
    "labels": { "visible": true },
    "legend": { "position": "bottom" }
}

Click Update

What each metric does

ID	Type	Purpose
m1	Metric	ConcurrentExecutions, used as input for SERVICE_QUOTA(). Hidden from graph
e1	Expression	SERVICE_QUOTA(m1), dynamically fetches your actual regional concurrency limit
m2	Metric	ClaimedAccountConcurrency, the metric we want to monitor
e2	Expression	(m2/e1) x 100, utilization as a percentage
e5	Expression	e1 - m2, remaining available concurrency

Why SERVICE_QUOTA(m1) instead of hardcoding 1,000? The concurrency limit is a soft limit. If you request an increase, SERVICE_QUOTA() dynamically reflects your actual current limit. No need to update the alarm every time your quota changes.

Step 2: Verify the metrics

After pasting the JSON and clicking Update, you should see the metrics table populated with all five entries. The table shows each metric's ID, label, details (source metric or expression), statistic, and period:

CloudWatch metrics console with pie chart view and metrics table showing m1, e1, m2, e2, and e5

In the Pie view, select only ClaimedAccountConcurrency and Available (checkboxes on the left) to get an instant visual of how much of your concurrency pool is claimed vs. free.

Switch to the Line view and select all metrics to see the values over time. Hovering over the chart reveals the actual numbers at any point. Here, the tooltip shows a Current Concurrent Limit of 1,000, Available at 999, ClaimedAccountConcurrency at 1, and % Claimed at 0.1%:

CloudWatch metrics console with line chart view and tooltip showing current values

This confirms the metrics and expressions are working correctly before creating the alarm.

Step 3: Create the alarm

Click the bell icon next to the % Claimed expression (e2)
Configure the alarm condition:

Setting	Value	Why
Metric	% Claimed (e2)	The utilization percentage we calculated
Threshold type	Static	Fixed threshold value
Condition	Greater than 70	70% gives headroom before hitting the limit
Period	1 minute	Matches Lambda's metric emission granularity
Statistic	Maximum	Catches spikes (average would smooth them out)
Datapoints to alarm	1 out of 1	Triggers on the first breach

Step 4: Configure actions

Configure an SNS topic as the notification target. This can deliver alerts via:

Email
Slack (via AWS Chatbot or a Lambda-backed integration)
PagerDuty, Opsgenie, or any HTTP endpoint

Step 5: Name the alarm

Give the alarm a descriptive name and optionally add a Markdown description (rendered in the CloudWatch console):

CloudWatch alarm details showing name and description setup

Step 6: Review and create

Review the configuration and click Create alarm.

Alarm in action

Once active, the alarm graph shows your utilization over time:

Blue line → % Claimed utilization
Threshold → 70%
The alarm bar at the bottom transitions from OK (green) to In alarm (red) when the threshold is breached

CloudWatch alarm graph showing % Claimed with threshold greater than 70

See the whole Region at a glance: the dashboard

The alarm tells you when you cross 70%. It does not tell you what is eating the pool. When the page fires at 3am, you still have to dig through functions to find the culprit.

The companion repo ships a CDK app that deploys a CloudWatch dashboard for exactly this. One cdk deploy and you get the alarm above plus a single view of regional capacity.

The top of the dashboard shows alarm state, % Claimed, claimed vs. available, the utilization trend, and your top consumers, throttles, and errors:

Lambda concurrency dashboard showing alarm state, % Claimed, claimed vs available, trend, and top consumers

A sortable per-function table breaks down reserved and provisioned concurrency, peak concurrency, invocations, errors, and throttles. This is how you spot a function sitting on a large reserved slice it never uses. There is also a button to log a quota increase request straight from the dashboard, no trip to the Service Quotas console:

Lambda concurrency dashboard per-function table with reserved and provisioned concurrency, peak concurrency, invocations, errors, and throttles

A third panel focuses on provisioned concurrency utilization and spillover per function, with a short decision guide on when to reclaim, cap, or increase:

Lambda concurrency dashboard showing provisioned concurrency utilization and spillover with a reclaim, cap, or increase decision guide

Deploy it into the Region you want to watch:

cd dashboard
npm install
npx cdk bootstrap   # once per account/Region
npx cdk deploy

Pass an email to auto-subscribe it to the alarm topic (confirm the SNS subscription afterwards):

npx cdk deploy -c alertEmail=you@example.com

Then open CloudWatch → Dashboards → lambda-concurrency. The alarm here is notification-only. The SNS message includes a direct link back to the dashboard, so whoever gets paged lands on the full picture immediately.

Going further: automate limit increases

The alarm tells you there is a problem. The next step is fixing it automatically.

Instead of only alerting, you can add a Lambda function as a direct alarm action that automatically submits a Service Quotas increase request. The alarm triggers two independent actions:

SNS → notifies your team (email, Slack)
Lambda → requests a concurrency limit increase

CloudWatch Alarm --(ALARM state)--&gt; SNS Topic --(email / Slack)--&gt; Team
CloudWatch Alarm --(ALARM state)--&gt; Lambda --(request_service_quota_increase())--&gt; Service Quotas API

How often does the Lambda fire? CloudWatch Alarm actions trigger on state transitions, not continuously. The function is invoked once when the alarm transitions from OK to In alarm. It will not fire again while the alarm stays in ALARM state. If the alarm recovers to OK and breaches again, it fires once more. To keep repeated breaches from compounding increases, the function makes itself a one-shot: after submitting a request it sets its own reserved concurrency to 0, so the next alarm action is throttled until a human re-enables it.

Important caveat: Quota increases are not instant. AWS reviews and approves them manually, which can take hours or days. This automation gets the request submitted immediately, but it does not provide more capacity right away. Treat it as a proactive request mechanism, not a real-time scaling solution.

The Lambda function

Create a new Lambda function with the Python 3.14 runtime. It receives the alarm event directly, skips out if a request is already pending, submits a proportional increase, then disables itself:

import os
import boto3
import logging
import math

logger = logging.getLogger()
logger.setLevel(logging.INFO)

SERVICE_CODE = "lambda"
QUOTA_CODE = "L-B99A9384"  # Concurrent executions
INCREMENT_PERCENT = float(os.environ.get("INCREMENT_PERCENT", "0.10"))

quotas = boto3.client("service-quotas")
lambda_client = boto3.client("lambda")


def has_pending_request():
    paginator = quotas.get_paginator(
        "list_requested_service_quota_change_history_by_quota"
    )
    for page in paginator.paginate(ServiceCode=SERVICE_CODE, QuotaCode=QUOTA_CODE):
        for r in page.get("RequestedQuotas", []):
            if r["Status"] in ("PENDING", "CASE_OPENED"):
                return True
    return False


def throttle_self(function_name):
    """Set this function's reserved concurrency to 0 so future alarm
    invocations are throttled by Lambda itself. Re-enable with:
        aws lambda delete-function-concurrency --function-name <name>
    """
    lambda_client.put_function_concurrency(
        FunctionName=function_name,
        ReservedConcurrentExecutions=0,
    )
    logger.warning(
        f"Auto-increase DISABLED: set {function_name} RC=0. "
        f"A human must re-enable the function to allow future auto-increases."
    )


def lambda_handler(event, context):
    alarm_name = event.get("alarmData", {}).get("alarmName", "unknown")
    logger.info(f"Alarm triggered: {alarm_name}")

    if has_pending_request():
        logger.info("Skipping: a quota increase request is already pending")
        throttle_self(context.function_name)
        return {"status": "SKIPPED", "reason": "pending request exists"}

    current = quotas.get_service_quota(
        ServiceCode=SERVICE_CODE, QuotaCode=QUOTA_CODE
    )
    current_value = current["Quota"]["Value"]

    increment = math.ceil(current_value * INCREMENT_PERCENT)
    desired_value = current_value + increment

    response = quotas.request_service_quota_increase(
        ServiceCode=SERVICE_CODE,
        QuotaCode=QUOTA_CODE,
        DesiredValue=desired_value,
    )

    status = response["RequestedQuota"]["Status"]
    logger.info(
        f"Requested increase: {current_value} -> {desired_value} "
        f"(+{increment}, {INCREMENT_PERCENT * 100:.0f}%) | Status: {status}"
    )

    throttle_self(context.function_name)

    return {
        "current": current_value,
        "desired": desired_value,
        "increment": increment,
        "increment_percent": INCREMENT_PERCENT * 100,
        "status": status,
    }

Key points about this function:

L-B99A9384 is the quota code for Lambda concurrent executions
INCREMENT_PERCENT = 0.10 requests 10% more than the current limit (rounded up), so the bump scales with your account instead of a flat number. Override it with the INCREMENT_PERCENT environment variable
throttle_self() sets the function's own reserved concurrency to 0 after it runs. That makes it a one-shot: the next alarm action is throttled until you re-enable it with aws lambda delete-function-concurrency --function-name <name>. This is the guardrail against runaway auto-increases that quietly compound cost
has_pending_request() paginates the request history and skips submission if an increase is already PENDING or CASE_OPENED
SERVICE_QUOTA() in CloudWatch dynamically reflects the new limit after approval, so the alarm threshold adjusts automatically
The event comes directly from the CloudWatch Alarm action (not via SNS), so the alarm name is at event["alarmData"]["alarmName"]
No external dependencies. boto3 is included in the Lambda runtime

IAM permissions

Attach a policy to the function's execution role with the minimum required permissions. The lambda:PutFunctionConcurrency action is scoped to the function itself, which is all the self-disable step needs:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "servicequotas:GetServiceQuota",
                "servicequotas:RequestServiceQuotaIncrease",
                "servicequotas:ListRequestedServiceQuotaChangeHistoryByQuota"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "lambda:PutFunctionConcurrency",
            "Resource": "arn:aws:lambda:*:*:function:limit-increase-request"
        },
        {
            "Effect": "Allow",
            "Action": "iam:CreateServiceLinkedRole",
            "Resource": "arn:aws:iam::*:role/aws-service-role/servicequotas.amazonaws.com/*",
            "Condition": {
                "StringEquals": {
                    "iam:AWSServiceName": "servicequotas.amazonaws.com"
                }
            }
        }
    ]
}

Wiring it up in the console

Create the function: Go to Lambda → Create function → Author from scratch. Name it (e.g. limit-increase-request), select Python 3.14 as the runtime, and paste the code above.
Attach the IAM policy: Go to the function's Configuration → Permissions → click the execution role → Add permissions → Create inline policy → paste the JSON above.
Grant CloudWatch permission to invoke the function. CloudWatch Alarms use a specific service principal and need explicit permission on the Lambda function. Run:

aws lambda add-permission \
  --function-name "limit-increase-request" \
  --statement-id "AllowCloudWatchAlarmInvoke" \
  --action "lambda:InvokeFunction" \
  --principal "lambda.alarms.cloudwatch.amazonaws.com" \
  --source-arn "arn:aws:cloudwatch:<REGION>:<ACCOUNT_ID>:alarm:<ALARM_NAME>"

Replace <REGION>, <ACCOUNT_ID>, and <ALARM_NAME> with your values. Without this, CloudWatch will silently fail to invoke the function.

Add the Lambda alarm action: Go back to your CloudWatch alarm → Edit → Configure actions → Add Lambda action. Select In alarm as the trigger state and choose your function:

CloudWatch alarm showing a Lambda action triggered on In alarm state

The SNS action you configured in Step 4 stays as-is for notifications. The Lambda action runs independently alongside it.

Verify the quota request and support case

After a successful invocation, go to Service Quotas → Recent quota increase requests and confirm a new request appears for AWS Lambda / Concurrent executions with a status like Case Opened:

Service Quotas showing recent quota increase requests with Lambda concurrent executions case opened

Click the support case ID to open the case details page and confirm the request metadata (subject, status, category, and creation time):

AWS Support case details for Lambda quota increase request

That covers the full setup. When concurrency crosses 70%, the alarm fires. Your team gets notified via SNS. The Lambda function requests a limit increase, then disables itself so the next breach waits for a human decision. AWS opens a support case for quota review. Monitoring becomes proactive instead of reactive.

Key takeaways

Concurrency = number of execution environments active at the same time
Concurrency is regional and shared across all functions in the same account and region
ConcurrentExecutions only shows active invocations. It misses reserved and provisioned capacity
ClaimedAccountConcurrency reflects real capacity usage, which is what Lambda uses to determine availability
A separate burst scaling rate (1,000 new environments per 10 seconds) can throttle sudden spikes even when the pool still has room
SERVICE_QUOTA() dynamically fetches your actual limit. Do not hardcode it
Set alarms at 70% to give yourself time to react before throttling
Quota increases are not instant. Combine the automated request with alerting so your team can also take manual action if needed

References: AWS Lambda: Understanding function scaling · Lambda scaling behavior · Monitoring concurrency · Companion repo (dashboard + auto-increase CDK)