AWS Fundamentals Logo
AWS Fundamentals

AWS Amazon Bedrock Service Limits

1125 quotas for Amazon Bedrock. 410 can be increased.

QuotaDefaultStatus
Minimum number of records per batch inference job for GLM 4.7 Flash

Minimum number of records per batch inference job for GLM 4.7 Flash

general
100
count
Fixed
On-demand model inference tokens per minute for Amazon Titan Image Generator G1

On-demand model inference tokens per minute for Amazon Titan Image Generator G1

general
2,000
count
Fixed
Records per batch inference job for Claude Opus 4.5

Records per batch inference job for Claude Opus 4.5

general
100,000
count
Adjustable
On-demand model inference requests per minute for Mistral Large 3

On-demand model inference requests per minute for Mistral Large 3

throughput
10,000
count
Fixed
(Model customization) Custom models per account

(Model customization) Custom models per account

general
100
count
Adjustable
Cross-region model inference tokens per minute for Anthropic Claude Opus 4.6 V1

Cross-region model inference tokens per minute for Anthropic Claude Opus 4.6 V1

general
3,000,000
count
Adjustable
On-demand model inference tokens per minute for Mistral Large 3

On-demand model inference tokens per minute for Mistral Large 3

general
100,000,000
count
Fixed
On-demand model inference tokens per minute for Cohere Command R Plus

On-demand model inference tokens per minute for Cohere Command R Plus

general
300,000
count
Fixed
On-demand model inference requests per minute for Meta Llama 2 Chat 70B

On-demand model inference requests per minute for Meta Llama 2 Chat 70B

throughput
400
count
Fixed
Model invocation max tokens per day for Nemotron Nano 3 30B (doubled for cross-region calls)

Model invocation max tokens per day for Nemotron Nano 3 30B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
(Model customization) Sum of training and validation records for a Claude 3-5-Haiku v1 Fine-tuning job

(Model customization) Sum of training and validation records for a Claude 3-5-Haiku v1 Fine-tuning job

general
10,000
count
Adjustable
Batch inference job size (in GB) for Gemma 3 12B

Batch inference job size (in GB) for Gemma 3 12B

storage
5
count
Fixed
Model invocation max tokens per day for Z.ai GLM-4.7 (doubled for cross-region calls)

Model invocation max tokens per day for Z.ai GLM-4.7 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference job size (in GB) for Claude Sonnet 4.5

Batch inference job size (in GB) for Claude Sonnet 4.5

storage
5
count
Fixed
Global cross-region model inference requests per minute for Anthropic Claude Sonnet 4.6

Global cross-region model inference requests per minute for Anthropic Claude Sonnet 4.6

throughput
10,000
count
Adjustable
(Model customization) Total number of custom model deployments

(Model customization) Total number of custom model deployments

general
10
count
Adjustable
ListAgentVersions requests per second

ListAgentVersions requests per second

throughput
10
count
Fixed
Records per batch inference job for Llama 3.1 405B Instruct

Records per batch inference job for Llama 3.1 405B Instruct

general
100,000
count
Adjustable
Cross-region model inference requests per minute for Anthropic Claude Haiku 4.5

Cross-region model inference requests per minute for Anthropic Claude Haiku 4.5

throughput
10,000
count
Adjustable
On-demand model inference tokens per minute for Amazon Titan Text Express

On-demand model inference tokens per minute for Amazon Titan Text Express

general
300,000
count
Fixed
Cross-region model inference requests per minute for Amazon Nova Lite

Cross-region model inference requests per minute for Amazon Nova Lite

throughput
4,000
count
Fixed
Model units per provisioned model for Anthropic Claude 3 Haiku 200K

Model units per provisioned model for Anthropic Claude 3 Haiku 200K

general
0
count
Adjustable
Model invocation max tokens per day for Gemma 3 4B (doubled for cross-region calls)

Model invocation max tokens per day for Gemma 3 4B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.7 Sonnet

Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.7 Sonnet

general
100
count
Adjustable
DisassociateAgentKnowledgeBase requests per second

DisassociateAgentKnowledgeBase requests per second

throughput
4
count
Fixed
Records per batch inference job for Llama 3.2 1B Instruct

Records per batch inference job for Llama 3.2 1B Instruct

general
100,000
count
Adjustable
(Flows) Conditions per condition node

(Flows) Conditions per condition node

capacity
5
count
Fixed
Cross-region model inference requests per minute for Anthropic Claude 3.7 Sonnet V1

Cross-region model inference requests per minute for Anthropic Claude 3.7 Sonnet V1

throughput
250
count
Fixed
On-demand model inference requests per minute for AI21 Labs Jamba Instruct

On-demand model inference requests per minute for AI21 Labs Jamba Instruct

throughput
100
count
Fixed
Batch inference input file size (in GB) for OpenAI GPT OSS 20b

Batch inference input file size (in GB) for OpenAI GPT OSS 20b

storage
1
count
Fixed
Model units no-commitment Provisioned Throughputs across base models

Model units no-commitment Provisioned Throughputs across base models

general
0
count
Adjustable
Batch inference job size (in GB) for OpenAI GPT OSS 120b

Batch inference job size (in GB) for OpenAI GPT OSS 120b

storage
5
count
Fixed
Records per input file per batch inference job for Nova 2 Lite

Records per input file per batch inference job for Nova 2 Lite

storage
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Magistral Small 2509

Sum of in-progress and submitted batch inference jobs using a base model for Magistral Small 2509

general
100
count
Adjustable
Global cross-region model inference tokens per day for Amazon Nova 2 Pro Preview

Global cross-region model inference tokens per day for Amazon Nova 2 Pro Preview

general
1,440,000,000
count
Fixed
(Flows) DeleteFlowVersion requests per second

(Flows) DeleteFlowVersion requests per second

throughput
2
count
Fixed
(Advanced Prompt Optimization) Active jobs per account

(Advanced Prompt Optimization) Active jobs per account

general
20
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.1 405B Instruct

Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.1 405B Instruct

general
100
count
Adjustable
Batch inference input file size (in GB) for Claude 3.5 Sonnet v2

Batch inference input file size (in GB) for Claude 3.5 Sonnet v2

storage
1
count
Fixed
On-demand model inference requests per minute for NVIDIA Nemotron Nano 2

On-demand model inference requests per minute for NVIDIA Nemotron Nano 2

throughput
10,000
count
Fixed
Batch inference input file size (in GB) for Claude Sonnet 4

Batch inference input file size (in GB) for Claude Sonnet 4

storage
1
count
Adjustable
Records per input file per batch inference job for Llama 4 Maverick

Records per input file per batch inference job for Llama 4 Maverick

storage
100,000
count
Adjustable
(Data Automation) Maximum number of Blueprints per Start Inference request (Audios)

(Data Automation) Maximum number of Blueprints per Start Inference request (Audios)

throughput
1
count
Fixed
Cross-region model inference tokens per minute for Meta Llama 4 Maverick V1

Cross-region model inference tokens per minute for Meta Llama 4 Maverick V1

general
600,000
count
Adjustable
Minimum number of records per batch inference job for Claude 3 Opus

Minimum number of records per batch inference job for Claude 3 Opus

general
100
count
Fixed
Throttle rate limit for GetDataAutomationProject

Throttle rate limit for GetDataAutomationProject

throughput
5
count
Fixed
Batch inference input file size (in GB) for Llama 3.1 8B Instruct

Batch inference input file size (in GB) for Llama 3.1 8B Instruct

storage
1
count
Fixed
(Model customization) Sum of training and validation records for a Claude 3 Haiku v1 Fine-tuning job

(Model customization) Sum of training and validation records for a Claude 3 Haiku v1 Fine-tuning job

general
10,000
count
Adjustable
Global cross-region model inference requests per minute for Amazon Nova 2 Lite

Global cross-region model inference requests per minute for Amazon Nova 2 Lite

throughput
2,000
count
Adjustable
Batch inference job size (in GB) for OpenAI GPT OSS Safeguard 20b

Batch inference job size (in GB) for OpenAI GPT OSS Safeguard 20b

storage
5
count
Fixed
On-demand model inference tokens per minute for Amazon Titan Text Premier

On-demand model inference tokens per minute for Amazon Titan Text Premier

general
300,000
count
Fixed
Batch inference job size (in GB) for Llama 4 Maverick

Batch inference job size (in GB) for Llama 4 Maverick

storage
5
count
Fixed
Model invocation max tokens per day for Minimax M2 (doubled for cross-region calls)

Model invocation max tokens per day for Minimax M2 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Cross-region model inference tokens per minute for Anthropic Claude Opus 4.7

Cross-region model inference tokens per minute for Anthropic Claude Opus 4.7

general
15,000,000
count
Adjustable
On-demand model inference requests per minute for GPT OSS Safeguard 20B

On-demand model inference requests per minute for GPT OSS Safeguard 20B

throughput
10,000
count
Fixed
Batch inference job size (in GB) for Llama 3.2 3B Instruct

Batch inference job size (in GB) for Llama 3.2 3B Instruct

storage
5
count
Fixed
Records per input file per batch inference job for Qwen3 Coder Next

Records per input file per batch inference job for Qwen3 Coder Next

storage
100,000
count
Adjustable
On-demand model inference requests per minute for Meta Llama 3 8B Instruct

On-demand model inference requests per minute for Meta Llama 3 8B Instruct

throughput
800
count
Fixed
On-demand model inference requests per minute for Minimax M2

On-demand model inference requests per minute for Minimax M2

throughput
10,000
count
Fixed
On-demand model inference requests per minute for DeepSeek V3.2

On-demand model inference requests per minute for DeepSeek V3.2

throughput
10,000
count
Fixed
(Knowledge Bases) RetrieveAndGenerate requests per second

(Knowledge Bases) RetrieveAndGenerate requests per second

throughput
20
count
Fixed
Records per input file per batch inference job for Llama 3.2 90B Instruct

Records per input file per batch inference job for Llama 3.2 90B Instruct

storage
100,000
count
Adjustable
Batch inference job size (in GB) for Llama 3.1 8B Instruct

Batch inference job size (in GB) for Llama 3.1 8B Instruct

storage
5
count
Fixed
On-demand model inference requests per minute for AI21 Labs Jamba 1.5 Large

On-demand model inference requests per minute for AI21 Labs Jamba 1.5 Large

throughput
100
count
Fixed
Model invocation max tokens per day for Ministral 3B 3.0 (doubled for cross-region calls)

Model invocation max tokens per day for Ministral 3B 3.0 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
(Model customization) Sum of on demand custom model deployment requests per minute for Amazon Nova Micro

(Model customization) Sum of on demand custom model deployment requests per minute for Amazon Nova Micro

throughput
2,000
count
Fixed
Characters in Agent instructions

Characters in Agent instructions

general
20,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for DeepSeek V3.2

Sum of in-progress and submitted batch inference jobs using a base model for DeepSeek V3.2

general
100
count
Adjustable
Cross-region model inference requests per minute for Anthropic Claude Opus 4 V1

Cross-region model inference requests per minute for Anthropic Claude Opus 4 V1

throughput
200
count
Fixed
Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1 1M Context Length

Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1 1M Context Length

general
1,000,000
count
Adjustable
Minimum number of records per batch inference job for Ministral 3B

Minimum number of records per batch inference job for Ministral 3B

general
100
count
Fixed
Model units per provisioned model for Amazon Titan Text Premier V1 32K

Model units per provisioned model for Amazon Titan Text Premier V1 32K

general
0
count
Adjustable
GetAgentActionGroup requests per second

GetAgentActionGroup requests per second

throughput
20
count
Fixed
Model invocation max tokens per day for Anthropic Claude Sonnet 4.5 V1 1M Context Length (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Sonnet 4.5 V1 1M Context Length (doubled for cross-region calls)

general
720,000,000
count
Fixed
Global cross-region model inference tokens per day for Anthropic Claude Haiku 4.5

Global cross-region model inference tokens per day for Anthropic Claude Haiku 4.5

general
7,200,000,000
count
Fixed
Custom models with a creating status per account

Custom models with a creating status per account

general
2
count
Adjustable
Minimum number of records per batch inference job for Claude Sonnet 4.6

Minimum number of records per batch inference job for Claude Sonnet 4.6

general
100
count
Fixed
Model invocation max tokens per day for Voxtral Mini 1.0 (doubled for cross-region calls)

Model invocation max tokens per day for Voxtral Mini 1.0 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference job size (in GB) for Ministral 3 8B

Batch inference job size (in GB) for Ministral 3 8B

storage
5
count
Fixed
Model invocation max tokens per day for Amazon Nova Pro (doubled for cross-region calls)

Model invocation max tokens per day for Amazon Nova Pro (doubled for cross-region calls)

general
1,440,000,000
count
Fixed
On-demand model inference tokens per minute for Cohere Embed English

On-demand model inference tokens per minute for Cohere Embed English

general
300,000
count
Fixed
Model invocation max tokens per day for Anthropic Claude Opus 4.5 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Opus 4.5 (doubled for cross-region calls)

general
1,440,000,000
count
Fixed
Global cross-region model inference tokens per day for Anthropic Claude Opus 4.6 V1

Global cross-region model inference tokens per day for Anthropic Claude Opus 4.6 V1

general
4,320,000,000
count
Fixed
On-demand InvokeModel concurrent requests for Amazon Nova Reel1.0

On-demand InvokeModel concurrent requests for Amazon Nova Reel1.0

compute
10
count
Fixed
Model units per provisioned model for Amazon Titan Text Embeddings V2

Model units per provisioned model for Amazon Titan Text Embeddings V2

general
0
count
Adjustable
On-demand model inference tokens per minute for Meta Llama 3.2 3B Instruct

On-demand model inference tokens per minute for Meta Llama 3.2 3B Instruct

general
300,000
count
Fixed
Model units per provisioned model for the 300k context length variant for Amazon Nova Lite

Model units per provisioned model for the 300k context length variant for Amazon Nova Lite

general
0
count
Adjustable
Batch inference input file size (in GB) for Claude Opus 4.5

Batch inference input file size (in GB) for Claude Opus 4.5

storage
1
count
Fixed
On-demand model inference tokens per minute for Z.ai GLM-4.7

On-demand model inference tokens per minute for Z.ai GLM-4.7

general
100,000,000
count
Fixed
Batch inference input file size (in GB) for Titan Text Embeddings V2

Batch inference input file size (in GB) for Titan Text Embeddings V2

storage
1
count
Fixed
(Data Automation) InvokeDataAutomationAsync - Audio - Max number of concurrent jobs

(Data Automation) InvokeDataAutomationAsync - Audio - Max number of concurrent jobs

compute
20
count
Adjustable
Batch inference job size (in GB) for Nova Pro V1

Batch inference job size (in GB) for Nova Pro V1

storage
100
count
Fixed
Batch inference input file size (in GB) for Claude 3 Opus

Batch inference input file size (in GB) for Claude 3 Opus

storage
1
count
Fixed
Batch inference input file size (in GB) for Mistral Large 2 (24.07)

Batch inference input file size (in GB) for Mistral Large 2 (24.07)

storage
1
count
Fixed
(Knowledge Bases) ListKnowledgeBases requests per second

(Knowledge Bases) ListKnowledgeBases requests per second

throughput
10
count
Fixed
(Model customization) Minimum number of prompts for distillation customization jobs

(Model customization) Minimum number of prompts for distillation customization jobs

general
100
count
Fixed
(Automated Reasoning) ListAutomatedReasoningPolicyBuildWorkflows requests per second

(Automated Reasoning) ListAutomatedReasoningPolicyBuildWorkflows requests per second

throughput
5
count
Adjustable
Model invocation max tokens per day for Z.ai GLM-4.7 Flash (doubled for cross-region calls)

Model invocation max tokens per day for Z.ai GLM-4.7 Flash (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Records per input file per batch inference job for OpenAI GPT OSS Safeguard 120b

Records per input file per batch inference job for OpenAI GPT OSS Safeguard 120b

storage
100,000
count
Adjustable
Records per batch inference job for Nova Lite V1

Records per batch inference job for Nova Lite V1

general
100,000
count
Adjustable
Global cross-region model inference tokens per day for Anthropic Claude Sonnet 4 V1

Global cross-region model inference tokens per day for Anthropic Claude Sonnet 4 V1

general
288,000,000
count
Fixed
Records per input file per batch inference job for Claude Sonnet 4

Records per input file per batch inference job for Claude Sonnet 4

storage
100,000
count
Adjustable
On-demand model inference requests per minute for Anthropic Claude 3 Opus

On-demand model inference requests per minute for Anthropic Claude 3 Opus

throughput
50
count
Fixed
On-demand model inference requests per minute for Anthropic Claude 3.5 Sonnet

On-demand model inference requests per minute for Anthropic Claude 3.5 Sonnet

throughput
50
count
Fixed
(Knowledge Bases) DeleteKnowledgeBase requests per second

(Knowledge Bases) DeleteKnowledgeBase requests per second

throughput
2
count
Fixed
Cross-region model inference tokens per minute for Amazon Nova Micro

Cross-region model inference tokens per minute for Amazon Nova Micro

general
8,000,000
count
Adjustable
(Evaluation) Number of prompts in a custom prompt dataset

(Evaluation) Number of prompts in a custom prompt dataset

general
1,000
count
Fixed
On-demand model inference requests per minute for Amazon Titan Text Lite

On-demand model inference requests per minute for Amazon Titan Text Lite

throughput
800
count
Fixed
Records per batch inference job for Qwen3 Next 80B

Records per batch inference job for Qwen3 Next 80B

general
100,000
count
Adjustable
On-demand model inference requests per minute for Stable Image Creative Upscale

On-demand model inference requests per minute for Stable Image Creative Upscale

throughput
2
count
Fixed
Batch inference input file size (in GB) for Ministral 3 8B

Batch inference input file size (in GB) for Ministral 3 8B

storage
1
count
Fixed
On-demand model inference requests per minute for NVIDIA Nemotron 3 Super 120B A12B

On-demand model inference requests per minute for NVIDIA Nemotron 3 Super 120B A12B

throughput
10,000
count
Fixed
(Flows) GetFlow requests per second

(Flows) GetFlow requests per second

throughput
10
count
Fixed
Batch inference job size (in GB) for Amazon Nova Premier

Batch inference job size (in GB) for Amazon Nova Premier

storage
5
count
Fixed
Batch inference job size (in GB) for Llama 3.2 11B Instruct

Batch inference job size (in GB) for Llama 3.2 11B Instruct

storage
5
count
Fixed
Model invocation max tokens per day for GPT OSS Safeguard 20B (doubled for cross-region calls)

Model invocation max tokens per day for GPT OSS Safeguard 20B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 3B Instruct

Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 3B Instruct

general
100
count
Adjustable
Batch inference job size (in GB) for GLM 4.7 Flash

Batch inference job size (in GB) for GLM 4.7 Flash

storage
5
count
Fixed
(Model customization) Sum of training and validation records for a Titan Multimodal Embeddings G1 v1 Fine-tuning job

(Model customization) Sum of training and validation records for a Titan Multimodal Embeddings G1 v1 Fine-tuning job

general
50,000
count
Adjustable
Enabled action groups per agent

Enabled action groups per agent

general
15
count
Adjustable
Records per batch inference job for Writer Palmyra Vision 7B

Records per batch inference job for Writer Palmyra Vision 7B

general
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Llama 4 Scout

Sum of in-progress and submitted batch inference jobs using a base model for Llama 4 Scout

general
100
count
Adjustable
(Evaluation) Number of models in a model evaluation job that uses human workers

(Evaluation) Number of models in a model evaluation job that uses human workers

general
2
count
Fixed
Cross-region model inference requests per minute for Anthropic Claude Sonnet 4.5 V1 1M Context Length

Cross-region model inference requests per minute for Anthropic Claude Sonnet 4.5 V1 1M Context Length

throughput
1,000
count
Adjustable
Model units per provisioned model for Meta Llama 3 70B Instruct

Model units per provisioned model for Meta Llama 3 70B Instruct

general
0
count
Adjustable
(Flows) DeleteFlow requests per second

(Flows) DeleteFlow requests per second

throughput
2
count
Fixed
Records per input file per batch inference job for GLM 5

Records per input file per batch inference job for GLM 5

storage
100,000
count
Adjustable
On-demand model inference tokens per minute for Z.ai GLM 5

On-demand model inference tokens per minute for Z.ai GLM 5

general
100,000,000
count
Fixed
(Model customization) Sum of on demand custom model deployment requests per minute for Amazon Nova Lite

(Model customization) Sum of on demand custom model deployment requests per minute for Amazon Nova Lite

throughput
2,000
count
Fixed
Minimum number of records per batch inference job for OpenAI GPT OSS Safeguard 120b

Minimum number of records per batch inference job for OpenAI GPT OSS Safeguard 120b

general
100
count
Fixed
Batch inference input file size (in GB) for NVIDIA Nemotron Nano 12B

Batch inference input file size (in GB) for NVIDIA Nemotron Nano 12B

storage
1
count
Fixed
Batch inference job size (in GB) for Writer Palmyra Vision 7B

Batch inference job size (in GB) for Writer Palmyra Vision 7B

storage
5
count
Fixed
(Knowledge Bases) Maximum number of files for Foundation Models as a parser

(Knowledge Bases) Maximum number of files for Foundation Models as a parser

storage
1,000
count
Fixed
(Knowledge Bases) Concurrent IngestKnowledgeBaseDocuments and DeleteKnowledgeBaseDocuments requests per account

(Knowledge Bases) Concurrent IngestKnowledgeBaseDocuments and DeleteKnowledgeBaseDocuments requests per account

compute
10
count
Fixed
On-demand model inference tokens per minute for AI21 Labs Jurassic-2 Ultra

On-demand model inference tokens per minute for AI21 Labs Jurassic-2 Ultra

general
300,000
count
Fixed
Model invocation max tokens per day for NVIDIA Nemotron Nano 2 (doubled for cross-region calls)

Model invocation max tokens per day for NVIDIA Nemotron Nano 2 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Cross-region model inference tokens per minute for Meta Llama 3.1 70B Instruct

Cross-region model inference tokens per minute for Meta Llama 3.1 70B Instruct

general
600,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for GLM 4.7 Flash

Sum of in-progress and submitted batch inference jobs using a base model for GLM 4.7 Flash

general
100
count
Adjustable
Batch inference job size (in GB) for Mistral Large 3

Batch inference job size (in GB) for Mistral Large 3

storage
5
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Llama 4 Maverick

Sum of in-progress and submitted batch inference jobs using a base model for Llama 4 Maverick

general
100
count
Adjustable
Model invocation max tokens per day for Mistral AI Mistral Small (doubled for cross-region calls)

Model invocation max tokens per day for Mistral AI Mistral Small (doubled for cross-region calls)

general
432,000,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a custom model for Titan Multimodal Embeddings G1

Sum of in-progress and submitted batch inference jobs using a custom model for Titan Multimodal Embeddings G1

general
3
count
Fixed
Cross-region model inference requests per minute for Anthropic Claude Opus 4.5

Cross-region model inference requests per minute for Anthropic Claude Opus 4.5

throughput
10,000
count
Adjustable
Cross-region model inference requests per minute for Mistral Pixtral Large 25.02 V1

Cross-region model inference requests per minute for Mistral Pixtral Large 25.02 V1

throughput
10
count
Fixed
Batch inference job size (in GB) for NVIDIA Nemotron Nano 3 30B

Batch inference job size (in GB) for NVIDIA Nemotron Nano 3 30B

storage
5
count
Fixed
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 200K

Model units per provisioned model for Anthropic Claude 3.5 Sonnet 200K

general
0
count
Adjustable
Model units per provisioned model for Amazon Nova 2 Lite V1.0 256K

Model units per provisioned model for Amazon Nova 2 Lite V1.0 256K

general
0
count
Adjustable
Model units per provisioned model for Anthropic Claude V2 100K

Model units per provisioned model for Anthropic Claude V2 100K

general
0
count
Adjustable
(Model customization) Sum of on demand custom model deployment tokens per minute for Amazon Nova Micro

(Model customization) Sum of on demand custom model deployment tokens per minute for Amazon Nova Micro

general
4,000,000
count
Fixed
(Automated Reasoning) GetAutomatedReasoningPolicyTestResult requests per second

(Automated Reasoning) GetAutomatedReasoningPolicyTestResult requests per second

throughput
10
count
Adjustable
On-demand model inference requests per minute for Anthropic Claude 3.5 Haiku

On-demand model inference requests per minute for Anthropic Claude 3.5 Haiku

throughput
1,000
count
Fixed
Cross-region model inference tokens per minute for Meta Llama 3.2 1B Instruct

Cross-region model inference tokens per minute for Meta Llama 3.2 1B Instruct

general
600,000
count
Adjustable
Records per input file per batch inference job for GLM 4.7

Records per input file per batch inference job for GLM 4.7

storage
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Voxtral Small 24B 2507

Sum of in-progress and submitted batch inference jobs using a base model for Voxtral Small 24B 2507

general
100
count
Adjustable
On-demand model inference requests per minute for Amazon Titan Text Embeddings

On-demand model inference requests per minute for Amazon Titan Text Embeddings

throughput
2,000
count
Fixed
(Flows) Agent nodes per flow

(Flows) Agent nodes per flow

capacity
20
count
Fixed
(Knowledge Bases) Data sources per knowledge base

(Knowledge Bases) Data sources per knowledge base

general
5
count
Fixed
On-demand model inference requests per minute for Meta Llama 3.1 70B Instruct

On-demand model inference requests per minute for Meta Llama 3.1 70B Instruct

throughput
400
count
Fixed
(Automated Reasoning) GetAutomatedReasoningPolicy requests per second

(Automated Reasoning) GetAutomatedReasoningPolicy requests per second

throughput
10
count
Adjustable
Model units per provisioned model for Anthropic Claude 3.5 Haiku 16K

Model units per provisioned model for Anthropic Claude 3.5 Haiku 16K

general
0
count
Adjustable
(Automated Reasoning) ExportAutomatedReasoningPolicyVersion requests per second

(Automated Reasoning) ExportAutomatedReasoningPolicyVersion requests per second

throughput
5
count
Adjustable
Model invocation max tokens per day for NVIDIA Nemotron Nano 2 VL (doubled for cross-region calls)

Model invocation max tokens per day for NVIDIA Nemotron Nano 2 VL (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Cross-region model inference tokens per minute for Anthropic Claude Opus 4.5

Cross-region model inference tokens per minute for Anthropic Claude Opus 4.5

general
2,000,000
count
Adjustable
Global cross-region model inference tokens per day for Cohere Embed V4

Global cross-region model inference tokens per day for Cohere Embed V4

general
432,000,000
count
Fixed
Records per batch inference job for GLM 5

Records per batch inference job for GLM 5

general
100,000
count
Adjustable
On-demand model inference requests per minute for Ministral 3B 3.0

On-demand model inference requests per minute for Ministral 3B 3.0

throughput
10,000
count
Fixed
Cross-Region model inference tokens per minute for Anthropic Claude 3.5 Sonnet V2

Cross-Region model inference tokens per minute for Anthropic Claude 3.5 Sonnet V2

general
800,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Nova Lite V1

Sum of in-progress and submitted batch inference jobs using a base model for Nova Lite V1

general
100
count
Adjustable
Cross-region model inference requests per minute for Stable Image Fast Upscale

Cross-region model inference requests per minute for Stable Image Fast Upscale

throughput
20
count
Fixed
Batch inference input file size (in GB) for Magistral Small 2509

Batch inference input file size (in GB) for Magistral Small 2509

storage
1
count
Fixed
On-demand model inference requests per minute for Stable Image Control Sketch

On-demand model inference requests per minute for Stable Image Control Sketch

throughput
10
count
Fixed
Records per batch inference job for Claude 3.5 Sonnet

Records per batch inference job for Claude 3.5 Sonnet

general
100,000
count
Adjustable
Global cross-region model inference tokens per minute for Amazon Nova 2 Omni

Global cross-region model inference tokens per minute for Amazon Nova 2 Omni

general
8,000,000
count
Adjustable
On-demand model inference tokens per minute for Amazon Nova Micro

On-demand model inference tokens per minute for Amazon Nova Micro

general
4,000,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 Next 80B

Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 Next 80B

general
100
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for NVIDIA Nemotron Nano 9B

Sum of in-progress and submitted batch inference jobs using a base model for NVIDIA Nemotron Nano 9B

general
100
count
Adjustable
On-demand InvokeModel concurrent requests for Amazon Nova 2 Sonic

On-demand InvokeModel concurrent requests for Amazon Nova 2 Sonic

compute
20
count
Fixed
On-demand model inference tokens per minute for AI21 Labs Jamba Instruct

On-demand model inference tokens per minute for AI21 Labs Jamba Instruct

general
300,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 Coder Next

Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 Coder Next

general
100
count
Adjustable
Cross-region model inference requests per minute for Stable Image Search and Recolor

Cross-region model inference requests per minute for Stable Image Search and Recolor

throughput
20
count
Fixed
On-demand model inference requests per minute for Amazon Nova Canvas

On-demand model inference requests per minute for Amazon Nova Canvas

throughput
100
count
Fixed
Model invocation max tokens per day for Amazon Nova 2 Pro Preview (doubled for cross-region calls)

Model invocation max tokens per day for Amazon Nova 2 Pro Preview (doubled for cross-region calls)

general
720,000,000
count
Fixed
On-demand model inference requests per minute for Amazon Titan Text Premier

On-demand model inference requests per minute for Amazon Titan Text Premier

throughput
100
count
Fixed
Minimum number of records per batch inference job for Llama 3.3 70B Instruct

Minimum number of records per batch inference job for Llama 3.3 70B Instruct

general
100
count
Fixed
Records per input file per batch inference job for Nova Pro V1

Records per input file per batch inference job for Nova Pro V1

storage
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Mistral Small

Sum of in-progress and submitted batch inference jobs using a base model for Mistral Small

general
100
count
Adjustable
(Knowledge Bases) GetKnowledgeBase requests per second

(Knowledge Bases) GetKnowledgeBase requests per second

throughput
10
count
Fixed
Records per input file per batch inference job for Qwen3 Coder 30B

Records per input file per batch inference job for Qwen3 Coder 30B

storage
100,000
count
Adjustable
Batch inference input file size (in GB) for Amazon Nova 2 Multimodal Embeddings V1

Batch inference input file size (in GB) for Amazon Nova 2 Multimodal Embeddings V1

storage
1
count
Fixed
Cross-region model inference tokens per minute for Amazon Nova 2 Pro Preview

Cross-region model inference tokens per minute for Amazon Nova 2 Pro Preview

general
1,000,000
count
Adjustable
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 18K

Model units per provisioned model for Anthropic Claude 3.5 Sonnet 18K

general
0
count
Adjustable
Records per batch inference job for Gemma 3 4B

Records per batch inference job for Gemma 3 4B

general
100,000
count
Adjustable
Minimum number of records per batch inference job for Nova 2 Lite

Minimum number of records per batch inference job for Nova 2 Lite

general
100
count
Fixed
Minimum number of records per batch inference job for GLM 4.7

Minimum number of records per batch inference job for GLM 4.7

general
100
count
Fixed
On-demand model inference tokens per minute for Minimax M2.1

On-demand model inference tokens per minute for Minimax M2.1

general
100,000,000
count
Fixed
Records per input file per batch inference job for Llama 3.1 405B Instruct

Records per input file per batch inference job for Llama 3.1 405B Instruct

storage
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Titan Text Embeddings V2

Sum of in-progress and submitted batch inference jobs using a base model for Titan Text Embeddings V2

general
100
count
Adjustable
(Guardrails) On-demand ApplyGuardrail Content filter policy text units per second

(Guardrails) On-demand ApplyGuardrail Content filter policy text units per second

identity
200
count
Adjustable
Model invocation max tokens per day for Meta Llama 3.2 1B Instruct (doubled for cross-region calls)

Model invocation max tokens per day for Meta Llama 3.2 1B Instruct (doubled for cross-region calls)

general
432,000,000
count
Fixed
Records per input file per batch inference job for Mistral Large 2 (24.07)

Records per input file per batch inference job for Mistral Large 2 (24.07)

storage
100,000
count
Adjustable
Batch inference input file size (in GB) for Gemma 3 27B

Batch inference input file size (in GB) for Gemma 3 27B

storage
1
count
Fixed
Records per input file per batch inference job for Voxtral Small 24B 2507

Records per input file per batch inference job for Voxtral Small 24B 2507

storage
100,000
count
Adjustable
Records per input file per batch inference job for Ministral 3 14B

Records per input file per batch inference job for Ministral 3 14B

storage
100,000
count
Adjustable
(Knowledge Bases) GetIngestionJob requests per second

(Knowledge Bases) GetIngestionJob requests per second

throughput
10
count
Fixed
Model invocation max tokens per day for Meta Llama 4 Maverick V1 (doubled for cross-region calls)

Model invocation max tokens per day for Meta Llama 4 Maverick V1 (doubled for cross-region calls)

general
432,000,000
count
Fixed
(Prompt management) ListPrompts requests per second

(Prompt management) ListPrompts requests per second

throughput
10
count
Fixed
On-demand model inference requests per minute for OpenAI GPT OSS 20B

On-demand model inference requests per minute for OpenAI GPT OSS 20B

throughput
10,000
count
Fixed
Batch inference job size (in GB) for Mistral Small

Batch inference job size (in GB) for Mistral Small

storage
5
count
Fixed
Sum of in-progress and submitted batch inference jobs using a custom model for Titan Text Embeddings V2

Sum of in-progress and submitted batch inference jobs using a custom model for Titan Text Embeddings V2

general
3
count
Fixed
Batch inference input file size (in GB) for Voxtral Mini 3B 2507

Batch inference input file size (in GB) for Voxtral Mini 3B 2507

storage
1
count
Fixed
On-demand model inference requests per minute for Minimax M2.1

On-demand model inference requests per minute for Minimax M2.1

throughput
10,000
count
Fixed
(Model customization) Sum of training and validation records for a Titan Text G1 - Lite v1 Continued Pre-Training job

(Model customization) Sum of training and validation records for a Titan Text G1 - Lite v1 Continued Pre-Training job

general
100,000
count
Adjustable
Batch inference job size (in GB) for Ministral 3B

Batch inference job size (in GB) for Ministral 3B

storage
5
count
Fixed
Cross-region model inference requests per minute for Twelve Labs Pegasus

Cross-region model inference requests per minute for Twelve Labs Pegasus

throughput
120
count
Adjustable
On-demand model inference requests per minute for Kimi K2 Thinking

On-demand model inference requests per minute for Kimi K2 Thinking

throughput
10,000
count
Fixed
(Model customization) Maximum student model fine tuning context length for Amazon Nova V1 distillation customization jobs

(Model customization) Maximum student model fine tuning context length for Amazon Nova V1 distillation customization jobs

general
32,000
count
Fixed
Cross-region model inference tokens per minute for Anthropic Claude Haiku 4.5

Cross-region model inference tokens per minute for Anthropic Claude Haiku 4.5

general
5,000,000
count
Adjustable
Records per input file per batch inference job for OpenAI GPT OSS Safeguard 20b

Records per input file per batch inference job for OpenAI GPT OSS Safeguard 20b

storage
100,000
count
Adjustable
Model invocation max tokens per day for Anthropic Claude Opus 4.6 V1 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Opus 4.6 V1 (doubled for cross-region calls)

general
2,160,000,000
count
Fixed
(Data Automation) InvokeDataAutomationAsync - Document - Max number of concurrent jobs

(Data Automation) InvokeDataAutomationAsync - Document - Max number of concurrent jobs

compute
25
count
Adjustable
(Data Automation) Maximum number of Blueprints per Start Inference request (Documents)

(Data Automation) Maximum number of Blueprints per Start Inference request (Documents)

throughput
10
count
Fixed
On-demand model inference requests per minute for Magistral Small 1.2

On-demand model inference requests per minute for Magistral Small 1.2

throughput
10,000
count
Fixed
Batch inference input file size (in GB) for Qwen3 Coder 30B

Batch inference input file size (in GB) for Qwen3 Coder 30B

storage
1
count
Fixed
Records per batch inference job for MiniMax M2.5

Records per batch inference job for MiniMax M2.5

general
100,000
count
Adjustable
(Automated Reasoning) Annotations in policy

(Automated Reasoning) Annotations in policy

identity
10
count
Fixed
Minimum number of records per batch inference job for Llama 3.2 3B Instruct

Minimum number of records per batch inference job for Llama 3.2 3B Instruct

general
100
count
Fixed
Records per batch inference job for Qwen3 32B

Records per batch inference job for Qwen3 32B

general
100,000
count
Adjustable
(Flows) ListFlowAliases requests per second

(Flows) ListFlowAliases requests per second

throughput
10
count
Fixed
Cross-region model inference tokens per minute for Amazon Nova Premier V1

Cross-region model inference tokens per minute for Amazon Nova Premier V1

general
2,000,000
count
Adjustable
(Guardrails) Word length in characters

(Guardrails) Word length in characters

general
100
count
Fixed
Records per input file per batch inference job for Kimi K2 Thinking

Records per input file per batch inference job for Kimi K2 Thinking

storage
100,000
count
Adjustable
On-demand model inference tokens per minute for Amazon Titan Text Embeddings

On-demand model inference tokens per minute for Amazon Titan Text Embeddings

general
300,000
count
Fixed
Records per batch inference job for MiniMax M2

Records per batch inference job for MiniMax M2

general
100,000
count
Adjustable
Global cross-region model inference tokens per minute for Anthropic Claude Haiku 4.5

Global cross-region model inference tokens per minute for Anthropic Claude Haiku 4.5

general
5,000,000
count
Adjustable
Batch inference input file size (in GB) for Ministral 3 14B

Batch inference input file size (in GB) for Ministral 3 14B

storage
1
count
Fixed
Minimum number of records per batch inference job for Ministral 3 8B

Minimum number of records per batch inference job for Ministral 3 8B

general
100
count
Fixed
Cross-region model inference requests per minute for Anthropic Claude 3.5 Sonnet

Cross-region model inference requests per minute for Anthropic Claude 3.5 Sonnet

throughput
100
count
Fixed
(Flows) ListFlowVersions requests per second

(Flows) ListFlowVersions requests per second

throughput
10
count
Fixed
Model units per provisioned model for Anthropic Claude V2.1 200K

Model units per provisioned model for Anthropic Claude V2.1 200K

general
0
count
Adjustable
(Knowledge Bases) GetKnowledgeBaseDocuments requests per second

(Knowledge Bases) GetKnowledgeBaseDocuments requests per second

throughput
5
count
Fixed
Batch inference input file size (in GB) for Claude 3 Sonnet

Batch inference input file size (in GB) for Claude 3 Sonnet

storage
1
count
Fixed
Records per input file per batch inference job for Claude 3.5 Sonnet v2

Records per input file per batch inference job for Claude 3.5 Sonnet v2

storage
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Voxtral Mini 3B 2507

Sum of in-progress and submitted batch inference jobs using a base model for Voxtral Mini 3B 2507

general
100
count
Adjustable
Cross-region model inference requests per minute for Anthropic Claude 3 Opus

Cross-region model inference requests per minute for Anthropic Claude 3 Opus

throughput
100
count
Fixed
Throttle rate limit for UpdateBlueprint

Throttle rate limit for UpdateBlueprint

throughput
5
count
Fixed
On-Demand, latency-optimized model inference tokens per minute for Amazon Nova Pro V1

On-Demand, latency-optimized model inference tokens per minute for Amazon Nova Pro V1

general
40,000
count
Fixed
On-demand model inference tokens per minute for Qwen3 Next 80B A3B

On-demand model inference tokens per minute for Qwen3 Next 80B A3B

general
100,000,000
count
Fixed
Global cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1 1M Context Length

Global cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1 1M Context Length

general
1,000,000
count
Adjustable
Batch inference job size (in GB) for Claude 3 Opus

Batch inference job size (in GB) for Claude 3 Opus

storage
5
count
Fixed
(Automated Reasoning) Source document size (MB)

(Automated Reasoning) Source document size (MB)

storage
5
count
Fixed
Records per input file per batch inference job for Kimi K2.5

Records per input file per batch inference job for Kimi K2.5

storage
100,000
count
Adjustable
Records per input file per batch inference job for Magistral Small 2509

Records per input file per batch inference job for Magistral Small 2509

storage
100,000
count
Adjustable
(Data Automation) Maximum audio length (Minutes)

(Data Automation) Maximum audio length (Minutes)

general
240
count
Fixed
(Data Automation) InvokeBlueprintOptimizationAsync - Max number of blueprint optimization jobs per day

(Data Automation) InvokeBlueprintOptimizationAsync - Max number of blueprint optimization jobs per day

general
30
count
Fixed
On-demand model inference tokens per minute for Cohere Command R

On-demand model inference tokens per minute for Cohere Command R

general
300,000
count
Fixed
On-demand model inference tokens per minute for Writer Palmyra Vision 7B

On-demand model inference tokens per minute for Writer Palmyra Vision 7B

general
100,000,000
count
Fixed
Model invocation max tokens per day for Amazon Nova 2 Lite (doubled for cross-region calls)

Model invocation max tokens per day for Amazon Nova 2 Lite (doubled for cross-region calls)

general
5,760,000,000
count
Fixed
Records per batch inference job for MiniMax M2.1

Records per batch inference job for MiniMax M2.1

general
100,000
count
Adjustable
Records per batch inference job for Nova Micro V1

Records per batch inference job for Nova Micro V1

general
100,000
count
Adjustable
Records per batch inference job for Llama 3.1 8B Instruct

Records per batch inference job for Llama 3.1 8B Instruct

general
100,000
count
Adjustable
(Evaluation) Number of concurrent automatic model evaluation jobs

(Evaluation) Number of concurrent automatic model evaluation jobs

compute
20
count
Fixed
On-demand model inference requests per minute for Mistral AI Mistral Small

On-demand model inference requests per minute for Mistral AI Mistral Small

throughput
400
count
Fixed
(Prompt management) CreatePromptVersion requests per second

(Prompt management) CreatePromptVersion requests per second

throughput
2
count
Fixed
(Prompt management) Versions per prompt

(Prompt management) Versions per prompt

general
10
count
Fixed
Concurrent model import jobs

Concurrent model import jobs

compute
1
count
Fixed
Global cross-region model inference requests per minute for Cohere Embed V4

Global cross-region model inference requests per minute for Cohere Embed V4

throughput
2,000
count
Adjustable
Records per batch inference job for GLM 4.7 Flash

Records per batch inference job for GLM 4.7 Flash

general
100,000
count
Adjustable
(Model customization) Scheduled customization jobs

(Model customization) Scheduled customization jobs

general
10
count
Fixed
Model invocation max tokens per day for MiniMax M2.5 (doubled for cross-region calls)

Model invocation max tokens per day for MiniMax M2.5 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
(Knowledge Bases) UpdateDataSource requests per second

(Knowledge Bases) UpdateDataSource requests per second

throughput
2
count
Fixed
(Model customization) Maximum number of prompts for distillation customization jobs

(Model customization) Maximum number of prompts for distillation customization jobs

general
15,000
count
Fixed
Cross-region model inference requests per minute for Stable Image Outpaint

Cross-region model inference requests per minute for Stable Image Outpaint

throughput
4
count
Fixed
Records per batch inference job for Claude 3.7 Sonnet

Records per batch inference job for Claude 3.7 Sonnet

general
100,000
count
Adjustable
Number of custom prompt routers per account

Number of custom prompt routers per account

general
500
count
Fixed
Batch inference input file size (in GB) for Claude Haiku 4.5

Batch inference input file size (in GB) for Claude Haiku 4.5

storage
1
count
Fixed
On-demand model inference tokens per minute for Meta Llama 3.1 70B Instruct

On-demand model inference tokens per minute for Meta Llama 3.1 70B Instruct

general
300,000
count
Fixed
On-demand model inference requests per minute for Stable Image Search and Recolor

On-demand model inference requests per minute for Stable Image Search and Recolor

throughput
10
count
Fixed
On-demand model inference tokens per minute for Z.ai GLM-4.7 Flash

On-demand model inference tokens per minute for Z.ai GLM-4.7 Flash

general
100,000,000
count
Fixed
Minimum number of records per batch inference job for Llama 3.2 11B Instruct

Minimum number of records per batch inference job for Llama 3.2 11B Instruct

general
100
count
Fixed
(Model customization) Sum of training and validation records for a Amazon Nova Micro Fine-tuning job

(Model customization) Sum of training and validation records for a Amazon Nova Micro Fine-tuning job

general
20,000
count
Adjustable
Batch inference input file size (in GB) for Gemma 3 4B

Batch inference input file size (in GB) for Gemma 3 4B

storage
1
count
Fixed
(Data Automation) CreateBlueprintVersion - Max number of Blueprint versions per Blueprint

(Data Automation) CreateBlueprintVersion - Max number of Blueprint versions per Blueprint

general
10
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3 Haiku

Sum of in-progress and submitted batch inference jobs using a base model for Claude 3 Haiku

general
100
count
Adjustable
Records per batch inference job for NVIDIA Nemotron Nano 9B

Records per batch inference job for NVIDIA Nemotron Nano 9B

general
100,000
count
Adjustable
Model invocation max tokens per day for Mistral Devstral 2 123b (doubled for cross-region calls)

Model invocation max tokens per day for Mistral Devstral 2 123b (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference input file size (in GB) for Devstral 2 123B

Batch inference input file size (in GB) for Devstral 2 123B

storage
1
count
Fixed
Records per batch inference job for Qwen3 VL 235B

Records per batch inference job for Qwen3 VL 235B

general
100,000
count
Adjustable
Cross-region model inference tokens per minute for Mistral Pixtral Large 25.02 V1

Cross-region model inference tokens per minute for Mistral Pixtral Large 25.02 V1

general
80,000
count
Adjustable
(Data Automation) Maximum instruction field length for Audio Blueprint - (Characters)

(Data Automation) Maximum instruction field length for Audio Blueprint - (Characters)

general
500
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Kimi K2 Thinking

Sum of in-progress and submitted batch inference jobs using a base model for Kimi K2 Thinking

general
100
count
Adjustable
Cross-region model inference requests per minute for Stable Image Style Transfer

Cross-region model inference requests per minute for Stable Image Style Transfer

throughput
20
count
Fixed
Cross-Region model inference requests per minute for Anthropic Claude 3.5 Sonnet V2

Cross-Region model inference requests per minute for Anthropic Claude 3.5 Sonnet V2

throughput
100
count
Fixed
Records per batch inference job for Claude Sonnet 4.5

Records per batch inference job for Claude Sonnet 4.5

general
100,000
count
Adjustable
Model invocation max tokens per day for Anthropic Claude 3.5 Haiku (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude 3.5 Haiku (doubled for cross-region calls)

general
2,880,000,000
count
Fixed
Batch inference input file size (in GB) for Qwen3 VL 235B

Batch inference input file size (in GB) for Qwen3 VL 235B

storage
1
count
Fixed
Batch inference input file size (in GB) for Nova 2 Lite

Batch inference input file size (in GB) for Nova 2 Lite

storage
1
count
Fixed
Batch inference input file size (in GB) for Voxtral Small 24B 2507

Batch inference input file size (in GB) for Voxtral Small 24B 2507

storage
1
count
Fixed
(Knowledge Bases) Knowledge bases per account

(Knowledge Bases) Knowledge bases per account

general
100
count
Fixed
(Automated Reasoning) GetAutomatedReasoningPolicyNextScenario requests per second

(Automated Reasoning) GetAutomatedReasoningPolicyNextScenario requests per second

throughput
10
count
Adjustable
ListAgentAliases requests per second

ListAgentAliases requests per second

throughput
10
count
Fixed
Minimum number of records per batch inference job for Nova Pro V1

Minimum number of records per batch inference job for Nova Pro V1

general
100
count
Fixed
Records per input file per batch inference job for NVIDIA Nemotron Nano 3 30B

Records per input file per batch inference job for NVIDIA Nemotron Nano 3 30B

storage
100,000
count
Adjustable
Batch inference job size (in GB) for Llama 3.3 70B Instruct

Batch inference job size (in GB) for Llama 3.3 70B Instruct

storage
5
count
Fixed
Records per input file per batch inference job for Llama 3.2 1B Instruct

Records per input file per batch inference job for Llama 3.2 1B Instruct

storage
100,000
count
Adjustable
Global cross-region model inference tokens per day for Anthropic Claude Sonnet 4.5 V1

Global cross-region model inference tokens per day for Anthropic Claude Sonnet 4.5 V1

general
7,200,000,000
count
Fixed
Records per input file per batch inference job for Claude 3.5 Sonnet

Records per input file per batch inference job for Claude 3.5 Sonnet

storage
100,000
count
Adjustable
Cross-region model inference requests per minute for Anthropic Claude Sonnet 4 V1

Cross-region model inference requests per minute for Anthropic Claude Sonnet 4 V1

throughput
200
count
Adjustable
Global cross-region model inference requests per minute for Anthropic Claude Sonnet 4 V1

Global cross-region model inference requests per minute for Anthropic Claude Sonnet 4 V1

throughput
200
count
Adjustable
Cross-region model inference tokens per minute for Anthropic Claude Opus 4.1

Cross-region model inference tokens per minute for Anthropic Claude Opus 4.1

general
500,000
count
Adjustable
Minimum number of records per batch inference job for Llama 4 Maverick

Minimum number of records per batch inference job for Llama 4 Maverick

general
100
count
Fixed
Records per input file per batch inference job for Claude 3.5 Haiku

Records per input file per batch inference job for Claude 3.5 Haiku

storage
100,000
count
Adjustable
(Knowledge Bases) Concurrent ingestion jobs per account

(Knowledge Bases) Concurrent ingestion jobs per account

compute
5
count
Fixed
(Guardrails) Words per word policy

(Guardrails) Words per word policy

identity
10,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Nova Pro V1

Sum of in-progress and submitted batch inference jobs using a base model for Nova Pro V1

general
100
count
Adjustable
Model units per provisioned model for Amazon Titan Image Generator G2

Model units per provisioned model for Amazon Titan Image Generator G2

general
0
count
Adjustable
Batch inference job size (in GB) for Mistral Large 2 (24.07)

Batch inference job size (in GB) for Mistral Large 2 (24.07)

storage
5
count
Fixed
Model units per provisioned model for AI21 Labs Jurassic-2 Ultra

Model units per provisioned model for AI21 Labs Jurassic-2 Ultra

general
0
count
Adjustable
Model invocation max tokens per day for Mistral AI Mistral 7B Instruct (doubled for cross-region calls)

Model invocation max tokens per day for Mistral AI Mistral 7B Instruct (doubled for cross-region calls)

general
432,000,000
count
Fixed
Records per batch inference job for Claude 3 Haiku

Records per batch inference job for Claude 3 Haiku

general
100,000
count
Adjustable
(Model customization) Maximum input file size for distillation customization jobs

(Model customization) Maximum input file size for distillation customization jobs

storage
2
Gigabytes
Fixed
(Evaluation) Number of evaluation jobs

(Evaluation) Number of evaluation jobs

general
5,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Sonnet v2

Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Sonnet v2

general
100
count
Adjustable
On-demand model inference tokens per minute for Meta Llama 2 Chat 70B

On-demand model inference tokens per minute for Meta Llama 2 Chat 70B

general
300,000
count
Fixed
Records per input file per batch inference job for Titan Multimodal Embeddings G1

Records per input file per batch inference job for Titan Multimodal Embeddings G1

storage
100,000
count
Adjustable
PrepareAgent requests per second

PrepareAgent requests per second

throughput
2
count
Fixed
Cross-region model inference requests per minute for Meta Llama 4 Maverick V1

Cross-region model inference requests per minute for Meta Llama 4 Maverick V1

throughput
800
count
Fixed
On-demand model inference tokens per minute for Anthropic Claude 3.5 Sonnet

On-demand model inference tokens per minute for Anthropic Claude 3.5 Sonnet

general
400,000
count
Fixed
On-demand model inference requests per minute for Moonshot AI Kimi K2.5

On-demand model inference requests per minute for Moonshot AI Kimi K2.5

throughput
10,000
count
Fixed
Throttle rate limit for CreateDataAutomationProject

Throttle rate limit for CreateDataAutomationProject

throughput
5
count
Fixed
(Data Automation) InvokeDataAutomationAsync - Max number of open jobs

(Data Automation) InvokeDataAutomationAsync - Max number of open jobs

general
1,800
count
Fixed
Model units, with commitment, for Provisioned Throughout created for Meta Maverick 4 Scout 17B Instruct 1M

Model units, with commitment, for Provisioned Throughout created for Meta Maverick 4 Scout 17B Instruct 1M

general
0
count
Adjustable
Records per batch inference job for Kimi K2.5

Records per batch inference job for Kimi K2.5

general
100,000
count
Adjustable
Cross-region model inference requests per minute for Amazon Nova Premier V1

Cross-region model inference requests per minute for Amazon Nova Premier V1

throughput
500
count
Fixed
Model units per provisioned model for Meta Llama 3 8B Instruct

Model units per provisioned model for Meta Llama 3 8B Instruct

general
0
count
Adjustable
Records per batch inference job for Llama 3.2 11B Instruct

Records per batch inference job for Llama 3.2 11B Instruct

general
100,000
count
Adjustable
On-demand model inference tokens per minute for Meta Llama 3 8B Instruct

On-demand model inference tokens per minute for Meta Llama 3 8B Instruct

general
300,000
count
Fixed
Batch inference job size (in GB) for Llama 3.1 405B Instruct

Batch inference job size (in GB) for Llama 3.1 405B Instruct

storage
5
count
Fixed
Minimum number of records per batch inference job for Gemma 3 4B

Minimum number of records per batch inference job for Gemma 3 4B

general
100
count
Fixed
Throttle rate limit for ListDataAutomationProjects

Throttle rate limit for ListDataAutomationProjects

throughput
5
count
Fixed
(Flows) UpdateFlowAlias requests per second

(Flows) UpdateFlowAlias requests per second

throughput
2
count
Fixed
Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4 V1 1M Context Length

Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4 V1 1M Context Length

general
1,000,000
count
Adjustable
(Model customization) Maximum number of training records for an Amazon Nova Canvas Fine-tuning job

(Model customization) Maximum number of training records for an Amazon Nova Canvas Fine-tuning job

general
10,000
count
Adjustable
(Model customization) Sum of training and validation records for a Meta Llama 2 70B v1 Fine-tuning job

(Model customization) Sum of training and validation records for a Meta Llama 2 70B v1 Fine-tuning job

general
10,000
count
Adjustable
On-demand model inference tokens per minute for MiniMax M2.5

On-demand model inference tokens per minute for MiniMax M2.5

general
100,000,000
count
Fixed
On-demand model inference requests per minute for Stable Image Conservative Upscale

On-demand model inference requests per minute for Stable Image Conservative Upscale

throughput
2
count
Fixed
Batch inference job size (in GB) for NVIDIA Nemotron Nano 12B

Batch inference job size (in GB) for NVIDIA Nemotron Nano 12B

storage
5
count
Fixed
Records per input file per batch inference job for Qwen3 VL 235B

Records per input file per batch inference job for Qwen3 VL 235B

storage
100,000
count
Adjustable
Records per batch inference job for Nova 2 Lite

Records per batch inference job for Nova 2 Lite

general
100,000
count
Adjustable
Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4 V1

Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4 V1

general
200,000
count
Adjustable
Model units per provisioned model for Anthropic Claude 3.5 Haiku 64K

Model units per provisioned model for Anthropic Claude 3.5 Haiku 64K

general
0
count
Adjustable
Minimum number of records per batch inference job for Claude Opus 4.6

Minimum number of records per batch inference job for Claude Opus 4.6

general
100
count
Fixed
GetAgent requests per second

GetAgent requests per second

throughput
15
count
Fixed
Records per batch inference job for DeepSeek V3.2

Records per batch inference job for DeepSeek V3.2

general
100,000
count
Adjustable
ListAgents requests per second

ListAgents requests per second

throughput
10
count
Fixed
Batch inference input file size (in GB) for Llama 3.2 3B Instruct

Batch inference input file size (in GB) for Llama 3.2 3B Instruct

storage
1
count
Fixed
(Data Automation) Minimum Audio Sample Rate (Hz)

(Data Automation) Minimum Audio Sample Rate (Hz)

throughput
8,000
count
Fixed
Model invocation max tokens per day for Amazon Nova Premier V1 (doubled for cross-region calls)

Model invocation max tokens per day for Amazon Nova Premier V1 (doubled for cross-region calls)

general
1,440,000,000
count
Fixed
On-demand latency-optimized model inference requests per minute for Anthropic Claude 3.5 Haiku

On-demand latency-optimized model inference requests per minute for Anthropic Claude 3.5 Haiku

throughput
100
count
Fixed
Model units per provisioned model for Anthropic Claude V2 18K

Model units per provisioned model for Anthropic Claude V2 18K

general
0
count
Adjustable
Records per batch inference job for Qwen3 Coder Next

Records per batch inference job for Qwen3 Coder Next

general
100,000
count
Adjustable
Records per batch inference job for Kimi K2 Thinking

Records per batch inference job for Kimi K2 Thinking

general
100,000
count
Adjustable
(Model customization) Sum of training and validation records for a Meta Llama 2 13B v1 Fine-tuning job

(Model customization) Sum of training and validation records for a Meta Llama 2 13B v1 Fine-tuning job

general
10,000
count
Adjustable
Model invocation max tokens per day for AI21 Labs Jamba 1.5 Mini (doubled for cross-region calls)

Model invocation max tokens per day for AI21 Labs Jamba 1.5 Mini (doubled for cross-region calls)

general
432,000,000
count
Fixed
Batch inference input file size (in GB) for Claude 3.5 Sonnet

Batch inference input file size (in GB) for Claude 3.5 Sonnet

storage
1
count
Fixed
Batch inference job size (in GB) for MiniMax M2.1

Batch inference job size (in GB) for MiniMax M2.1

storage
5
count
Fixed
Cross-region model inference requests per minute for Meta Llama 4 Scout V1

Cross-region model inference requests per minute for Meta Llama 4 Scout V1

throughput
800
count
Fixed
Cross-region model inference requests per minute for Anthropic Claude Opus 4.1

Cross-region model inference requests per minute for Anthropic Claude Opus 4.1

throughput
50
count
Fixed
(Model customization) Sum of on demand custom model deployment requests per minute for Amazon Nova Pro

(Model customization) Sum of on demand custom model deployment requests per minute for Amazon Nova Pro

throughput
200
count
Fixed
Minimum number of records per batch inference job for GLM 5

Minimum number of records per batch inference job for GLM 5

general
100
count
Fixed
On-demand latency-optimized model inference tokens per minute for Anthropic Claude 3.5 Haiku

On-demand latency-optimized model inference tokens per minute for Anthropic Claude 3.5 Haiku

general
500,000
count
Fixed
On-demand InvokeModel concurrent requests for Twelve Labs Marengo

On-demand InvokeModel concurrent requests for Twelve Labs Marengo

compute
30
count
Fixed
Batch inference job size (in GB) for Qwen3 Next 80B

Batch inference job size (in GB) for Qwen3 Next 80B

storage
5
count
Fixed
Batch inference input file size (in GB) for Mistral Small

Batch inference input file size (in GB) for Mistral Small

storage
1
count
Fixed
On-demand model inference requests per minute for Amazon Nova Pro

On-demand model inference requests per minute for Amazon Nova Pro

throughput
250
count
Fixed
Records per batch inference job for Llama 3.3 70B Instruct

Records per batch inference job for Llama 3.3 70B Instruct

general
100,000
count
Adjustable
On-demand model inference requests per minute for Anthropic Claude 3.5 Sonnet V2

On-demand model inference requests per minute for Anthropic Claude 3.5 Sonnet V2

throughput
50
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Amazon Nova 2 Multimodal Embeddings V1

Sum of in-progress and submitted batch inference jobs using a base model for Amazon Nova 2 Multimodal Embeddings V1

general
100
count
Adjustable
Records per input file per batch inference job for Mistral Large 3

Records per input file per batch inference job for Mistral Large 3

storage
100,000
count
Adjustable
Records per input file per batch inference job for DeepSeek V3.2

Records per input file per batch inference job for DeepSeek V3.2

storage
100,000
count
Adjustable
Model invocation max tokens per day for Mistral AI Mixtral 8X7B Instruct (doubled for cross-region calls)

Model invocation max tokens per day for Mistral AI Mixtral 8X7B Instruct (doubled for cross-region calls)

general
432,000,000
count
Fixed
Batch inference job size (in GB) for Claude 3.5 Sonnet

Batch inference job size (in GB) for Claude 3.5 Sonnet

storage
5
count
Fixed
Records per input file per batch inference job for Amazon Nova Premier

Records per input file per batch inference job for Amazon Nova Premier

storage
100,000
count
Adjustable
Model invocation max tokens per day for Anthropic Claude Haiku 4.5 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Haiku 4.5 (doubled for cross-region calls)

general
3,600,000,000
count
Fixed
(Guardrails) Automated Reasoning policies per guardrail

(Guardrails) Automated Reasoning policies per guardrail

general
2
count
Fixed
Global cross-region model inference tokens per minute for Anthropic Claude Opus 4.6 V1

Global cross-region model inference tokens per minute for Anthropic Claude Opus 4.6 V1

general
3,000,000
count
Adjustable
Minimum number of records per batch inference job for Devstral 2 123B

Minimum number of records per batch inference job for Devstral 2 123B

general
100
count
Fixed
Records per input file per batch inference job for Claude 3 Sonnet

Records per input file per batch inference job for Claude 3 Sonnet

storage
100,000
count
Adjustable
UpdateAgent requests per second

UpdateAgent requests per second

throughput
4
count
Fixed
(Guardrails) On-demand ApplyGuardrail Denied topic policy text units per second (standard)

(Guardrails) On-demand ApplyGuardrail Denied topic policy text units per second (standard)

identity
200
count
Adjustable
(Data Automation) InvokeDataAutomation(Sync) - Image - Max number of requests

(Data Automation) InvokeDataAutomation(Sync) - Image - Max number of requests

throughput
200
count
Adjustable
(Flows) ListFlows requests per second

(Flows) ListFlows requests per second

throughput
10
count
Fixed
Model invocation max tokens per day for Meta Llama 3.2 90B Instruct (doubled for cross-region calls)

Model invocation max tokens per day for Meta Llama 3.2 90B Instruct (doubled for cross-region calls)

general
432,000,000
count
Fixed
On-demand model inference requests per minute for AI21 Labs Jamba 1.5 Mini

On-demand model inference requests per minute for AI21 Labs Jamba 1.5 Mini

throughput
100
count
Fixed
Cross-region model inference requests per minute for Meta Llama 3.3 70B Instruct

Cross-region model inference requests per minute for Meta Llama 3.3 70B Instruct

throughput
800
count
Fixed
Batch inference input file size (in GB) for Titan Multimodal Embeddings G1

Batch inference input file size (in GB) for Titan Multimodal Embeddings G1

storage
1
count
Fixed
(Advanced Prompt Optimization) Inactive jobs per account

(Advanced Prompt Optimization) Inactive jobs per account

general
5,000
count
Fixed
(Automated Reasoning) UpdateAutomatedReasoningPolicyAnnotations requests per second

(Automated Reasoning) UpdateAutomatedReasoningPolicyAnnotations requests per second

throughput
5
count
Adjustable
Associated knowledge bases per Agent

Associated knowledge bases per Agent

general
2
count
Adjustable
(Automated Reasoning) StartAutomatedReasoningPolicyTestWorkflow requests per second

(Automated Reasoning) StartAutomatedReasoningPolicyTestWorkflow requests per second

throughput
1
count
Adjustable
(Guardrails) Contextual grounding response length in text units

(Guardrails) Contextual grounding response length in text units

general
5
count
Fixed
Minimum number of records per batch inference job for Nova Micro V1

Minimum number of records per batch inference job for Nova Micro V1

general
100
count
Fixed
Cross-region model inference tokens per minute for Anthropic Claude 3.7 Sonnet V1

Cross-region model inference tokens per minute for Anthropic Claude 3.7 Sonnet V1

general
1,000,000
count
Adjustable
Batch inference job size (in GB) for NVIDIA Nemotron 3 Super 120B A12B

Batch inference job size (in GB) for NVIDIA Nemotron 3 Super 120B A12B

storage
5
count
Fixed
Cross-region model inference requests per minute for Anthropic Claude Sonnet 4.5 V1

Cross-region model inference requests per minute for Anthropic Claude Sonnet 4.5 V1

throughput
10,000
count
Adjustable
On-demand model inference requests per minute for Amazon Titan Text Embeddings V2

On-demand model inference requests per minute for Amazon Titan Text Embeddings V2

throughput
6,000
count
Fixed
(Knowledge Bases) GetDataSource requests per second

(Knowledge Bases) GetDataSource requests per second

throughput
10
count
Fixed
(Model customization) Sum of training and validation records for a Titan Image Generator G1 V2 Fine-tuning job

(Model customization) Sum of training and validation records for a Titan Image Generator G1 V2 Fine-tuning job

general
10,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Nova 2 Lite

Sum of in-progress and submitted batch inference jobs using a base model for Nova 2 Lite

general
100
count
Adjustable
(Model customization) Sum of on demand custom model deployment requests per minute for Amazon Nova 2 Lite

(Model customization) Sum of on demand custom model deployment requests per minute for Amazon Nova 2 Lite

throughput
2,000
count
Fixed
Minimum number of records per batch inference job for Qwen3 32B

Minimum number of records per batch inference job for Qwen3 32B

general
100
count
Fixed
(Model customization) Sum of training and validation records for a Amazon Nova Pro Fine-tuning job

(Model customization) Sum of training and validation records for a Amazon Nova Pro Fine-tuning job

general
20,000
count
Adjustable
Records per batch inference job for Claude Sonnet 4

Records per batch inference job for Claude Sonnet 4

general
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3 Opus

Sum of in-progress and submitted batch inference jobs using a base model for Claude 3 Opus

general
100
count
Adjustable
(Automated Reasoning) GetAutomatedReasoningPolicyAnnotations requests per second

(Automated Reasoning) GetAutomatedReasoningPolicyAnnotations requests per second

throughput
10
count
Adjustable
On-demand model inference tokens per minute for OpenAI GPT OSS 120B

On-demand model inference tokens per minute for OpenAI GPT OSS 120B

general
100,000,000
count
Fixed
Batch inference input file size (in GB) for DeepSeek V3.2

Batch inference input file size (in GB) for DeepSeek V3.2

storage
1
count
Fixed
Model invocation max tokens per day for Gemma 3 27B (doubled for cross-region calls)

Model invocation max tokens per day for Gemma 3 27B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Minimum number of records per batch inference job for Claude 3 Sonnet

Minimum number of records per batch inference job for Claude 3 Sonnet

general
100
count
Fixed
Cross-region model inference requests per minute for Stable Image Creative Upscale

Cross-region model inference requests per minute for Stable Image Creative Upscale

throughput
4
count
Fixed
Minimum number of records per batch inference job for NVIDIA Nemotron Nano 3 30B

Minimum number of records per batch inference job for NVIDIA Nemotron Nano 3 30B

general
100
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for NVIDIA Nemotron 3 Super 120B A12B

Sum of in-progress and submitted batch inference jobs using a base model for NVIDIA Nemotron 3 Super 120B A12B

general
100
count
Adjustable
Model units per provisioned model for the 24k context length variant for Amazon Nova Pro

Model units per provisioned model for the 24k context length variant for Amazon Nova Pro

general
0
count
Adjustable
(Automated Reasoning) Concurrent policy builds per account

(Automated Reasoning) Concurrent policy builds per account

compute
5
count
Fixed
Minimum number of records per batch inference job for Claude 3.7 Sonnet

Minimum number of records per batch inference job for Claude 3.7 Sonnet

general
100
count
Adjustable
(Guardrails) Regex length in characters

(Guardrails) Regex length in characters

general
500
count
Fixed
Model invocation max tokens per day for Ministral 8B 3.0 (doubled for cross-region calls)

Model invocation max tokens per day for Ministral 8B 3.0 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference input file size (in GB) for Claude Sonnet 4.6

Batch inference input file size (in GB) for Claude Sonnet 4.6

storage
1
count
Fixed
Records per input file per batch inference job for NVIDIA Nemotron 3 Super 120B A12B

Records per input file per batch inference job for NVIDIA Nemotron 3 Super 120B A12B

storage
100,000
count
Adjustable
On-demand model inference requests per minute for Gemma 3 27B

On-demand model inference requests per minute for Gemma 3 27B

throughput
10,000
count
Fixed
(Flows) Inline code nodes per flow

(Flows) Inline code nodes per flow

capacity
5
count
Fixed
Batch inference job size (in GB) for Qwen3 32B

Batch inference job size (in GB) for Qwen3 32B

storage
5
count
Fixed
Batch inference input file size (in GB) for Qwen3 Next 80B

Batch inference input file size (in GB) for Qwen3 Next 80B

storage
1
count
Fixed
(Flows) S3 storage nodes per flow

(Flows) S3 storage nodes per flow

storage
10
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Amazon Nova Premier

Sum of in-progress and submitted batch inference jobs using a base model for Amazon Nova Premier

general
100
count
Adjustable
Batch inference job size (in GB) for Llama 3.2 90B Instruct

Batch inference job size (in GB) for Llama 3.2 90B Instruct

storage
5
count
Fixed
(Model customization) Sum of training and validation records for a Titan Text G1 - Express v1 Continued Pre-Training job

(Model customization) Sum of training and validation records for a Titan Text G1 - Express v1 Continued Pre-Training job

general
100,000
count
Adjustable
Records per batch inference job for NVIDIA Nemotron Nano 3 30B

Records per batch inference job for NVIDIA Nemotron Nano 3 30B

general
100,000
count
Adjustable
Records per batch inference job for NVIDIA Nemotron Nano 12B

Records per batch inference job for NVIDIA Nemotron Nano 12B

general
100,000
count
Adjustable
On-demand model inference requests per minute for Meta Llama 3.2 90B Instruct

On-demand model inference requests per minute for Meta Llama 3.2 90B Instruct

throughput
400
count
Fixed
Minimum number of records per batch inference job for Claude Sonnet 4.5.

Minimum number of records per batch inference job for Claude Sonnet 4.5.

general
100
count
Fixed
Batch inference job size (in GB) for Kimi K2 Thinking

Batch inference job size (in GB) for Kimi K2 Thinking

storage
5
count
Fixed
Model invocation max tokens per day for AI21 Labs Jamba 1.5 Large (doubled for cross-region calls)

Model invocation max tokens per day for AI21 Labs Jamba 1.5 Large (doubled for cross-region calls)

general
432,000,000
count
Fixed
On-demand model inference tokens per minute for Ministral 8B 3.0

On-demand model inference tokens per minute for Ministral 8B 3.0

general
100,000,000
count
Fixed
Batch inference input file size (in GB) for Claude 3.5 Haiku

Batch inference input file size (in GB) for Claude 3.5 Haiku

storage
1
count
Fixed
Minimum number of records per batch inference job for Nova Lite V1

Minimum number of records per batch inference job for Nova Lite V1

general
100
count
Fixed
On-demand model inference requests per minute for Qwen3 32B V1

On-demand model inference requests per minute for Qwen3 32B V1

throughput
10,000
count
Fixed
Minimum number of records per batch inference job for Qwen3 Coder Next

Minimum number of records per batch inference job for Qwen3 Coder Next

general
100
count
Fixed
(Automated Reasoning) CreateAutomatedReasoningPolicy requests per second

(Automated Reasoning) CreateAutomatedReasoningPolicy requests per second

throughput
5
count
Adjustable
On-demand model inference tokens per minute for AI21 Labs Jamba 1.5 Large

On-demand model inference tokens per minute for AI21 Labs Jamba 1.5 Large

general
300,000
count
Fixed
Records per batch inference job for Qwen3 Coder 30B

Records per batch inference job for Qwen3 Coder 30B

general
100,000
count
Adjustable
Model units per provisioned model for Anthropic Claude 3 Sonnet 28K

Model units per provisioned model for Anthropic Claude 3 Sonnet 28K

general
0
count
Adjustable
(Knowledge Bases) DeleteKnowledgeBaseDocuments requests per second

(Knowledge Bases) DeleteKnowledgeBaseDocuments requests per second

throughput
5
count
Fixed
Model units, with commitment, for Provisioned Throughout created for Meta Maverick 4 Scout 17B Instruct 128K

Model units, with commitment, for Provisioned Throughout created for Meta Maverick 4 Scout 17B Instruct 128K

general
0
count
Adjustable
Model invocation max tokens per day for Anthropic Claude Sonnet 4 V1 1M Context Length (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Sonnet 4 V1 1M Context Length (doubled for cross-region calls)

general
720,000,000
count
Fixed
Throttle rate limit for Bedrock Data Automation: ListTagsForResource

Throttle rate limit for Bedrock Data Automation: ListTagsForResource

throughput
25
count
Fixed
Model units per provisioned model for Meta Llama 2 Chat 70B

Model units per provisioned model for Meta Llama 2 Chat 70B

general
0
count
Adjustable
Throttle rate limit for Bedrock Data Automation: UntagResource

Throttle rate limit for Bedrock Data Automation: UntagResource

throughput
25
count
Fixed
Records per input file per batch inference job for Claude Haiku 4.5

Records per input file per batch inference job for Claude Haiku 4.5

storage
100,000
count
Adjustable
(Flows) Flow versions per flow

(Flows) Flow versions per flow

general
10
count
Fixed
On-demand model inference requests per minute for Nemotron Nano 3 30B

On-demand model inference requests per minute for Nemotron Nano 3 30B

throughput
10,000
count
Fixed
On-demand model inference tokens per minute for Anthropic Claude 3 Sonnet

On-demand model inference tokens per minute for Anthropic Claude 3 Sonnet

general
1,000,000
count
Fixed
(Data Automation) Maximum Blueprints per Project (Videos)

(Data Automation) Maximum Blueprints per Project (Videos)

general
1
count
Fixed
On-demand model inference tokens per minute for Amazon Nova Lite

On-demand model inference tokens per minute for Amazon Nova Lite

general
4,000,000
count
Fixed
(Data Automation) Maximum JSON Blueprint Size (Characters)

(Data Automation) Maximum JSON Blueprint Size (Characters)

storage
100,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for OpenAI GPT OSS 20b

Sum of in-progress and submitted batch inference jobs using a base model for OpenAI GPT OSS 20b

general
100
count
Adjustable
On-demand model inference requests per minute for Meta Llama 3 70B Instruct

On-demand model inference requests per minute for Meta Llama 3 70B Instruct

throughput
400
count
Fixed
Records per batch inference job for Gemma 3 27B

Records per batch inference job for Gemma 3 27B

general
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Devstral 2 123B

Sum of in-progress and submitted batch inference jobs using a base model for Devstral 2 123B

general
100
count
Adjustable
Minimum number of records per batch inference job for MiniMax M2

Minimum number of records per batch inference job for MiniMax M2

general
100
count
Fixed
Minimum number of records per batch inference job for OpenAI GPT OSS 120b

Minimum number of records per batch inference job for OpenAI GPT OSS 120b

general
100
count
Fixed
(Model customization) In-progress custom model deployments

(Model customization) In-progress custom model deployments

general
2
count
Adjustable
Model units per provisioned model for Stability.ai Stable Diffusion XL 0.8

Model units per provisioned model for Stability.ai Stable Diffusion XL 0.8

general
0
count
Adjustable
On-demand model inference tokens per minute for Meta Llama 3 70B Instruct

On-demand model inference tokens per minute for Meta Llama 3 70B Instruct

general
300,000
count
Fixed
Global cross-region model inference tokens per minute for Anthropic Claude Opus 4.7

Global cross-region model inference tokens per minute for Anthropic Claude Opus 4.7

general
15,000,000
count
Adjustable
(Knowledge Bases) ListKnowledgeBaseDocuments requests per second

(Knowledge Bases) ListKnowledgeBaseDocuments requests per second

throughput
5
count
Fixed
Batch inference job size (in GB) for Gemma 3 27B

Batch inference job size (in GB) for Gemma 3 27B

storage
5
count
Fixed
(Model customization) Sum of on demand custom model deployment tokens per day for Amazon Nova 2 Lite

(Model customization) Sum of on demand custom model deployment tokens per day for Amazon Nova 2 Lite

general
5,760,000,000
count
Fixed
(Guardrails) On-demand ApplyGuardrail requests per second

(Guardrails) On-demand ApplyGuardrail requests per second

throughput
50
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for MiniMax M2.1

Sum of in-progress and submitted batch inference jobs using a base model for MiniMax M2.1

general
100
count
Adjustable
Cross-region model inference tokens per minute for Anthropic Claude 3 Opus

Cross-region model inference tokens per minute for Anthropic Claude 3 Opus

general
800,000
count
Adjustable
Records per batch inference job for Claude Haiku 4.5

Records per batch inference job for Claude Haiku 4.5

general
100,000
count
Adjustable
(Model customization) Sum of training and validation records for a Titan Text G1 - Lite v1 Fine-tuning job

(Model customization) Sum of training and validation records for a Titan Text G1 - Lite v1 Fine-tuning job

general
10,000
count
Adjustable
(Automated Reasoning) Versions per policy

(Automated Reasoning) Versions per policy

identity
1,000
count
Fixed
Cross-region model inference requests per minute for Meta Llama 3.2 1B Instruct

Cross-region model inference requests per minute for Meta Llama 3.2 1B Instruct

throughput
1,600
count
Fixed
Model invocation max tokens per day for Z.ai GLM 5 (doubled for cross-region calls)

Model invocation max tokens per day for Z.ai GLM 5 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference input file size (in GB) for GLM 5

Batch inference input file size (in GB) for GLM 5

storage
1
count
Fixed
Minimum number of records per batch inference job for NVIDIA Nemotron 3 Super 120B A12B

Minimum number of records per batch inference job for NVIDIA Nemotron 3 Super 120B A12B

general
100
count
Fixed
On-demand model inference tokens per minute for Anthropic Claude 3.5 Sonnet V2

On-demand model inference tokens per minute for Anthropic Claude 3.5 Sonnet V2

general
400,000
count
Fixed
Model units per provisioned model for the 128k context length variant for Amazon Nova Micro

Model units per provisioned model for the 128k context length variant for Amazon Nova Micro

general
0
count
Adjustable
Model units per provisioned model for Stability.ai Stable Diffusion XL 1.0

Model units per provisioned model for Stability.ai Stable Diffusion XL 1.0

general
0
count
Adjustable
On-Demand, latency-optimized model inference requests per minute for Meta Llama 3.1 70B Instruct

On-Demand, latency-optimized model inference requests per minute for Meta Llama 3.1 70B Instruct

throughput
100
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Nova Micro V1

Sum of in-progress and submitted batch inference jobs using a base model for Nova Micro V1

general
100
count
Adjustable
Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.6

Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.6

general
6,000,000
count
Adjustable
Throttle rate limit for CreateBlueprint

Throttle rate limit for CreateBlueprint

throughput
5
count
Fixed
(Knowledge Bases) Ingestion job file size

(Knowledge Bases) Ingestion job file size

storage
50
count
Fixed
(Automated Reasoning) StartAutomatedReasoningPolicyBuildWorkflow requests per second

(Automated Reasoning) StartAutomatedReasoningPolicyBuildWorkflow requests per second

throughput
1
count
Adjustable
On-demand model inference requests per minute for Meta Llama 2 Chat 13B

On-demand model inference requests per minute for Meta Llama 2 Chat 13B

throughput
800
count
Fixed
Records per input file per batch inference job for Ministral 3B

Records per input file per batch inference job for Ministral 3B

storage
100,000
count
Adjustable
(Automated Reasoning) Source document tokens

(Automated Reasoning) Source document tokens

general
122,880
count
Fixed
Global cross-region model inference tokens per minute for Amazon Nova 2 Lite

Global cross-region model inference tokens per minute for Amazon Nova 2 Lite

general
8,000,000
count
Adjustable
On-demand model inference tokens per minute for Amazon Titan Multimodal Embeddings G1

On-demand model inference tokens per minute for Amazon Titan Multimodal Embeddings G1

general
300,000
count
Fixed
On-demand model inference requests per minute for OpenAI GPT OSS 120B

On-demand model inference requests per minute for OpenAI GPT OSS 120B

throughput
10,000
count
Fixed
On-demand model inference requests per minute for Stable Image Search and Replace

On-demand model inference requests per minute for Stable Image Search and Replace

throughput
10
count
Fixed
On-demand model inference requests per minute for Qwen3 Next 80B A3B

On-demand model inference requests per minute for Qwen3 Next 80B A3B

throughput
10,000
count
Fixed
Batch inference job size (in GB) for Claude Haiku 4.5

Batch inference job size (in GB) for Claude Haiku 4.5

storage
5
count
Fixed
Global cross-region model inference tokens per minute for Anthropic Claude Sonnet 4 V1

Global cross-region model inference tokens per minute for Anthropic Claude Sonnet 4 V1

general
200,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Ministral 3B

Sum of in-progress and submitted batch inference jobs using a base model for Ministral 3B

general
100
count
Adjustable
Records per input file per batch inference job for Nova Lite V1

Records per input file per batch inference job for Nova Lite V1

storage
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Ministral 3 8B

Sum of in-progress and submitted batch inference jobs using a base model for Ministral 3 8B

general
100
count
Adjustable
Cross-region model inference requests per minute for Meta Llama 3.1 8B Instruct

Cross-region model inference requests per minute for Meta Llama 3.1 8B Instruct

throughput
1,600
count
Fixed
Model invocation max tokens per day for Meta Llama 3.2 11B Instruct (doubled for cross-region calls)

Model invocation max tokens per day for Meta Llama 3.2 11B Instruct (doubled for cross-region calls)

general
432,000,000
count
Fixed
Batch inference input file size (in GB) for Llama 3.2 1B Instruct

Batch inference input file size (in GB) for Llama 3.2 1B Instruct

storage
1
count
Fixed
(Guardrails) Versions per guardrail

(Guardrails) Versions per guardrail

general
20
count
Fixed
On-demand model inference requests per minute for Qwen3 VL 235B A22B

On-demand model inference requests per minute for Qwen3 VL 235B A22B

throughput
10,000
count
Fixed
Cross-region model inference requests per minute for Meta Llama 3.2 3B Instruct

Cross-region model inference requests per minute for Meta Llama 3.2 3B Instruct

throughput
1,600
count
Fixed
(Evaluation) Number of custom metrics

(Evaluation) Number of custom metrics

general
10
count
Fixed
Records per input file per batch inference job for Qwen3 Next 80B

Records per input file per batch inference job for Qwen3 Next 80B

storage
100,000
count
Adjustable
Cross-region model inference requests per minute for Stable Image Conservative Upscale

Cross-region model inference requests per minute for Stable Image Conservative Upscale

throughput
4
count
Fixed
UpdateAgentKnowledgeBase requests per second

UpdateAgentKnowledgeBase requests per second

throughput
4
count
Fixed
(Model customization) Maximum line length for distillation customization jobs

(Model customization) Maximum line length for distillation customization jobs

general
16
Kilobytes
Fixed
Cross-region model inference requests per minute for Anthropic Claude 3 Haiku

Cross-region model inference requests per minute for Anthropic Claude 3 Haiku

throughput
2,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Claude Sonnet 4.6

Sum of in-progress and submitted batch inference jobs using a base model for Claude Sonnet 4.6

general
100
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 Coder 30B

Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 Coder 30B

general
100
count
Adjustable
Records per batch inference job for Claude 3.5 Sonnet v2

Records per batch inference job for Claude 3.5 Sonnet v2

general
100,000
count
Adjustable
(Knowledge Bases) Concurrent ingestion jobs per knowledge base

(Knowledge Bases) Concurrent ingestion jobs per knowledge base

compute
1
count
Fixed
Cross-region model inference requests per minute for Cohere Embed V4

Cross-region model inference requests per minute for Cohere Embed V4

throughput
2,000
count
Fixed
Batch inference job size (in GB) for Claude 3.7 Sonnet

Batch inference job size (in GB) for Claude 3.7 Sonnet

storage
5
count
Adjustable
Global cross-region model inference tokens per minute for Cohere Embed V4

Global cross-region model inference tokens per minute for Cohere Embed V4

general
300,000
count
Adjustable
On-demand model inference tokens per minute for Meta Llama 3.2 11B Instruct

On-demand model inference tokens per minute for Meta Llama 3.2 11B Instruct

general
300,000
count
Fixed
On-demand model inference requests per minute for Mistral 7B Instruct

On-demand model inference requests per minute for Mistral 7B Instruct

throughput
800
count
Fixed
Records per input file per batch inference job for Voxtral Mini 3B 2507

Records per input file per batch inference job for Voxtral Mini 3B 2507

storage
100,000
count
Adjustable
Minimum number of records per batch inference job for Claude 3.5 Haiku

Minimum number of records per batch inference job for Claude 3.5 Haiku

general
100
count
Fixed
On-demand model inference requests per minute for AI21 Labs Jurassic-2 Mid

On-demand model inference requests per minute for AI21 Labs Jurassic-2 Mid

throughput
400
count
Fixed
On-demand model inference requests per minute for Anthropic Claude 3 Sonnet

On-demand model inference requests per minute for Anthropic Claude 3 Sonnet

throughput
500
count
Fixed
(Flows) DeleteFlowAlias requests per second

(Flows) DeleteFlowAlias requests per second

throughput
2
count
Fixed
(Flows) Flow aliases per flow

(Flows) Flow aliases per flow

general
10
count
Fixed
(Knowledge Bases) Files to add or update per ingestion job

(Knowledge Bases) Files to add or update per ingestion job

storage
5,000,000
count
Fixed
Records per input file per batch inference job for OpenAI GPT OSS 20b

Records per input file per batch inference job for OpenAI GPT OSS 20b

storage
100,000
count
Adjustable
Model invocation max tokens per day for Ministral 14B 3.0 (doubled for cross-region calls)

Model invocation max tokens per day for Ministral 14B 3.0 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
(Knowledge Bases) IngestKnowledgeBaseDocuments requests per second

(Knowledge Bases) IngestKnowledgeBaseDocuments requests per second

throughput
5
count
Fixed
Model invocation max tokens per day for Minimax M2.1 (doubled for cross-region calls)

Model invocation max tokens per day for Minimax M2.1 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference input file size (in GB) for Kimi K2.5

Batch inference input file size (in GB) for Kimi K2.5

storage
1
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Claude Opus 4.5

Sum of in-progress and submitted batch inference jobs using a base model for Claude Opus 4.5

general
100
count
Adjustable
ListAgentKnowledgeBases requests per second

ListAgentKnowledgeBases requests per second

throughput
10
count
Fixed
(Model customization) Sum of on demand custom model deployment tokens per day for Amazon Nova Micro

(Model customization) Sum of on demand custom model deployment tokens per day for Amazon Nova Micro

general
5,760,000,000
count
Fixed
Records per batch inference job for Titan Text Embeddings V2

Records per batch inference job for Titan Text Embeddings V2

general
100,000
count
Adjustable
On-demand model inference tokens per minute for Mistral Devstral 2 123b

On-demand model inference tokens per minute for Mistral Devstral 2 123b

general
100,000,000
count
Fixed
(Flows) ValidateFlowDefinition requests per second

(Flows) ValidateFlowDefinition requests per second

throughput
2
count
Fixed
Inference profiles per account

Inference profiles per account

storage
1,000
count
Adjustable
On-demand model inference requests per minute for Twelve Labs Marengo

On-demand model inference requests per minute for Twelve Labs Marengo

throughput
100
count
Fixed
CreateAgent requests per second

CreateAgent requests per second

throughput
6
count
Fixed
(Knowledge Bases) Files to ingest per IngestKnowledgeBaseDocuments job.

(Knowledge Bases) Files to ingest per IngestKnowledgeBaseDocuments job.

storage
25
count
Fixed
Cross-region model inference requests per minute for Amazon Nova 2 Lite

Cross-region model inference requests per minute for Amazon Nova 2 Lite

throughput
2,000
count
Fixed
Minimum number of records per batch inference job for Llama 3.1 8B Instruct

Minimum number of records per batch inference job for Llama 3.1 8B Instruct

general
100
count
Fixed
Minimum number of records per batch inference job for Titan Text Embeddings V2

Minimum number of records per batch inference job for Titan Text Embeddings V2

general
100
count
Fixed
Records per batch inference job for Llama 3.1 70B Instruct

Records per batch inference job for Llama 3.1 70B Instruct

general
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Titan Multimodal Embeddings G1

Sum of in-progress and submitted batch inference jobs using a base model for Titan Multimodal Embeddings G1

general
100
count
Adjustable
Batch inference job size (in GB) for Nova Lite V1

Batch inference job size (in GB) for Nova Lite V1

storage
100
count
Fixed
Model units per provisioned model for Amazon Titan Embeddings G1 - Text

Model units per provisioned model for Amazon Titan Embeddings G1 - Text

general
0
count
Adjustable
On-demand InvokeModel concurrent requests for Amazon Nova Sonic

On-demand InvokeModel concurrent requests for Amazon Nova Sonic

compute
20
count
Fixed
Model units per provisioned model for Cohere Embed English

Model units per provisioned model for Cohere Embed English

general
0
count
Adjustable
(Model customization) Sum of on demand custom model deployment tokens per minute for Amazon Nova 2 Lite

(Model customization) Sum of on demand custom model deployment tokens per minute for Amazon Nova 2 Lite

general
4,000,000
count
Fixed
(Knowledge Bases) User query size

(Knowledge Bases) User query size

storage
1,000
count
Fixed
(Flows) GetFlowVersion requests per second

(Flows) GetFlowVersion requests per second

throughput
10
count
Fixed
On-demand model inference tokens per minute for Amazon Nova Pro

On-demand model inference tokens per minute for Amazon Nova Pro

general
1,000,000
count
Fixed
Records per batch inference job for Claude 3.5 Haiku

Records per batch inference job for Claude 3.5 Haiku

general
100,000
count
Adjustable
No-commitment model units for Provisioned Throughput created for base model Amazon Nova 2 Lite V1.0 256K

No-commitment model units for Provisioned Throughput created for base model Amazon Nova 2 Lite V1.0 256K

general
0
count
Fixed
On-demand model inference requests per minute for MiniMax M2.5

On-demand model inference requests per minute for MiniMax M2.5

throughput
10,000
count
Fixed
Model units per provisioned model for Mistral Small

Model units per provisioned model for Mistral Small

general
0
count
Adjustable
(Data Automation) InvokeBlueprintOptimizationAsync - Max number of blueprint optimization concurrent jobs

(Data Automation) InvokeBlueprintOptimizationAsync - Max number of blueprint optimization concurrent jobs

compute
3
count
Adjustable
On-demand model inference requests per minute for Z.ai GLM-4.7 Flash

On-demand model inference requests per minute for Z.ai GLM-4.7 Flash

throughput
10,000
count
Fixed
On-demand model inference tokens per minute for NVIDIA Nemotron Nano 2 VL

On-demand model inference tokens per minute for NVIDIA Nemotron Nano 2 VL

general
100,000,000
count
Fixed
(Knowledge Bases) RetrieveAndGenerateStream requests per second

(Knowledge Bases) RetrieveAndGenerateStream requests per second

throughput
20
count
Fixed
Records per batch inference job for Claude Sonnet 4.6

Records per batch inference job for Claude Sonnet 4.6

general
100,000
count
Adjustable
(Knowledge Bases) Files to delete per ingestion job

(Knowledge Bases) Files to delete per ingestion job

storage
5,000,000
count
Fixed
(Automated Reasoning) ListAutomatedReasoningPolicies requests per second

(Automated Reasoning) ListAutomatedReasoningPolicies requests per second

throughput
5
count
Adjustable
(Data Automation) Maximum number of Blueprints per Start Inference request (Images)

(Data Automation) Maximum number of Blueprints per Start Inference request (Images)

throughput
1
count
Fixed
Batch inference job size (in GB) for NVIDIA Nemotron Nano 9B

Batch inference job size (in GB) for NVIDIA Nemotron Nano 9B

storage
5
count
Fixed
Model invocation max tokens per day for DeepSeek V3.2 (doubled for cross-region calls)

Model invocation max tokens per day for DeepSeek V3.2 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference job size (in GB) for Nova Micro V1

Batch inference job size (in GB) for Nova Micro V1

storage
5
count
Fixed
(Flows) GetFlowAlias requests per second

(Flows) GetFlowAlias requests per second

throughput
10
count
Fixed
On-demand model inference requests per minute for Amazon Titan Image Generator G1

On-demand model inference requests per minute for Amazon Titan Image Generator G1

throughput
60
count
Fixed
On-demand model inference tokens per minute for Ministral 14B 3.0

On-demand model inference tokens per minute for Ministral 14B 3.0

general
100,000,000
count
Fixed
Records per input file per batch inference job for Gemma 3 27B

Records per input file per batch inference job for Gemma 3 27B

storage
100,000
count
Adjustable
Batch inference input file size (in GB) for OpenAI GPT OSS Safeguard 20b

Batch inference input file size (in GB) for OpenAI GPT OSS Safeguard 20b

storage
1
count
Fixed
Batch inference job size (in GB) for Claude 3 Haiku

Batch inference job size (in GB) for Claude 3 Haiku

storage
5
count
Fixed
(Automated Reasoning) DeleteAutomatedReasoningPolicyBuildWorkflow requests per second

(Automated Reasoning) DeleteAutomatedReasoningPolicyBuildWorkflow requests per second

throughput
5
count
Adjustable
(Automated Reasoning) Types per policy

(Automated Reasoning) Types per policy

identity
50
count
Fixed
(Model customization) Sum of on demand custom model deployment tokens per day for Amazon Nova Pro

(Model customization) Sum of on demand custom model deployment tokens per day for Amazon Nova Pro

general
1,152,000,000
count
Fixed
Records per batch inference job for Voxtral Mini 3B 2507

Records per batch inference job for Voxtral Mini 3B 2507

general
100,000
count
Adjustable
Model invocation max tokens per day for Anthropic Claude Opus 4.1 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Opus 4.1 (doubled for cross-region calls)

general
360,000,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Mistral Large 2 (24.07)

Sum of in-progress and submitted batch inference jobs using a base model for Mistral Large 2 (24.07)

general
100
count
Adjustable
Cross-region model inference tokens per minute for Meta Llama 3.2 3B Instruct

Cross-region model inference tokens per minute for Meta Llama 3.2 3B Instruct

general
600,000
count
Adjustable
On-demand model inference requests per minute for Writer Palmyra Vision 7B

On-demand model inference requests per minute for Writer Palmyra Vision 7B

throughput
10,000
count
Fixed
(Automated Reasoning) Variables in policy

(Automated Reasoning) Variables in policy

identity
200
count
Fixed
Model invocation max tokens per day for Qwen3 Coder Next (doubled for cross-region calls)

Model invocation max tokens per day for Qwen3 Coder Next (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference job size (in GB) for Claude Opus 4.6

Batch inference job size (in GB) for Claude Opus 4.6

storage
5
count
Fixed
(Automated Reasoning) ListAutomatedReasoningPolicyTestResults requests per second

(Automated Reasoning) ListAutomatedReasoningPolicyTestResults requests per second

throughput
5
count
Adjustable
Batch inference job size (in GB) for Claude 3.5 Haiku

Batch inference job size (in GB) for Claude 3.5 Haiku

storage
5
count
Fixed
On-demand model inference requests per minute for Stable Image Inpaint

On-demand model inference requests per minute for Stable Image Inpaint

throughput
10
count
Fixed
(Flows) Flow executions per account

(Flows) Flow executions per account

compute
1,000
count
Adjustable
Cross-region model inference requests per minute for Twelve Labs Marengo

Cross-region model inference requests per minute for Twelve Labs Marengo

throughput
200
count
Fixed
(Prompt management) UpdatePrompt requests per second

(Prompt management) UpdatePrompt requests per second

throughput
2
count
Fixed
Model invocation max tokens per day for DeepSeek R1 V1 (doubled for cross-region calls)

Model invocation max tokens per day for DeepSeek R1 V1 (doubled for cross-region calls)

general
144,000,000
count
Fixed
AssociateAgentKnowledgeBase requests per second

AssociateAgentKnowledgeBase requests per second

throughput
6
count
Fixed
Global cross-region model inference requests per minute for Amazon Nova 2 Omni

Global cross-region model inference requests per minute for Amazon Nova 2 Omni

throughput
2,000
count
Adjustable
(Flows) Prompt nodes per flow

(Flows) Prompt nodes per flow

capacity
20
count
Adjustable
Global cross-region model inference tokens per minute for Amazon Nova 2 Pro Preview

Global cross-region model inference tokens per minute for Amazon Nova 2 Pro Preview

general
1,000,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Gemma 3 27B

Sum of in-progress and submitted batch inference jobs using a base model for Gemma 3 27B

general
100
count
Adjustable
(Data Automation) Maximum video file size (MB)

(Data Automation) Maximum video file size (MB)

storage
10,240
count
Fixed
Throttle rate limit for UpdateDataAutomationProject

Throttle rate limit for UpdateDataAutomationProject

throughput
5
count
Fixed
(Data Automation) Minimum audio length (Miliseconds)

(Data Automation) Minimum audio length (Miliseconds)

general
500
count
Fixed
(Flows) Collector nodes per flow

(Flows) Collector nodes per flow

capacity
1
count
Fixed
Batch inference job size (in GB) for Llama 3.1 70B Instruct

Batch inference job size (in GB) for Llama 3.1 70B Instruct

storage
5
count
Fixed
Global cross-region model inference tokens per day for Anthropic Claude Sonnet 4.6

Global cross-region model inference tokens per day for Anthropic Claude Sonnet 4.6

general
8,640,000,000
count
Fixed
Batch inference input file size (in GB) for MiniMax M2

Batch inference input file size (in GB) for MiniMax M2

storage
1
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Gemma 3 12B

Sum of in-progress and submitted batch inference jobs using a base model for Gemma 3 12B

general
100
count
Adjustable
Model invocation max tokens per day for Amazon Nova 2 Omni (doubled for cross-region calls)

Model invocation max tokens per day for Amazon Nova 2 Omni (doubled for cross-region calls)

general
5,760,000,000
count
Fixed
On-demand model inference tokens per minute for Qwen3 Coder Next

On-demand model inference tokens per minute for Qwen3 Coder Next

general
100,000,000
count
Fixed
Model invocation max tokens per day for Amazon Nova Lite (doubled for cross-region calls)

Model invocation max tokens per day for Amazon Nova Lite (doubled for cross-region calls)

general
5,760,000,000
count
Fixed
(Prompt management) GetPrompt requests per second

(Prompt management) GetPrompt requests per second

throughput
10
count
Fixed
(Model customization) Sum of on demand custom model deployment tokens per day for Amazon Nova Lite

(Model customization) Sum of on demand custom model deployment tokens per day for Amazon Nova Lite

general
5,760,000,000
count
Fixed
(Data Automation) Maximum Levels of Field Hierarchy

(Data Automation) Maximum Levels of Field Hierarchy

general
1
count
Fixed
Batch inference input file size (in GB) for Claude 3 Haiku

Batch inference input file size (in GB) for Claude 3 Haiku

storage
1
count
Fixed
Cross-region model inference tokens per minute for Anthropic Claude Opus 4 V1

Cross-region model inference tokens per minute for Anthropic Claude Opus 4 V1

general
200,000
count
Adjustable
Minimum number of records per batch inference job for Mistral Large 2 (24.07)

Minimum number of records per batch inference job for Mistral Large 2 (24.07)

general
100
count
Fixed
Model units per provisioned model for Meta Llama 2 70B

Model units per provisioned model for Meta Llama 2 70B

general
0
count
Adjustable
Minimum number of records per batch inference job for Claude 3 Haiku

Minimum number of records per batch inference job for Claude 3 Haiku

general
100
count
Fixed
Minimum number of records per batch inference job for Claude 3.5 Sonnet

Minimum number of records per batch inference job for Claude 3.5 Sonnet

general
100
count
Fixed
Minimum number of records per batch inference job for Amazon Nova 2 Multimodal Embeddings V1

Minimum number of records per batch inference job for Amazon Nova 2 Multimodal Embeddings V1

general
100
count
Fixed
Global cross-region model inference requests per minute for Amazon Nova 2 Pro Preview

Global cross-region model inference requests per minute for Amazon Nova 2 Pro Preview

throughput
100
count
Adjustable
Model units per provisioned model for Cohere Command R

Model units per provisioned model for Cohere Command R

general
0
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Gemma 3 4B

Sum of in-progress and submitted batch inference jobs using a base model for Gemma 3 4B

general
100
count
Adjustable
On-demand model inference requests per minute for Twelve Labs Pegasus

On-demand model inference requests per minute for Twelve Labs Pegasus

throughput
60
count
Adjustable
Batch inference job size (in GB) for GLM 4.7

Batch inference job size (in GB) for GLM 4.7

storage
5
count
Fixed
On-demand model inference tokens per minute for Mistral AI Mistral Small

On-demand model inference tokens per minute for Mistral AI Mistral Small

general
300,000
count
Fixed
(Model customization) Maximum student model fine tuning context length for Amazon Nova Micro V1 distillation customization jobs

(Model customization) Maximum student model fine tuning context length for Amazon Nova Micro V1 distillation customization jobs

general
32,000
count
Fixed
On-demand model inference tokens per minute for GPT OSS Safeguard 120B

On-demand model inference tokens per minute for GPT OSS Safeguard 120B

general
100,000,000
count
Fixed
Cross-region model inference requests per minute for TwelveLabs Marengo Embed 3.0

Cross-region model inference requests per minute for TwelveLabs Marengo Embed 3.0

throughput
1,000
count
Adjustable
Model units per provisioned model for Meta Llama 2 13B

Model units per provisioned model for Meta Llama 2 13B

general
0
count
Adjustable
On-demand model inference tokens per minute for DeepSeek V3.2

On-demand model inference tokens per minute for DeepSeek V3.2

general
100,000,000
count
Fixed
On-demand model inference tokens per minute for Meta Llama 3.1 8B Instruct

On-demand model inference tokens per minute for Meta Llama 3.1 8B Instruct

general
300,000
count
Fixed
Batch inference input file size (in GB) for Llama 3.1 405B Instruct

Batch inference input file size (in GB) for Llama 3.1 405B Instruct

storage
1
count
Fixed
On-demand model inference requests per minute for Stable Image Fast Upscale

On-demand model inference requests per minute for Stable Image Fast Upscale

throughput
10
count
Fixed
On-demand model inference requests per minute for Stability.ai Stable Diffusion XL 0.8

On-demand model inference requests per minute for Stability.ai Stable Diffusion XL 0.8

throughput
60
count
Fixed
UpdateAgentAlias requests per second

UpdateAgentAlias requests per second

throughput
2
count
Fixed
(Guardrails) Regex entities in Sensitive Information Filter

(Guardrails) Regex entities in Sensitive Information Filter

general
30
count
Fixed
Cross-region model inference requests per minute for Amazon Nova Pro

Cross-region model inference requests per minute for Amazon Nova Pro

throughput
500
count
Fixed
On-demand model inference tokens per minute for Gemma 3 12B

On-demand model inference tokens per minute for Gemma 3 12B

general
100,000,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Claude Sonnet 4

Sum of in-progress and submitted batch inference jobs using a base model for Claude Sonnet 4

general
100
count
Adjustable
Model invocation max tokens per day for Anthropic Claude 3.5 Sonnet V1 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude 3.5 Sonnet V1 (doubled for cross-region calls)

general
2,880,000,000
count
Fixed
Records per input file per batch inference job for Llama 4 Scout

Records per input file per batch inference job for Llama 4 Scout

storage
100,000
count
Adjustable
Model invocation max tokens per day for Meta Llama 4 Scout V1 (doubled for cross-region calls)

Model invocation max tokens per day for Meta Llama 4 Scout V1 (doubled for cross-region calls)

general
432,000,000
count
Fixed
(Data Automation) Maximum document file size (MB)

(Data Automation) Maximum document file size (MB)

storage
500
count
Fixed
(Automated Reasoning) Policies per account

(Automated Reasoning) Policies per account

general
100
count
Fixed
Cross-region model inference tokens per minute for Amazon Nova Pro

Cross-region model inference tokens per minute for Amazon Nova Pro

general
2,000,000
count
Adjustable
Model invocation max tokens per day for Qwen3 Next 80B A3B (doubled for cross-region calls)

Model invocation max tokens per day for Qwen3 Next 80B A3B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Cross-region model inference requests per minute for Amazon Nova 2 Omni

Cross-region model inference requests per minute for Amazon Nova 2 Omni

throughput
2,000
count
Fixed
(Flows) Condition nodes per flow

(Flows) Condition nodes per flow

capacity
5
count
Fixed
(Automated Reasoning) CreateAutomatedReasoningPolicyTestCase requests per second

(Automated Reasoning) CreateAutomatedReasoningPolicyTestCase requests per second

throughput
5
count
Adjustable
(Flows) PrepareFlow requests per second

(Flows) PrepareFlow requests per second

throughput
2
count
Fixed
Batch inference job size (in GB) for Ministral 3 14B

Batch inference job size (in GB) for Ministral 3 14B

storage
5
count
Fixed
Throttle rate limit for Bedrock Data Automation Runtime: ListTagsForResource

Throttle rate limit for Bedrock Data Automation Runtime: ListTagsForResource

throughput
25
count
Fixed
(Automated Reasoning) ListAutomatedReasoningPolicyTestCases requests per second

(Automated Reasoning) ListAutomatedReasoningPolicyTestCases requests per second

throughput
5
count
Adjustable
(Guardrails) On-demand ApplyGuardrail Denied topic policy text units per second

(Guardrails) On-demand ApplyGuardrail Denied topic policy text units per second

identity
50
count
Adjustable
(Data Automation) Maximum image file size (MB)

(Data Automation) Maximum image file size (MB)

storage
5
count
Fixed
Records per input file per batch inference job for Claude 3.7 Sonnet

Records per input file per batch inference job for Claude 3.7 Sonnet

storage
100,000
count
Adjustable
Minimum number of records per batch inference job for Mistral Small

Minimum number of records per batch inference job for Mistral Small

general
100
count
Fixed
Batch inference job size (in GB) for Llama 3.2 1B Instruct

Batch inference job size (in GB) for Llama 3.2 1B Instruct

storage
5
count
Fixed
Records per input file per batch inference job for Nova Micro V1

Records per input file per batch inference job for Nova Micro V1

storage
100,000
count
Adjustable
On-demand model inference tokens per minute for OpenAI GPT OSS 20B

On-demand model inference tokens per minute for OpenAI GPT OSS 20B

general
100,000,000
count
Fixed
On-demand model inference requests per minute for Ministral 14B 3.0

On-demand model inference requests per minute for Ministral 14B 3.0

throughput
10,000
count
Fixed
(Evaluation) Number of datasets per job

(Evaluation) Number of datasets per job

general
5
count
Fixed
Cross-region model inference tokens per minute for Anthropic Claude 3 Haiku

Cross-region model inference tokens per minute for Anthropic Claude 3 Haiku

general
4,000,000
count
Adjustable
Model invocation max tokens per day for Anthropic Claude Opus 4 V1 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Opus 4 V1 (doubled for cross-region calls)

general
144,000,000
count
Fixed
Records per batch inference job for OpenAI GPT OSS Safeguard 20b

Records per batch inference job for OpenAI GPT OSS Safeguard 20b

general
100,000
count
Adjustable
On-demand InvokeModel concurrent requests for Amazon Nova Reel1.1

On-demand InvokeModel concurrent requests for Amazon Nova Reel1.1

compute
3
count
Fixed
Model invocation max tokens per day for GPT OSS Safeguard 120B (doubled for cross-region calls)

Model invocation max tokens per day for GPT OSS Safeguard 120B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
On-demand model inference requests per minute for Meta Llama 2 70B

On-demand model inference requests per minute for Meta Llama 2 70B

throughput
400
count
Fixed
Batch inference input file size (in GB) for GLM 4.7

Batch inference input file size (in GB) for GLM 4.7

storage
1
count
Fixed
Batch inference input file size (in GB) for Nova Micro V1

Batch inference input file size (in GB) for Nova Micro V1

storage
1
count
Fixed
Cross-region model inference requests per minute for Stable Image Inpaint

Cross-region model inference requests per minute for Stable Image Inpaint

throughput
20
count
Fixed
Model invocation max tokens per day for Mistral Large 3 (doubled for cross-region calls)

Model invocation max tokens per day for Mistral Large 3 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Records per batch inference job for Claude 3 Sonnet

Records per batch inference job for Claude 3 Sonnet

general
100,000
count
Adjustable
On-demand model inference tokens per minute for Qwen3 VL 235B A22B

On-demand model inference tokens per minute for Qwen3 VL 235B A22B

general
100,000,000
count
Fixed
Model units per provisioned model for Amazon Titan Text G1 - Express 8K

Model units per provisioned model for Amazon Titan Text G1 - Express 8K

general
0
count
Adjustable
Batch inference job size (in GB) for Titan Multimodal Embeddings G1

Batch inference job size (in GB) for Titan Multimodal Embeddings G1

storage
5
count
Fixed
(Evaluation) Number of metrics per dataset

(Evaluation) Number of metrics per dataset

general
3
count
Fixed
On-demand model inference requests per minute for Cohere Embed V4

On-demand model inference requests per minute for Cohere Embed V4

throughput
1,000
count
Fixed
Batch inference job size (in GB) for Claude Sonnet 4

Batch inference job size (in GB) for Claude Sonnet 4

storage
5
count
Adjustable
DeleteAgentActionGroup requests per second

DeleteAgentActionGroup requests per second

throughput
2
count
Fixed
(Knowledge Bases) Maximum number of files for BDA parser

(Knowledge Bases) Maximum number of files for BDA parser

storage
1,000
count
Fixed
(Knowledge Bases) ListDataSources requests per second

(Knowledge Bases) ListDataSources requests per second

throughput
10
count
Fixed
(Knowledge Bases) CreateKnowledgeBase requests per second

(Knowledge Bases) CreateKnowledgeBase requests per second

throughput
2
count
Fixed
Model invocation max tokens per day for OpenAI GPT OSS 20B (doubled for cross-region calls)

Model invocation max tokens per day for OpenAI GPT OSS 20B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference job size (in GB) for Claude 3.5 Sonnet v2

Batch inference job size (in GB) for Claude 3.5 Sonnet v2

storage
5
count
Fixed
Model invocation max tokens per day for Moonshot AI Kimi K2.5 (doubled for cross-region calls)

Model invocation max tokens per day for Moonshot AI Kimi K2.5 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Batch inference input file size (in GB) for Llama 4 Scout

Batch inference input file size (in GB) for Llama 4 Scout

storage
1
count
Fixed
Model invocation max tokens per day for Meta Llama 3.2 3B Instruct (doubled for cross-region calls)

Model invocation max tokens per day for Meta Llama 3.2 3B Instruct (doubled for cross-region calls)

general
432,000,000
count
Fixed
Batch inference input file size (in GB) for Claude 3.7 Sonnet

Batch inference input file size (in GB) for Claude 3.7 Sonnet

storage
1
count
Adjustable
(Model customization) Sum of on demand custom model deployment tokens per minute for Amazon Nova Lite

(Model customization) Sum of on demand custom model deployment tokens per minute for Amazon Nova Lite

general
4,000,000
count
Fixed
Records per input file per batch inference job for Ministral 3 8B

Records per input file per batch inference job for Ministral 3 8B

storage
100,000
count
Adjustable
Records per batch inference job for Magistral Small 2509

Records per batch inference job for Magistral Small 2509

general
100,000
count
Adjustable
Batch inference input file size (in GB) for OpenAI GPT OSS 120b

Batch inference input file size (in GB) for OpenAI GPT OSS 120b

storage
1
count
Fixed
Batch inference job size (in GB) for Llama 4 Scout

Batch inference job size (in GB) for Llama 4 Scout

storage
5
count
Fixed
Cross-Region model inference requests per minute for Anthropic Claude 3.5 Haiku

Cross-Region model inference requests per minute for Anthropic Claude 3.5 Haiku

throughput
2,000
count
Fixed
On-demand model inference tokens per minute for Moonshot AI Kimi K2.5

On-demand model inference tokens per minute for Moonshot AI Kimi K2.5

general
100,000,000
count
Fixed
On-Demand, latency-optimized model inference requests per minute for Amazon Nova Pro V1

On-Demand, latency-optimized model inference requests per minute for Amazon Nova Pro V1

throughput
10
count
Fixed
Throttle rate limit for Bedrock Data Automation Runtime: TagResource

Throttle rate limit for Bedrock Data Automation Runtime: TagResource

throughput
25
count
Fixed
Model units per provisioned model for Meta Llama 2 Chat 13B

Model units per provisioned model for Meta Llama 2 Chat 13B

general
0
count
Adjustable
Model units per provisioned model for Amazon Titan Image Generator G1

Model units per provisioned model for Amazon Titan Image Generator G1

general
0
count
Adjustable
On-demand model inference tokens per minute for Anthropic Claude 3.5 Haiku

On-demand model inference tokens per minute for Anthropic Claude 3.5 Haiku

general
2,000,000
count
Fixed
Records per batch inference job for Ministral 3 14B

Records per batch inference job for Ministral 3 14B

general
100,000
count
Adjustable
Records per batch inference job for OpenAI GPT OSS Safeguard 120b

Records per batch inference job for OpenAI GPT OSS Safeguard 120b

general
100,000
count
Adjustable
(Model customization) Sum of on demand custom model deployment tokens per minute for Amazon Nova Pro

(Model customization) Sum of on demand custom model deployment tokens per minute for Amazon Nova Pro

general
800,000
count
Fixed
Global cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1

Global cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1

general
5,000,000
count
Adjustable
(Guardrails) On-demand ApplyGuardrail Sensitive information filter policy text units per second

(Guardrails) On-demand ApplyGuardrail Sensitive information filter policy text units per second

identity
200
count
Adjustable
Minimum number of records per batch inference job for NVIDIA Nemotron Nano 12B

Minimum number of records per batch inference job for NVIDIA Nemotron Nano 12B

general
100
count
Fixed
On-demand model inference requests per minute for Amazon Nova 2 Multimodal Embeddings V1

On-demand model inference requests per minute for Amazon Nova 2 Multimodal Embeddings V1

throughput
2,000
count
Fixed
Minimum number of records per batch inference job for Voxtral Small 24B 2507

Minimum number of records per batch inference job for Voxtral Small 24B 2507

general
100
count
Fixed
(Data Automation) Maximum audio file size (MB)

(Data Automation) Maximum audio file size (MB)

storage
2,048
count
Fixed
(Knowledge Bases) UpdateKnowledgeBase requests per second

(Knowledge Bases) UpdateKnowledgeBase requests per second

throughput
2
count
Fixed
(Automated Reasoning) CancelAutomatedReasoningPolicyBuildWorkflow requests per second

(Automated Reasoning) CancelAutomatedReasoningPolicyBuildWorkflow requests per second

throughput
5
count
Adjustable
On-demand InvokeModel concurrent requests for Twelve Labs Pegasus

On-demand InvokeModel concurrent requests for Twelve Labs Pegasus

compute
30
count
Adjustable
Throttle rate limit for ListBlueprints

Throttle rate limit for ListBlueprints

throughput
5
count
Fixed
Minimum number of records per batch inference job for Gemma 3 27B

Minimum number of records per batch inference job for Gemma 3 27B

general
100
count
Fixed
Model units per provisioned model for the 24k context length variant for Amazon Nova Lite

Model units per provisioned model for the 24k context length variant for Amazon Nova Lite

general
0
count
Adjustable
Model invocation max tokens per day for Magistral Small 1.2 (doubled for cross-region calls)

Model invocation max tokens per day for Magistral Small 1.2 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
On-demand model inference requests per minute for Voxtral Mini 1.0

On-demand model inference requests per minute for Voxtral Mini 1.0

throughput
10,000
count
Fixed
Parameters per function

Parameters per function

general
5
count
Adjustable
Minimum number of records per batch inference job for Kimi K2 Thinking

Minimum number of records per batch inference job for Kimi K2 Thinking

general
100
count
Fixed
On-demand model inference tokens per minute for Magistral Small 1.2

On-demand model inference tokens per minute for Magistral Small 1.2

general
100,000,000
count
Fixed
Throttle rate limit for Bedrock Data Automation: TagResource

Throttle rate limit for Bedrock Data Automation: TagResource

throughput
25
count
Fixed
On-demand model inference tokens per minute for Anthropic Claude 3 Opus

On-demand model inference tokens per minute for Anthropic Claude 3 Opus

general
400,000
count
Fixed
On-demand model inference tokens per minute for Mistral AI Mistral Large

On-demand model inference tokens per minute for Mistral AI Mistral Large

general
300,000
count
Fixed
On-demand model inference tokens per minute for Qwen3 Coder 30B a3b V1

On-demand model inference tokens per minute for Qwen3 Coder 30B a3b V1

general
100,000,000
count
Fixed
GetAgentVersion requests per second

GetAgentVersion requests per second

throughput
10
count
Fixed
Cross-region model inference requests per minute for DeepSeek R1 V1

Cross-region model inference requests per minute for DeepSeek R1 V1

throughput
200
count
Fixed
(Data Automation) Maximum Blueprints per Project (Audios)

(Data Automation) Maximum Blueprints per Project (Audios)

general
1
count
Fixed
Batch inference input file size (in GB) for Claude Opus 4.6

Batch inference input file size (in GB) for Claude Opus 4.6

storage
1
count
Fixed
Model invocation max tokens per day for NVIDIA Nemotron 3 Super 120B A12B (doubled for cross-region calls)

Model invocation max tokens per day for NVIDIA Nemotron 3 Super 120B A12B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
(Data Automation) Maximum Number of pages per document

(Data Automation) Maximum Number of pages per document

general
3,000
count
Fixed
(Automated Reasoning) Values per type in policy

(Automated Reasoning) Values per type in policy

identity
50
count
Fixed
(Knowledge Bases) Concurrent ingestion jobs per data source

(Knowledge Bases) Concurrent ingestion jobs per data source

compute
1
count
Fixed
Batch inference job size (in GB) for Claude 3 Sonnet

Batch inference job size (in GB) for Claude 3 Sonnet

storage
5
count
Fixed
Cross-region model inference requests per minute for Stable Image Style Guide

Cross-region model inference requests per minute for Stable Image Style Guide

throughput
20
count
Fixed
(Knowledge Bases) Rerank requests per second

(Knowledge Bases) Rerank requests per second

throughput
10
count
Fixed
On-demand model inference requests per minute for Stable Image Style Guide

On-demand model inference requests per minute for Stable Image Style Guide

throughput
10
count
Fixed
Global cross-region model inference tokens per day for Anthropic Claude Sonnet 4.5 V1 1M Context Length

Global cross-region model inference tokens per day for Anthropic Claude Sonnet 4.5 V1 1M Context Length

general
1,440,000,000
count
Fixed
Minimum number of records per batch inference job for Claude Opus 4.5

Minimum number of records per batch inference job for Claude Opus 4.5

general
100
count
Fixed
Model units per provisioned model for Anthropic Claude 3.5 Sonnet V2 200K

Model units per provisioned model for Anthropic Claude 3.5 Sonnet V2 200K

general
0
count
Adjustable
Cross-region model inference tokens per minute for Amazon Nova 2 Lite

Cross-region model inference tokens per minute for Amazon Nova 2 Lite

general
8,000,000
count
Adjustable
On-demand model inference tokens per minute for Meta Llama 3.2 1B Instruct

On-demand model inference tokens per minute for Meta Llama 3.2 1B Instruct

general
300,000
count
Fixed
Records per input file per batch inference job for Llama 3.3 70B Instruct

Records per input file per batch inference job for Llama 3.3 70B Instruct

storage
100,000
count
Adjustable
CreateAgentAlias requests per second

CreateAgentAlias requests per second

throughput
2
count
Fixed
DeleteAgentAlias requests per second

DeleteAgentAlias requests per second

throughput
2
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for MiniMax M2

Sum of in-progress and submitted batch inference jobs using a base model for MiniMax M2

general
100
count
Adjustable
Batch inference job size (in GB) for Devstral 2 123B

Batch inference job size (in GB) for Devstral 2 123B

storage
5
count
Fixed
Batch inference input file size (in GB) for Claude Sonnet 4.5

Batch inference input file size (in GB) for Claude Sonnet 4.5

storage
1
count
Fixed
(Guardrails) On-demand ApplyGuardrail Word filter policy text units per second

(Guardrails) On-demand ApplyGuardrail Word filter policy text units per second

identity
200
count
Adjustable
On-demand model inference requests per minute for Z.ai GLM-4.7

On-demand model inference requests per minute for Z.ai GLM-4.7

throughput
10,000
count
Fixed
Minimum number of records per batch inference job for NVIDIA Nemotron Nano 9B

Minimum number of records per batch inference job for NVIDIA Nemotron Nano 9B

general
100
count
Fixed
Minimum number of records per batch inference job for Llama 3.1 70B Instruct

Minimum number of records per batch inference job for Llama 3.1 70B Instruct

general
100
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for NVIDIA Nemotron Nano 12B

Sum of in-progress and submitted batch inference jobs using a base model for NVIDIA Nemotron Nano 12B

general
100
count
Adjustable
Minimum number of records per batch inference job for OpenAI GPT OSS Safeguard 20b

Minimum number of records per batch inference job for OpenAI GPT OSS Safeguard 20b

general
100
count
Fixed
Minimum number of records per batch inference job for Voxtral Mini 3B 2507

Minimum number of records per batch inference job for Voxtral Mini 3B 2507

general
100
count
Fixed
Records per batch inference job for Ministral 3 8B

Records per batch inference job for Ministral 3 8B

general
100,000
count
Adjustable
Cross-region model inference tokens per minute for Anthropic Claude 3 Sonnet

Cross-region model inference tokens per minute for Anthropic Claude 3 Sonnet

general
2,000,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3 Sonnet

Sum of in-progress and submitted batch inference jobs using a base model for Claude 3 Sonnet

general
100
count
Adjustable
Batch inference input file size (in GB) for MiniMax M2.1

Batch inference input file size (in GB) for MiniMax M2.1

storage
1
count
Fixed
Records per input file per batch inference job for OpenAI GPT OSS 120b

Records per input file per batch inference job for OpenAI GPT OSS 120b

storage
100,000
count
Adjustable
(Guardrails) On-demand ApplyGuardrail contextual grounding policy text units per second

(Guardrails) On-demand ApplyGuardrail contextual grounding policy text units per second

identity
106
count
Adjustable
On-demand model inference requests per minute for Meta Llama 2 13B

On-demand model inference requests per minute for Meta Llama 2 13B

throughput
800
count
Fixed
Cross-region model inference requests per minute for Stable Image Remove Background

Cross-region model inference requests per minute for Stable Image Remove Background

throughput
20
count
Fixed
Model invocation max latency-optimized tokens per day for Amazon Nova Pro V1

Model invocation max latency-optimized tokens per day for Amazon Nova Pro V1

general
57,600,000
count
Fixed
Cross-region model inference tokens per minute for Meta Llama 3.3 70B Instruct

Cross-region model inference tokens per minute for Meta Llama 3.3 70B Instruct

general
600,000
count
Adjustable
(Data Automation) InvokeDataAutomationAsync - Image - Max number of concurrent jobs

(Data Automation) InvokeDataAutomationAsync - Image - Max number of concurrent jobs

compute
20
count
Adjustable
On-demand model inference requests per minute for Ministral 8B 3.0

On-demand model inference requests per minute for Ministral 8B 3.0

throughput
10,000
count
Fixed
Cross-region model inference requests per minute for Anthropic Claude Opus 4.6 V1

Cross-region model inference requests per minute for Anthropic Claude Opus 4.6 V1

throughput
10,000
count
Adjustable
On-demand model inference tokens per minute for Kimi K2 Thinking

On-demand model inference tokens per minute for Kimi K2 Thinking

general
100,000,000
count
Fixed
On-demand model inference requests per minute for Stable Image Control Structure

On-demand model inference requests per minute for Stable Image Control Structure

throughput
10
count
Fixed
Model units per provisioned model for Cohere Command R Plus

Model units per provisioned model for Cohere Command R Plus

general
0
count
Adjustable
On-demand model inference requests per minute for Meta Llama 3.2 11B Instruct

On-demand model inference requests per minute for Meta Llama 3.2 11B Instruct

throughput
400
count
Fixed
On-demand model inference tokens per minute for Qwen3 32B V1

On-demand model inference tokens per minute for Qwen3 32B V1

general
100,000,000
count
Fixed
Records per batch inference job for Llama 4 Maverick

Records per batch inference job for Llama 4 Maverick

general
100,000
count
Adjustable
(Flows) Lex nodes per flow

(Flows) Lex nodes per flow

capacity
5
count
Fixed
Minimum number of records per batch inference job for Gemma 3 12B

Minimum number of records per batch inference job for Gemma 3 12B

general
100
count
Fixed
Throttle rate limit for Bedrock Data Automation Runtime: UntagResource

Throttle rate limit for Bedrock Data Automation Runtime: UntagResource

throughput
25
count
Fixed
(Data Automation) Maximum number of Blueprints per Start Inference request (Videos)

(Data Automation) Maximum number of Blueprints per Start Inference request (Videos)

throughput
1
count
Fixed
(Automated Reasoning) CreateAutomatedReasoningPolicyVersion requests per second

(Automated Reasoning) CreateAutomatedReasoningPolicyVersion requests per second

throughput
5
count
Adjustable
Cross-region model inference requests per minute for Anthropic Claude 3 Sonnet

Cross-region model inference requests per minute for Anthropic Claude 3 Sonnet

throughput
1,000
count
Fixed
APIs per Agent

APIs per Agent

general
11
count
Adjustable
(Prompt management) DeletePrompt requests per second

(Prompt management) DeletePrompt requests per second

throughput
2
count
Fixed
Cross-region model inference requests per minute for Anthropic Claude Sonnet 4.6

Cross-region model inference requests per minute for Anthropic Claude Sonnet 4.6

throughput
10,000
count
Adjustable
On-demand model inference tokens per minute for Cohere Embed V4

On-demand model inference tokens per minute for Cohere Embed V4

general
150,000
count
Fixed
Model invocation max tokens per day for Mistral AI Mistral Large (doubled for cross-region calls)

Model invocation max tokens per day for Mistral AI Mistral Large (doubled for cross-region calls)

general
432,000,000
count
Fixed
(Knowledge Bases) StartIngestionJob requests per second

(Knowledge Bases) StartIngestionJob requests per second

throughput
0.1
count
Fixed
(Data Automation) Maximum Resolution

(Data Automation) Maximum Resolution

general
8,000
count
Fixed
(Automated Reasoning) DeleteAutomatedReasoningPolicy requests per second

(Automated Reasoning) DeleteAutomatedReasoningPolicy requests per second

throughput
5
count
Adjustable
Model invocation max tokens per day for Anthropic Claude Sonnet 4.5 V1 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Sonnet 4.5 V1 (doubled for cross-region calls)

general
3,600,000,000
count
Fixed
Records per batch inference job for Amazon Nova 2 Multimodal Embeddings V1

Records per batch inference job for Amazon Nova 2 Multimodal Embeddings V1

general
100,000
count
Adjustable
On-demand model inference requests per minute for Stable Image Outpaint

On-demand model inference requests per minute for Stable Image Outpaint

throughput
2
count
Fixed
Minimum number of records per batch inference job for Claude Sonnet 4

Minimum number of records per batch inference job for Claude Sonnet 4

general
100
count
Adjustable
Imported models per account

Imported models per account

general
3
count
Adjustable
(Guardrails) Contextual grounding query length in text units

(Guardrails) Contextual grounding query length in text units

general
1
count
Fixed
On-demand model inference tokens per minute for Voxtral Small 1.0

On-demand model inference tokens per minute for Voxtral Small 1.0

general
100,000,000
count
Fixed
Cross-region model inference requests per minute for Stable Image Control Structure

Cross-region model inference requests per minute for Stable Image Control Structure

throughput
20
count
Fixed
On-demand model inference tokens per minute for Amazon Titan Text Lite

On-demand model inference tokens per minute for Amazon Titan Text Lite

general
300,000
count
Fixed
Batch inference job size (in GB) for Claude Opus 4.5

Batch inference job size (in GB) for Claude Opus 4.5

storage
5
count
Fixed
Cross-region model inference tokens per minute for Amazon Nova Lite

Cross-region model inference tokens per minute for Amazon Nova Lite

general
8,000,000
count
Adjustable
On-demand model inference requests per minute for Amazon Titan Text Express

On-demand model inference requests per minute for Amazon Titan Text Express

throughput
400
count
Fixed
Batch inference input file size (in GB) for Nova Lite V1

Batch inference input file size (in GB) for Nova Lite V1

storage
1
count
Fixed
Model units per provisioned model for Amazon Titan Lite V1 4K

Model units per provisioned model for Amazon Titan Lite V1 4K

general
0
count
Adjustable
(Data Automation) InvokeDataAutomation(Sync) - Document - Max number of requests

(Data Automation) InvokeDataAutomation(Sync) - Document - Max number of requests

throughput
60
count
Adjustable
(Flows) CreateFlow requests per second

(Flows) CreateFlow requests per second

throughput
2
count
Fixed
Batch inference input file size (in GB) for Qwen3 32B

Batch inference input file size (in GB) for Qwen3 32B

storage
1
count
Fixed
Model invocation max tokens per day for Voxtral Small 1.0 (doubled for cross-region calls)

Model invocation max tokens per day for Voxtral Small 1.0 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Cross-region model inference requests per minute for Amazon Nova Micro

Cross-region model inference requests per minute for Amazon Nova Micro

throughput
4,000
count
Fixed
On-demand model inference requests per minute for Cohere Rerank 3.5

On-demand model inference requests per minute for Cohere Rerank 3.5

throughput
250
count
Fixed
No-commitment model units for Provisioned Throughput created for custom model Amazon Nova 2 Lite V1.0 256K

No-commitment model units for Provisioned Throughput created for custom model Amazon Nova 2 Lite V1.0 256K

general
0
count
Fixed
On-demand model inference tokens per minute for AI21 Labs Jurassic-2 Mid

On-demand model inference tokens per minute for AI21 Labs Jurassic-2 Mid

general
300,000
count
Fixed
Minimum number of records per batch inference job for DeepSeek V3.2

Minimum number of records per batch inference job for DeepSeek V3.2

general
100
count
Fixed
Records per input file per batch inference job for Claude Opus 4.5

Records per input file per batch inference job for Claude Opus 4.5

storage
100,000
count
Adjustable
(Model customization) Sum of training and validation records for a Titan Text G1 - Premier v1 Fine-tuning job

(Model customization) Sum of training and validation records for a Titan Text G1 - Premier v1 Fine-tuning job

general
20,000
count
Adjustable
Cross-region model inference requests per minute for Stable Image Erase Object

Cross-region model inference requests per minute for Stable Image Erase Object

storage
20
count
Fixed
Throttle rate limit for InvokeDataAutomationAsync

Throttle rate limit for InvokeDataAutomationAsync

throughput
10
count
Fixed
(Automated Reasoning) Concurrent builds per policy

(Automated Reasoning) Concurrent builds per policy

compute
2
count
Fixed
(Flows) Iterator nodes per flow

(Flows) Iterator nodes per flow

capacity
1
count
Fixed
Records per batch inference job for Amazon Nova Premier

Records per batch inference job for Amazon Nova Premier

general
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Kimi K2.5

Sum of in-progress and submitted batch inference jobs using a base model for Kimi K2.5

general
100
count
Adjustable
Agents per account

Agents per account

general
1,000
count
Adjustable
Minimum number of records per batch inference job for Qwen3 VL 235B

Minimum number of records per batch inference job for Qwen3 VL 235B

general
100
count
Fixed
On-demand model inference tokens per minute for Voxtral Mini 1.0

On-demand model inference tokens per minute for Voxtral Mini 1.0

general
100,000,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 VL 235B

Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 VL 235B

general
100
count
Adjustable
Model units per provisioned model for Anthropic Claude 3.5 Sonnet 51K

Model units per provisioned model for Anthropic Claude 3.5 Sonnet 51K

general
0
count
Adjustable
Batch inference job size (in GB) for Gemma 3 4B

Batch inference job size (in GB) for Gemma 3 4B

storage
5
count
Fixed
Batch inference input file size (in GB) for GLM 4.7 Flash

Batch inference input file size (in GB) for GLM 4.7 Flash

storage
1
count
Fixed
Records per input file per batch inference job for Titan Text Embeddings V2

Records per input file per batch inference job for Titan Text Embeddings V2

storage
100,000
count
Adjustable
Global cross-region model inference tokens per day for Anthropic Claude Opus 4.5

Global cross-region model inference tokens per day for Anthropic Claude Opus 4.5

general
2,880,000,000
count
Fixed
Global cross-region model inference requests per minute for Anthropic Claude Sonnet 4.5 V1 1M Context Length

Global cross-region model inference requests per minute for Anthropic Claude Sonnet 4.5 V1 1M Context Length

throughput
1,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for MiniMax M2.5

Sum of in-progress and submitted batch inference jobs using a base model for MiniMax M2.5

general
100
count
Adjustable
Model units per provisioned model for Anthropic Claude 3.5 Haiku 200K

Model units per provisioned model for Anthropic Claude 3.5 Haiku 200K

general
0
count
Adjustable
Batch inference input file size (in GB) for OpenAI GPT OSS Safeguard 120b

Batch inference input file size (in GB) for OpenAI GPT OSS Safeguard 120b

storage
1
count
Fixed
Cross-region model inference requests per minute for Anthropic Claude Sonnet 4 V1 1M Context Length

Cross-region model inference requests per minute for Anthropic Claude Sonnet 4 V1 1M Context Length

throughput
5
count
Adjustable
Minimum number of records per batch inference job for Llama 4 Scout

Minimum number of records per batch inference job for Llama 4 Scout

general
100
count
Fixed
Batch inference job size (in GB) for OpenAI GPT OSS 20b

Batch inference job size (in GB) for OpenAI GPT OSS 20b

storage
5
count
Fixed
Batch inference job size (in GB) for GLM 5

Batch inference job size (in GB) for GLM 5

storage
5
count
Fixed
On-demand model inference tokens per minute for Ministral 3B 3.0

On-demand model inference tokens per minute for Ministral 3B 3.0

general
100,000,000
count
Fixed
(Model customization) Sum of training and validation records for a Amazon Nova Lite Fine-tuning job

(Model customization) Sum of training and validation records for a Amazon Nova Lite Fine-tuning job

general
20,000
count
Adjustable
Batch inference job size (in GB) for Amazon Nova 2 Multimodal Embeddings V1

Batch inference job size (in GB) for Amazon Nova 2 Multimodal Embeddings V1

storage
100
count
Fixed
Records per input file per batch inference job for MiniMax M2

Records per input file per batch inference job for MiniMax M2

storage
100,000
count
Adjustable
Batch inference input file size (in GB) for Nova Pro V1

Batch inference input file size (in GB) for Nova Pro V1

storage
1
count
Fixed
Records per input file per batch inference job for Claude Sonnet 4.5

Records per input file per batch inference job for Claude Sonnet 4.5

storage
100,000
count
Adjustable
(Automated Reasoning) DeleteAutomatedReasoningPolicyTestCase requests per second

(Automated Reasoning) DeleteAutomatedReasoningPolicyTestCase requests per second

throughput
5
count
Adjustable
Batch inference job size (in GB) for Magistral Small 2509

Batch inference job size (in GB) for Magistral Small 2509

storage
5
count
Fixed
Cross-region model inference tokens per minute for Meta Llama 3.1 8B Instruct

Cross-region model inference tokens per minute for Meta Llama 3.1 8B Instruct

general
600,000
count
Adjustable
On-demand model inference tokens per minute for Amazon Titan Image Generator G1 V2

On-demand model inference tokens per minute for Amazon Titan Image Generator G1 V2

general
2,000
count
Fixed
(Data Automation) Maximum Blueprints per Project (Documents)

(Data Automation) Maximum Blueprints per Project (Documents)

general
40
count
Fixed
Minimum number of records per batch inference job for Titan Multimodal Embeddings G1

Minimum number of records per batch inference job for Titan Multimodal Embeddings G1

general
100
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.1 8B Instruct

Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.1 8B Instruct

general
100
count
Adjustable
Batch inference input file size (in GB) for Writer Palmyra Vision 7B

Batch inference input file size (in GB) for Writer Palmyra Vision 7B

storage
1
count
Fixed
Batch inference job size (in GB) for Voxtral Mini 3B 2507

Batch inference job size (in GB) for Voxtral Mini 3B 2507

storage
5
count
Fixed
Global cross-region model inference requests per minute for Anthropic Claude Haiku 4.5

Global cross-region model inference requests per minute for Anthropic Claude Haiku 4.5

throughput
10,000
count
Adjustable
(Guardrails) Example phrases per Topic

(Guardrails) Example phrases per Topic

general
5
count
Fixed
Batch inference input file size (in GB) for Gemma 3 12B

Batch inference input file size (in GB) for Gemma 3 12B

storage
1
count
Fixed
DeleteAgent requests per second

DeleteAgent requests per second

throughput
2
count
Fixed
(Knowledge Bases) GenerateQuery requests per second

(Knowledge Bases) GenerateQuery requests per second

throughput
2
count
Fixed
Minimum number of records per batch inference job for Llama 3.2 1B Instruct

Minimum number of records per batch inference job for Llama 3.2 1B Instruct

general
100
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.3 70B Instruct

Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.3 70B Instruct

general
100
count
Adjustable
Global cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.6

Global cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.6

general
6,000,000
count
Adjustable
Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1

Cross-region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1

general
5,000,000
count
Adjustable
Batch inference input file size (in GB) for Llama 3.2 11B Instruct

Batch inference input file size (in GB) for Llama 3.2 11B Instruct

storage
1
count
Fixed
Records per batch inference job for GLM 4.7

Records per batch inference job for GLM 4.7

general
100,000
count
Adjustable
On-demand InvokeModel async concurrent requests for Amazon Nova 2 Multimodal Embeddings V1

On-demand InvokeModel async concurrent requests for Amazon Nova 2 Multimodal Embeddings V1

compute
30
count
Fixed
(Automated Reasoning) UpdateAutomatedReasoningPolicyTestCase requests per second

(Automated Reasoning) UpdateAutomatedReasoningPolicyTestCase requests per second

throughput
5
count
Adjustable
Records per input file per batch inference job for Gemma 3 4B

Records per input file per batch inference job for Gemma 3 4B

storage
100,000
count
Adjustable
Cross-region model inference tokens per minute for Meta Llama 4 Scout V1

Cross-region model inference tokens per minute for Meta Llama 4 Scout V1

general
600,000
count
Adjustable
On-demand model inference requests per minute for Gemma 3 4B

On-demand model inference requests per minute for Gemma 3 4B

throughput
10,000
count
Fixed
Batch inference job size (in GB) for Claude Sonnet 4.6

Batch inference job size (in GB) for Claude Sonnet 4.6

storage
5
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Writer Palmyra Vision 7B

Sum of in-progress and submitted batch inference jobs using a base model for Writer Palmyra Vision 7B

general
100
count
Adjustable
(Guardrails) Topics per guardrail

(Guardrails) Topics per guardrail

general
30
count
Fixed
Batch inference job size (in GB) for DeepSeek V3.2

Batch inference job size (in GB) for DeepSeek V3.2

storage
5
count
Fixed
On-demand model inference requests per minute for Cohere Command R Plus

On-demand model inference requests per minute for Cohere Command R Plus

throughput
400
count
Fixed
(Data Automation) Maximum video length (Minutes)

(Data Automation) Maximum video length (Minutes)

general
240
count
Fixed
Cross-region model inference requests per minute for Meta Llama 3.1 70B Instruct

Cross-region model inference requests per minute for Meta Llama 3.1 70B Instruct

throughput
800
count
Fixed
Records per batch inference job for Voxtral Small 24B 2507

Records per batch inference job for Voxtral Small 24B 2507

general
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Mistral Large 3

Sum of in-progress and submitted batch inference jobs using a base model for Mistral Large 3

general
100
count
Adjustable
(Guardrails) Guardrails per account

(Guardrails) Guardrails per account

general
100
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Claude Opus 4.6

Sum of in-progress and submitted batch inference jobs using a base model for Claude Opus 4.6

general
100
count
Adjustable
Records per input file per batch inference job for Claude Opus 4.6

Records per input file per batch inference job for Claude Opus 4.6

storage
100,000
count
Adjustable
Global cross-region model inference requests per minute for Anthropic Claude Opus 4.5

Global cross-region model inference requests per minute for Anthropic Claude Opus 4.5

throughput
10,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 1B Instruct

Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 1B Instruct

general
100
count
Adjustable
Records per batch inference job for Titan Multimodal Embeddings G1

Records per batch inference job for Titan Multimodal Embeddings G1

general
100,000
count
Adjustable
Cross-region model inference requests per minute for Stable Image Control Sketch

Cross-region model inference requests per minute for Stable Image Control Sketch

throughput
20
count
Fixed
On-demand model inference requests per minute for Voxtral Small 1.0

On-demand model inference requests per minute for Voxtral Small 1.0

throughput
10,000
count
Fixed
(Knowledge Bases) DeleteDataSource requests per second

(Knowledge Bases) DeleteDataSource requests per second

throughput
2
count
Fixed
Model invocation max tokens per day for Qwen3 32B V1 (doubled for cross-region calls)

Model invocation max tokens per day for Qwen3 32B V1 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
(Knowledge Bases) ListIngestionJobs requests per second

(Knowledge Bases) ListIngestionJobs requests per second

throughput
10
count
Fixed
Records per input file per batch inference job for Llama 3.2 11B Instruct

Records per input file per batch inference job for Llama 3.2 11B Instruct

storage
100,000
count
Adjustable
Minimum number of records per batch inference job for Qwen3 Coder 30B

Minimum number of records per batch inference job for Qwen3 Coder 30B

general
100
count
Fixed
Records per input file per batch inference job for MiniMax M2.1

Records per input file per batch inference job for MiniMax M2.1

storage
100,000
count
Adjustable
GetAgentAlias requests per second

GetAgentAlias requests per second

throughput
10
count
Fixed
Cross-region model inference tokens per minute for Cohere Embed V4

Cross-region model inference tokens per minute for Cohere Embed V4

general
300,000
count
Adjustable
Records per batch inference job for Mistral Large 2 (24.07)

Records per batch inference job for Mistral Large 2 (24.07)

general
100,000
count
Adjustable
On-demand model inference tokens per minute for Meta Llama 2 70B

On-demand model inference tokens per minute for Meta Llama 2 70B

general
300,000
count
Fixed
(Flows) Input nodes per flow

(Flows) Input nodes per flow

capacity
1
count
Fixed
On-demand model inference requests per minute for Gemma 3 12B

On-demand model inference requests per minute for Gemma 3 12B

throughput
10,000
count
Fixed
Minimum number of records per batch inference job for OpenAI GPT OSS 20b

Minimum number of records per batch inference job for OpenAI GPT OSS 20b

general
100
count
Fixed
On-demand model inference requests per minute for Cohere Command R

On-demand model inference requests per minute for Cohere Command R

throughput
400
count
Fixed
Records per batch inference job for Ministral 3B

Records per batch inference job for Ministral 3B

general
100,000
count
Adjustable
(Evaluation) Number of models in automated model evaluation job

(Evaluation) Number of models in automated model evaluation job

general
1
count
Fixed
On-demand model inference requests per minute for NVIDIA Nemotron Nano 2 VL

On-demand model inference requests per minute for NVIDIA Nemotron Nano 2 VL

throughput
10,000
count
Fixed
Records per input file per batch inference job for Claude 3 Haiku

Records per input file per batch inference job for Claude 3 Haiku

storage
100,000
count
Adjustable
Batch inference input file size (in GB) for Llama 3.2 90B Instruct

Batch inference input file size (in GB) for Llama 3.2 90B Instruct

storage
1
count
Fixed
On-demand model inference requests per minute for Mistral Mixtral 8x7b Instruct

On-demand model inference requests per minute for Mistral Mixtral 8x7b Instruct

throughput
400
count
Fixed
(Flows) Knowledge base nodes per flow

(Flows) Knowledge base nodes per flow

capacity
20
count
Fixed
Records per input file per batch inference job for Writer Palmyra Vision 7B

Records per input file per batch inference job for Writer Palmyra Vision 7B

storage
100,000
count
Adjustable
Model invocation max tokens per day for Qwen3 VL 235B A22B (doubled for cross-region calls)

Model invocation max tokens per day for Qwen3 VL 235B A22B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Model invocation max tokens per day for Anthropic Claude 3.5 Sonnet V2 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude 3.5 Sonnet V2 (doubled for cross-region calls)

general
2,880,000,000
count
Fixed
(Knowledge Bases) IngestKnowledgeBaseDocuments total payload size

(Knowledge Bases) IngestKnowledgeBaseDocuments total payload size

storage
6
count
Fixed
On-demand model inference tokens per minute for Mistral AI Mistral 7B Instruct

On-demand model inference tokens per minute for Mistral AI Mistral 7B Instruct

general
300,000
count
Fixed
Records per input file per batch inference job for Mistral Small

Records per input file per batch inference job for Mistral Small

storage
100,000
count
Adjustable
Batch inference input file size (in GB) for Qwen3 Coder Next

Batch inference input file size (in GB) for Qwen3 Coder Next

storage
1
count
Fixed
Batch inference input file size (in GB) for Ministral 3B

Batch inference input file size (in GB) for Ministral 3B

storage
1
count
Fixed
Associated aliases per Agent

Associated aliases per Agent

general
10
count
Fixed
Records per batch inference job for OpenAI GPT OSS 20b

Records per batch inference job for OpenAI GPT OSS 20b

general
100,000
count
Adjustable
(Flows) Lambda function nodes per flow

(Flows) Lambda function nodes per flow

capacity
20
count
Fixed
(Prompt management) CreatePrompt requests per second

(Prompt management) CreatePrompt requests per second

throughput
2
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for NVIDIA Nemotron Nano 3 30B

Sum of in-progress and submitted batch inference jobs using a base model for NVIDIA Nemotron Nano 3 30B

general
100
count
Adjustable
Records per input file per batch inference job for Claude Sonnet 4.6

Records per input file per batch inference job for Claude Sonnet 4.6

storage
100,000
count
Adjustable
On-demand model inference requests per minute for Amazon Nova Lite

On-demand model inference requests per minute for Amazon Nova Lite

throughput
2,000
count
Fixed
On-demand model inference tokens per minute for Gemma 3 27B

On-demand model inference tokens per minute for Gemma 3 27B

general
100,000,000
count
Fixed
Model invocation max tokens per day for Anthropic Claude Sonnet 4.6 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Sonnet 4.6 (doubled for cross-region calls)

general
4,320,000,000
count
Fixed
(Knowledge Bases) Retrieve requests per second

(Knowledge Bases) Retrieve requests per second

throughput
20
count
Fixed
On-demand model inference requests per minute for Stable Image Remove Background

On-demand model inference requests per minute for Stable Image Remove Background

throughput
10
count
Fixed
Model units per provisioned model for Cohere Embed Multilingual

Model units per provisioned model for Cohere Embed Multilingual

general
0
count
Adjustable
Records per batch inference job for Claude Opus 4.6

Records per batch inference job for Claude Opus 4.6

general
100,000
count
Adjustable
On-demand model inference requests per minute for Meta Llama 3.1 8B Instruct

On-demand model inference requests per minute for Meta Llama 3.1 8B Instruct

throughput
800
count
Fixed
On-demand model inference requests per minute for Cohere Embed Multilingual

On-demand model inference requests per minute for Cohere Embed Multilingual

throughput
2,000
count
Fixed
(Evaluation) Number of concurrent model evaluation jobs that use human workers

(Evaluation) Number of concurrent model evaluation jobs that use human workers

compute
10
count
Fixed
On-demand model inference tokens per minute for Amazon Titan Text Embeddings V2

On-demand model inference tokens per minute for Amazon Titan Text Embeddings V2

general
300,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for OpenAI GPT OSS 120b

Sum of in-progress and submitted batch inference jobs using a base model for OpenAI GPT OSS 120b

general
100
count
Adjustable
(Flows) UpdateFlow requests per second

(Flows) UpdateFlow requests per second

throughput
2
count
Fixed
On-demand model inference requests per minute for Stable Image Erase Object

On-demand model inference requests per minute for Stable Image Erase Object

storage
10
count
Fixed
(Data Automation) Maximum number of list fields per Blueprint

(Data Automation) Maximum number of list fields per Blueprint

general
15
count
Fixed
Model units no-commitment Provisioned Throughputs across custom models

Model units no-commitment Provisioned Throughputs across custom models

general
0
count
Adjustable
Records per batch inference job for Nova Pro V1

Records per batch inference job for Nova Pro V1

general
100,000
count
Adjustable
Minimum number of records per batch inference job for Claude 3.5 Sonnet v2

Minimum number of records per batch inference job for Claude 3.5 Sonnet v2

general
100
count
Fixed
(Evaluation) Size of prompt

(Evaluation) Size of prompt

storage
4
count
Fixed
Global cross-region model inference requests per minute for Anthropic Claude Opus 4.6 V1

Global cross-region model inference requests per minute for Anthropic Claude Opus 4.6 V1

throughput
10,000
count
Adjustable
(Evaluation) Task time for workers

(Evaluation) Task time for workers

general
30
count
Fixed
Throttle rate limit for GetBlueprint

Throttle rate limit for GetBlueprint

throughput
5
count
Fixed
Batch inference input file size (in GB) for Llama 3.1 70B Instruct

Batch inference input file size (in GB) for Llama 3.1 70B Instruct

storage
1
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for GLM 5

Sum of in-progress and submitted batch inference jobs using a base model for GLM 5

general
100
count
Adjustable
On-demand model inference requests per minute for Amazon Nova Micro

On-demand model inference requests per minute for Amazon Nova Micro

throughput
2,000
count
Fixed
On-demand model inference requests per minute for Amazon Titan Multimodal Embeddings G1

On-demand model inference requests per minute for Amazon Titan Multimodal Embeddings G1

throughput
2,000
count
Fixed
Model invocation max tokens per day for Anthropic Claude 3.7 Sonnet V1 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude 3.7 Sonnet V1 (doubled for cross-region calls)

general
720,000,000
count
Fixed
Records per batch inference job for Mistral Large 3

Records per batch inference job for Mistral Large 3

general
100,000
count
Adjustable
Batch inference job size (in GB) for Kimi K2.5

Batch inference job size (in GB) for Kimi K2.5

storage
5
count
Fixed
Endpoints per inference profile

Endpoints per inference profile

storage
5
count
Fixed
Action groups per Agent

Action groups per Agent

general
20
count
Adjustable
Model units per provisioned model for Anthropic Claude V2.1 18K

Model units per provisioned model for Anthropic Claude V2.1 18K

general
0
count
Adjustable
Records per batch inference job for Gemma 3 12B

Records per batch inference job for Gemma 3 12B

general
100,000
count
Adjustable
Minimum number of records per batch inference job for Writer Palmyra Vision 7B

Minimum number of records per batch inference job for Writer Palmyra Vision 7B

general
100
count
Fixed
Cross-region model inference requests per minute for Amazon Nova 2 Pro Preview

Cross-region model inference requests per minute for Amazon Nova 2 Pro Preview

throughput
100
count
Fixed
On-demand model inference tokens per minute for Meta Llama 2 Chat 13B

On-demand model inference tokens per minute for Meta Llama 2 Chat 13B

general
300,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.1 70B Instruct

Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.1 70B Instruct

general
100
count
Adjustable
Records per input file per batch inference job for Amazon Nova 2 Multimodal Embeddings V1

Records per input file per batch inference job for Amazon Nova 2 Multimodal Embeddings V1

storage
100,000
count
Adjustable
Model units per provisioned model for Anthropic Claude 3 Sonnet 200K

Model units per provisioned model for Anthropic Claude 3 Sonnet 200K

general
0
count
Adjustable
Model invocation max tokens per day for Cohere Embed V4 (doubled for cross-region calls)

Model invocation max tokens per day for Cohere Embed V4 (doubled for cross-region calls)

general
216,000,000
count
Fixed
(Data Automation) Description length for fields (Characters)

(Data Automation) Description length for fields (Characters)

general
300
count
Fixed
On-demand model inference tokens per minute for NVIDIA Nemotron Nano 2

On-demand model inference tokens per minute for NVIDIA Nemotron Nano 2

general
100,000,000
count
Fixed
Throttle rate limit for GetDataAutomationStatus

Throttle rate limit for GetDataAutomationStatus

throughput
10
count
Fixed
(Data Automation) InvokeDataAutomationAsync - Video - Max number of concurrent jobs

(Data Automation) InvokeDataAutomationAsync - Video - Max number of concurrent jobs

compute
20
count
Adjustable
Records per input file per batch inference job for Qwen3 32B

Records per input file per batch inference job for Qwen3 32B

storage
100,000
count
Adjustable
On-demand model inference requests per minute for Z.ai GLM 5

On-demand model inference requests per minute for Z.ai GLM 5

throughput
10,000
count
Fixed
Records per batch inference job for Llama 3.2 3B Instruct

Records per batch inference job for Llama 3.2 3B Instruct

general
100,000
count
Adjustable
Model units per provisioned model for Anthropic Claude Instant V1 100K

Model units per provisioned model for Anthropic Claude Instant V1 100K

general
0
count
Adjustable
On-demand model inference requests per minute for Cohere Embed English

On-demand model inference requests per minute for Cohere Embed English

throughput
2,000
count
Fixed
Records per batch inference job for Mistral Small

Records per batch inference job for Mistral Small

general
100,000
count
Adjustable
(Knowledge Bases) Ingestion job size

(Knowledge Bases) Ingestion job size

storage
100
count
Fixed
On-demand model inference requests per minute for Meta Llama 3.2 1B Instruct

On-demand model inference requests per minute for Meta Llama 3.2 1B Instruct

throughput
800
count
Fixed
Records per input file per batch inference job for Llama 3.2 3B Instruct

Records per input file per batch inference job for Llama 3.2 3B Instruct

storage
100,000
count
Adjustable
Records per batch inference job for Claude 3 Opus

Records per batch inference job for Claude 3 Opus

general
100,000
count
Adjustable
Minimum number of records per batch inference job for MiniMax M2.5

Minimum number of records per batch inference job for MiniMax M2.5

general
100
count
Fixed
Global cross-region model inference tokens per day for Amazon Nova 2 Lite

Global cross-region model inference tokens per day for Amazon Nova 2 Lite

general
11,520,000,000
count
Fixed
(Knowledge Bases) CreateDataSource requests per second

(Knowledge Bases) CreateDataSource requests per second

throughput
2
count
Fixed
On-demand model inference tokens per minute for Meta Llama 3.2 90B Instruct

On-demand model inference tokens per minute for Meta Llama 3.2 90B Instruct

general
300,000
count
Fixed
On-demand model inference requests per minute for Meta Llama 3.2 3B Instruct

On-demand model inference requests per minute for Meta Llama 3.2 3B Instruct

throughput
800
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for OpenAI GPT OSS Safeguard 20b

Sum of in-progress and submitted batch inference jobs using a base model for OpenAI GPT OSS Safeguard 20b

general
100
count
Adjustable
Batch inference job size (in GB) for Qwen3 Coder Next

Batch inference job size (in GB) for Qwen3 Coder Next

storage
5
count
Fixed
On-demand model inference tokens per minute for NVIDIA Nemotron 3 Super 120B A12B

On-demand model inference tokens per minute for NVIDIA Nemotron 3 Super 120B A12B

general
100,000,000
count
Fixed
(Data Automation) CreateBlueprint - Max number of blueprints per account

(Data Automation) CreateBlueprint - Max number of blueprints per account

general
350
count
Adjustable
Cross-region model inference requests per minute for Stable Image Search and Replace

Cross-region model inference requests per minute for Stable Image Search and Replace

throughput
20
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Haiku

Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Haiku

general
100
count
Adjustable
Batch inference job size (in GB) for OpenAI GPT OSS Safeguard 120b

Batch inference job size (in GB) for OpenAI GPT OSS Safeguard 120b

storage
5
count
Fixed
Batch inference job size (in GB) for Titan Text Embeddings V2

Batch inference job size (in GB) for Titan Text Embeddings V2

storage
5
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Claude Haiku 4.5

Sum of in-progress and submitted batch inference jobs using a base model for Claude Haiku 4.5

general
100
count
Adjustable
Cross-region model inference tokens per minute for Anthropic Claude 3.5 Sonnet

Cross-region model inference tokens per minute for Anthropic Claude 3.5 Sonnet

general
800,000
count
Adjustable
Batch inference input file size (in GB) for NVIDIA Nemotron Nano 3 30B

Batch inference input file size (in GB) for NVIDIA Nemotron Nano 3 30B

storage
1
count
Fixed
Global cross-region model inference tokens per minute for Anthropic Claude Opus 4.5

Global cross-region model inference tokens per minute for Anthropic Claude Opus 4.5

general
2,000,000
count
Adjustable
On-demand model inference requests per minute for Mistral Large

On-demand model inference requests per minute for Mistral Large

throughput
400
count
Fixed
Throttle rate limit for DeleteDataAutomationProject

Throttle rate limit for DeleteDataAutomationProject

throughput
5
count
Fixed
Batch inference job size (in GB) for MiniMax M2.5

Batch inference job size (in GB) for MiniMax M2.5

storage
5
count
Fixed
Batch inference input file size (in GB) for Amazon Nova Premier

Batch inference input file size (in GB) for Amazon Nova Premier

storage
1
count
Fixed
(Flows) Output nodes per flow

(Flows) Output nodes per flow

capacity
20
count
Fixed
Model units, with commitment, for Provisioned Throughout created for Meta Llama 4 Scout 17B Instruct 10M

Model units, with commitment, for Provisioned Throughout created for Meta Llama 4 Scout 17B Instruct 10M

general
0
count
Adjustable
Batch inference job size (in GB) for Qwen3 Coder 30B

Batch inference job size (in GB) for Qwen3 Coder 30B

storage
5
count
Fixed
Batch inference input file size (in GB) for Llama 3.3 70B Instruct

Batch inference input file size (in GB) for Llama 3.3 70B Instruct

storage
1
count
Fixed
Batch inference job size (in GB) for Voxtral Small 24B 2507

Batch inference job size (in GB) for Voxtral Small 24B 2507

storage
5
count
Fixed
(Model customization) Sum of training and validation records for a Titan Image Generator G1 V1 Fine-tuning job

(Model customization) Sum of training and validation records for a Titan Image Generator G1 V1 Fine-tuning job

general
10,000
count
Adjustable
(Automated Reasoning) Rules in policy

(Automated Reasoning) Rules in policy

identity
500
count
Fixed
Minimum number of records per batch inference job for Amazon Nova Premier

Minimum number of records per batch inference job for Amazon Nova Premier

general
100
count
Fixed
On-demand model inference requests per minute for Amazon Rerank 1.0

On-demand model inference requests per minute for Amazon Rerank 1.0

throughput
200
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 32B

Sum of in-progress and submitted batch inference jobs using a base model for Qwen3 32B

general
100
count
Adjustable
Model invocation max tokens per day for Gemma 3 12B (doubled for cross-region calls)

Model invocation max tokens per day for Gemma 3 12B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
DeleteAgentVersion requests per second

DeleteAgentVersion requests per second

throughput
2
count
Fixed
Batch inference input file size (in GB) for MiniMax M2.5

Batch inference input file size (in GB) for MiniMax M2.5

storage
1
count
Fixed
On-demand model inference tokens per minute for Gemma 3 4B

On-demand model inference tokens per minute for Gemma 3 4B

general
100,000,000
count
Fixed
Minimum number of records per batch inference job for Llama 3.2 90B Instruct

Minimum number of records per batch inference job for Llama 3.2 90B Instruct

general
100
count
Fixed
Records per batch inference job for OpenAI GPT OSS 120b

Records per batch inference job for OpenAI GPT OSS 120b

general
100,000
count
Adjustable
Minimum number of records per batch inference job for Ministral 3 14B

Minimum number of records per batch inference job for Ministral 3 14B

general
100
count
Fixed
(Prompt management) Prompts per account

(Prompt management) Prompts per account

general
500
count
Adjustable
(Data Automation) Maximum Audio Sample Rate (Hz)

(Data Automation) Maximum Audio Sample Rate (Hz)

throughput
48,000
count
Fixed
(Automated Reasoning) GetAutomatedReasoningPolicyBuildWorkflow requests per second

(Automated Reasoning) GetAutomatedReasoningPolicyBuildWorkflow requests per second

throughput
10
count
Adjustable
On-demand model inference tokens per minute for Cohere Embed Multilingual

On-demand model inference tokens per minute for Cohere Embed Multilingual

general
300,000
count
Fixed
Cross-region model inference tokens per minute for DeepSeek R1 V1

Cross-region model inference tokens per minute for DeepSeek R1 V1

general
200,000
count
Adjustable
On-demand model inference requests per minute for Stability.ai Stable Diffusion XL 1.0

On-demand model inference requests per minute for Stability.ai Stable Diffusion XL 1.0

throughput
60
count
Fixed
On-demand model inference requests per minute for TwelveLabs Marengo Embed 3.0

On-demand model inference requests per minute for TwelveLabs Marengo Embed 3.0

throughput
500
count
Adjustable
Records per input file per batch inference job for GLM 4.7 Flash

Records per input file per batch inference job for GLM 4.7 Flash

storage
100,000
count
Adjustable
On-demand model inference requests per minute for AI21 Labs Jurassic-2 Ultra

On-demand model inference requests per minute for AI21 Labs Jurassic-2 Ultra

throughput
100
count
Fixed
Records per batch inference job for Devstral 2 123B

Records per batch inference job for Devstral 2 123B

general
100,000
count
Adjustable
On-demand InvokeModel async concurrent requests for TwelveLabs Marengo Embed 3.0

On-demand InvokeModel async concurrent requests for TwelveLabs Marengo Embed 3.0

compute
10
count
Adjustable
(Model customization) Sum of training and validation records for a Titan Text G1 - Express v1 Fine-tuning job

(Model customization) Sum of training and validation records for a Titan Text G1 - Express v1 Fine-tuning job

general
10,000
count
Adjustable
Minimum number of records per batch inference job for Llama 3.1 405B Instruct

Minimum number of records per batch inference job for Llama 3.1 405B Instruct

general
100
count
Fixed
Records per batch inference job for Llama 4 Scout

Records per batch inference job for Llama 4 Scout

general
100,000
count
Adjustable
(Guardrails) Contextual grounding source length in text units

(Guardrails) Contextual grounding source length in text units

general
100
count
Fixed
On-demand model inference tokens per minute for AI21 Labs Jamba 1.5 Mini

On-demand model inference tokens per minute for AI21 Labs Jamba 1.5 Mini

general
300,000
count
Fixed
Records per input file per batch inference job for NVIDIA Nemotron Nano 12B

Records per input file per batch inference job for NVIDIA Nemotron Nano 12B

storage
100,000
count
Adjustable
Model units per provisioned model for the 24k context length variant for Amazon Nova Micro

Model units per provisioned model for the 24k context length variant for Amazon Nova Micro

general
0
count
Adjustable
Cross-Region model inference tokens per minute for Anthropic Claude 3.5 Haiku

Cross-Region model inference tokens per minute for Anthropic Claude 3.5 Haiku

general
4,000,000
count
Adjustable
Records per input file per batch inference job for Claude 3 Opus

Records per input file per batch inference job for Claude 3 Opus

storage
100,000
count
Adjustable
Batch inference job size (in GB) for Qwen3 VL 235B

Batch inference job size (in GB) for Qwen3 VL 235B

storage
5
count
Fixed
Records per input file per batch inference job for Gemma 3 12B

Records per input file per batch inference job for Gemma 3 12B

storage
100,000
count
Adjustable
On-demand model inference tokens per minute for Nemotron Nano 3 30B

On-demand model inference tokens per minute for Nemotron Nano 3 30B

general
100,000,000
count
Fixed
Minimum number of records per batch inference job for Mistral Large 3

Minimum number of records per batch inference job for Mistral Large 3

general
100
count
Fixed
Minimum number of records per batch inference job for Claude Haiku 4.5

Minimum number of records per batch inference job for Claude Haiku 4.5

general
100
count
Fixed
On-demand model inference requests per minute for Stable Image Style Transfer

On-demand model inference requests per minute for Stable Image Style Transfer

throughput
10
count
Fixed
Global cross-region model inference requests per minute for Anthropic Claude Sonnet 4.5 V1

Global cross-region model inference requests per minute for Anthropic Claude Sonnet 4.5 V1

throughput
10,000
count
Adjustable
(Data Automation) (Console) Maximum document file size (MB)

(Data Automation) (Console) Maximum document file size (MB)

storage
200
count
Fixed
Model units per provisioned model for the 300k context length variant for Amazon Nova Pro

Model units per provisioned model for the 300k context length variant for Amazon Nova Pro

general
0
count
Adjustable
On-demand model inference requests per minute for Qwen3 Coder 30B a3b V1

On-demand model inference requests per minute for Qwen3 Coder 30B a3b V1

throughput
10,000
count
Fixed
On-demand model inference requests per minute for Anthropic Claude 3 Haiku

On-demand model inference requests per minute for Anthropic Claude 3 Haiku

throughput
1,000
count
Fixed
Records per input file per batch inference job for MiniMax M2.5

Records per input file per batch inference job for MiniMax M2.5

storage
100,000
count
Adjustable
(Model customization) Sum of training and validation records for a Amazon Nova 2 Lite Fine-tuning job

(Model customization) Sum of training and validation records for a Amazon Nova 2 Lite Fine-tuning job

general
20,000
count
Adjustable
GetAgentKnowledgeBase requests per second

GetAgentKnowledgeBase requests per second

throughput
15
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for OpenAI GPT OSS Safeguard 120b

Sum of in-progress and submitted batch inference jobs using a base model for OpenAI GPT OSS Safeguard 120b

general
100
count
Adjustable
(Data Automation) (Console) Maximum number of pages per document file

(Data Automation) (Console) Maximum number of pages per document file

storage
20
count
Fixed
Records per batch inference job for Llama 3.2 90B Instruct

Records per batch inference job for Llama 3.2 90B Instruct

general
100,000
count
Adjustable
Model invocation max tokens per day for Amazon Nova Micro (doubled for cross-region calls)

Model invocation max tokens per day for Amazon Nova Micro (doubled for cross-region calls)

general
5,760,000,000
count
Fixed
Throttle rate limit for CreateBlueprintVersion

Throttle rate limit for CreateBlueprintVersion

throughput
5
count
Fixed
Model invocation max tokens per day for OpenAI GPT OSS 120B (doubled for cross-region calls)

Model invocation max tokens per day for OpenAI GPT OSS 120B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Model units per provisioned model for Anthropic Claude 3.5 Sonnet V2 18K

Model units per provisioned model for Anthropic Claude 3.5 Sonnet V2 18K

general
0
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Claude Sonnet 4.5.

Sum of in-progress and submitted batch inference jobs using a base model for Claude Sonnet 4.5.

general
100
count
Adjustable
Minimum number of records per batch inference job for Kimi K2.5

Minimum number of records per batch inference job for Kimi K2.5

general
100
count
Fixed
(Flows) CreateFlowAlias requests per second

(Flows) CreateFlowAlias requests per second

throughput
2
count
Fixed
Model units per provisioned model for AI21 Labs Jurassic-2 Mid

Model units per provisioned model for AI21 Labs Jurassic-2 Mid

general
0
count
Adjustable
Batch inference input file size (in GB) for NVIDIA Nemotron Nano 9B

Batch inference input file size (in GB) for NVIDIA Nemotron Nano 9B

storage
1
count
Fixed
Records per input file per batch inference job for Devstral 2 123B

Records per input file per batch inference job for Devstral 2 123B

storage
100,000
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 11B Instruct

Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 11B Instruct

general
100
count
Adjustable
On-demand model inference tokens per minute for Anthropic Claude 3 Haiku

On-demand model inference tokens per minute for Anthropic Claude 3 Haiku

general
2,000,000
count
Fixed
(Data Automation) Maximum Blueprints per Project (Images)

(Data Automation) Maximum Blueprints per Project (Images)

general
1
count
Fixed
Global cross-region model inference tokens per day for Amazon Nova 2 Omni

Global cross-region model inference tokens per day for Amazon Nova 2 Omni

general
11,520,000,000
count
Fixed
Records per input file per batch inference job for NVIDIA Nemotron Nano 9B

Records per input file per batch inference job for NVIDIA Nemotron Nano 9B

storage
100,000
count
Adjustable
Agent Collaborators per Agent

Agent Collaborators per Agent

general
1,000
count
Adjustable
UpdateAgentActionGroup requests per second

UpdateAgentActionGroup requests per second

throughput
6
count
Fixed
(Automated Reasoning) GetAutomatedReasoningPolicyBuildWorkflowResultAssets requests per second

(Automated Reasoning) GetAutomatedReasoningPolicyBuildWorkflowResultAssets requests per second

throughput
10
count
Adjustable
Model units, with commitment, for Provisioned Throughout created for Meta Llama 4 Scout 17B Instruct 128K

Model units, with commitment, for Provisioned Throughout created for Meta Llama 4 Scout 17B Instruct 128K

general
0
count
Adjustable
Sum of in-progress and submitted batch inference jobs using a base model for GLM 4.7

Sum of in-progress and submitted batch inference jobs using a base model for GLM 4.7

general
100
count
Adjustable
(Automated Reasoning) GetAutomatedReasoningPolicyTestCase requests per second

(Automated Reasoning) GetAutomatedReasoningPolicyTestCase requests per second

throughput
10
count
Adjustable
Batch inference job size (in GB) for MiniMax M2

Batch inference job size (in GB) for MiniMax M2

storage
5
count
Fixed
Model units per provisioned model for Amazon Titan Multimodal Embeddings G1

Model units per provisioned model for Amazon Titan Multimodal Embeddings G1

general
0
count
Adjustable
(Flows) Total nodes per flow

(Flows) Total nodes per flow

capacity
40
count
Fixed
(Automated Reasoning) UpdateAutomatedReasoningPolicy requests per second

(Automated Reasoning) UpdateAutomatedReasoningPolicy requests per second

throughput
5
count
Adjustable
Model invocation max tokens per day for Mistral Pixtral Large 25.02 V1 (doubled for cross-region calls)

Model invocation max tokens per day for Mistral Pixtral Large 25.02 V1 (doubled for cross-region calls)

general
57,600,000
count
Fixed
(Guardrails) On-demand ApplyGuardrail Content filter policy text units per second (standard)

(Guardrails) On-demand ApplyGuardrail Content filter policy text units per second (standard)

identity
200
count
Adjustable
Records per batch inference job for NVIDIA Nemotron 3 Super 120B A12B

Records per batch inference job for NVIDIA Nemotron 3 Super 120B A12B

general
100,000
count
Adjustable
On-Demand, latency-optimized model inference tokens per minute for Meta Llama 3.1 70B Instruct

On-Demand, latency-optimized model inference tokens per minute for Meta Llama 3.1 70B Instruct

general
40,000
count
Fixed
On-demand model inference tokens per minute for Meta Llama 2 13B

On-demand model inference tokens per minute for Meta Llama 2 13B

general
300,000
count
Fixed
On-demand model inference requests per minute for GPT OSS Safeguard 120B

On-demand model inference requests per minute for GPT OSS Safeguard 120B

throughput
10,000
count
Fixed
(Flows) S3 retrieval nodes per flow

(Flows) S3 retrieval nodes per flow

capacity
10
count
Fixed
On-demand model inference requests per minute for Qwen3 Coder Next

On-demand model inference requests per minute for Qwen3 Coder Next

throughput
10,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 90B Instruct

Sum of in-progress and submitted batch inference jobs using a base model for Llama 3.2 90B Instruct

general
100
count
Adjustable
Model invocation max tokens per day for Anthropic Claude 3 Haiku (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude 3 Haiku (doubled for cross-region calls)

general
2,880,000,000
count
Fixed
Model units per provisioned model for Amazon Nova Canvas

Model units per provisioned model for Amazon Nova Canvas

general
0
count
Adjustable
Model invocation max tokens per day for Kimi K2 Thinking (doubled for cross-region calls)

Model invocation max tokens per day for Kimi K2 Thinking (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
On-demand model inference tokens per minute for Mistral AI Mixtral 8X7BB Instruct

On-demand model inference tokens per minute for Mistral AI Mixtral 8X7BB Instruct

general
300,000
count
Fixed
Minimum number of records per batch inference job for MiniMax M2.1

Minimum number of records per batch inference job for MiniMax M2.1

general
100
count
Fixed
On-demand model inference requests per minute for Mistral Devstral 2 123b

On-demand model inference requests per minute for Mistral Devstral 2 123b

throughput
10,000
count
Fixed
ListAgentActionGroups requests per second

ListAgentActionGroups requests per second

throughput
10
count
Fixed
Batch inference input file size (in GB) for NVIDIA Nemotron 3 Super 120B A12B

Batch inference input file size (in GB) for NVIDIA Nemotron 3 Super 120B A12B

storage
1
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Ministral 3 14B

Sum of in-progress and submitted batch inference jobs using a base model for Ministral 3 14B

general
100
count
Adjustable
Batch inference input file size (in GB) for Kimi K2 Thinking

Batch inference input file size (in GB) for Kimi K2 Thinking

storage
1
count
Fixed
Batch inference input file size (in GB) for Mistral Large 3

Batch inference input file size (in GB) for Mistral Large 3

storage
1
count
Fixed
Model invocation max tokens per day for Anthropic Claude Sonnet 4 V1 (doubled for cross-region calls)

Model invocation max tokens per day for Anthropic Claude Sonnet 4 V1 (doubled for cross-region calls)

general
144,000,000
count
Fixed
On-demand model inference tokens per minute for GPT OSS Safeguard 20B

On-demand model inference tokens per minute for GPT OSS Safeguard 20B

general
100,000,000
count
Fixed
Model invocation max tokens per day for Qwen3 Coder 30B a3b V1 (doubled for cross-region calls)

Model invocation max tokens per day for Qwen3 Coder 30B a3b V1 (doubled for cross-region calls)

general
144,000,000,000
count
Fixed
Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Sonnet

Sum of in-progress and submitted batch inference jobs using a base model for Claude 3.5 Sonnet

general
100
count
Adjustable
CreateAgentActionGroup requests per second

CreateAgentActionGroup requests per second

throughput
12
count
Fixed
(Flows) CreateFlowVersion requests per second

(Flows) CreateFlowVersion requests per second

throughput
2
count
Fixed
(Automated Reasoning) Tests per policy

(Automated Reasoning) Tests per policy

identity
100
count
Fixed
Cross-region model inference tokens per minute for Amazon Nova 2 Omni

Cross-region model inference tokens per minute for Amazon Nova 2 Omni

general
8,000,000
count
Adjustable
Minimum number of records per batch inference job for Qwen3 Next 80B

Minimum number of records per batch inference job for Qwen3 Next 80B

general
100
count
Fixed
Records per input file per batch inference job for Llama 3.1 70B Instruct

Records per input file per batch inference job for Llama 3.1 70B Instruct

storage
100,000
count
Adjustable
Minimum number of records per batch inference job for Magistral Small 2509

Minimum number of records per batch inference job for Magistral Small 2509

general
100
count
Fixed
On-demand model inference requests per minute for Amazon Titan Image Generator G1 V2

On-demand model inference requests per minute for Amazon Titan Image Generator G1 V2

throughput
60
count
Fixed
(Flows) Flows per account

(Flows) Flows per account

general
100
count
Adjustable
Throttle rate limit for DeleteBlueprint

Throttle rate limit for DeleteBlueprint

throughput
5
count
Fixed
Model units per provisioned model for Anthropic Claude 3 Haiku 48K

Model units per provisioned model for Anthropic Claude 3 Haiku 48K

general
0
count
Adjustable
(Evaluation) Number of custom prompt datasets in a human-based model evaluation job

(Evaluation) Number of custom prompt datasets in a human-based model evaluation job

general
1
count
Fixed
Records per input file per batch inference job for Llama 3.1 8B Instruct

Records per input file per batch inference job for Llama 3.1 8B Instruct

storage
100,000
count
Adjustable
Model units per provisioned model for Anthropic Claude 3.5 Sonnet V2 51K

Model units per provisioned model for Anthropic Claude 3.5 Sonnet V2 51K

general
0
count
Adjustable
Batch inference input file size (in GB) for Llama 4 Maverick

Batch inference input file size (in GB) for Llama 4 Maverick

storage
1
count
Fixed
On-demand model inference tokens per minute for Minimax M2

On-demand model inference tokens per minute for Minimax M2

general
100,000,000
count
Fixed
Model invocation max tokens per day for Writer Palmyra Vision 7B (doubled for cross-region calls)

Model invocation max tokens per day for Writer Palmyra Vision 7B (doubled for cross-region calls)

general
144,000,000,000
count
Fixed

How to Request a Quota Increase

  1. 1Open the AWS Service Quotas console.
  2. 2Select Amazon Bedrock from the service list.
  3. 3Find the quota and click "Request increase".
  4. 4Enter the desired value and submit. Most increases are approved within a few hours.

Learn AWS the Practical Way

Our bi-weekly newsletter teaches hands-on AWS fundamentals. No certification fluff - just practical knowledge.

Subscribe to Newsletter

Learn AWS the Practical Way

Our bi-weekly newsletter teaches hands-on AWS fundamentals. No certification fluff - just practical knowledge.

Subscribe to Newsletter

Quick Stats

Total Quotas1125
Adjustable410
Fixed715
Commonly Hit0