Gemini models are now available in Batch Mode
Today, we’re excited to introduce a batch mode in the Gemini API, a new asynchronous endpoint designed specifically for high-throughput, non-latency-critical workloads. The Gemini API Batch Mode allows you to submit large jobs, offload the scheduling and processing, and retrieve your results within 24 hours—all at a 50% discount compared to our synchronous APIs.
Process more for less
Batch Mode is the perfect tool for any task where you have your data ready upfront and don’t need an immediate response. By separating these large jobs from your real-time traffic, you unlock three key benefits:
- Cost savings: Batch jobs are priced at 50% less than the standard rate for a given model
- Higher throughput: Batch Mode has even higher rate limits
- Easy API calls: No need to manage complex client-side queuing or retry logic. Available results are returned within a 24-hour window.
A simple workflow for large jobs
We’ve designed the API to be simple and intuitive. You package all your requests into a single file, submit it, and retrieve your results once the job is complete. Here are some ways developers are leveraging Batch Mode for tasks today:
- Bulk content generation and processing: Specializing in deep video understanding, Reforged Labs uses Gemini 2.5 Pro to analyze and label vast quantities of video ads monthly. Implementing Batch Mode has revolutionized their operations by significantly cutting costs, accelerating client deliverables, and enabling the massive scalability needed for meaningful market insights.
Get started in just a few lines of code
You can start using Batch Mode today with the Google GenAI Python SDK:
# Create a JSONL that contains these lines:
# {"key": "request_1", "request": {"contents": [{"parts": [{"text": "Explain how AI works in a few words"}]}]}},
# {"key": "request_2", "request": {"contents": [{"parts": [{"text": "Explain how quantum computing works in a few words"}]}]}}
uploaded_batch_requests = client.files.upload(file="batch_requests.json")
batch_job = client.batches.create(
model="gemini-2.5-flash",
src=uploaded_batch_requests.name,
config={
'display_name': "batch_job-1",
},
)
print(f"Created batch job: {batch_job.name}")
# Wait for up to 24 hours
if batch_job.state.name == 'JOB_STATE_SUCCEEDED':
result_file_name = batch_job.dest.file_name
file_content_bytes = client.files.download(file=result_file_name)
file_content = file_content_bytes.decode('utf-8')
for line in file_content.splitlines():
print(line)
Python
To learn more, check out the official documentation and pricing pages.
We’re rolling out Batch Mode for the Gemini API today and tomorrow to all users. This is just the start for batch processing, and we’re actively working on expanding its capabilities. Stay tuned for more powerful and flexible options!