Gemma 3

Name: Gemma 3
Author: Google

GoogleOpen SourceVerified

Gemma 3 is a collection of lightweight, state-of-the-art open models built from the same research and technology that powers Gemini 2.0 models. The most capable model you can run on a single GPU or TPU, delivering state-of-the-art performance for its size and outperforming Llama3-405B, DeepSeek-V3 and o3-mini in preliminary human preference evaluations on LMArena's leaderboard. Features advanced text and visual reasoning capabilities, 128K token context window, function calling, and support for over 140 languages.

2025-03-12

1B, 4B, 12B, 27B

Decoder-only Transformer

Apache-2.0

Compare with other models

Specifications

Parameters: 1B, 4B, 12B, 27B
Architecture: Decoder-only Transformer
License: Apache-2.0
Context Window: 128,000 tokens
Type: multimodal
Modalities: textimagevideo

Benchmark Scores

MMLU72.8

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathema...

GSM8K78.5

Grade School Math 8K (GSM8K) consists of 8.5K high-quality grade school math word problems....

HumanEval65.2

Evaluates code generation capabilities by asking models to complete Python functions based on docstr...

HellaSwag82.1

Tests common sense natural language inference through completion of scenarios....

MATH45.3

A dataset of 12,500 challenging competition mathematics problems requiring multi-step reasoning....

TruthfulQA68.7

Measures a model's tendency to reproduce falsehoods commonly believed by humans....

DROP71.2

Discrete Reasoning Over Paragraphs (DROP) requires models to resolve references in a passage and per...

Global-MMLU75.4

A multilingual evaluation set spanning 42 languages that combines machine translations for MMLU ques...

view all (+1)

Advanced Specifications

Model Family: Gemma
API Access: Available
Chat Interface: Available
Multilingual Support: Yes
Variants: Full 32bitBF16 (16-bit)SFP8 (8-bit)Q4_0 (4-bit)INT4 (4-bit)
Hardware Support: CUDATPUCPUMetalNVIDIA H100NVIDIA Jetson NanoNVIDIA BlackwellAMD ROCmGoogle Cloud TPU

Capabilities & Limitations

Capabilities: question answeringsummarizationreasoningmultimodal understandingfunction callingstructured outputcode generationimage analysisvideo analysistext extractionobject identificationlarge context processingSTEM reasoningagentic experiencesworkflow automation
Known Limitations: 1B model is text-only without image supportLower parameter models have reduced capabilitiesPerformance varies with quantization level
Notable Use Cases: single GPU/TPU deploymenton-device AI applicationsquestion answering systemsdocument summarizationvisual content analysisvideo analysisprogramming interfacesmultilingual applicationslarge document processingAI-driven workflowsagentic applicationsimage safety checking
Function Calling Support: Yes
Tool Use Support: Yes

Resources

Related Models

Grok 4

xAI

xAI's latest generation model with enhanced mathematical reasoning capabilities, showing significant improvements in competition-level mathematics benchmarks. Features 2x faster end-to-end latency, supports 5 different voices, and achieves 10x daily user seconds compared to previous models.

Gemini 2.5 Flash-Lite

Google

Google's most cost-efficient and fastest 2.5 model yet. Higher quality than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks. Excels at high-volume, latency-sensitive tasks like translation and classification.

Claude Opus 4

Anthropic

Claude Opus 4 is Anthropic's most powerful model and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours. Features include extended thinking with tool use, parallel tool execution, improved memory capabilities, and significantly reduced shortcut behaviors.