MedGemma

Name: MedGemma
Author: Google

GoogleOpen WeightsVerified

The MedGemma collection contains Google's most capable open models for medical text and image comprehension, built on Gemma 3. Developers can use MedGemma to accelerate building healthcare-based AI applications. MedGemma comes in two variants: a 4B multimodal version and a 27B text-only version. The 4B version utilizes a SigLIP image encoder specifically pre-trained on de-identified medical data including chest X-rays, dermatology images, ophthalmology images, and histopathology slides.

2025-05-20

4B, 27B

Decoder-only Transformer (based on Gemma 3)

Health AI Developer Foundations terms of use

Compare with other models

Specifications

Parameters: 4B, 27B
Architecture: Decoder-only Transformer (based on Gemma 3)
License: Health AI Developer Foundations terms of use
Context Window: 128,000 tokens
Max Output: 8,192 tokens
Training Data Cutoff: 2025-05-20
Type: multimodal
Modalities: textimage

Benchmark Scores

view all (+13)

Advanced Specifications

Model Family: Gemma
Finetuned From: Gemma 3
API Access: Available
Chat Interface: Not Available
Multilingual Support: No
Variants: 4B instruction-tuned (medgemma-4b-it)4B pre-trained (medgemma-4b-pt)27B text instruction-tuned (medgemma-27b-text-it)
Hardware Support: CUDATPUCPU

Capabilities & Limitations

Capabilities: medical image classificationmedical image interpretationchest X-ray interpretationdermatology image analysishistopathology analysisophthalmology image analysisfundus image analysisradiology report generationmedical visual question answeringmedical text comprehensionclinical reasoningmedical knowledge retrievalmultimodal medical understandingpatient interviewingmedical triagingclinical decision supportmedical summarizationprompt engineering adaptationfine-tuning capabilityagentic orchestration
Known Limitations: Not intended for direct clinical diagnosis or treatment recommendationsDeveloper model that requires validation on intended use caseBaseline performance may need improvement through adaptationNot clinical grade without additional fine-tuningPrimarily evaluated on single-image tasksNot evaluated for multi-turn applicationsMay be more sensitive to specific prompts than base Gemma 327B model is text-only without image supportEvaluation primarily in English language
Notable Use Cases: medical image classificationmedical image interpretation and reportingradiology image analysisdigital pathology classificationfundus image analysisskin image classificationmedical visual question answeringpatient interviewing systemsmedical triaging applicationsclinical decision support toolsmedical summarizationhealthcare AI application developmentagentic medical systemsFHIR data processingprivate health data parsingmedical education platforms
Function Calling Support: Yes
Tool Use Support: Yes

Resources

Related Models

FunctionGemma

Google

FunctionGemma is a specialized version of Gemma 3 270M fine-tuned for function calling and designed to run on edge devices. It bridges natural language and software execution, translating user commands into executable API actions. The model excels at unified action and chat capabilities, switching seamlessly between generating structured function calls and conversational responses. Built specifically for customization through fine-tuning, it demonstrated 85% accuracy on Mobile Actions after training (up from 58% baseline). Small enough to run on mobile phones and edge devices like NVIDIA Jetson Nano, it uses Gemma's 256k vocabulary to efficiently tokenize JSON and multilingual inputs.

Gemini 3 Pro

Google

Gemini 3 Pro is Google's flagship multimodal foundation model, released in November 2025. Built on a sparse Mixture-of-Experts (MoE) Transformer architecture, it features a 1 million token context window and native understanding of text, images, audio, and video. The model introduces 'Deep Think' capabilities for enhanced reasoning, controlled via a 'thinking_level' parameter, and is optimized for 'agentic' workflows and 'vibe coding'—the generation of full applications from natural language. It supports advanced function calling and tool use, making it suitable for complex software engineering and long-context analysis tasks.

Typemultimodal

ParametersProprietary

2025-11-18

Proprietary

Details Compare

Gemini 2.5 Flash-Lite

Google

Google's most cost-efficient and fastest 2.5 model yet. Higher quality than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks. Excels at high-volume, latency-sensitive tasks like translation and classification.