Gemini Nano Banana 图像生成完全指南

Nano Banana 是 Gemini 的原生图像生成能力，支持通过文本、图像或两者结合的方式对话式生成和处理图像。开发者可以使用 Gemini API 创建、编辑和迭代视觉内容，实现前所未有的控制精度。

Nano Banana 模型介绍

Gemini API 提供两个不同的图像生成模型：

Nano Banana (gemini-2.5-flash-image)：专为速度和效率设计，优化用于高吞吐量、低延迟任务。

Nano Banana Pro (gemini-3-pro-image-preview)：专为专业资产生产设计，利用高级推理（"Thinking"）功能遵循复杂指令并渲染高保真文本。

所有生成的图像都包含 SynthID 水印。

图像生成（文生图）

使用文本提示生成图像的基本示例：

python

from google import genai
from google.genai import types
from PIL import Image

client = genai.Client()

prompt = ("Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme")

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=[prompt],
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("generated_image.png")

图像编辑（图文生图）

提供图像并使用文本提示添加、删除或修改元素，更改样式或调整色彩。以下示例演示如何上传 base64 编码的图像：

python

from google import genai
from google.genai import types
from PIL import Image

client = genai.Client()

prompt = (
    "Create a picture of my cat eating a nano-banana in a "
    "fancy restaurant under the Gemini constellation",
)

image = Image.open("/path/to/cat_image.png")

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=[prompt, image],
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("generated_image.png")

多轮图像编辑

通过对话方式持续生成和编辑图像是推荐的迭代方式。以下示例展示生成光合作用信息图的提示：

python

from google import genai
from google.genai import types

client = genai.Client()

chat = client.chats.create(
    model="gemini-3-pro-image-preview",
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        tools=[{"google_search": {}}]
    )
)

message = "Create a vibrant infographic that explains photosynthesis as if it were a recipe for a plant's favorite food. Show the \"ingredients\" (sunlight, water, CO2) and the \"finished dish\" (sugar/energy). The style should be like a page from a colorful kids' cookbook, suitable for a 4th grader."

response = chat.send_message(message)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image:= part.as_image():
        image.save("photosynthesis.png")

然后可以使用同一个聊天会话将图形上的语言更改为西班牙语：

python

message = "Update this infographic to be in Spanish. Do not change any other elements of the image."

aspect_ratio = "16:9"  # "1:1","2:3","3:2","3:4","4:3","4:5","5:4","9:16","16:9","21:9"
resolution = "2K"      # "1K", "2K", "4K"

response = chat.send_message(message, config=types.GenerateContentConfig(
    image_config=types.ImageConfig(
        aspect_ratio=aspect_ratio,
        image_size=resolution
    ),
))

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image:= part.as_image():
        image.save("photosynthesis_spanish.png")

Gemini 3 Pro Image 新特性

Gemini 3 Pro Image (gemini-3-pro-image-preview) 是最先进的图像生成和编辑模型，专为专业资产生产优化。它通过高级推理应对最具挑战性的工作流程，擅长复杂的多轮创建和修改任务。

主要特性：

高分辨率输出：内置生成 1K、2K 和 4K 视觉效果的能力
高级文本渲染：能够生成清晰、风格化的文本，适用于信息图、菜单、图表和营销资产
Google Search 接地：模型可以使用 Google Search 作为工具验证事实并基于实时数据生成图像（如当前天气图、股票图表、近期事件）
思考模式：模型利用"思考"过程来推理复杂提示，生成中间"思考图像"（在后台可见但不收费）来优化构图，然后生成最终高质量输出
最多 14 张参考图像：现在可以混合最多 14 张参考图像来生成最终图像

使用最多 14 张参考图像

Gemini 3 Pro Preview 允许混合最多 14 张参考图像，包括：

最多 6 张高保真物体图像，用于包含在最终图像中
最多 5 张人物图像，用于保持角色一致性

python

from google import genai
from google.genai import types
from PIL import Image

prompt = "An office group photo of these people, they are making funny faces."

aspect_ratio = "5:4"  # "1:1","2:3","3:2","3:4","4:3","4:5","5:4","9:16","16:9","21:9"
resolution = "2K"     # "1K", "2K", "4K"

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        prompt,
        Image.open('person1.png'),
        Image.open('person2.png'),
        Image.open('person3.png'),
        Image.open('person4.png'),
        Image.open('person5.png'),
    ],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio=aspect_ratio,
            image_size=resolution
        ),
    )
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image:= part.as_image():
        image.save("office.png")

Google Search 接地

使用 Google Search 工具基于实时信息生成图像，如天气预报、股票图表或近期事件。

python

from google import genai

prompt = "Visualize the current weather forecast for the next 5 days in San Francisco as a clean, modern weather chart. Add a visual on what I should wear each day"

aspect_ratio = "16:9"

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=prompt,
    config=types.GenerateContentConfig(
        response_modalities=['Text', 'Image'],
        image_config=types.ImageConfig(
            aspect_ratio=aspect_ratio,
        ),
        tools=[{"google_search": {}}]
    )
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image:= part.as_image():
        image.save("weather.png")

生成高达 4K 分辨率的图像

Gemini 3 Pro Image 默认生成 1K 图像，但也可以输出 2K 和 4K 图像。要生成更高分辨率的资产，请在 generation_config 中指定 image_size。

python

from google import genai
from google.genai import types

prompt = "Da Vinci style anatomical sketch of a dissected Monarch butterfly. Detailed drawings of the head, wings, and legs on textured parchment with notes in English."

aspect_ratio = "1:1"
resolution = "1K"  # "1K", "2K", "4K"

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=prompt,
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio=aspect_ratio,
            image_size=resolution
        ),
    )
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image:= part.as_image():
        image.save("butterfly.png")

思考过程

思考过程仅在 Gemini 3 Pro Image 中可用。启用后，模型会生成中间图像来测试构图和逻辑。最终渲染的图像是思考过程中的最后一个图像。

模型最多生成两个中间图像来测试构图和逻辑。思考中的最后一个图像也是最终渲染的图像。

其他图像生成模式

Gemini 支持基于提示结构和上下文的其他图像交互模式，包括：

图像理解：使用 Gemini 分析图像内容
视频理解：使用 Gemini 分析视频内容

批量生成图像

Gemini API 支持批量处理多个图像生成请求，适用于需要大量生成图像的场景。

提示词指南和策略

掌握图像生成的核心原则：具体性。你提供的细节越多，结果就越好。

生成图像的提示词

1. 照片级真实场景

对于真实图像，使用摄影术语。提及相机角度、镜头类型、光线和精细细节，引导模型生成照片级真实的结果。

提示词模板：

A photorealistic [camera angle] portrait of [subject], [age] years old, wearing [clothing details]. The scene is set in [location] with [lighting conditions]. Use a [lens type] lens with [aperture] aperture to create [depth of field effect]. Focus on [specific details]. [Color grading description].

2. 风格化插图和贴纸

要创建贴纸、图标或资产，明确说明样式并请求透明背景。

提示词模板：

A [art style] sticker of [subject], [mood/pose]. The character has [distinctive features]. [Background specification]. [Color palette]. [Additional style details].

3. 图像中的准确文本

Gemini 擅长渲染文本。明确说明文本内容、字体样式（描述性）和整体设计。对于专业资产生产，使用 Gemini 3 Pro Image Preview。

提示词模板：

Create a [style] [asset type] for [purpose]. The text "[exact text]" should be [text description - size, font style, color, placement]. [Background description]. [Additional design elements].

4. 产品样机和商业摄影

非常适合为电子商务、广告或品牌创建干净、专业的产品照片。

提示词模板：

A high-resolution, studio-lit product photograph of [product description]. The product is [positioning]. [Background description]. [Lighting details]. Shot with [camera/lens specs]. [Mood/atmosphere].

5. 极简主义和负空间设计

非常适合创建网站、演示文稿或营销材料的背景，其中将叠加文本。

提示词模板：

A minimalist composition featuring [main subject]. [Background description]. [Color palette]. [Mood/atmosphere]. [Negative space usage]. [Text overlay consideration].

6. 连续艺术（漫画面板/故事板）

基于角色一致性和场景描述构建，创建视觉叙事的 panel。为了文本准确性和叙事能力，这些提示词最适合与 Gemini 3 Pro Image Preview 一起使用。

7. Google Search 接地

使用 Google Search 基于近期或实时信息生成图像。这对于新闻、天气和其他时间敏感主题很有用。

编辑图像的提示词

1. 添加和删除元素

提供图像并描述你的更改。模型将匹配原始图像的样式、光线和透视。

2. 内绘（语义遮罩）

对话式定义"遮罩"来编辑图像的特定部分，同时保持其余部分不变。

3. 风格迁移

提供图像并要求模型以不同的艺术风格重新创建其内容。

4. 高级构图：组合多个图像

提供多个图像作为上下文来创建新的合成场景。这非常适合产品样机或创意拼贴。

5. 高保真细节保留

为确保在编辑过程中保留关键细节（如面部或标志），请详细描述它们以及你的编辑请求。

6. 赋予生命

上传粗略的草图或绘图，并要求模型将其完善为成品图像。

7. 角色一致性：360 度视图

你可以通过迭代提示不同角度来生成角色的 360 度视图。为获得最佳效果，在后续提示中包含先前生成的图像以保持一致性。对于复杂姿势，包含所需姿势的参考图像。

最佳实践

将以下专业策略融入你的工作流程，将结果从良好提升到卓越：

具体性是关键：你提供的细节越多，结果就越好
使用摄影术语：对于真实图像，提及相机角度、镜头类型、光线
明确样式：对于插图，明确说明艺术风格和背景要求
迭代优化：使用多轮对话逐步完善图像
参考图像：使用参考图像来保持角色一致性和风格一致性

限制

所有生成的图像都包含 SynthID 水印
某些复杂提示可能需要多次迭代才能获得最佳结果
人物生成有特定的安全限制
高分辨率图像生成需要更多时间和资源

可选配置

输出类型

可以配置响应模态以获取文本和/或图像：

python

config=types.GenerateContentConfig(
    response_modalities=['TEXT', 'IMAGE']
)

宽高比和图像大小

支持的宽高比：

1:1 - 正方形
2:3, 3:2 - 纵向/横向
3:4, 4:3 - 标准照片比例
4:5, 5:4 - 社交媒体比例
9:16, 16:9 - 视频比例
21:9 - 超宽比例

支持的图像大小：

1K - 标准分辨率
2K - 高分辨率
4K - 超高分辨率

python

config=types.GenerateContentConfig(
    image_config=types.ImageConfig(
        aspect_ratio="16:9",
        image_size="2K"
    ),
)

模型选择

选择最适合你特定用例的模型：

模型	最佳用途	特点
`gemini-2.5-flash-image`	快速迭代、高吞吐量任务	速度优先，低延迟
`gemini-3-pro-image-preview`	专业资产生产	高保真、思考模式、4K支持

何时使用 Imagen

Imagen 4 应该是你开始使用 Imagen 生成图像时的首选模型。选择 Imagen 4 Ultra 用于高级用例或当你需要最佳图像质量时（注意它一次只能生成一张图像）。

字节笔记本

Gemini Nano Banana 图像生成完全指南

Nano Banana 模型介绍

图像生成（文生图）

图像编辑（图文生图）

多轮图像编辑

Gemini 3 Pro Image 新特性

使用最多 14 张参考图像

Google Search 接地

生成高达 4K 分辨率的图像

思考过程

其他图像生成模式

批量生成图像

提示词指南和策略

生成图像的提示词

编辑图像的提示词

最佳实践

限制

可选配置

输出类型

宽高比和图像大小

模型选择

何时使用 Imagen

参考