Qwen-Image Edit
By Alibaba Cloud Qwen Team
Qwen-Image Edit is now open-source!
Advanced Image-to-Image Generative Model for Precise Editing

Introduction
We are thrilled to release Qwen-Image Edit, an image editing foundation model in the Qwen series that achieves significant advances in transforming existing images with precise control. Experiments show strong capabilities in image-to-image generation, with exceptional performance in maintaining original image structure while applying creative transformations. Qwen-Image Edit is now available on Hugging Face and can be used locally with the diffusers library.

🚀 Multimodal AI Capabilities
Part of the Qwen (Tongyi Qianwen) model series, offering powerful image-to-image generation with exceptional understanding of complex editing requirements
🌟 Open Source Innovation
Part of Alibaba's commitment to open-source AI development, allowing researchers and developers to build upon and extend its capabilities
🔍 Comprehensive Model Family
Works alongside other Qwen models for text, vision, and multimodal applications, providing a complete ecosystem for AI development
Quick Start
Choose your preferred Qwen image model:
Option 1: Using Qwen-Image-Edit with diffusers
Install the latest version of diffusers
pip install git+https://github.com/huggingface/diffusers
import os
from PIL import Image
import torch
from diffusers import QwenImageEditPipeline
pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit")
print("pipeline loaded")
pipeline.to(torch.bfloat16)
pipeline.to("cuda")
pipeline.set_progress_bar_config(disable=None)
image = Image.open("./input.png").convert("RGB")
prompt = "Change the rabbit's color to purple, with a flash light background."
inputs = {
"image": image,
"prompt": prompt,
"generator": torch.manual_seed(0),
"true_cfg_scale": 4.0,
"negative_prompt": " ",
"num_inference_steps": 50,
}
with torch.inference_mode():
output = pipeline(**inputs)
output_image = output.images[0]
output_image.save("output_image_edit.png")
print("image saved at", os.path.abspath("output_image_edit.png"))
Option 2: Using the latest Qwen VLo model
The new Qwen VLo model specializes in image-to-image generation with progressive editing features.
pip install dashscope>=1.20.7
import dashscope
from dashscope import ImageSynthesis
# Set your API key
dashscope.api_key = "YOUR_API_KEY"
# Image-to-image generation
response = ImageSynthesis.call(
model='qwen-vlo',
prompt='Transform this coffee shop into a futuristic cyber cafe with neon lights',
negative_prompt='blurry, low quality',
n=1, # Number of images to generate
size='1024*1024', # Image size
steps=50, # Diffusion steps
image='path/to/input_image.jpg' # Input image for editing
)
# Save the generated image
if response.status_code == 200:
with open('qwen_vlo_result.png', 'wb') as f:
f.write(response.output.images[0].image)
print('Image saved successfully!')
else:
print(f'Failed to generate image: {response.message}')
Show Cases
Semantic Editing
One of the highlights of Qwen-Image Edit lies in its powerful capabilities for semantic editing. It can modify image content while perfectly preserving the original visual semantics. For example, when editing character images like Qwen's mascot Capybara, the model maintains character consistency even when most pixels in the image are changed. This enables effortless and diverse creation of original IP content, such as MBTI-themed emoji packs based on mascot characters.

Novel View Synthesis
Qwen-Image Edit excels at novel view synthesis, a key application in semantic editing. The model can rotate objects by various angles, including 90-degree and even full 180-degree rotations, allowing users to see different sides of objects. This capability is particularly valuable for product visualization, architectural rendering, and creative content production where multiple perspectives are needed.

Appearance Editing
Appearance editing is another powerful capability of Qwen-Image Edit. The model can keep certain regions of an image completely unchanged while adding, removing, or modifying specific elements. For example, it can insert signboards into scenes with corresponding reflections, remove fine details like hair strands, change the color of specific elements, or adjust backgrounds and clothing in portraits—all with exceptional attention to detail and realism.

Text Editing Excellence
A standout feature of Qwen-Image Edit is its accurate text editing capability, which stems from Qwen-Image's deep expertise in text rendering. The model excels at editing both English and Chinese text in images, enabling modifications to large headline text as well as precise adjustments to small and intricate text elements. This makes it particularly valuable for poster design, advertisement creation, and multilingual content production.

Progressive Editing
Qwen-Image Edit supports chained, step-by-step editing approaches that allow users to progressively refine and correct images. For example, when editing complex calligraphy artwork, users can draw bounding boxes to mark specific regions that need correction and instruct the model to fix these areas one by one. This iterative approach enables precise control over the editing process, ensuring the desired final result is achieved even for challenging edits.

Together, these features make Qwen-Image Edit not just a tool for basic image editing, but a comprehensive foundation model for intelligent visual transformation—where existing images become the canvas for sophisticated artistic and creative manipulation.