Question 1

What is HiDream-O1-Image?

Accepted Answer

HiDream-O1-Image is an open-source, 8-billion-parameter AI image generation model built on a Pixel-level Unified Transformer (UiT). Released by HiDream-ai under the MIT License on May 8, 2026, it supports text-to-image generation, instruction-based image editing, and subject-driven personalization — all in a single model without external VAEs or disjoint text encoders, at up to 2,048 × 2,048 native resolution.

Question 2

Can I run HiDream-O1-Image online without a GPU?

Accepted Answer

Yes. You can generate directly on this page — no GPU, no install required. For local deployment, a CUDA-capable GPU is necessary; the FP8-quantized variant runs on GPUs with as little as ~10GB VRAM (e.g., RTX 3080, RTX 4070), making it accessible outside data-center environments.

Question 3

What's the difference between HiDream-O1-Image Full and the Dev variant?

Accepted Answer

The Full model uses 50 sampling steps with classifier-free guidance (CFG 5.0) and produces the highest photographic detail and realism — best for final production outputs. The Dev variant uses a distilled 28-step schedule with CFG 0.0, converging faster for rapid iteration and prototyping. Both checkpoints share the same 8B parameter count and MIT license.

Question 4

How does HiDream-O1-Image compare to FLUX and DALL-E 3?

Accepted Answer

HiDream-O1-Image outperforms FLUX.2 Dev on GenEval (0.90 vs. 0.66) and DPG-Bench (89.83 vs. 83.79) while using 7× fewer parameters. It also outscores GPT Image 2 and DALL-E 3 on HPSv3 human preference (10.37 vs. 10.21 and lower). Unlike those models, HiDream-O1-Image is open-weight, MIT-licensed, and natively generates at 2K resolution without upscaling.

Question 5

Can I use HiDream-O1-Image-generated images commercially?

Accepted Answer

Yes. HiDream-O1-Image model weights and code are released under the MIT License, which permits personal, research, and commercial use. You should review the full MIT License terms in the GitHub repository and ensure your use case complies with applicable content policies on any inference platform you use.

Question 6

Does HiDream-O1-Image support image editing, not just generation?

Accepted Answer

Yes. HiDream-O1-Image natively supports instruction-based image editing — you provide a reference image and a plain-English instruction (e.g., "remove the earphones"), and the model applies the change while preserving the original composition. This is built into the same model that handles text-to-image, with no separate editing pipeline required.

Question 7

How does the built-in Reasoning Agent work?

Accepted Answer

The Reasoning-Driven Prompt Agent is a separate wrapper — not part of the diffusion model itself — that runs a language model (Gemma-4-31B or any OpenAI-compatible API) over your raw instruction before image generation begins. It analyzes your prompt's implied spatial logic, object relationships, and attributes, then rewrites it into a detailed, structured directive. This dramatically reduces prompt engineering effort for complex scenes.

Question 8

Will my images or prompts be used to train the model?

Accepted Answer

HiDream-O1-Image is an open-weight model you run locally or via cloud inference. If you self-host using the GitHub weights, your prompts and generated images are fully under your control and are not transmitted anywhere. When using our hosted service, please review our Privacy Policy for data handling practices.

Question 9

Does HiDream-O1-Image support languages other than English?

Accepted Answer

Yes. HiDream-O1-Image achieves 0.979 (English) and 0.978 (Chinese) on LongText-Bench for visual text rendering within generated images. The Reasoning-Driven Prompt Agent currently operates in English; non-English instructions may benefit from translation before input. Generated multilingual text inside images — banners, labels, posters — is natively supported.

Metric	HiDream-O1-Image	FLUX.2 Dev	DALL-E 3	GPT Image 2
Parameters	8B	56B	Closed	Closed
Native Resolution	2048 × 2048	1024px	1024px	1024px
Architecture	Pixel-level UiT (no VAE)	Latent DiT + VAE	Closed	Closed
GenEval Score	0.90	0.66	0.67	0.89
DPG-Bench Score	89.83	83.79	83.50	—
HPSv3 Score	10.37	—	—	10.21
Text Rendering	Native (0.979 EN)	Poor	Moderate	Moderate
Image Editing	Native instruction-based	Separate model	Via API	Yes
Subject Personalization	Multi-reference (up to 12)	No	No	Limited
License	MIT (commercial)	Apache 2.0	Closed / paid	Closed / paid
ComfyUI Support	Yes	Yes	No	No
Prompt Reasoning Agent	Built-in	No	No	No
Open Weights	Yes (GitHub)	Yes	No	No

HiDream-O1-Image: Generate, Edit & Personalize Images at 2K Resolution

What Is HiDream-O1-Image and Why It Outperforms Bigger Models

Pixel-Native Architecture — No VAE, No Compromise

Benchmark-Leading Performance at 7× Smaller Footprint

Built-In Reasoning Agent for Complex Prompt Understanding

HiDream-O1-Image Features That Replace an Entire Image Production Pipeline

Text-to-Image Generation at Native 2048 × 2048

Instruction-Based Image Editing with Natural Language

Subject-Driven Personalization Across Multiple References

Precise Multilingual Visual Text Rendering

Reasoning-Driven Prompt Agent for Complex Scenes

ComfyUI & API Integration for Professional Workflows

What Can You Create With HiDream-O1-Image?

Product Photography at Scale

On-Brand Ad Creatives in Minutes

Character-Consistent Storyboards

Multilingual Visuals Without Retouching

Natural Language Photo Editing

Open-Weight Model Integration

HiDream-O1-Image vs. FLUX, DALL-E 3 & GPT Image 2: Full Comparison

Frequently Asked Questions

Experience HiDream-O1-Image in Your Browser — No GPU Required