Introducing 4o Image Generation


March 25, 2025

ProductRelease

Unlocking useful and valuable image generation with a natively multimodal model capable of precise, accurate, photorealistic outputs.

Loading…

At OpenAI, we have long believed image generation should be a primary capability of our language models. That’s why we’ve built our most advanced image generator yet into GPT‑4o. The result—image generation that is not only beautiful, but useful.

*]:!my-0 relative” aria-hidden=”false”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

Useful image generation

From the first cave paintings to modern infographics, humans have used visual imagery to communicate, persuade, and analyze—not just to decorate. Today’s generative models can conjure surreal, breathtaking scenes, but struggle with the workhorse imagery people use to share and create information. From logos to diagrams, images can convey precise meaning when augmented with symbols that refer to shared language and experience.

GPT‑4o image generation excels at accurately rendering text, precisely following prompts, and leveraging 4o’s inherent knowledge base and chat context—including transforming uploaded images or using them as visual inspiration. These capabilities make it easier to create exactly the image you envision, helping you communicate more effectively through visuals and advancing image generation into a practical tool with precision and power.

Improved capabilities

We trained our models on the joint distribution of online images and text, learning not just how images relate to language, but how they relate to each other. Combined with aggressive post-training, the resulting model has surprising visual fluency, capable of generating images that are useful, consistent, and context-aware.

Text rendering

A picture is worth a thousand words, but sometimes generating a few words in the right place can elevate the meaning of an image. 4o’s ability to blend precise symbols with imagery turns image generation into a tool for visual communication.

*]:!my-0 relative” aria-hidden=”false”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

Multi-turn generation

Because image generation is now native to GPT‑4o, you can refine images through natural conversation. GPT‑4o can build upon images and text in chat context, ensuring consistency throughout. For example, if you’re designing a video game character, the character’s appearance remains coherent across multiple iterations as you refine and experiment.

*]:!my-0 relative” aria-hidden=”false”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

Instruction following

GPT‑4o’s image generation follows detailed prompts with attention to detail. While other systems struggle with ~5-8 objects, GPT‑4o can handle up to 10-20 different objects. The tighter binding of objects to their traits and relations allows for better control.

*]:!my-0 relative” aria-hidden=”false”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

In-context learning

GPT‑4o can analyze and learn from user-uploaded images, seamlessly integrating their details into its context to inform image generation.

*]:!my-0 relative” aria-hidden=”false”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

World knowledge

Native image generation enables 4o to link its knowledge between text and images, resulting in a model that feels smarter and more efficient.

*]:!my-0 relative” aria-hidden=”false”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

Photorealism and style

Training on images reflecting a vast variety of image styles allows the model to create or transform images convincingly.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.
A cat looking into a puddle of water on a street, but its reflection is that of a tiger, and both reflections are realistically distorted by ripples in the water
Generate a candid, Polaroid-style photograph of four diverse friends in their early 20s at a gritty dive bar. The lighting features a very harsh, direct flash, creating sharp shadows and giving the photo a very overexposed, vintage instant-camera feel. Colors should be slightly muted, evoking nostalgic, early-2000s party vibes. The aesthetic is casually emo. No border or logos or signs. There's an interesting looking wall behind them with some light graffiti. Quality of the image should be very sharp and detailed (very little grain). The energy should be silly and chaotic. They're either playfully grimacing, smiling, or pretending to look tough. One of them should have their friend in a silly, playful headlock. Their mouths are closed.
Generate a photorealistic image of farmer's market in toronto on a saturday in summer 2006, it's a beautiful late june day, people are shopping and eating sandwiches. in focus should be a young asian girl wearing denim overalls and sipping on a strawberry banana smoothie - rest can be blurred. the photo should be reminiscent of that a digital camera from 2006 would take, with a timestamp like a printed photo would have. aspect ratio should be 3:2
blurry old analog film photograph, picture of parked car on side street, quiet night. credit creator: [Roope Rainisto](https://www.instagram.com/never_ever_never_land/?igsh=MXh3N3EyOWdoMmNubg%3D%3D#)
Create image super-realistic picture of these 4 creatures playing poker on a picnic blanket, zoomed out, in dolores park. photorealistic. The tabby long haired cat is holding a hand; right next to it are 2 tall vertical black chips (with stripes) as it has been raking in the dough.  Tabby's pupils are large and cute, and ii looking down and scrutinizing its cards, focused. Derpy black cat went all in. Two dogs are peering over cat's shoulder to see their cards. All cards are face down and of the same back color except for an exposed three of diamonds. small stack of poker chips are in front of each creature, but black cat went all in. the two dogs folded. All chips are from the same set and all cards have same color. photorealistic, shot on iphone, raw format.
A lone astronaut floats inside a vast space station, painting swirling galaxies onto a massive canvas that hangs weightlessly in the air. Their paintbrush leaves behind trails of cosmic dust, and their suit is stained with nebula-colored hues. Their helmet is off, revealing eyes filled with the reflection of distant planets. Outside the glass window, a black hole looms, twisting light into mesmerizing patterns.
Realistic photograph of a horse galloping from right to left across a vast, calm ocean surface, accurately depicting splashes, reflections, and subtle ripple patterns beneath their hooves. Exaggerate horse movements but everything else should be still, quiet to show contrast with the horse's strength. clean composition, cinematographic. A wide, panoramic composition showcasing a distant horizon. Atmospheric perspective creating depth. zoomed out so the horse appears minuscule compared to vast ocean.

horse is right at the horizon where ocean meets sky. use rule of thirds to position horse. size of horse is 1% size of entire image because camera is so far away from subject. camera view is super close to the ground/ocean like a worm's eye view. horse is galloping right where ocean meets the sky
A realistic underwater scene with dolphins swimming through the windows of an abandoned subway car, with bubbles and detailed water flow accurately simulated.
Photo of a fruit bowl consisting of real fruits mixed with miniature planets (Jupiter, Saturn, Mars, Earth), maintaining realistic reflections, lighting, and shadows consistent with original photo, clean composition, authentic textures, crisp detailed rendering
*]:first:mt-0 [&>*]:last:mb-0 line-clamp-1″>A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel.

Limitations

Our model isn’t perfect. We’re aware of multiple limitations at the moment which we will work to address through model improvements after the initial launch.

*]:!my-0 relative” aria-hidden=”false”>

div]:p-0 col-span-12″>

cropping
p]:text-caption prose [&>p]:text-primary-100 !my-[0] max-w-none [&>p]:my-0″>

We’ve noticed that GPT‑4o can occasionally crop longer images, like posters, too tightly, especially near the bottom.

*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

div]:p-0 col-span-12″>

Hallucinations
p]:text-caption prose [&>p]:text-primary-100 !my-[0] max-w-none [&>p]:my-0″>

Like our other text models, image generation can also make up information, especially in low-context prompts.

*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

div]:p-0 col-span-12″>

High binding problems
p]:text-caption prose [&>p]:text-primary-100 !my-[0] max-w-none [&>p]:my-0″>

When generating images that rely on its knowledge base, it may struggle to accurately render more than 10-20 distinct concepts at once, such as a full periodic table.

*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

div]:p-0 col-span-12″>

Precise graphing
*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

div]:p-0 col-span-12″>

Multilingual text rendering
p]:text-caption prose [&>p]:text-primary-100 !my-[0] max-w-none [&>p]:my-0″>

The model sometimes struggles with rendering non-Latin languages, and the characters can be inaccurate or hallucinated, especially with more complexity.

*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

div]:p-0 col-span-12″>

Editing precision
p]:text-caption prose [&>p]:text-primary-100 !my-[0] max-w-none [&>p]:my-0″>

We’ve noticed that requests to edit specific portions of an image generation, such as typos are not always effective and may also alter other parts of the image in a way that was not requested or introduce more errors. We’re currently working on introducing increased editing precision to the model.  

We’re aware of a bug where the model struggles with maintaining consistency of edits to faces from user uploads but expect this to be fixed within the week.

*]:!my-0 pointer-events-none absolute left-0 top-0 w-full opacity-0″ aria-hidden=”true”>

div]:p-0 col-span-12″>

Dense information with small text
p]:text-caption prose [&>p]:text-primary-100 !my-[0] max-w-none [&>p]:my-0″>

The model is known to struggle when asked to render detail information at a very small size.

Safety

In line with our Model Spec, we aim to maximize creative freedom by supporting valuable use cases like game development, historical exploration, and education—while maintaining strong safety standards. At the same time, it remains as important as ever to block requests that violate those standards. Below are evaluations of additional risk areas where we’re working to enable safe, high-utility content and support broader creative expression for users.

Provenance via C2PA and internal reversible search
All generated images come with C2PA⁠ metadata, which will identify an image as coming from GPT‑4o, to provide transparency. We’ve also built an internal search tool that uses technical attributes of generations to help verify if content came from our model.

Blocking the bad stuff
We’re continuing to block requests for generated images that may violate our content policies, such as child sexual abuse materials and sexual deepfakes. When images of real people are in context, we have heightened restrictions regarding what kind of imagery can be created, with particularly robust safeguards around nudity and graphic violence. As with any launch, safety is never finished and is rather an ongoing area of investment. As we learn more about real-world use of this model, we’ll adjust our policies accordingly.

For more on our approach, visit the image generation addendum to the GPT‑4o system card.

Using reasoning to power safety
Similar to our deliberative alignment work, we’ve trained a reasoning LLM to work directly from human-written and interpretable safety specifications. We used this reasoning LLM during development to help us identify and address ambiguities in our policies. Together with our multimodal advancements and existing safety techniques developed for ChatGPT and Sora, this allows us to moderate both input text and output images against our policies.

Access and availability

4o image generation rolls out starting today to Plus, Pro, Team, and Free users as the default image generator in ChatGPT, with access coming soon to Enterprise and Edu. It’s also available to use in Sora. For those who hold a special place in their hearts for DALL·E, it can still be accessed through a dedicated DALL·E GPT.

Developers will soon be able to generate images with GPT‑4o via the API, with access rolling out in the next few weeks.

Creating and customizing images is as simple as chatting using GPT‑4o – just describe what you need, including any specifics like aspect ratio, exact colors using hex codes, or a transparent background. Because this model creates more detailed pictures, images take longer to render, often up to one minute.

credit creator: [Alex Duffy](https://every.to/@AlxAi)
credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#)
credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#)
credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#)
credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#)
credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#)
credit creator: [August Kamp](https://www.instagram.com/august.kamp/?igsh=MTRpeG9xd3F2MzEyeg#)
credit creator: Cassandra Ansara
credit creator: [Isa](https://www.instagram.com/isabelitavirtual/?igsh=ZHdoYjFwYzV6dzFi#)
credit creator: [Isa](https://www.instagram.com/isabelitavirtual/?igsh=ZHdoYjFwYzV6dzFi#)
credit creator: Les Morgan
credit creator: Les Morgan
credit creator: [Derya Unatmaz](https://x.com/deryatr_)
credit creator: [Derya Unatmaz](https://x.com/deryatr_)
credit creator: [Derya Unatmaz](https://x.com/deryatr_)
credit creator: [Elene Chekurishvili](https://www.instagram.com/th_ene_ighbor/?igsh=eDh2Z2kyOGhnaXA0#)
credit creator: [Elene Chekurishvili](https://www.instagram.com/th_ene_ighbor/?igsh=eDh2Z2kyOGhnaXA0#)
credit creator: [Elene Chekurishvili](https://www.instagram.com/th_ene_ighbor/?igsh=eDh2Z2kyOGhnaXA0#)
credit creator: [Elene Chekurishvili](https://www.instagram.com/th_ene_ighbor/?igsh=eDh2Z2kyOGhnaXA0#)
credit creator: [Elene Chekurishvili](https://www.instagram.com/th_ene_ighbor/?igsh=eDh2Z2kyOGhnaXA0#)
credit creator: [Eugenio Marongiu](https://www.instagram.com/katsukokoiso.ai/?igsh=YTduZnNjZ2RhdTM3#)
credit creator: [Eugenio Marongiu](https://www.instagram.com/katsukokoiso.ai/?igsh=YTduZnNjZ2RhdTM3#)
credit creator: Jesse Kramme
credit creator: Jesse Kramme
credit creator: Matthew Dear
credit creator: [Minh Do](https://www.instagram.com/minhsmind/?igsh=MTFscDRqZ3JiZHVveA%3D%3D#)
credit creator: [Niceaunties](https://www.instagram.com/niceaunties/?igsh=Nm1jZmV4YTF6MTQ%3D#)
credit creator: Eskcanta
credit creator: Eskcanta
credit creator: [Roope Rainisto](https://www.instagram.com/never_ever_never_land/?igsh=MXh3N3EyOWdoMmNubg%3D%3D#)
credit creator: [Roope Rainisto](https://www.instagram.com/never_ever_never_land/?igsh=MXh3N3EyOWdoMmNubg%3D%3D#)
credit creator: [Roope Rainisto](https://www.instagram.com/never_ever_never_land/?igsh=MXh3N3EyOWdoMmNubg%3D%3D#)
credit creator: Shane Copenhagen
credit creator: Will Maberry
credit creator: Manuel Sainsily
credit creator: Manuel Sainsily
credit creator: Manuel Sainsily
credit creator: Manuel Sainsily
credit creator: Manuel Sainsily
*]:first:mt-0 [&>*]:last:mb-0 line-clamp-1″>credit creator: [Alex Duffy](https://every.to/@AlxAi)

Livestream replay





Source link

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top