Explained: DALL·E 2
OpenAI’s latest image-generating AI system
Something I learned while writing this: The real Salvador Dalí used to draw on the back of his cheques whenever he used to pay for meals, knowing that restaurants would never cash a cheque with his original artwork on it. THE AUDACITY 🤣
OpenAI’s new work of art
What do you get if you mix the creativity of Salvador Dalí with the intelligence of WALL-E? OpenAI’s new brainchild: DALL·E 2.
The AI research lab just introduced the latest version of its image-generating AI system to the world. The first version, DALL·E, was introduced back in 2021. DALL·E 2 is a significant improvement compared to its predecessor. It can better understand words and create more photorealistic and high-resolution images.
When asked to generate an image of “bears shopping for groceries in Ancient Egypt”, DALL·E 2 generated the following image:
You can even specify which art style you would like.
“An astronaut playing basketball with cats in space as a children’s book illustration” returns this image:
DALL·E 2 is still a research project and is not available to the public yet. OpenAI hasn’t outlined any specific or intended applications for it:
Our hope is that DALL·E 2 will empower people to express themselves creatively. DALL·E 2 also helps us understand how advanced AI systems see and understand our world, which is critical to our mission of creating AI that benefits humanity.
– OpenAI
Drawing and stealing like an artist
DALL·E 2 can perform 3 types of tasks:
- Create brand new images
- Edit existing images
- Create variations of existing images
#1: Creating brand new images
DALL·E can create brand new images from a text description, as long as it understands the words you enter. It doesn’t just mashup different concepts together in one image, but it understands the relationship between items and can represent actions visually.
In the “koala dunking a basketball” example, DALL·E 2 needs to understand and put together three concepts: koalas, basketball, and the act of dunking. DALL·E correctly generates an image of an airborne koala dunking like it’s at the NBA All-Star Weekend.
#2: Editing existing images
When you don’t need DALL·E 2 to channel its inner artist, it can make realistic edits to existing images while maintaining consistent textures, shadows, and reflections.
The researchers at OpenAI used DALL·E 2 to give the Mona Lisa a mohawk. If you look closely at the image, you can see how the hair colour was well-preserved: the light is coming from the left, making the front of the mohawk lighter than the side. The top seems a bit blurry, but it’s still impressive.
At least it doesn’t edit paintings like Mr. Bean.
#3: Creating variations of existing images
Finally, DALL·E can copy something and change it up a bit. The AI system can take an existing image and create new variations of it. An example:
OpenAI wants to minimize potential misuse
Like any other technology, AI can be used for unpleasant reasons.
According to OpenAI, the research group took several measures to minimize potential misuse:
- Preventing harmful generations: Data containing violence, hate, or adult images was removed from the training data so DALL·E 2 wouldn’t be exposed to these concepts and start understanding them.
- OpenAI also says they used “advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures”. I couldn’t find more information on how they did this exactly.
- Preventing misuse: DALL·E 2 doesn’t generate images when it’s given a text description containing violent, adult, or political content. You can read OpenAI’s full content policy here.
- Phased deployment: OpenAI decided to phase out the launch of DALL·E 2 as it works with a select group of experts to understand its capabilities and limitations in more depth. I signed up for the waitlist so maybe I’ll get access soon and experiment with it.
The risks of DALL·E 2
Despite these measures, OpenAI still found multiple risks and limitations with DALL·E 2 when testing the system:
- Explicit content
- Bias and representation
- Harassment, bullying, and exploitation
- Dis- and misinformation
- Economic
- Copyright and trademarks
I’m summarizing the main risks below but I’ve included a link to the detailed analysis provided by OpenAI in the Deep Dive section.
#1: Explicit content
Although DALL·E 2 won’t generate an image when given a text prompt that includes violence or nudity, it can still create images that suggest these topics when visual synonyms are used.
For example:
- A man with blood all over his shirt → No image generated ❌
- A man with ketchup all over his shirt → Image generated ✅
Even if ketchup is harmless, it would still generate an image containing what most of us would assume to be blood in that context.
#2: Bias and representation
DALL·E 2 may reinforce existing gender, racial, or cultural stereotypes due to bias in the model’s training data. Testing of the model uncovered different types of biases:
- Racial bias: It overrepresented people who are white.
- Gender bias: It overrepresented certain genders based on professions. Images of nurses contained mostly females, while images of CEOs contained mostly males.
- Cultural bias: It defaults to Western culture, customs, and traditions when generating images of things like weddings, restaurants, and homes.
#3: Harassment, bullying, and exploitation
Since DALL·E tries to maintain consistent textures, reflections, and shadows when editing images, it can become hard to distinguish them from reality.
Although images can be edited and altered with many other tools, DALL·E makes the process much easier and faster compared to something like Photoshop which needs more time and effort to learn. It might even give you a more realistic image compared to the one you tried editing in Photoshop.
#4: Dis- and misinformation
This is somewhat related to the previous point but it has wider and more serious implications.
Editing or creating photorealistic images to deceive or mislead people can be extremely manipulative. We’re already facing widespread misinformation with something as rudimentary as fake articles, and more recently with other AI applications like deepfakes.
#5: Economic
DALL·E’s super-charged creation and editing skills could replace some of the work done by designers, photographers, models, and artists.
I can envision applications to generate custom art and logos for individuals at a fraction of the price of hiring a designer. It would be harder to replace an entire creative team for a bigger project since DALL·E 2 gives you little control over the art direction.
Ownership is another problem. Who owns the art generated by DALL·E 2? OpenAI says that commercial use of these generated images is not allowed but that would be difficult, if not impossible, to track. This reminds me of the previous dilemma I discussed in the Artificial Inventor episode.
#6: Copyright and trademarks
Finally, OpenAI says that the model can generate images with trademarked logos or copyrighted characters. The model was trained on large and public datasets that may contain references to IP-protected elements or concepts which are hard to filter out.
Final thoughts
This is one of those innovations that make you go “this is cool!” until you start learning about its equally-harmful applications.
That was my reaction in the process of discovering and learning more about DALL·E 2. Koalas dunking basketballs and Mona Lisa with a mohawk are fun and creative visualizations that get me excited about trying the system out. But altering images to harm and deceive people makes me hope that it’s never released to the public.
I think there’s a middle ground, however. Almost all of DALL·E’s risks come from generating photorealistic images of real people because they can be hard to separate from reality. It can completely ruin our trust systems when it comes to consuming online content.
Many of these risks could be eliminated if DALL·E 2 was only trained to generate images in artistic styles like line drawings, cartoons, and watercolour. These would enable fun and creative experiments that aren’t competing with reality. And I believe this would better preserve OpenAI’s goal of empowering people to express themselves creatively.