OpenAI has debuted DALL·E, an AI model that can draw images based on text prompts it receives.
While AI is relatively good at duplicating things, it’s a significant leap for AI to create, and especially to create based on nothing more than a text prompt. DALL·E, “a portmanteau of the artist Salvador Dalí and Pixar’s WALL·E,” can do just that, drawing images from descriptions given to it.
DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.
This breakthrough opens the door to using language to manipulate visual images.
GPT-3 showed that language can be used to instruct a large neural network to perform a variety of text generation tasks. Image GPT showed that the same type of neural network can also be used to generate images with high fidelity. We extend these findings to show that manipulating visual concepts through language is now within reach.
The Holy Grail is the ability to engage in verbal communication with an AI, and having that AI understand and respond accordingly. OpenAI’s latest breakthrough is a step in that direction.