AI method "DragGAN" promises to revolutionize digital image processing
Imagine being able to try on different clothes on a virtual avatar and see how they look from every angle. Or adjusting the direction your pet is looking in your favorite photo. You could even change the perspective of a landscape picture. These types of photo edits have always been challenging, even for experts.
A novel AI tool now promises that with just a few mouse clicks, anyone can achieve edits like these effortlessly. The method is being developed by a research team led by the Max Planck Institute for Informatics in Saarbrücken, in particular by the Saarbruecken Research Center for Visual Computing, Interaction and Artificial Intelligence (VIA) located there.
This groundbreaking method has the potential to revolutionize digital image processing. "With 'DragGAN,' we are currently creating a user-friendly tool that allows even non-professionals to perform complex image editing. All you need to do is mark the areas in the photo that you want to change and specify the desired edits in a menu. Thanks to the support of AI, with just a few clicks of the mouse anyone can adjust things like the pose, facial expression, direction of gaze, or viewing angle, for example in a pet photo," explains Christian Theobalt, Managing Director of the Max Planck Institute for Informatics, Director of the Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence, and Professor at Saarland University at Saarland Informatics Campus.
This is made possible through the use of artificial intelligence, specifically a type of model called "Generative Adversarial Networks" or GANs.
"As the name suggests, GANs are capable of generating new content, such as images. The term 'adversarial' refers to the fact that GANs involve two networks competing against each other," explains Xingang Pan, a postdoctoral researcher at the MPI for Informatics and the first author of the paper.
A GAN consists of a generator, responsible for creating images, and a discriminator, whose task is to determine whether an image is real or generated by the generator. These two networks engage in a competition are trained until it reaches a point where the generator produces images that the discriminator cannot differentiate from real ones.
There are many uses for GANs. For example, besides the obvious use case of image generator, GANs are good at predicting images: this allows for so-called video frame prediction, which can reduce the data requirements for video streaming by anticipating the next frame of a video. Or they can upscale low-resolution images, improving image quality by computing where the additional pixels of the new images should go.
"In our case, this property of GANs proves advantageous when, for example, the direction of a dog's gaze is to be changed in an image. The GAN then basically recalculates the whole image, anticipating where which pixel must land in the image with a new viewing direction. A side effect of this is that DragGAN can calculate things that were previously occluded by the dog's head position, for example. Or if the user wants to show the dog's teeth, he can open the dog's muzzle on the image," explains Xingang Pan.
DragGAN could also find applications in professional settings. For instance, fashion designers could utilize its features to make adjustments to the cut of clothing in photographs after the initial capture. Additionally, vehicle manufacturers could efficiently explore different design configurations for their planned vehicles. While DragGAN works on diverse object categories like animals, cars, people and landscapes, most results are achieved on GAN-generated synthetic images.
"How to apply it to any user-input images is still a challenging problem that we are looking into," adds Xingang Pan.
Just a few days after the release of the preprint, the new tool from the Saarbrücken-based computer scientists is already causing a stir in the international tech community and is considered by many the next big step in AI-assisted image processing. While tools like Midjourney can be used to create completely new images, DragGAN could massively simplify their post-processing.
The new method is being developed at the Max Planck Institute for Informatics in collaboration with the Saarbrücken Research Center for Visual Computing, Interaction and Artificial Intelligence (VIA), which was opened there in partnership with Google. The research consortium also includes experts from the Massachusetts Institute of Technology (MIT) and the University of Pennsylvania.
More information:
Xingang Pan et al, Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold, arXiv (2023). DOI: 10.48550/arxiv.2305.10973
Provided by Saarland University