Can an AI Learn The Concept of Pose And Appearance? 👱‍♀️


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. I apologize for my voice today, I am trapped
in this frail human body, and sometimes it falters. But you remember from the previous episode,
the papers must go on. In the last few years, we have seen a bunch
of new AI-based techniques that were specialized in generating new and novel images. This is mainly done through learning-based
techniques, typically a Generative Adversarial Network, a GAN in short, which is an architecture
where a generator neural network creates new images, and passes it to a discriminator network,
which learns to distinguish real photos from these fake, generated images. These two networks learn and improve together,
so much so that many of these techniques have become so realistic that we sometimes can’t
even tell they are synthetic images unless we look really closely. You see some examples here from BigGAN, a
previous technique that is based on this architecture. Now, normally, if we are looking to generate
a specific human face, we have to generate hundreds and hundreds of these images, and
our best bet is to hope that sooner or later, we’ll find something that we were looking
for. So, of course, scientists were interested
in trying to exert control over the outputs, and with followup works, we can now kind of
control the appearance, but, in return, we have to accept the pose in which they are
given. And, this new project is about teaching a
learning algorithm to separate pose from identity. Now, that sounds kind of possible with proper
supervision. What does this mean exactly? Well, we have to train these GANs on a large
number of images so they can learn what a human face looks like, what landmarks to expect
and how to form them properly when generating new images. However, when the input images are given with
different poses, we will normally need to add additional information to the discriminator
that describes the rotations of these people and objects. Well, hold on to your papers, because that
is exactly what is not happening in this new work. This paper proposes an architecture that contains
a 3D transform and a projection unit, you see them here with red and blue, and, these
help us in separating pose and identity. As a result, we have much finer artistic control
over these during image generation. That is amazing. So as you see here, it enables a really nice
workflow where we can also set up the poses. Don’t like the camera position for this
generated bedroom? No problem. Need to rotate the chairs? No problem. And we are not even finished yet, because
when we set up the pose correctly, we’re not stuck with these images – we can also
choose from several different appearances. And all this comes from the fact that this
technique was able to learn the intricacies of these objects. Love it. Now, it is abundantly clear that as we rotate
these cars, or change the camera viewpoint for the bedroom, a flickering effect is still
present. And this, is how research works. We try to solve a new problem, one step at
a time. Then, we find flaws in the solution, and improve
upon that. As a result, we always say, two more papers
down the line, and we’ll likely have smooth and creamy transitions between the images. The Lambda sponsorship spot is coming in a
moment, and I don’t know if you have noticed at the start, but they were also part of this
research project as well. I think that is as relevant of a sponsor as
it gets. Thanks for watching and for
your generous support, and I’ll see you next time!

60 Comments

Add a Comment

Your email address will not be published. Required fields are marked *