LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

1 Kuaishou Technology    2 University of Science and Technology of China    3 Fudan University
Corresponding author

Institution

arXiv Paper
github Online Demo
github Code

Comparisons with existing methods

Self-reenactment


The results of our first-stage base model, without applying the stitching and retargeting modules, are presented here.



Cross-reenactment


The results of our first-stage base model are presented in Ours w/o stitching, while Ours shows the results with the stitching module applied.



Radar Comparisons


The quantitative radar comparisons with existing methods are presented above, with metrics normalized for better visualization.

Portrait animation from a still image with stitching

Across various styles (realistic, oil painting, sculpture, 3d rendering) and different sizes

Portrait video editing with stitching

All source videos are generated from a single image by Kling

Eyes and lip retargeting


Control the eyes-open extent conditioned on a given scalar
Interpolate start reference image.

Source

Loading...

Control the lip-open extent conditioned on a given scalar
Interpolate start reference image.

Source

Loading...


Effect of eyes retargeting



Effect of lip retargeting

🐱🐶 Generalization to Animals 🐼🐱

By fine-tuning on animals data, these cute cats, dogs and pandas can be precisely driven by humans






Abstract

Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation.


Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch. The inference code and models are available at https://github.com/KwaiVGI/LivePortrait.

Method

Pipeline of the first stage: base model training. The appearance and motion extractors \(\mathcal{F}\) and \(\mathcal{M}\), the warping module \(\mathcal{W}\), and the decoder \(\mathcal{G}\) are optimized. In this stage, models are trained from scratch. Please see the paper for more details.

Pipeline of the second stage: stitching and retargeting modules training. After training the base model in the first stage, we freeze the appearance and motion extractor, warpping module and decoder. Only the stitching module and the retargeting modules are optimized in the second stage. Please see the paper for more details.

BibTeX


@article{guo2024liveportrait,
    title   = {LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control},
    author  = {Guo, Jianzhu and Zhang, Dingyun and Liu, Xiaoqiang and Zhong, Zhizhou and Zhang, Yuan and Wan, Pengfei and Zhang, Di},
    journal = {arXiv preprint arXiv:2407.03168},
    year    = {2024}
}