SPLAM: Accelerating Image Generation with
Sub-Path Linear Approximation Model

ECCV 2024
Chen Xu1,2, Tianhui Song1,2, Weixin Feng2, Xubin Li2,
Tiezheng Ge2, Bo Zheng2, and Limin Wang1,3*
1State Key Laboratory for Novel Software Technology, Nanjing University,
2Alibaba Group, 3Shanghai AI Lab
*Corresponding Author

Abstract

Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applica- tions in practical scenarios are hindered by slow inference speed. Draw- ing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining high-quality image generation. SLAM treats the PF-ODE trajectory as a series of PF-ODE sub-paths divided by sampled points, and harnesses sub-path linear (SL) ODEs to form a progressive and continuous error estimation along each individual PF-ODE sub-path. The optimization on such SL-ODEs al- lows SLAM to construct denoising mappings with smaller cumulative approximated errors. An efficient distillation method is also developed to facilitate the incorporation of more advanced diffusion models, such as latent diffusion models. Our extensive experimental results demonstrate that SLAM achieves an efficient training regimen, requiring only 6 A100 GPU days to produce a high-quality generative model capable of 2 to 4-step generation with high performance. Comprehensive evaluations on LAION, MS COCO 2014, and MS COCO 2017 datasets also illustrate that SLAM surpasses existing acceleration methods in few-step genera- tion tasks, achieving state-of-the-art performance both on FID and the quality of the generated images.

Method

Image0

Our Sub-path Linear Approximation Model employs Sub-path Linear ODEs to approximate the sub-paths on the PF-ODE trajectories, which is determined by the linear interpolation of corresponding endpoints. SLAM is then trained based on the consistent mapping along SL-ODEs to minimize the approximated errors.