Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

MVIG-RHOS, SJTU

Recently, dataset distillation has paved the way towards efficient machine learning, especially for image datasets. However, the distillation for videos, characterized by an exclusive temporal dimension, remains an underexplored domain. In this work, we provide the first systematic study of video distillation and introduce a taxonomy to categorize temporal compression. Our investigation reveals that the temporal information is usually not well learned during distillation, and the temporal dimension of synthetic data contributes little. The observations motivate our unified framework of disentangling the dynamic and static information in the videos. It first distills the videos into still images as static memory and then compensates the dynamic and motion information with a learnable dynamic memory block. Our method achieves state-of-the-art on video datasets at different scales, with notably smaller storage expenditure. Our code will be publicly available.

Method and Results

Overview of our method

The synthetic videos are learned in two stages:
Stage 1: static memory learning with image distillation on one frame per video.
Stage 2: the static (frozen) and dynamic memory are combined.

Results

Publications

If you find our paper, data or code usefull, please cite:

@article{wang2023dancing,
  title={Dancing with Images: Video Distillation via Static-Dynamic Disentanglement},
  author={Wang, Ziyu and Xu, Yue and Lu, Cewu and Li, Yong-Lu},
  journal={arXiv preprint arXiv:2312.00362},
  year={2023}
}