Pangea: Unified Semantic Space for Human Action Understanding


CVPR 2024, Highlight

Action understanding matters and attracts attention. It can be formed as the mapping from the action physical space to the semantic space. Typically, researchers built action datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Thus, datasets are incompatible with each other like “Isolated Islands” due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that a more principled semantic space is an urgent need to concentrate the community efforts and enable us to use all datasets together to pursue generalizable action learning. To this end, we design a Poincare action semantic space in view of verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging “isolated islands” into a “Pangea”. Accordingly, we propose a bidirectional mapping model between physical and semantic space to fully use Pangea. In extensive experiments, our method shows significant superiority, especially in transfer learning.


Top: video frames

Left-bottom: semantic prediction visualization

Right-bottom: semantic prediction details

News and Olds

[2024.1] Our code and data will be available at Code.
[2023.4] Our paper is available at arXiv.
[2023.3] Trail run




[arXiv], [Code]


Before using our data and code in your project, please cite:
  title={From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding},
  author={Li, Yong-Lu and Wu, Xiaoqian and Liu, Xinpeng and Dou, Yiming and Ji, Yikun 
    and Zhang, Junyi and Li, Yixing and Tan, Jingru and Lu, Xudong and Lu, Cewu},
  journal={arXiv preprint arXiv:2304.00553},
© Copyright 2022 MVIG-RHOS • Based on tbakerx