Robot • Human • Object • Scene



Hi, this is the website of RHOS team at MVIG. We study Human Activity Understanding, Visual Reasoning, and Embodied AI. We are building a knowledge-driven system that enables intelligent agents to perceive human activities, reason human behavior logics, learn skills from human activities, and interact with environment.

Research Interests:

(S) Embodied AI: how to make agents learn skills from humans and interact with human & scene & object.
(S-1) Human Activity Understanding: how to learn and ground complex/ambiguous human activity concepts (body motion, human-object/human/scene interaction) and object concepts from multi-modal information (2D-3D-4D).
(S-2) Visual Reasoning: how to mine, capture, and embed the logics and causal relations from human activities.
(S-3) General Multi-Modal Foundation Models: especially for human-centric perception tasks.
(S-4) Activity Understanding from A Cognitive Perspective: work with multidisciplinary researchers to study how the brain perceives activities.
(E) Human-Robot Interaction (e.g. for Smart Hospital): work with the healthcare team (doctors and engineers) in SJTU to develop intelligent robots to help people.


Yong-Lu Li
Email: yonglu_li[at]sjtu[dot]edu[dot]cn
Office: SEIEE-3-301
Shanghai Jiao Tong University


We are actively looking for self-motivated students (Master/PhD, 2025 spring & fall), interns / engineers / visitors (CV/ML/ROB/NLP/Math/Phys background, always welcome) to join us in Machine Vision and Intelligence Group (MVIG). If you share same/similar interests, feel free to drop me an email with your resume.

Click: Eng or for more details.

News and Olds

[2024.2] Our work Pangea and Video Distillation will appear at CVPR 2024.
[2023.12] Our work on primitive-based HOI reconstruction (P3HAOI) will appear at AAAI 2024!
[2023.9] The advanced HAKE reasoning engine based on LLM (Symbol-LLM) will appear at NeurIPS'23!
[2023.7] Our works on ego-centric video understanding and object concept learning will appear at ICCV'23!
[2023.7] The upgrade version of DCR will appear at IJCV!
[2022.12] HAKE 2.0 will appear at TPAMI!
[2022.12] OCL (Object Concept Leanring) is released on arXiv. Please visit the project page for details.
[2022.11] We release the human body part states and interactive object bounding box annotations upon AVA (2.1 & 2.2): [HAKE-AVA], and a CLIP-based human part state & verb recognizer: [CLIP-Activity2Vec].
[2022.11] AlphaPose will appear at TPAMI!
[2022.07] Two papers on longtailed learning, HOI detection are accepted by ECCV'22, arXivs and code are coming soon
[2022.03] Five papers on HOI detection/prediction, trajection prediction, 3D detection/keypoints are accepted by CVPR'22, papers and code are coming soon.
[2022.02] We release the human body part state labels based on AVA: HAKE-AVA and HAKE 2.0.
[2021.12] Our work on HOI generalization will appear at AAAI'22.
[2021.10] Learning Single/Multi-Attribute of Object with Symmetry and Group is accepted by TPAMI.
[2021.09] Our work Localization with Sampling-Argmax will appear at NeurIPS'21.
[2021.02] Upgraded HAKE-Activity2Vec is released! Images/Videos --> human box + ID + skeleton + part states + action + representation. [Demo] [Description]
[2021.01] TIN (Transferable Interactiveness Network) is accepted by TPAMI.
[2020.12] DecAug is accepted by AAAI'21.
[2020.09] Our work HOI Analysis will appear at NeurIPS 2020.
[2020.06] The larger HAKE-Large (>120K images with activity and part state labels) is released.
[2020.02] Three papers Image-based HAKE: PaSta-Net, 2D-3D Joint HOI Learning, Symmetry-based Attribute-Object Learning are accepted in CVPR'20! Papers and corresponding resources (code, data) will be released soon.
[2019.07] Our paper InstaBoost is accepted in ICCV'19.
[2019.06] The Part I of our HAKE: HAKE-HICO which contains the image-level part-state annotations is released.
[2019.04] Our project HAKE (Human Activity Knowledge Engine) begins trial operation.
[2019.02] Our paper on Interactiveness is accepted in CVPR'19.
[2018.07] Our paper on GAN & Annotation Generation is accepted in ECCV'18.
[2018.05] Presentation (Kaibot Team) in TIDY UP MY ROOM CHALLENGE | ICRA'18.
[2018.02] Our paper on Object Part States is accepted in CVPR'18.



Human Activity Knowledge Engine2018Project Page

Human Activity Knowledge Engine (HAKE) is a knowledge-driven system that aims at enabling intelligent agents to perceive human activities, reason human behavior logics, learn skills from human activities, and interact with objects and environments.


Object Concept Learning2022Project Page

We propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances.


Unified Action Semantic Space2023Project Page

We design an action semantic space in view of verb taxonomy hierarchy and covering massive actions. Thus, we can gather multi-modal datasets into a unified database in a unified label system, i.e., bridging “isolated islands” into a “Pangea”. Accordingly, we propose a bidirectional mapping model between physical and semantic space to fully use Pangea.


EgoPCA: A New Framework for EgoHOI2023Project Page

We rethinks and proposes a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA). We contribute comprehensive pre-train sets, balanced test sets and a new baseline, which are complete with a training-finetuning strategy and several new and effective mechanisms and settings to advance further research.


Shadow Hand Teleoperation System2023Project Page

A teleopration system for Shadow Dexterous Hand.


Video Distillation via Static-Dynamic Disentanglement2023Project Page

We provide the first systematic study of video distillation and introduce a taxonomy to categorize temporal compression, which motivates our unified framework of disentangling the dynamic and static information in the videos. It first distills the videos into still images as static memory and then compensates the dynamic and motion information with a learnable dynamic memory block.


*=equal contribution
#=corresponding author

Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

Ziyu Wang*, Yue Xu*, Cewu Lu, Yong-Lu Li#

From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Yong-Lu Li*, Xiaoqian Wu*, Xinpeng Liu, Yiming Dou, Yikun Ji, Junyi Zhang, Yixing Li, Jingru Tan, Xudong Lu, Cewu Lu
CVPR 2024, Highlight (11.9%)arXivPDFProjectCode

Primitive-based 3D Human-Object Interaction Modelling and Programming

Siqi Liu, Yong-Lu Li#, Zhou Fang, Xinpeng Liu, Yang You, Cewu Lu#

Revisit Human-Scene Interaction via Space Occupancy

Xinpeng Liu*, Haowen Hou*, Yanchao Yang, Yong-Lu Li#, Cewu Lu
arXiv 2023arXivPDFProjectCode

Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Xinpeng Liu, Yong-Lu Li#, Ailing Zeng, Zizheng Zhou, Yang You, Cewu Lu#
arXiv 2023arXivPDFProjectCode

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

Yue Xu, Yong-Lu Li#, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang
arXiv 2023arXivPDFCode

Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning

Xiaoqian Wu, Yong-Lu Li#, Jianhua Sun, Cewu Lu#
NeurIPS 2023arXivPDFProjectCode

Beyond Object Recognition: A New Benchmark towards Object Concept Learning

Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Yuan Yao, Siqi Liu, Cewu Lu

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

Yue Xu, Yong-Lu Li#, Zhemin Huang, Michael Xu LIU, Cewu Lu, Yu-Wing Tai, Chi Keung Tang.

Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive Tasks

Xinyu Xu, Yong-Lu Li#, Cewu Lu#.
IJCV 2023arXivPDFCode

Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions

Yong-Lu Li*, Hongwei Fan*, Zuoyu Qiu, Yiming Dou, Liang Xu, Hao-Shu Fang, Peiyang Guo, Haisheng Su, Dongliang Wang, Wei Wu, Cewu Lu
A part of the HAKE Project

HAKE: Human Activity Knowledge Engine

Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Mingyang Chen, Ze Ma, Shiyi Wang, Hao-Shu Fang, Cewu Lu
Tech ReportHAKE1.0arXivPDFProjectCode
Main Repo: HAKE Star
Sub-repos: Torch StarTF StarHAKE-AVA Star
Halpe StarHOI List Star

HAKE: A Knowledge Engine Foundation for Human Activity Understanding

Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Yizhuo Li, Zuoyu Qiu, Liang Xu, Yue Xu, Hao-Shu Fang, Cewu Lu
TPAMI 2023HAKE2.0arXivPDFProjectCodePress

AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time

Hao-Shu Fang*, Jiefeng Li*, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, Cewu Lu
TPAMI 2022arXivPDFCodeStar

Constructing Balance from Imbalance for Long-tailed Image Recognition

Yue Xu*, Yong-Lu Li*, Jiefeng Li, Cewu Lu
ECCV 2022DLSAarXivPDFCodeStar

Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection

Xiaoqian Wu*, Yong-Lu Li*, Xinpeng Liu, Junyi Zhang, Yuzhe Wu, Cewu Lu
ECCV 2022PartMaparXivPDFCodeStar

Interactiveness Field of Human-Object Interactions

Xinpeng Liu*, Yong-Lu Li*, Xiaoqian Wu, Yu-Wing Tai, Cewu Lu, Chi Keung Tang
CVPR 2022arXivPDFCodeStar

Human Trajectory Prediction with Momentary Observation

Jianhua Sun, Yuxuan Li, Liang Chai, Hao-Shu Fang, Yong-Lu Li, Cewu Lu

Learn to Anticipate Future with Dynamic Context Removal

Xinyu Xu, Yong-Lu Li, Cewu Lu
CVPR 2022DCRarXivPDFCodeStar

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

Yang You, Zelin Ye, Yujing Lou, Chengkun Li, Yong-Lu Li, Lizhuang Ma, Weiming Wang, Cewu Lu
CVPR 2022arXivPDFCodeStar

UKPGAN: Unsupervised KeyPoint GANeration

Yang You, Wenhai Liu, Yong-Lu Li, Weiming Wang, Cewu Lu
CVPR 2022arXivPDFCodeStar

Highlighting Object Category Immunity for the Generalization of Human-Object Interaction Detection

Xinpeng Liu*, Yong-Lu Li*, Cewu Lu
AAAI 2022arXivPDFCodeStar

Learning Single/Multi-Attribute of Object with Symmetry and Group

Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Cewu Lu
TPAMI 2021SymNetarXivPDFCodeStar
An extension of our CVPR 2020 work (Symmetry and Group in Attribute-Object Compositions, SymNet).

Localization with Sampling-Argmax

Jiefeng Li, Tong Chen, Ruiqi Shi, Yujing Lou, Yong-Lu Li, Cewu Lu
NeurIPS 2021arXivPDFCodeStar

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Xijie Huang, Liang Xu, Cewu Lu
TPAMI 2021TIN++arXivPDFCodeStar
An extension of our CVPR 2019 work (Transferable Interactiveness Network, TIN).

DecAug: Augmenting HOI Detection via Decomposition

Yichen Xie, Hao-Shu Fang, Dian Shao, Yong-Lu Li, Cewu Lu
AAAI 2021arXivPDF

HOI Analysis: Integrating and Decomposing Human-Object Interaction

Yong-Lu Li*, Xinpeng Liu*, Xiaoqian Wu, Yizhuo Li, Cewu Lu

PaStaNet: Toward Human Activity Knowledge Engine

Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Hao-Shu Fang, Ze Ma, Mingyang Chen, Cewu Lu.

Oral Talk: Compositionality in Computer Vision in CVPR 2020

Detailed 2D-3D Joint Representation for Human-Object Interaction

Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu, Jiefeng Li, Cewu Lu

Symmetry and Group in Attribute-Object Compositions

Yong-Lu Li, Yue Xu, Xiaohan Mao, Cewu Lu
CVPR 2020SymNetarXivPDFVideoSlidesCodeStar

InstaBoost: Boosting Instance Segmentation Via Probability Map Guided Copy-Pasting

Hao-Shu Fang*, Jianhua Sun*, Runzhong Wang*, Minghao Gou, Yong-Lu Li, Cewu Lu
ICCV 2019arXivPDFCodeStar

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

Yong-Lu Li, Siyuan Zhou, Xijie Huang, Liang Xu, Ze Ma, Hao-Shu Fang, Yan-Feng Wang, Cewu Lu
CVPR 2019TINarXivPDFCodeStar

SRDA: Generating Instance Segmentation Annotation via Scanning, Reasoning and Domain Adaptation

Wenqiang Xu*, Yong-Lu Li*, Cewu Lu

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

Cewu Lu, Hao Su, Yong-Lu Li, Yongyi Lu, Li Yi, Chi-Keung Tang, Leonidas J. Guibas

Optimization of Radial Distortion Self-Calibration for Structure from Motion from Uncalibrated UAV Images

Yong-Lu Li, Yinghao Cai, Dayong Wen, Yiping Yang


Cewu Lu
Yong-Lu Li
Assistant Professor
Xinpeng Liu
PhD. Student
Yue Xu
PhD. Student
Xiaoqian Wu
PhD. Student
Siqi Liu
PhD. Student
Hong Li
PhD. Student
Yusong Qiu
Master Student
Yushun Xiang
Master Student


Yixing Li: CUHK, PhD
Mingyu Liu: ZJU, PhD
Kaitong Cui: HKU, Intern
Junyi Zhang: UC Merced, Intern
Yiming Dou: UMich, Ph.D.
Xiaohan Mao: Shanghai AI Lab & SJTU, Ph.D.
Zhemin Huang: Stanford University, MS
Shaopeng Guo: UCSD, Ph.D.
Xudong Lu: CUHK, Ph.D.
Hongwei Fan: Sensetime, Research Engineer
Yuan Yao: U of Rochester, Ph.D.
Zuoyu Qiu: SJTU, MS
Han Lu: SJTU, Ph.D.
Zhanke Zhou: HKBU, Ph.D.
Mingyang Chen: UCSD, MS
Liang Xu: EIAS & SJTU, Ph.D.
Ze Ma: Columbia University, MS
Xijie Huang: HKUST, Ph.D.
© Copyright 2022 MVIG-RHOS • Based on tbakerx