RHOS

About

Hi, this is the website of RHOS team at MVIG. We study Embodied AI, Physical Reasoning, and Human Activity Understanding. We are building a knowledge and reasoning-driven system that enables intelligent agents/robots to perceive human activities, reason human behavior logics, learn skills from human activities, and interact with the environment.

Research Interests:

(S) Embodied AI: how to make agents learn skills from humans and interact with humans & scenes & objects.
(S-1) Human Activity Understanding: how to learn and ground complex/ambiguous human activity concepts (body motion, human-object/human/scene interaction) and object concepts from multi-modal information (2D-3D-4D).
(S-2) Visual Reasoning: how to mine, capture, and embed the logic and causal relations from human activities.
(S-3) General Multi-Modal Foundation Models: especially for human-centric perception tasks.
(S-4) Activity Understanding from A Cognitive Perspective: work with multidisciplinary researchers to study how the brain perceives activities.
(E) Human-Robot Interaction for hospital, home, factory, etc.: work with experts in different domains to develop intelligent robots to help people.

Yong-Lu Li
Email: yonglu_li[at]sjtu[dot]edu[dot]cn Shanghai Jiao Tong University Shanghai Innovation Institute

Check out some of our work

Main Repo:	HAKE Star
Sub-repos:	Torch Star	TF Star	HAKE-AVA Star
	Halpe Star	HOI List Star

RHOS

RHOS

About

Contact

Recruitment

News and Olds

Projects

HAKE

OCL

Pangea

EgoPCA

Human-Robot Joint Learning

Video-Distillation

Check out some of our work

HAKE 2.0

PartMap

DLSA

Interactiveness-Field

DCR

OC-Immunity

TIN & TIN++

SymNet

HOI Analysis

PaStaNet

DJ-RN

HAKE 1.0

Publications

---Robot Demos---

Dense Policy: Bidirectional Autoregressive Learning of Actions

SIME: Enhancing Policy Self-Improvement with Modal-level Exploration

Motion Before Action: Diffusing Object Motion as Manipulation Condition

Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions

Homogeneous Dynamics Space for Heterogeneous Humans

GaPT-DAR: Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction

M3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition

ImDy: Human Inverse Dynamics from Imitated Observations

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs

exUMI: Extensible System for Robot Teaching with Precise Proprioception and Multi-Modalities

Interacted Object Grounding in Spatio-Temporal Human-Object Interactions

Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models

General Articulated Objects Manipulation in Real Images via Part-Aware Diffusion Process

HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

Take A Step Back: Rethinking the Two Stages in Visual Reasoning

Revisit Human-Scene Interaction via Space Occupancy

Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control

Low-Rank Similarity Mining for Multimodal Dataset Distillation

Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Primitive-based 3D Human-Object Interaction Modelling and Programming

Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning

Beyond Object Recognition: A New Benchmark towards Object Concept Learning

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive Tasks

Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions

HAKE: Human Activity Knowledge Engine

HAKE: A Knowledge Engine Foundation for Human Activity Understanding

AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time

Constructing Balance from Imbalance for Long-tailed Image Recognition

Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection

Interactiveness Field of Human-Object Interactions

Human Trajectory Prediction with Momentary Observation

Learn to Anticipate Future with Dynamic Context Removal

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

UKPGAN: Unsupervised KeyPoint GANeration

Highlighting Object Category Immunity for the Generalization of Human-Object Interaction Detection

Learning Single/Multi-Attribute of Object with Symmetry and Group

Localization with Sampling-Argmax

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

DecAug: Augmenting HOI Detection via Decomposition

HOI Analysis: Integrating and Decomposing Human-Object Interaction

PaStaNet: Toward Human Activity Knowledge Engine

Detailed 2D-3D Joint Representation for Human-Object Interaction

Symmetry and Group in Attribute-Object Compositions

InstaBoost: Boosting Instance Segmentation Via Probability Map Guided Copy-Pasting

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

SRDA: Generating Instance Segmentation Annotation via Scanning, Reasoning and Domain Adaptation