OCL: Object Concept Learning

MVIG-RHOS, SJTU

Understanding objects is a central building block of artificial intelligence, especially for embodied AI. Even though object recognition excels with deep learning, current machines still struggle to learn higher-level knowledge, e.g., what attributes does an object have, what can we do with an object. In this work, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances. To support OCL, we build a densely annotated knowledge base including extensive labels for three levels of object concept: categories, attributes, and affordances, together with their causal relations. By analyzing the causal structure of OCL, we present a strong baseline, Object Concept Reasoning Network (OCRN). It leverages causal intervention and concept instantiation to infer the three levels following their causal relations.

Demo

Left-top: object (in yellow box)

Right-top: key causal graph

Left-bottom: affordance prediction score

Right-bottom: key causal relations

Full demo on Youtube

Full demo on BiliBili

News and Olds

[2023.11] We release the code and data of OCL on Github.

[2023.07] OCL will appear at ICCV 2023.

[2022.12] Our preprint paper is available on arXiv.

Download

Our code and full data are available on Github.

Publications

To use our data and code in your project, please cite:

@inproceedings{li2023beyond,
  title={Beyond Object Recognition: A New Benchmark towards Object Concept Learning},
  author={Li, Yong-Lu and Xu, Yue and Xu, Xinyu and Mao, Xiaohan and Yao, Yuan and Liu, Siqi and Lu, Cewu},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={20029--20040},
  year={2023}
}

Disclaimer

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
In our database, 75,578 images and their anntations are extracted from existing datasets (COCOa, ImageNet-150K, aPY, SUN). 4,885 images are from internet. We only provide image links for research purposes.