One-vs-rest training is a pervasive optimization regime in deep learning, whether the problem is supervised, self-supervised, or multi-modal in nature. The real world is, however, not binary, but governed by hierarchies. Hierarchies provide key information about the semantic relation between concepts, about which mistakes to avoid, and about the inherent organization of vision and language itself. Hierarchical learning, therefore, has a long history in computer vision and has gained further traction with the rise of hyperbolic deep learning. Currently, however, hierarchies are not standardized and centrally organized. Instead, such knowledge is scattered around various repositories, with inconsistent formatting, organizations, and availability. The lack of a central hub for hierarchies in vision datasets harms the utility and reproducibility of hierarchical learning. This paper introduces HierVision, a central hub for hierarchical knowledge in vision datasets. This hub contains 60+ hierarchical sources, spanning actions, concepts, fine-grained categories, vision-language, and more. We outline a uniform coding of the hierarchies and procedures to embed them in existing pipelines. With this hub, we hope to positively impact the broad use and re-use of hierarchies for deep learning in computer vision.
HierVision: Standardized and Reproducible Hierarchical Sources for Vision Datasets / Kasarla, Tejaswi; Hulikal Rooparaghunath, Ruthu; D'Arrigo, Stefano; Mago, Gowreesh; Jha, Abhishek; Ayoughi, Melika; Shreya Mishra, Swasti; Manzano Rodríguez, Ana; Long, Teng; Ghadimi Atigh, Mina; Van Spengler, Max; Mettes, Pascal. - (2025), pp. 671-684. ( IEEE International Conference on Computer Vision Honolulu; Hawaii, USA ).
HierVision: Standardized and Reproducible Hierarchical Sources for Vision Datasets
Stefano D'ArrigoData Curation
;
2025
Abstract
One-vs-rest training is a pervasive optimization regime in deep learning, whether the problem is supervised, self-supervised, or multi-modal in nature. The real world is, however, not binary, but governed by hierarchies. Hierarchies provide key information about the semantic relation between concepts, about which mistakes to avoid, and about the inherent organization of vision and language itself. Hierarchical learning, therefore, has a long history in computer vision and has gained further traction with the rise of hyperbolic deep learning. Currently, however, hierarchies are not standardized and centrally organized. Instead, such knowledge is scattered around various repositories, with inconsistent formatting, organizations, and availability. The lack of a central hub for hierarchies in vision datasets harms the utility and reproducibility of hierarchical learning. This paper introduces HierVision, a central hub for hierarchical knowledge in vision datasets. This hub contains 60+ hierarchical sources, spanning actions, concepts, fine-grained categories, vision-language, and more. We outline a uniform coding of the hierarchies and procedures to embed them in existing pipelines. With this hub, we hope to positively impact the broad use and re-use of hierarchies for deep learning in computer vision.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


