跳至主導覽 跳至搜尋 跳過主要內容

UMRetail: A Unified Multimodal Dataset for Hyper-Dense Shelves in Smart Retail

  • University of Coimbra
  • Guangzhou College of Commerce
  • Macao Polytechnic University

研究成果: Conference contribution同行評審

摘要

Vision-language models and multi-task learning are advancing scene understanding toward unified multimodal frameworks. However, retail datasets are fragmented: most target a single task, and heterogeneous annotation protocols and semantic granularity impede joint training, inference, and fair benchmarking. We present UMRetail, a unified multimodal dataset of real-world retail shelves with human-verified annotations. It comprises 17,697 high-resolution images covering 3,812 product types and provides instance-level segmentation masks, product detection bounding boxes, shelf-vacancy labels, and hierarchical product descriptions (short, medium, long) ranging from concise names to detailed specifications. These harmonized, cross-task labels enable integrated training and consistent evaluation for detection, segmentation, and vacancy detection. Experimental results demonstrate that UMRetail's rich data labels provide a reliable basis for rigorous evaluations: YOLOv11 Medium achieves state-of-the-art edge-device product detection (mAP 0.551, mAP50 0.806); UMRetail-MTArch raises image-to-text retrieval R@1 by 143.1% vs zero-shot Chinese CLIP and hits 74.55% Top-1 in zero-shot classification for 3,812 classes, which is 5.5 times that of Chinese-CLIP (13.43%) and 31 times that of CLIP (ViT-B, 2.39%). This establishes UMRetail as a research-deployment bridge for retail scene perception.

原文English
主出版物標題Proceedings - 2025 International Conference on Virtual Reality and Visualization, ICVRV 2025
發行者Institute of Electrical and Electronics Engineers Inc.
頁面336-341
頁數6
ISBN(電子)9798331556297
DOIs
出版狀態Published - 2025
事件2025 International Conference on Virtual Reality and Visualization, ICVRV 2025 - Bogota, Colombia
持續時間: 19 12月 202521 12月 2025

出版系列

名字Proceedings - 2025 International Conference on Virtual Reality and Visualization, ICVRV 2025

Conference

Conference2025 International Conference on Virtual Reality and Visualization, ICVRV 2025
國家/地區Colombia
城市Bogota
期間19/12/2521/12/25

指紋

深入研究「UMRetail: A Unified Multimodal Dataset for Hyper-Dense Shelves in Smart Retail」主題。共同形成了獨特的指紋。

引用此