Abstract
Multi-label image classification models often inevitably learn on partially labeled datasets, where a considerable proportion of labels are missing. However, the popular PyTorch deep learning ecosystem is less compatible with training on partially labeled datasets, as many built-in functions like loss functions and metrics do not work correctly or raise errors when unknown labels are present. To this end, we present an original and easy-to-install Python package called mlcpl, which expands the PyTorch ecosystem to offer a friendly environment for learning with partially labeled datasets. The package provides a series of multi-label loss functions and metrics that are compatible with unknown labels. Seven recently proposed approaches are also implemented for the convenient use of cutting-edge techniques. In addition, eleven dataset loading functions, followed by three partial label simulation schemes, expedite the development of experiments. Furthermore, these functions are simple to use, have a PyTorch-like interface, and can collaborate well with other PyTorch components. Several examples of experiments with mlcpl are also provided for demonstration. We wish the release of this package could facilitate relevant academic research and real-world applications. The source code is available at https://github.com/maxium0526/mlcpl.
| Original language | English |
|---|---|
| Article number | 132588 |
| Journal | Neurocomputing |
| Volume | 670 |
| DOIs | |
| Publication status | Published - 14 Mar 2026 |
Keywords
- Data augmentation
- Image recognition
- Missing data
- Weak supervision