A Deep Learning Library for X-Risk Optimization

An open-source library that translates theories to real-world applications

Latest News Install

Why LibAUC?

LibAUC is a novel deep learning library to offer an easier way to directly optimize commonly used performance measures and losses with user-friendly APIs. LibAUC has broad applications in AI for tackling both classic and emerging challenges, such as Classification of Imbalanced Data (CID), Learning to Rank (LTR), and Contrastive Learning of Representation (CLR).

LibAUC provides a unified framework to abstract the optimization of a family of risk functions called X-Risk, including surrogate losses for AUROC, AUPRC/AP, and partial AUROC that are suitable for CID, surrogate losses for NDCG, top-K NDCG, and listwise losses that are used in LTR, and global contrastive losses for CLR. For more details, please check our LibAUC paper.

Key Features

Easy Installation

Easy to install and integrate LibAUC into existing training pipeline using popular Deep Learning frameworks like PyTorch.

Users can learn different neural network structures (e.g., linear, MLP, CNN, GNN, transformer, etc) that support their data types.

Efficient Algorithms

Stochastic algorithms with provable convergence that support learning with millions of data points without a large batch size.

Hands-on Tutorials

Hands-on tutorials are provided for optimizing a variety of measures and objectives belonging to the family of X-risks.

What is X-Risk?

LibAUC is powered by Deep X-Risk Optimization (DXO), where X-Risk formally refers to a family of compositional measures in which the loss function of each data point is defined in a way that contrasts the data point with a large number of others. Mathematically, X-Risk optimization can be cast into the following abstract optimization problem:

$\min _{\mathbf{w} \in \mathbb{R}^d} F(\mathbf{w})=\frac{1}{|\mathcal{S}|} \sum_{\mathbf{z}_i \in \mathcal{S}} f_i\left(g\left(\mathbf{w} ; \mathbf{z}_i, \mathcal{S}_i\right)\right)$

where $$g: \mathbb{R}^d \mapsto \mathcal{R}$$ is a mapping, $$f_i: \mathcal{R} \mapsto \mathbb{R}$$ is a simple deterministic function, $$\mathcal{S}=\left\{\mathbf{z}_1, \ldots, \mathbf{z}_m\right\}$$ denotes a target set of data points, and $$\mathcal{S}_i$$ denotes a reference set of data points dependent or independent of $$\mathbf{z}_i$$. For mathmetrical derivations, please check our DXO paper.

3+

Challenges winning solution (e.g., Stanford CheXpert, MIT AICures, OGB Graph Property Prediction).

4+

Collaborations and deployments at multiple industrial units, e.g., Google, Uber, Tencent, etc.

25+

Scientific publications on top-tier AI Conferences (e.g., ICML, NeurIPS，ICLR).

37000+

Globally recognized by AI researchers across the world.

Applications

CheXpert

Our Deep AUROC Maximization method has achieved the 1st place on Stanford CheXpert Competition organized by Andrew Ng’s ML group on August 2020. CheXpert is a large dataset of chest X-rays and competition for automated chest x-ray interpretation and aims to automatically detect related diseases based on Chest X-ray images.

Self-Supervised Learning

Our SogCLR achieves a performance of 69.4% with a small batch size of 256 for top-1 linear evaluation accuracy using ResNet-50, which is on par with SimCLR (69.3%) with a large batch size 8,192 for self-supervised learning task on ImageNet1000 dataset. The pertained model can be widely used in many downstream computer vision tasks.

<-- Photo Credit: https://www.barstoolsports.com/blog/3377424/these-are-the-top-25-movies-of-2021-so-far-according-to-jeff-d-lowe -->
MoiveLens

Our Deep NDCG and top-K NDCG maximization algorithms (SONG and K-SONG) improve NDCG@10 by 11.7% and 12.4% over baseline methods implemented by Tensorflow Ranking library by Google on MovieLens20M with 20 millions of movie ratings of users. The prediction model can help build powerful recommender systems to make personalized movie recommendations.

Drug Discovery

Our LibAUC (AUROC, AUPRC) helped the team to achieve the 1st place at the MIT AI Cures Open Challenge, which is to predict antibacterial properties for fighting secondary effects of COVID19. Our AUC maximization algorithms improve the AUROC by 3%+ and AUPRC by 5%+ over the baseline models. Our framework can help tackle many practical health challenges.

Melanoma

Our Deep AUROC Maximization method outperforms standard deep learning methods for optimizing class-weighted imbalanced loss for detecting Melanoma based on skin images. We achieved the SOTA performance on 2020 Kaggle Melanoma Competition by improving the winner’s performance by 0.2% to predict Melanoma.

Stroke

Our Deep AUROC maximization method improves the baseline models by 4% for detecting Stroke on an internal data. Stroke is the 2nd leading cause for death globally, responsible for approximately 11% of total deaths. We collaborate with University of Iowa Hospitals & Clinics (UIHC) to build AI models for predicting Stroke based on CT perfusion data.

Tissue

Our Deep AUROC maximization methods achieve an improvement of 3% over baseline methods on PatchCamelyon dataset for identifying metastatic tissue from a microscopic image, which is a challenging diagnosis task even for pathologists. Building an automated AI detection system is essential for places that are short of pathological diagnosis services.

Citations

If any questions, please reach out to Zhuoning Yuan and Prof. Tianbao Yang. If LibAUC is helpful in your work, please cite the following papers:

@inproceedings{yuan2023libauc,
title={LibAUC: A Deep Learning Library for X-risk Optimization},
author={Zhuoning Yuan and Dixian Zhu and Zi-Hao Qiu and Gang Li and Xuanhui Wang and Tianbao Yang},
booktitle={29th SIGKDD Conference on Knowledge Discovery and Data Mining},
year={2023}
}

@article{yang2022algorithmic,
title={Algorithmic Foundation of Deep X-risk Optimization},
author={Yang, Tianbao},
journal={arXiv preprint arXiv:2206.00439},
year={2022}