# Introduction to LibAUC

#### An Overview to Understanding LibAUC

*by Zhuoning Yuan and Tianbao Yang*

## Overview

Traditional risk functions such as the cross-entropy loss, are
limited in modeling a wide range of problems or tasks, e.g.,
classification with imbalanced data (CID), learning to rank (LTR),
contrastive learning of representations (CLR). **X-risk**
refers to a family of compositional measures in which the loss function
of each data point is defined in a way that contrasts the data point
with a large number of others. It covers a family of widely used
measures/losses including but not limited to the following three
interconnected categories:

**Areas Under the Curves**, including areas under ROC curves (AUROC), areas under Precision-Recall curves (AUPRC), one-way and two-wary partial areas under ROC curves.**Ranking Measures/Objectives**, including p-norm push for bipartite ranking, listwise losses for learning to rank (e.g., listNet), mean average precision (mAP), normalized discounted cumulative gain (NDCG), etc.**Contrastive Objectives**, including supervised contrastive objectives (e.g., NCA), and global self-supervised contrastive objectives improving upon SimCLR and CLIP.

## Relationships between X-Risks

The following figure demonstrates the relationships between different
X-risks. AUROC is a special case of one-way pAUC and two-way pAUC.
One-way pAUC with FPR in a range (0,*α*) is a special case of two-way
pAUC. Top Push is a special case of one-way pAUC and p-norm push. AP is
a non-parametric estimator of AUPRC. MAP and NDCG are similar in the
sense that they are functions of ranks. Top-K MAP, Top-K NDCG, Recall@K (R@K), Precision@K (P@K), partialAUC+Precision@K (pAp@K), Precision@Recall (P@R) are similar in the sense that they all
involve the computation of K-th largest scores in a set. Listwise losses, supervised contrastive losses, and self-supervised contrastive losses are similar in the sense that they all involve the sum of log-sum
term.

## Mathematical Definition & Optimization Challenges

The mathematical formulation behind X-Risk Optimization is defined as follows:

$$\min _{\mathbf{w} \in \mathbb{R}^d} F(\mathbf{w})=\frac{1}{|\mathcal{S}|} \sum_{\mathbf{z}_i \in \mathcal{S}} f_i\left(g\left(\mathbf{w}; \mathbf{z}_i, \mathcal{S}_i\right)\right)$$

The gradient of the above objective involves computing \(\nabla f_i(g(\mathbf{w}, \mathbf{z}_i, \mathcal{S}_i))\nabla g(\mathbf{w}, \mathbf{z}_i, \mathcal{S}_i)\). The challenge of computing this gradient lies at the inner function \(g(\mathbf{w}, \mathbf{z}_i, \mathcal{S}_i)\) and its gradient \(\nabla g(\mathbf{w}, \mathbf{z}_i, \mathcal{S}_i)\) are not easy to compute since \(\mathcal{S}_i\)could be a large set; the gradient of inner function \(\nabla g(\mathbf{w}, \mathbf{z}_i, \mathcal{S}_i)\) might also involve implicit gradient. Hence, proper estimators for \(g(\mathbf{w}, \mathbf{z}_i, \mathcal{S}_i)\) and \(\nabla g(\mathbf{w}, \mathbf{z}_i, \mathcal{S}_i)\) are needed to ensure the convergence of optimization. As a result, the training pipeline for X-risk optimization needs to be designed specially to ensure the algorithms converge.

## LibAUC Training Pipeline

The LibAUC training pipeline is shown below. It has two unique
components, namely **Controlled Data Sampler** and
**Dynamic Mini-batch Loss**, which are highlighted in
yellow.

## Loss & Optimizer

We present a list of names of dynamic losses, their corresponding controlled data samplers and optimizer wrappers in the LibAUC library, along with references in the following Table.

Loss Function | Data Sampler | Optimizer Wrapper | Reference |
---|---|---|---|

AUCMLoss | DualSampler | PESG | yuan2021large |

CompositionalAUCLoss | DualSampler | PDSCA | yuan2022compositional |

APLoss | DualSampler | SOAP | qi2021stochastic |

pAUCLoss('1w') | DualSampler | SOPAs | zhu2022auc |

pAUCLoss('2w') | DualSampler | SOTAs | zhu2022auc |

MultiLabelAUCMLoss | TriSampler | PESG | yuan2023libauc |

mAPLoss | TriSampler | SOTAs | yuan2023libauc |

MultiLabelpAUCLoss | TriSampler | SOPAs | yuan2023libauc |

NDCGLoss | TriSampler | SONG | qiu2022largescale |

NDCGLoss(topk=5) | TriSampler | SONG | qiu2022largescale |

ListwiseCELoss | TriSampler | SONG | qiu2022largescale |

GCLoss('unimodal') | RandomSampler | SogCLR | yuan2022provable |

GCLoss('bimodal') | RandomSampler | SogCLR | yuan2022provable |

GCLoss('unimodal',enable_isogclr=True) | RandomSampler | iSogCLR | qiu2023provable |

GCLoss('bimodal', enable_isogclr=True) | RandomSampler | iSogCLR | qiu2023provable |

MIDAMLoss('attention') | DualSampler | MIDAM | zhu2023provable |

MIDAMLoss('softmax') | DualSampler | MIDAM | zhu2023provable |

## History of LibAUC

The development of the library originated from a project in the OptMAI Lab at the University of Iowa, led by Zhuoning Yuan under the supervision of Dr. Tianbao Yang, focusing on deep AUROC maximization. Zhuoning made original and significant contributions by achieving 1st place in the Stanford CheXpert Competition in 2020, demonstrating the success of deep AUROC maximization. This success prompted the decision to develop the library, which is why it is named the LibAUC library. The first version was released in Spring 2021.

In Spring 2021, the OptMAI lab collaborated with Dr. Shuiwang Ji's research group to explore deep AUPRC maximization for improving classification performance in the MIT AICURES Challenge. Later, the finite-sum coupled compositional optimization framework for AUPRC maximization was extended to solving a broad range of problems. Additionally, multi-block bilevel optimization techniques were introduced for optimizing top-K performance measures.

In June 2022, a major update was implemented, incorporating optimization algorithms for AP, NDCG, partial AUC, and global contrastive loss into the library. During this period, Dixian Zhu, Gang Li, and Zi-Hao Qiu joined the development team, with Zhuoning Yuan continuing to lead the development. In Summer 2022, the OptMAI lab moved to Texas A&M University, and Yang introduced the generic X-risk optimization framework, expanding the library's scope from AUC maximization to X-risk optimization. However, the name LibAUC was retained.

In June 2023, the team conducted another significant update to the library, including enhancements to the codebase, launch of a new documentation website and the debut of a redesigned logo. This release also incorporated two additional algorithms to the library, namely iSogCLR for contrastive learning, and MIDAM for multi-instance deep AUC maximization.

## Acknowledgments

Acknowledgments go to other students who made original contributions, such as Qi Qi, who conducted early studies on AP maximization, Bokun Wang, who improved convergence analysis of finite-sum coupled compositional optimization, Zhishuai Guo, who contributed original convergence analysis of compositional AUROC maximization, and Quanqi Hu, who performed original convergence analysis of multi-block bilevel optimization.

Recognition is also given to previous lab members, including Dr. Mingrui Liu, who conducted original analysis of minimax optimization for AUROC maximization, Dr. Yan Yan, who simplified the minimax optimization algorithms and improved their analysis, and Yongjian Zhong, who conducted some experiments.

The team expresses gratitude to collaborators, including Dr. Milan Sonka (IEEE Fellow, University of Iowa), Dr. Nitesh Chawla (IEEE Fellow, ACM Fellow, University of Notre Dame), Dr. Shuiwang Ji (IEEE Fellow, Texas A&M University), Dr. Jiebo Luo (IEEE Fellow, ACM Fellow, University of Rochester), Dr. Xiaodong Wu (University of Iowa), Dr. Qihang Lin (University of Iowa), Dr. Yiming Ying (University at Albany), Dr. Denny Zhou (Google Brain), Dr. Xuanhui Wang (Google), Dr. Rong Jin (Alibaba Group), Dr. Yi Xu (Alibaba Group), Dr. Yuexin Wu (Google), Dr. Xianzhi Du (Google), Dr. Lijun Zhang (Nanjing University), Yao Yao (University of Iowa), Guanghui Wang (Georgia Tech), Youzhi Luo (Texas A&M University), Zhao Xu (Texas A&M University).

The OptMAI Lab will continue to maintain and update the library. We welcome students and researchers to collaborate on this exciting journey.