Publications

(* denotes equal contribution, † denotes corresponding author)

Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits
Pu Wang, Yao-Xiang Ding
Preprint: arXiv:2606.01799
tl;dr: We propose Tree-Guided Identify-Then-Exploit (TG-ITE), the first unified framework for N-armed dueling bandits that jointly handles best arm identification (BAI), weak regret, and strong regret under only the Condorcet-winner assumption. A shared identification primitive, TreeAscent, exploits a tree-based tournament decomposition to locate the Condorcet winner, paired with objective-specific exploitation strategies that achieve bounds closely matching the state of the art.