Xiaoqi Tan
Assistant Professor
Department of Computing Science
University of Alberta
Email: xiaoqi.tan@ualberta.ca
Office: UCOMM 6-121
Research Interests
Algorithms and decision-making under uncertainty, especially online algorithms, economic aspects of algorithms, and learning based on different forms of information access, as well as their broader societal implications in systems and networks shaped by dynamics and strategic behavior.
Selected Recent Publications
Computational Hardness of Reinforcement Learning with Partial qπ-RealizabilityShayan Karimi, and Xiaoqi TanNeurIPS 2025Abstract
This paper investigates the computational complexity of reinforcement learning within a novel linear function approximation regime, termed partial qπ-realizability. In this framework, the objective is to learn an ϵ-optimal policy with respect to a predefined policy set Π, under the assumption that all value functions corresponding to policies in Π are linearly realizable. This framework adopts assumptions that are weaker than those in the qπ-realizability setting yet stronger than those in the q∗-realizability setup. As a result, it provides a more practical model for reinforcement learning scenarios where function approximation naturally arise. We prove that learning an ϵ-optimal policy in this newly defined setting is computationally hard. More specifically, we establish NP-hardness under a parameterized greedy policy set (i.e., argmax) and, further, show that—unless NP = RP—an exponential lower bound (exponential in feature vector dimension) holds when the policy set contains softmax policies, under the Randomized Exponential Time Hypothesis. Our hardness results mirror those obtained in the q∗-realizability settings, and suggest that computational difficulty persists even when the policy class Π is expanded beyond the optimal policy, reinforcing the unbreakable nature of the computational hardness result regarding partial qπ-realizability under two important policy sets. To establish our negative result, our primary technical contribution is a reduction from two complexity problems, δ-Max-3SAT and δ-Max-3SAT(b), to instances of our problem settings: GLinear-κ-RL (under the greedy policy set) and SLinear-κ-RL (under the softmax policy set), respectively. Our findings indicate that positive computational results are generally unattainable in the context of partial qπ-realizability, in sharp contrast to the qπ-realizability setting under a generative access model.
Online Multi-Class Selection with Group Fairness GuaranteeNeurIPS 2025Abstract
We study the online multi-class selection problem with group fairness guarantees, where limited resources must be allocated to sequentially arriving agents. Our work addresses two key limitations in the existing literature. First, we introduce a novel lossless rounding scheme that ensures the integral algorithm achieves the same expected performance as any fractional solution. Second, we explicitly address the challenges introduced by agents who belong to multiple classes. To this end, we develop a randomized algorithm based on a relax-and-round framework. The algorithm first computes a fractional solution using a resource reservation approach---referred to as the extit{set-aside} mechanism---to enforce fairness across classes. The subsequent rounding step preserves these fairness guarantees without degrading performance. Additionally, we propose a learning-augmented variant that incorporates untrusted machine-learned predictions to better balance fairness and efficiency in practical settings.
Online Allocation with Multi-Class Arrivals: Group Fairness vs Individual WelfareFaraz Zargari, Hossein Nekouyan, Bo Sun, and Xiaoqi TanACM SIGMETRICS 2025 [PDF]Abstract
We introduce and study a multi-class online resource allocation problem with group fairness guarantees. The problem involves allocating a fixed amount of resources to a sequence of agents, each belonging to a specific group. The primary objective is to ensure fairness across different groups in an online setting. We focus on three fairness notions: one based on quantity and two based on utility. To achieve fair allocations, we develop two threshold-based online algorithms, proving their optimality under two fairness notions and near-optimality for the more challenging one. Additionally, we demonstrate a fundamental trade-off between group fairness and individual welfare using a novel representative function-based approach. To address this trade-off, we propose a set-aside multi-threshold algorithm that reserves a portion of the resource to ensure fairness across groups while utilizing the remaining resource to optimize efficiency under utility-based fairness notions. This algorithm is proven to achieve the Pareto-optimal trade-off. We also demonstrate that our problem can model a wide range of real-world applications, including network caching and cloud computing, and empirically evaluate our proposed algorithms in the network caching problem using real datasets.
Posted Price Mechanisms for Online Allocation with Diseconomies of ScaleHossein Nekouyan, Bo Sun, Raouf Boutaba, and Xiaoqi TanACM Web Conference (WWW) 2025 [PDF] (Track: Economics, Online Markets and Human Computation)Abstract
This paper addresses the online k-selection problem with diseconomies of scale (OSDoS), where a seller seeks to maximize social welfare by optimally pricing items for sequentially arriving buyers, accounting for increasing marginal production costs. Previous studies have investigated deterministic dynamic pricing mechanisms for such settings. However, significant challenges remain, particularly in achieving optimality with small or finite inventories and developing effective randomized posted price mechanisms. To bridge this gap, we propose a novel randomized dynamic pricing mechanism for OSDoS, providing a tighter lower bound on the competitive ratio compared to prior work. Our approach ensures optimal performance in small inventory settings (i.e., whenis small) and surpasses existing online mechanisms in large inventory settings (i.e., whenis large), leading to the best-known posted price mechanism for optimizing online selection and allocation with diseconomies of scale across varying inventory sizes.
Threshold Policies with Tight Guarantees for Online Selection with Convex CostsXiaoqi Tan, Siyuan Yu, Raouf Boutaba, and Alberto Leon-GarciaACM Transactions on Economics and Computation, vol. 13, no. 2, June 2025. [PDF] (Earlier version was presented in WINE 2023)Abstract
This paper provides threshold policies with tight guarantees for online selection with convex cost (OSCC). In OSCC, a seller wants to sell some asset to a sequence of buyers with the goal of maximizing his/her profit. The seller can produce additional units of the asset, but at non-decreasing marginal costs. At each time, a buyer arrives and offers a price, and the seller must make an immediate and irrevocable decision in terms of whether to accept the offer and produce/sell one unit of the asset to this buyer. The goal is to develop an online algorithm that selects a subset of buyers to maximize the seller’s profit, namely, the total selling revenue minus the total production cost. Our main result is the development of a class of simple threshold policies that are logistically-simple and easy to implement, but have provable optimality guarantees among all deterministic algorithms. We also derive a lower bound on competitive ratios of randomized algorithms, and prove that the competitive ratio of our threshold policy asymptotically converges to this lower bound when the total production output is sufficiently large. Our results generalize and unify various online search, pricing, and auction problems, and provide a new perspective on the impact of non-decreasing marginal costs on real-world online resource allocation problems.
Static Pricing for Online Selection Problem and its VariantsBo Sun, Hossein Nekouyan, Xiaoqi Tan, and Raouf BoutabaWINE 2024 [PDF]Abstract
This paper studies an online selection problem, where a seller seeks to sequentially sell multiple copies of an item to arriving buyers. We consider an adversarial setting, making no modeling assumptions about buyers’ valuations for the items except acknowledging a finite support. In this paper, we focus on a class of static pricing algorithms that sample a price from a pre-determined distribution and sell items to buyers whose valuations exceed the sampled price. Such algorithms are of practical interests due to their advantageous properties, such as ease of implementation and non-discrimination over prices. Our work shows that the simple static pricing strategy can achieve strong guarantees comparable to the best known dynamic pricing algorithms. Particularly, we design the optimal static pricing algorithms for the adversarial online selection problem and its two important variants: the online assignment problem and the online selection with convex cost. The static pricing algorithms can even attain the optimal competitive ratios among all online algorithms for the online selection problem and the online assignment problem. To achieve these results, we propose an economics-based approach in the competitive analysis of static pricing algorithms, and develop a novel representative function-based approach to derive the lower bounds. We expect these approaches will be useful in related problems such as online matching.
Almost Free: Self-concordance in Natural Exponential Families and an Application to BanditsShuai Liu, Alex Ayoub, Flore Sentenac, Xiaoqi Tan, and Csaba SzepesváriNeurIPS 2024 [PDF]Abstract
We prove that single-parameter natural exponential families with subexponential tails are self-concordant with polynomial-sized parameters. For subgaussian natural exponential families we establish an exact characterization of the growth rate of the self-concordance parameter. Applying these findings to bandits allows us to fill gaps in the literature: We show that optimistic algorithms for generalized linear bandits enjoy regret bounds that are both second-order (scale with the variance of the optimal arm's reward distribution) and free of an exponential dependence on the bound of the problem parameter in the leading term. To the best of our knowledge, ours is the first regret bound for generalized linear bandits with subexponential tails, broadening the class of problems to include Poisson, exponential and gamma bandits.
For Prospective Students
I am always looking for highly motivated students at all levels (undergraduate, MSc, and PhD). Below are some useful links to learn more about my research:
SODALab @UofA
page — contains up-to-date information about my research group, including our latest publications, member highlights, and details on how to join depending on your current status.Letter to Prospective Students
— outlines my core values in research and mentoring, as well as guidelines on how best to reach out. If you decide to email me, please review this letter first.CMPUT 676
course page — provides an overview of the research themes that align closely with the ongoing work in my group. Interested students may find it helpful to review the page; the course was most recently offered in Fall 2024.If you’re already at UofA: My group meets weekly at the SODALab Seminar, which is open to anyone interested in attending.
Click here
to view our past and upcoming schedules.If you’ve never been to UofA: Here are some links to help you virtually explore the city of
Edmonton
and the province ofAlberta
.