Paper Misalignment, Learning, and Ranking: Harnessing Users Limited Attention

A central challenge in recommendation systems is incentivizing exploration, encouraging users to select options that help the platform learn the information needed for better future decisions. In some settings, however, the structure of the problem allows exploration to occur naturally, without explicit incentives. Motivated by this observation and by applications in product ranking on online platforms, this study introduces and analyzes the online learning problem of Ranking with Limited-Attention Users. In the proposed model, a platform displays a ranked list of items to a sequence of arriving users. Each user considers only a prefix window of the ranked list and selects their most preferred item within that window. User preferences are known to the platform upon arrival, but item payoffs, such as the likelihood of a purchase, are unknown and must be learned over time as users make selections. Two complementary settings are analyzed. In the first, window sizes are adversarial and payoffs are drawn from a fixed distribution. An active-elimination algorithm is developed that exploits the combinatorial structure of the problem to simultaneously extract high payoffs and gather informative samples. This yields an optimal instance-dependent regret bound of O(log T), supported by matching upper and lower bounds. In the second setting, payoffs are adversarial and window sizes are drawn from a fixed distribution. The study characterizes the set of achievable item selection probabilities induced by ranked lists, showing it admits a compact polynomial-size representation. This structure enables standard adversarial online learning algorithms to operate efficiently over permutations, achieving regret of order √T relative to the best fixed ranking in hindsight.

Get the Paper