The bandits with knapsacks (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio α when B = Ω(T) or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent Õ(T1/2) regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.

BANDITS WITH REPLENISHABLE KNAPSACKS: THE BEST OF BOTH WORLDS / Bernasconi, M.; Castiglioni, M.; Celli, A.; Fusco, F.. - (2024). (Intervento presentato al convegno https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html tenutosi a Vienna).

BANDITS WITH REPLENISHABLE KNAPSACKS: THE BEST OF BOTH WORLDS

Celli A.;Fusco F.
2024

Abstract

The bandits with knapsacks (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio α when B = Ω(T) or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent Õ(T1/2) regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.
2024
https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html
Bandits with Knapsack; online learning
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
BANDITS WITH REPLENISHABLE KNAPSACKS: THE BEST OF BOTH WORLDS / Bernasconi, M.; Castiglioni, M.; Celli, A.; Fusco, F.. - (2024). (Intervento presentato al convegno https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html tenutosi a Vienna).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1717196
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact