The bandits with knapsacks (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio α when B = Ω(T) or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent Õ(T1/2) regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.

Bandits with Replenishable Knapsacks: the Best of both Worlds / Bernasconi, M.; Castiglioni, M.; Celli, A.; Fusco, F.. - (2024). ( 12th International Conference on Learning Representations, ICLR 2024 Vienna ).

Bandits with Replenishable Knapsacks: the Best of both Worlds

Celli A.
;
Fusco F.
2024

Abstract

The bandits with knapsacks (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio α when B = Ω(T) or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent Õ(T1/2) regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.
2024
12th International Conference on Learning Representations, ICLR 2024
Bandits with Knapsack; online learning
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Bandits with Replenishable Knapsacks: the Best of both Worlds / Bernasconi, M.; Castiglioni, M.; Celli, A.; Fusco, F.. - (2024). ( 12th International Conference on Learning Representations, ICLR 2024 Vienna ).
File allegati a questo prodotto
File Dimensione Formato  
Bernasconi_Bandits-with_Replenishabl_2024.pdf

accesso aperto

Note: https://openreview.net/forum?id=yBIJRIYTqa&nesting=2&sort=date-desc
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 379.49 kB
Formato Adobe PDF
379.49 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1717196
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact