Multi-armed bandits may be used for modelling the process of selecting one among different wireless networks, given a set of system constraints typically formed by user-perceived network quality indicators. This work proposes a novel multi-armed bandit, that is made appropriate to the above context by introducing a distinction between two actions, to measure and to use, in order to better reflect real communication application scenarios. The impact of this introduction is analysed through simulations by comparing a traditional multi-armed bandit algorithm against methods that integrate the new concept of measuring vs. using. Results show that performance in terms of regret can be significantly improved using the proposed algorithms if the period needed for measuring is at least 3 times shorter than the one for the using action. The classical method would require a significantly shorter measuring period to reach the same regret, i.e. much stricter constraints on the allowed measure action duration. © 2013 IEEE.

Introducing strategic measure actions in multi-armed bandits / Boldrini, Stefano; Fiorina, Jocelyn; DI BENEDETTO, Maria Gabriella. - (2013), pp. 41-45. (Intervento presentato al convegno 2013 IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC Workshops 2013 tenutosi a London; United Kingdom nel 8 September 2013 through 9 September 2013) [10.1109/pimrcw.2013.6707833].

Introducing strategic measure actions in multi-armed bandits

BOLDRINI, STEFANO;FIORINA, JOCELYN;DI BENEDETTO, Maria Gabriella
2013

Abstract

Multi-armed bandits may be used for modelling the process of selecting one among different wireless networks, given a set of system constraints typically formed by user-perceived network quality indicators. This work proposes a novel multi-armed bandit, that is made appropriate to the above context by introducing a distinction between two actions, to measure and to use, in order to better reflect real communication application scenarios. The impact of this introduction is analysed through simulations by comparing a traditional multi-armed bandit algorithm against methods that integrate the new concept of measuring vs. using. Results show that performance in terms of regret can be significantly improved using the proposed algorithms if the period needed for measuring is at least 3 times shorter than the one for the using action. The classical method would require a significantly shorter measuring period to reach the same regret, i.e. much stricter constraints on the allowed measure action duration. © 2013 IEEE.
2013
2013 IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC Workshops 2013
multi-armed bandit; ucb; exploitation; regret; learning; exploration; wireless network selection
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Introducing strategic measure actions in multi-armed bandits / Boldrini, Stefano; Fiorina, Jocelyn; DI BENEDETTO, Maria Gabriella. - (2013), pp. 41-45. (Intervento presentato al convegno 2013 IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC Workshops 2013 tenutosi a London; United Kingdom nel 8 September 2013 through 9 September 2013) [10.1109/pimrcw.2013.6707833].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/556475
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact