The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order (Equation presented) (ignoring logarithmic factors), where αε and δε are graph-theoretic quantities measured on the support of the stochastic feedback graph G with edge probabilities thresholded at ε. Our result, which holds without any preliminary knowledge about G, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.
Learning on the Edge: Online Learning with Stochastic Feedback Graphs / Esposito, E.; van der Hoeven, D.; Fusco, F.; Cesa-Bianchi, N.. - 35:(2022). (Intervento presentato al convegno Advances in Neural Information Processing Systems (was NIPS) tenutosi a New Orleans; USA).
Learning on the Edge: Online Learning with Stochastic Feedback Graphs
Fusco F.
;
2022
Abstract
The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order (Equation presented) (ignoring logarithmic factors), where αε and δε are graph-theoretic quantities measured on the support of the stochastic feedback graph G with edge probabilities thresholded at ε. Our result, which holds without any preliminary knowledge about G, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.File | Dimensione | Formato | |
---|---|---|---|
Esposito_preprint_Learning_2022.pdf
accesso aperto
Note: https://proceedings.neurips.cc/paper_files/paper/2022/file/e0e956681b04ac126679e8c7dd706b2e-Paper-Conference.pdf
Tipologia:
Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
574.14 kB
Formato
Adobe PDF
|
574.14 kB | Adobe PDF | |
Esposito_Learning_2022.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
490.32 kB
Formato
Adobe PDF
|
490.32 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.