Cardinality estimation, also known as count-distinct, is the problem of finding the number of different elements in a set with repeated elements. Among the many approximate algorithms proposed for this task, HyperLogLog (HLL) has established itself as the state of the art due to its ability to accurately estimate cardinality over a large range of values using a small memory footprint. When elements arrive in a stream, as in the case of most networking applications, improved techniques are possible. We specifically propose a new algorithm that improves the accuracy of cardinality estimation by grouping counters, and by using their new organization to further track all updates within a given counter size range (compared with just the last update as in the standard HLL). Results show that when using the same number of counters, one configuration of the new scheme reduces the relative error by approximately 0.86x using the same amount of memory as the streaming HLL and another configuration achieves a similar accuracy reducing the memory needed by approximately 0.85x.

More Accurate Streaming Cardinality Estimation With Vectorized Counters / Bruschi, Valerio; Reviriego, Pedro; Pontarelli, Salvatore; Ting, Daniel; Bianchi, Giuseppe. - In: IEEE NETWORKING LETTERS. - ISSN 2576-3156. - 3:2(2021), pp. 75-79. [10.1109/LNET.2021.3076048]

More Accurate Streaming Cardinality Estimation With Vectorized Counters

Pontarelli, Salvatore;
2021

Abstract

Cardinality estimation, also known as count-distinct, is the problem of finding the number of different elements in a set with repeated elements. Among the many approximate algorithms proposed for this task, HyperLogLog (HLL) has established itself as the state of the art due to its ability to accurately estimate cardinality over a large range of values using a small memory footprint. When elements arrive in a stream, as in the case of most networking applications, improved techniques are possible. We specifically propose a new algorithm that improves the accuracy of cardinality estimation by grouping counters, and by using their new organization to further track all updates within a given counter size range (compared with just the last update as in the standard HLL). Results show that when using the same number of counters, one configuration of the new scheme reduces the relative error by approximately 0.86x using the same amount of memory as the streaming HLL and another configuration achieves a similar accuracy reducing the memory needed by approximately 0.85x.
2021
Network monitoring , high speed networks , cardinality , hyperloglog
01 Pubblicazione su rivista::01a Articolo in rivista
More Accurate Streaming Cardinality Estimation With Vectorized Counters / Bruschi, Valerio; Reviriego, Pedro; Pontarelli, Salvatore; Ting, Daniel; Bianchi, Giuseppe. - In: IEEE NETWORKING LETTERS. - ISSN 2576-3156. - 3:2(2021), pp. 75-79. [10.1109/LNET.2021.3076048]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1591643
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact