Scalability and robustness are not an issue when compression is applied for massive data storage, in the context of distributed computing. Speeding up on-the-fly compression for data transmission is more controversial. In such case, a compression technique merging together an adaptive and a non-adaptive approach has to be considered. A practical implementation of LZW (Lempel, Ziv and Welch) compression, called LZC (C indicates the Unix command 'compress'), has this characteristic. The non-adaptive phases work with bounded size prefix dictionaries built by LZW factorizations during the adaptive ones. Both phases employ the greedy approach. We show a worst case analysis of the greedy approach with respect to the optimal solution decodable by the LZC decompressor and give worst case examples for which the solution cost produced by the distributed implementations reaches the $\Theta(d)$ theoretical upper bound to the optimal cost approximation factor, where $d$ is the dictionary size. We prove that for the worst case examples of the totally adaptive version of LZW compression the approximation factor is $\Theta(\sqrt{d})$. Such analysis suggests that parallelization of on the fly compression is not suitable for highly disseminated data since the non-adaptive phases are too far from optimal. In order to improve the compression effectiveness, we suggest to apply LZMW (Lempel, Ziv, Miller and Wegman) factorization to LZC compression (LZCMW) during the adaptive phases. Such factorization was originally thought with a dictionary bounded by the least recently used strategy. We introduce LZCMW in order to have non-adaptive phases.

Greedy versus optimal analysis of bounded size dictionary compression and on-the-fly distributed computing / DE AGOSTINO, Sergio. - In: DISCRETE APPLIED MATHEMATICS. - ISSN 0166-218X. - 342:(2024), pp. 200-206. [10.1016/j.dam.2023.09.017]

Greedy versus optimal analysis of bounded size dictionary compression and on-the-fly distributed computing

Sergio De Agostino
2024

Abstract

Scalability and robustness are not an issue when compression is applied for massive data storage, in the context of distributed computing. Speeding up on-the-fly compression for data transmission is more controversial. In such case, a compression technique merging together an adaptive and a non-adaptive approach has to be considered. A practical implementation of LZW (Lempel, Ziv and Welch) compression, called LZC (C indicates the Unix command 'compress'), has this characteristic. The non-adaptive phases work with bounded size prefix dictionaries built by LZW factorizations during the adaptive ones. Both phases employ the greedy approach. We show a worst case analysis of the greedy approach with respect to the optimal solution decodable by the LZC decompressor and give worst case examples for which the solution cost produced by the distributed implementations reaches the $\Theta(d)$ theoretical upper bound to the optimal cost approximation factor, where $d$ is the dictionary size. We prove that for the worst case examples of the totally adaptive version of LZW compression the approximation factor is $\Theta(\sqrt{d})$. Such analysis suggests that parallelization of on the fly compression is not suitable for highly disseminated data since the non-adaptive phases are too far from optimal. In order to improve the compression effectiveness, we suggest to apply LZMW (Lempel, Ziv, Miller and Wegman) factorization to LZC compression (LZCMW) during the adaptive phases. Such factorization was originally thought with a dictionary bounded by the least recently used strategy. We introduce LZCMW in order to have non-adaptive phases.
2024
factorization; scalability; distributed; on-the-fly
01 Pubblicazione su rivista::01a Articolo in rivista
Greedy versus optimal analysis of bounded size dictionary compression and on-the-fly distributed computing / DE AGOSTINO, Sergio. - In: DISCRETE APPLIED MATHEMATICS. - ISSN 0166-218X. - 342:(2024), pp. 200-206. [10.1016/j.dam.2023.09.017]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1689266
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact