Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to Online Indexing

Amir, A.; Franceschini, Gianni; Grossi, R.; Kopelowitz, T.; Lewenstein, M.; Lewenstein, N.

doi:10.1137/110836377

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multidimensional points, multiple-precision numbers, multikey data (e.g., records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree construction can be done in worst case time $O(\log n)$ per input symbol (as opposed to amortized $O(\log n)$ time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves $O(\log n)$ worst case time per input symbol. Searching for a pattern of length $m$ in the resulting suffix tree takes $O(\min(m \log |\Sigma|, m + \log n) + tocc)$ time, where $tocc$ is the number of occurrences of the pattern. The paper also describes more applications and shows how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors, and order maintenance. The technical features of the proposed technique for a given data structure $\mathscr{D}$ are the following ones. The new data structure $\mathscr{D}'$ is obtained from $\mathscr{D}$ by augmenting the latter with an oracle for strings, extending the functionalities of the Dietz--Sleator list for order maintenance [P. F. Dietz and D. D. Sleator, Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, ACM, New York, 1987, pp. 365--372; A. Tsakalidis, Acta Inform., 21 (1984), pp. 101--112]. The space complexity of $\mathscr{D}'$ is $\mathscr{S}(n) + O(n)$ memory cells for storing $n$ keys, where $\mathscr{S}(n)$ denotes the space complexity of $\mathscr{D}$. Then, each operation involving $O(1)$ keys taken from $\mathscr{D}'$ requires $O(\mathscr{T}(n))$ time, where $\mathscr{T}(n)$ denotes the time complexity of the corresponding operation originally supported in $\mathscr{D}$. Each operation involving a key $y$ not stored in $\mathscr{D}'$ takes $O(\mathscr{T}(n) + |y|)$ time, where $|y|$ denotes the length of $y$. For the special case where the oracle handles suffixes of a string, the achieved insertion time is $O(\mathscr{T}(n))$.

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to Online Indexing / A., Amir; Franceschini, Gianni; R., Grossi; T., Kopelowitz; M., Lewenstein; N., Lewenstein. - In: SIAM JOURNAL ON COMPUTING. - ISSN 0097-5397. - STAMPA. - 43:4(2014), pp. 1396-1416. [10.1137/110836377]

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to Online Indexing

A. Amir;FRANCESCHINI, GIANNI;R. Grossi;T. Kopelowitz;M. Lewenstein;N. Lewenstein

2014

Abstract

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multidimensional points, multiple-precision numbers, multikey data (e.g., records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree construction can be done in worst case time $O(\log n)$ per input symbol (as opposed to amortized $O(\log n)$ time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves $O(\log n)$ worst case time per input symbol. Searching for a pattern of length $m$ in the resulting suffix tree takes $O(\min(m \log |\Sigma|, m + \log n) + tocc)$ time, where $tocc$ is the number of occurrences of the pattern. The paper also describes more applications and shows how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors, and order maintenance. The technical features of the proposed technique for a given data structure $\mathscr{D}$ are the following ones. The new data structure $\mathscr{D}'$ is obtained from $\mathscr{D}$ by augmenting the latter with an oracle for strings, extending the functionalities of the Dietz--Sleator list for order maintenance [P. F. Dietz and D. D. Sleator, Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, ACM, New York, 1987, pp. 365--372; A. Tsakalidis, Acta Inform., 21 (1984), pp. 101--112]. The space complexity of $\mathscr{D}'$ is $\mathscr{S}(n) + O(n)$ memory cells for storing $n$ keys, where $\mathscr{S}(n)$ denotes the space complexity of $\mathscr{D}$. Then, each operation involving $O(1)$ keys taken from $\mathscr{D}'$ requires $O(\mathscr{T}(n))$ time, where $\mathscr{T}(n)$ denotes the time complexity of the corresponding operation originally supported in $\mathscr{D}$. Each operation involving a key $y$ not stored in $\mathscr{D}'$ takes $O(\mathscr{T}(n) + |y|)$ time, where $|y|$ denotes the length of $y$. For the special case where the oracle handles suffixes of a string, the achieved insertion time is $O(\mathscr{T}(n))$.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2014
			
	Parole chiave
	
				suffix sorting; text indexing; suffix tree; search trees; strings
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to Online Indexing / A., Amir; Franceschini, Gianni; R., Grossi; T., Kopelowitz; M., Lewenstein; N., Lewenstein. - In: SIAM JOURNAL ON COMPUTING. - ISSN 0097-5397. - STAMPA. - 43:4(2014), pp. 1396-1416. [10.1137/110836377]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/595180

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

20

17

Catalogo dei prodotti della ricerca