The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.

Ontology-Based Search of Genomic Metadata / Fernández, Javier D.; Lenzerini, Maurizio; Masseroli, Marco; Venco, Francesco; Ceri, Stefano. - In: IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. - ISSN 1545-5963. - STAMPA. - 13:2(2016), pp. 233-247. [10.1109/TCBB.2015.2495179]

Ontology-Based Search of Genomic Metadata

LENZERINI, Maurizio
;
2016

Abstract

The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.
2016
Encyclopedia of DNA Elements; Genomic data and knowledge management; Genomic data retrieval; Semantic search; Algorithms; Data Mining; Databases, Genetic; Genomics; User-Computer Interface; Gene Ontology; Metadata; Semantics; Biotechnology; Genetics; Applied Mathematics
01 Pubblicazione su rivista::01a Articolo in rivista
Ontology-Based Search of Genomic Metadata / Fernández, Javier D.; Lenzerini, Maurizio; Masseroli, Marco; Venco, Francesco; Ceri, Stefano. - In: IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. - ISSN 1545-5963. - STAMPA. - 13:2(2016), pp. 233-247. [10.1109/TCBB.2015.2495179]
File allegati a questo prodotto
File Dimensione Formato  
Fernández_Postprint_Ontology-Based_2016.pdf

accesso aperto

Note: http://doi.ieeecomputersociety.org/10.1109/TCBB.2015.2495179
Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.17 MB
Formato Adobe PDF
2.17 MB Adobe PDF
Fernández_Ontology-Based_2016.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 774.61 kB
Formato Adobe PDF
774.61 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/951112
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 10
social impact