Topic detection technology can automatically discover new topics on the Internet. This paper investigates domain-oriented feature extraction methods, and proposes a keyword feature extraction method ITFIDF-LP, a subject word feature extraction method LDA-SLP and a topic clustering model based on vector product similarity. A novel Domain-oriented Topic Discovery based on Features Extraction and Topic Clustering (DTD-FETC) model is proposed to analyze open source web of a domain and identify emerging topics in the domain in real time. This article describes a DTD-FETC system built for cyber security domain. It filters and aggregates web for specical security threat topics such as vulnerability and malware, and helps security staff respond quickly and defends against the emerging cyber threats as early as possible. The recall rate, accuracy and F1 value results of the DTD-FETC method applied to the cyber security dataset are all above 0.99.
Domain-oriented topic discovery based on features extraction and topic clustering / Lu, X.; Zhou, X.; Wang, W.; Lio, P.; Hui, P.. - In: IEEE ACCESS. - ISSN 2169-3536. - 8:(2020), pp. 93648-93662. [10.1109/ACCESS.2020.2994516]
Domain-oriented topic discovery based on features extraction and topic clustering
Lio P.;
2020
Abstract
Topic detection technology can automatically discover new topics on the Internet. This paper investigates domain-oriented feature extraction methods, and proposes a keyword feature extraction method ITFIDF-LP, a subject word feature extraction method LDA-SLP and a topic clustering model based on vector product similarity. A novel Domain-oriented Topic Discovery based on Features Extraction and Topic Clustering (DTD-FETC) model is proposed to analyze open source web of a domain and identify emerging topics in the domain in real time. This article describes a DTD-FETC system built for cyber security domain. It filters and aggregates web for specical security threat topics such as vulnerability and malware, and helps security staff respond quickly and defends against the emerging cyber threats as early as possible. The recall rate, accuracy and F1 value results of the DTD-FETC method applied to the cyber security dataset are all above 0.99.File | Dimensione | Formato | |
---|---|---|---|
Lu_Domain_2020.pdf
accesso aperto
Note: DOI10.1109/ACCESS.2020.2994516
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
9.22 MB
Formato
Adobe PDF
|
9.22 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.