Monitoring specific features of the enterprises, for example, the adoption of e-commerce, is an important and basic task for several economic activities. This type of information is usually obtained by means of surveys, which are costly due to the amount of personnel involved in the task. An automatic detection of this information would allow consistent savings. This can actually be performed by relying on computer engineering, since in general this information is publicly available on-line through the corporate websites. This work describes how to convert the detection of e-commerce into a supervised classification problem, where each record is obtained from the automatic analysis of one corporate website, and the class is the presence or the absence of e-commerce facilities. The automatic generation of similar data records requires the use of several Text Mining phases; in particular we compare six strategies based on the selection of best words and best n-grams. After this, we classify the obtained dataset by means of four classification algorithms: Support Vector Machines; Random Forest; Statistical and Logical Analysis of Data; Logistic Classifier. This turns out to be a difficult case of classification problem. However, after a careful design and set-up of the whole procedure, the results on a practical case of Italian enterprises are encouraging.

Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms / Bianchi, Gianpiero; Bruni, Renato; Scalfati, Francesco. - In: MATHEMATICAL PROBLEMS IN ENGINEERING. - ISSN 1024-123X. - 2018:(2018), pp. 1-8. [10.1155/2018/7231920]

Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms

Gianpiero Bianchi;Renato Bruni
;
2018

Abstract

Monitoring specific features of the enterprises, for example, the adoption of e-commerce, is an important and basic task for several economic activities. This type of information is usually obtained by means of surveys, which are costly due to the amount of personnel involved in the task. An automatic detection of this information would allow consistent savings. This can actually be performed by relying on computer engineering, since in general this information is publicly available on-line through the corporate websites. This work describes how to convert the detection of e-commerce into a supervised classification problem, where each record is obtained from the automatic analysis of one corporate website, and the class is the presence or the absence of e-commerce facilities. The automatic generation of similar data records requires the use of several Text Mining phases; in particular we compare six strategies based on the selection of best words and best n-grams. After this, we classify the obtained dataset by means of four classification algorithms: Support Vector Machines; Random Forest; Statistical and Logical Analysis of Data; Logistic Classifier. This turns out to be a difficult case of classification problem. However, after a careful design and set-up of the whole procedure, the results on a practical case of Italian enterprises are encouraging.
2018
Machine Learning; Big Data; Data Engineering; Data Mining
01 Pubblicazione su rivista::01a Articolo in rivista
Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms / Bianchi, Gianpiero; Bruni, Renato; Scalfati, Francesco. - In: MATHEMATICAL PROBLEMS IN ENGINEERING. - ISSN 1024-123X. - 2018:(2018), pp. 1-8. [10.1155/2018/7231920]
File allegati a questo prodotto
File Dimensione Formato  
Bianchi_Identifying_2018.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.4 MB
Formato Adobe PDF
1.4 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1164997
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 9
social impact