Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms

Bianchi, Gianpiero; Bruni, Renato; Scalfati, Francesco

doi:10.1155/2018/7231920

Monitoring specific features of the enterprises, for example, the adoption of e-commerce, is an important and basic task for several economic activities. This type of information is usually obtained by means of surveys, which are costly due to the amount of personnel involved in the task. An automatic detection of this information would allow consistent savings. This can actually be performed by relying on computer engineering, since in general this information is publicly available on-line through the corporate websites. This work describes how to convert the detection of e-commerce into a supervised classification problem, where each record is obtained from the automatic analysis of one corporate website, and the class is the presence or the absence of e-commerce facilities. The automatic generation of similar data records requires the use of several Text Mining phases; in particular we compare six strategies based on the selection of best words and best n-grams. After this, we classify the obtained dataset by means of four classification algorithms: Support Vector Machines; Random Forest; Statistical and Logical Analysis of Data; Logistic Classifier. This turns out to be a difficult case of classification problem. However, after a careful design and set-up of the whole procedure, the results on a practical case of Italian enterprises are encouraging.

Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms / Bianchi, G., Bruni, R., Scalfati, F.. - In: MATHEMATICAL PROBLEMS IN ENGINEERING. - ISSN 1024-123X. - 2018:(2018), pp. 1-8. [10.1155/2018/7231920]

Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms

Gianpiero Bianchi;Renato Bruni;Francesco Scalfati^Software

2018

Abstract

Monitoring specific features of the enterprises, for example, the adoption of e-commerce, is an important and basic task for several economic activities. This type of information is usually obtained by means of surveys, which are costly due to the amount of personnel involved in the task. An automatic detection of this information would allow consistent savings. This can actually be performed by relying on computer engineering, since in general this information is publicly available on-line through the corporate websites. This work describes how to convert the detection of e-commerce into a supervised classification problem, where each record is obtained from the automatic analysis of one corporate website, and the class is the presence or the absence of e-commerce facilities. The automatic generation of similar data records requires the use of several Text Mining phases; in particular we compare six strategies based on the selection of best words and best n-grams. After this, we classify the obtained dataset by means of four classification algorithms: Support Vector Machines; Random Forest; Statistical and Logical Analysis of Data; Logistic Classifier. This turns out to be a difficult case of classification problem. However, after a careful design and set-up of the whole procedure, the results on a practical case of Italian enterprises are encouraging.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2018
			
	Parole chiave
	
				Machine Learning; Big Data; Data Engineering; Data Mining
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms / Bianchi, G., Bruni, R., Scalfati, F.. - In: MATHEMATICAL PROBLEMS IN ENGINEERING. - ISSN 1024-123X. - 2018:(2018), pp. 1-8. [10.1155/2018/7231920]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Bianchi_Identifying_2018.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 1.4 MB Formato Adobe PDF	1.4 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1164997

Citazioni

ND

15

11

Catalogo dei prodotti della ricerca