Data Collection and Preprocessing in Web Usage Mining: Implementation and Analysis

Data collection and data preprocessing are crucial stages in web usage mining, mainly because of the unstructured, diverse, and noisy nature of log data. During data collection, log file datasets are loaded and merged. Effective and comprehensive data preprocessing plays a vital role in ensuring the...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammed Ali Mohammed, Rula A. Hamid, Reem Razzaq AbdulHussein
Format: Article
Language:Arabic
Published: University of Information Technology and Communications 2024-11-01
Series:Iraqi Journal for Computers and Informatics
Subjects:
Online Access:https://ijci.uoitc.edu.iq/index.php/ijci/article/view/486
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data collection and data preprocessing are crucial stages in web usage mining, mainly because of the unstructured, diverse, and noisy nature of log data. During data collection, log file datasets are loaded and merged. Effective and comprehensive data preprocessing plays a vital role in ensuring the efficiency and scalability of algorithms used in the pattern discovery phase of web usage mining. This work aims to address these phases by introducing two innovative approaches. The first approach focuses on determining the device used for accessing the web, distinguishing between computers and mobile devices. The second approach aims to determine user sessions and complete paths by utilizing the referrer URL. The entire preprocessing pipeline has been implemented using the C# programming language, and the source code is available on GitHub at the following link: https://github.com/Mohammed91/Web-Usage-Mining.
ISSN:2313-190X
2520-4912