An Artificial Ant-Based Approach Using Polynomial Algorithms to Tackle the Text Aspect of Clustering Web Pages
Abstract
Nowadays, the web clustering problem represents a scalable research area, which is based on deep study and efficient analysis of the user's browsing behavior. Managing huge amounts of unstructured data that are given through web pages is described as a hard and primary task. In this article, we analyze clusters by grouping users based on the similarity of the web pages they have visited. Our work focuses on cleaning, analyzing, and clustering web data to facilitate users’ access to relevant content. Thus, we propose a novel algorithm, called WCLARTANT, to cluster WEB pages, which consists of finding groups of sessions according to the corresponding Web access patterns. We propose a new approach based on the ANTTREE algorithm, inspired from the self-assembling behavior observed in real ants and the binary search tree concept. The combination that we present in our approach is applied for the first time in web usage mining clustering. More precisely, different topologies are built in terms of different similarity measures, such as SBS, Euclidean, Jaccard and Cosine. Afterward, the clusters are extracted from the binary tree, which is built by the prefix depth algorithm. In other words, the proposed algorithms in this manuscript provide the corresponding binary tree to the sessions' matrix, where each node models a WEB session and each branch represents a cluster. In addition, we use the Silhouette index to evaluate and to analyze the clustering performance of WCLARTANT relative to the DBScan algorithm. WClArtAnt combined with the similarity measure SBS provides the best results compared to DBScan. The performance of our algorithm varies between 0.62 and 0.39, which are considered good. The considered log files are coming from NASA and contain all HTTP requests for a month period, from 1st July, 1995, to 31st July, 1995, for a total of 65,194 entries.
Article Metrics
Abstract: 21 Viewers PDF: 14 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
ISSN | : | 2723-6471 (Online) |
Organized by | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
Website | : | http://bright-journal.org/JADS |
: | taqwa@amikompurwokerto.ac.id (principal contact) | |
support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0