Contribution au Clustering de données : Adéquation d’approches et défis de domaines d’application
Loading...
Date
2024-04-30
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Université 20 août 1955 Skikda
Abstract
We are swamped in our daily lives by an ever-growing torrent of information, handled by
intricate systems with insatiable demands for speed and efficiency. Gleaning knowledge and
insights from this deluge requires adaptable techniques that flex with the harsh, ever-shifting
realities of data processing. Data clustering, though classified as NP-Hard problem, shines
as one such technique. To tackle these computationally prohibitive problems, we turn to
metaheuristics, inspired by the marvels in the living world. These approximate solutions
allow us to stay ahead of the curve, continually proposing new approaches and optimizing
and combining existing ones to achieve high-performance, quality clustering.
This thesis offers a short comprehensive exploration of data clustering, delving into
its fundamental concepts, motivations, and diverse applications across scientific fields. It
provides an in-depth analysis of the state-of-the-art in data clustering, encompassing its
objectives, applications, techniques, and the various measures used to evaluate clustering
results. Furthermore, the thesis introduces optimization methods inspired by real-world
phenomena like genetic algorithms, rat swarm optimizer, and grey wolf optimizer, and
explores their potential for effective data clustering.
This work makes two notable contributions: the Rat Swarm Optimizer for Data Clustering
(RSOC) and the Enhanced Grey Wolf Optimizer for Data Clustering (EGWAC). Both aim to
address the limitations of some existing clustering techniques and enhance their performance.
RSOC adapts the swarm intelligence metaheuristic RSO to data clustering challenges.
It leverages its ability to escape local optima and premature convergence while exploring
a broad solution space to find optimal cluster centers. The discussion section showcases
RSOC’s performance on diverse datasets using various measures, including homogeneity,
completeness, v-measure, purity, and error rate. Comparisons with state-of-the-art and recent
algorithms demonstrate RSOC’s adaptability and superior performance in most cases.
The second contribution (EGWAC) addresses an identifies the issue in the position updation mechanism of the original Grey Wolf Algorithm-based Clustering technique (GWAC).
This issue arises from treating the order of cluster centers as significant for finding clusters,
when in reality it only affects wolf position updates and can lead to inaccurate solutions.
EGWAC optimizes this key element. Experiments on various data clustering benchmarksviii
comparing EGWAC against GWAC and other well-known algorithms, using measures like
precision, recall, g-measure, purity, and entropy, demonstrate its overall capability to identify
optimal clusters.
Description
Keywords
clustering, hard clustering,, Contribution to Data Clustering