Image Classification as an Image Mining Task : A Machine Learning Approach
Loading...
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Sciences
Abstract
In today’s digital world, the volume of data textual, visual, and audio has increased
exponentially, significantly transforming the way information is processed and analyzed across
domains. This massive growth has driven researchers to explore advanced methods for
extracting useful knowledge, giving rise to the field of Data Mining. Initially focused on
structured data from relational databases, data mining has evolved to handle complex data types
such as images, videos, and data from social networks, ushering in the Big Data era. Big Data
is characterized by five key properties: volume, variety, velocity, veracity, and value each
requiring tailored techniques for effective processing.
This thesis focuses on image classification, a core task within the specialized field of Image
Mining, which itself extends traditional data mining techniques to visual content. Image mining
involves extracting meaningful patterns from large image collections by analyzing their visual
characteristics and semantic content. These patterns can then be used for various tasks, such as
classification, similarity detection, and anomaly identification.
To address the image classification problem, this research explores two primary approaches:
• Machine learning, using classifiers such as K-Nearest Neighbors (KNN), Support
Vector Machines (SVM), Naive Bayes, Decision Trees, and Random Forests.
• Deep learning, particularly Convolutional Neural Networks (CNNs), which
automatically learn hierarchical representations from image data.
A comparative study was conducted using two datasets with distinct characteristics:
• Corel-1K, a general-purpose dataset of 1,000 images grouped into 10 semantic
categories,
• KimiaPath960, a medical dataset composed of digital pathology images.
The experiments tested the performance of the classifiers in terms of accuracy and execution
time using five feature extraction techniques: Local Binary Patterns (LBP), Haralick
descriptors, HSV histograms, Color Moments, and Fourier Descriptors.
The key findings are:
1. The choice of classifier, feature type, color space, and dataset structure all significantly
influence classification performance.
4
2. CNNs were more effective on semantically rich images (Corel-1K) than on semantically
poor ones (KimiaPath960).
3. Certain images are inherently easier to classify due to the richness or simplicity of their
visual and semantic content.
This research contributes to a deeper understanding of how intelligent systems can extract and
classify visual data effectively using both classical and deep learning-based techniques.