Image Classification as an Image Mining Task : A Machine Learning Approach

KIFADJI ,Taissir El Amel; BOUCHEHAM ,Bachir

Image Classification as an Image Mining Task : A Machine Learning Approach

Files

M-006-00022-1.pdf (3.41 MB)

Date

2025

Authors

KIFADJI ,Taissir El Amel

BOUCHEHAM ,Bachir

Publisher

Faculty of Sciences

Abstract

In today’s digital world, the volume of data textual, visual, and audio has increased exponentially, significantly transforming the way information is processed and analyzed across domains. This massive growth has driven researchers to explore advanced methods for extracting useful knowledge, giving rise to the field of Data Mining. Initially focused on structured data from relational databases, data mining has evolved to handle complex data types such as images, videos, and data from social networks, ushering in the Big Data era. Big Data is characterized by five key properties: volume, variety, velocity, veracity, and value each requiring tailored techniques for effective processing. This thesis focuses on image classification, a core task within the specialized field of Image Mining, which itself extends traditional data mining techniques to visual content. Image mining involves extracting meaningful patterns from large image collections by analyzing their visual characteristics and semantic content. These patterns can then be used for various tasks, such as classification, similarity detection, and anomaly identification. To address the image classification problem, this research explores two primary approaches: • Machine learning, using classifiers such as K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Naive Bayes, Decision Trees, and Random Forests. • Deep learning, particularly Convolutional Neural Networks (CNNs), which automatically learn hierarchical representations from image data. A comparative study was conducted using two datasets with distinct characteristics: • Corel-1K, a general-purpose dataset of 1,000 images grouped into 10 semantic categories, • KimiaPath960, a medical dataset composed of digital pathology images. The experiments tested the performance of the classifiers in terms of accuracy and execution time using five feature extraction techniques: Local Binary Patterns (LBP), Haralick descriptors, HSV histograms, Color Moments, and Fourier Descriptors. The key findings are: 1. The choice of classifier, feature type, color space, and dataset structure all significantly influence classification performance. 4 2. CNNs were more effective on semantically rich images (Corel-1K) than on semantically poor ones (KimiaPath960). 3. Certain images are inherently easier to classify due to the richness or simplicity of their visual and semantic content. This research contributes to a deeper understanding of how intelligent systems can extract and classify visual data effectively using both classical and deep learning-based techniques.

URI

http://dspace.univ-skikda.dz:4000/handle/123456789/5326

Collections

Informatique

Full item page