HYBRID FEATURE SELECTION BASED DEEP LEARNING MODEL FOR ENHANCED EMAIL SPAM DETECTION
Keywords:
Email Spam Detection, Deep Learning, Feature Selection, Multilayer Perceptron, Genetic Algorithm, Correlation-Based Filtering, Machine Learning, Spam Classification, Spam Filtering.Abstract
Email has become an indispensable communication tool, but it also faces the persistent issue of spam, which poses a significant threat to user privacy, productivity, and system security. As spam continues to grow and evolve, detecting and filtering it has become increasingly challenging. Traditional spam detection systems, often rule-based, struggle to keep up with sophisticated spam tactics. Recent advancements in machine learning (ML) and deep learning (DL) offer promising solutions, particularly through the integration of feature selection techniques and deep learning models.
This paper proposes a hybrid deep learning model for email spam detection that combines feature selection methods with a multilayer perceptron (MLP) classifier. The hybrid approach begins with correlation-based filtering to eliminate irrelevant features, followed by a genetic algorithm (GA) to select the most informative features. These selected features are then used to train an MLP for spam classification. The approach addresses the challenges of high-dimensional, noisy, and imbalanced datasets, which are common in spam detection tasks.
Experimental results on the Enron email dataset demonstrate the effectiveness of the proposed method. The model outperforms traditional classifiers, such as Naïve Bayes and Support Vector Machines (SVM), as well as other deep learning models, achieving an accuracy of 96.3% and an AUC of 0.985. The integration of feature selection significantly improves performance by reducing overfitting and enhancing generalization. Furthermore, the model's ability to adapt to evolving spam patterns underscores the potential of combining feature selection with deep learning for robust and efficient spam detection. In conclusion, the proposed hybrid model offers a promising solution to the growing challenges of email spam detection, providing both high accuracy and efficiency in real-world applications. Future work could explore real-time classification, multilingual adaptation, and the incorporation of explainable AI techniques to further enhance the model's applicability and transparency.