A MACHINE LEARNING-BASED FRAMEWORK FOR AUTOMATED MULTI-VIEW DOCUMENT STRUCTURE CLASSIFICATION

Authors

  • T.Vani Author

Keywords:

supervised, unsupervised and semi supervised

Abstract

In multi-view document classification, various machine learning approaches such as supervised, unsupervised, and semi-supervised different techniques have been applied in existing systems. To effectively categorize document objects, it is essential to first extract background knowledge and metadata from the documents. Different machine learning algorithms contribute distinct classification methods based on features such as short text content, metadata, and heading structures. Typically, an expert can determine whether a document follows a supervised, unsupervised, or semi-supervised learning approach by reading and analyzing its structure. However, this manual process can be time-consuming and prone to ambiguity. To address these challenges, we propose an IDS (Identifying Document Structure) model — a machine learning-based approach for automated identifying document structure classification and tries to categorize according to the document. In this model, keywords are trained using labeled data, while clustering techniques are employed to handle unsupervised data. A combination of both methods is used for semi-supervised classification. We split the dataset into 60% for training and 40% for testing, demonstrating improved classification performance and efficiency compared to existing techniques.

Downloads

Published

2025-10-01

Issue

Section

Articles