Please use this identifier to cite or link to this item:
http://hdl.handle.net/123456789/29233
Title: | MULTI-CLASS TEXT CLASSIFICATION USING PRE-TRAINED BERT MODEL |
Authors: | Haider Zaman Khan |
Keywords: | Electronics |
Issue Date: | 2023 |
Publisher: | Quaid I Azam University Islamabad |
Abstract: | Natural Language Processing (NLP) has gained immense popularity due to its fundamental application in automatically categorizing text documents into predefined groups or classes, en abling the extraction of valuable insights from unstructured text data. This research aims to create a robust and efficient text classification model using state-of-the-art techniques while minimizing time and resource requirements. Leveraging the Transformer framework, specif ically the BERT model for NLP tasks, this work focuses on augmenting pre-trained BERT models with cross-validation techniques for text classification. The model is fine-tuned using the 20 Newsgroups dataset, comprising both preprocessed and raw datasets with 20 distinct classes each. The achieved accuracy rates of 92% for the preprocessed dataset and 90% for the raw dataset highlight the superiority of pre-trained BERT models in real-world text clas sification challenges. This research contributes by exploring the synergy of BERT models with cross-validation and fine-tuning strategies, aiming to outperform traditional baseline models in multiclass text classification tasks. It also includes 5-fold cross-validation for efficient perfor mance of text classification, mitigating overfitting and class imbalance issues, resulting in better accuracy with low computational resources. |
URI: | http://hdl.handle.net/123456789/29233 |
Appears in Collections: | M.Phil |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ELE 568.pdf | ELE 568 | 1.41 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.