Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/29233
Title: MULTI-CLASS TEXT CLASSIFICATION USING PRE-TRAINED BERT MODEL
Authors: Haider Zaman Khan
Keywords: Electronics
Issue Date: 2023
Publisher: Quaid I Azam University Islamabad
Abstract: Natural Language Processing (NLP) has gained immense popularity due to its fundamental application in automatically categorizing text documents into predefined groups or classes, en abling the extraction of valuable insights from unstructured text data. This research aims to create a robust and efficient text classification model using state-of-the-art techniques while minimizing time and resource requirements. Leveraging the Transformer framework, specif ically the BERT model for NLP tasks, this work focuses on augmenting pre-trained BERT models with cross-validation techniques for text classification. The model is fine-tuned using the 20 Newsgroups dataset, comprising both preprocessed and raw datasets with 20 distinct classes each. The achieved accuracy rates of 92% for the preprocessed dataset and 90% for the raw dataset highlight the superiority of pre-trained BERT models in real-world text clas sification challenges. This research contributes by exploring the synergy of BERT models with cross-validation and fine-tuning strategies, aiming to outperform traditional baseline models in multiclass text classification tasks. It also includes 5-fold cross-validation for efficient perfor mance of text classification, mitigating overfitting and class imbalance issues, resulting in better accuracy with low computational resources.
URI: http://hdl.handle.net/123456789/29233
Appears in Collections:M.Phil

Files in This Item:
File Description SizeFormat 
ELE 568.pdfELE 5681.41 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.