Data-Driven Transition-Based Dependency Parsing of Free Word Order Languages

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/30195

Title:	Data-Driven Transition-Based Dependency Parsing of Free Word Order Languages
Authors:	Fatima Tuz Zuhra
Keywords:	Computer Sciences
Issue Date:	2024
Publisher:	Quaid I Azam University Islamabad
Abstract:	A parser is at the heart of any natural language processing (NLP) task e.g. machine translation, text generation, sentiment analysis, part of speech (POS) tagging, named entity recognition and text summarization. The accuracy of this component has a direct impact on the accuracy of any NLP application. There are two types of parsing i.e. constituency parsing and dependency parsing. This research work deals with dependency parsing. The recent successes of neural network models such as multilayer perceptrons (MLPs) and transformers in NLP and other areas of artificial intelligence since 2010 have caused the NLP community to shift its focus towards these models. The accuracy of these models, and any other machine learning models, however, is dependent on how accurately the input words are represented in the input vectors i.e. word embeddings. Hence, word embeddings, which are pre-trained representation of every word in the text, are one of the sole resources for any natural language. Languages under this study, i.e. Arabic, Persian, Urdu and Uyghur lack such resources. Even if such resources exist for these languages, these need improvement as e.g. the word embeddings are learnt from a limited amount of text. Thus, these languages are categorized as low-resourced languages, which is a hurdle in the way of successful NLP applications for such languages. This research work is conducted to achieve an accurate dependency parsing architecture for such languages and to develop language resources for these resourcepoor languages. The set of languages under this study have a rich morphology and have strong agreement patterns. These features need to be part of word representations (word embeddings). This research work has proposed a novel type of word embeddings, called hybrid word embeddings (hybrid embeddings for short), which capture the morphological and agreement patterns as part of word embeddings. These word embeddings have been experimented upon and the results of these experiments have shown an improvement in the accuracy of the dependency parser. We have adopted a transition-based dependency parsing approach for this work. Our experiments suggest that hybrid embeddings cause a significant increase in accuracy and decrease in loss of the parser. We have further elaborated the impact of these morphology and agreement based hybrid word embeddings and have proposed a novel ‘agreement’ layer inside a transformer model, which is the state-of-the-art attention-based model due to its success in large language models (LLMs). The dependency parsing based experiments performed upon the transformer with ‘attention and agreement’ show that the novel ‘agreement’ layer causes improvements in the accuracy and loss of the transformer model.
URI:	http://hdl.handle.net/123456789/30195
Appears in Collections:	Ph.D

Files in This Item:

File	Description	Size	Format
COM 2543.pdf	COM 2543	5.38 MB	Adobe PDF	View/Open

Show full item record