APPLICATION OF HYBRID APPROACH FOR WOLAITA LANGUAGE PART OF SPEECH TAGGING

BIRHANESH FIKRE SHIRKO

APPLICATION OF HYBRID APPROACH FOR WOLAITA LANGUAGE PART OF SPEECH TAGGING

Files

Thesis-final-BirhaneshFikre.pdf (1.72 MB)

Date

2020-03-24

Authors

BIRHANESH FIKRE SHIRKO

Publisher

Hawassa University

Abstract

The aim of this research is to develop part-of-speech tagger for Wolaita Language using hybrid approach. Part of speech tagger is one of the subtasks in natural language processing (NLP) applications which is vital for other NLP tasks, like parser, machine translator, speech recognizer and search engines. It is a process of labeling a corresponding part of speech (PoS) tag for a word that defines how the word is used in a sentence. The PoS tagging for Wolaita language is not sufficient yet to be used as one important component in other natural language processing (NLP) applications. In this thesis, the development of part of speech tagger using hybrid approach that combines HMM and transformation based learning approaches is conducted for Wolaita language. In general, HMM model needs large data to increase the performance and the transformation based learning model learn rule based on the language features. The HMM model tags the words based on the optimal path for a given sequence of words and transformation based learning is a rule based model that tag the words based on rules; it learns rule directly from the training corpus without expert knowledge. The developed hybrid approach of Wolaita language PoS tagger uses HMM tagger as initial annotators and the rule based tagger as a corrector based on fixed threshold values. For implementation and experiment, the author used python programming and NLTK. For training and testing the model, 14,358 untagged Wolaita language words are collected from three different categories (Bible, Social media in Wolaita language (Wogetta FM 96.6) and Wolaita language department). The annotation of corpus performed manually by two language experts. For tagging purpose 26 PoS tag are identified based on the work of Berhanu H., work of wakasa (2008) and with help of language experts. From the entire corpus, 90% is used for training and the remaining 10% is used for testing purpose. The performance of the three taggers is tested by using different experiments. After experiment the researcher found that the performance of HMM, rule based and hybrid taggers shows 88.14%, 92.96% and 94.82% respectively. Generally, hybrid approach showed the better performance to assigning part of speech tag for Wolaita language sentences

Keywords

NLP, HMM, TBL, NLTK and Hybrid

URI

https://etd.hu.edu.et/handle/123456789/1402

Collections

Computer Science

Full item page

APPLICATION OF HYBRID APPROACH FOR WOLAITA LANGUAGE PART OF SPEECH TAGGING

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By