CONTEXT-BASED SPELL CHECKER FOR SIDAAMU AFOO USING HYBRID APPROACH
No Thumbnail Available
Date
2024-04-08
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Hawassa University
Abstract
Spellcheck involves identifying and suggesting corrections for incorrectly spelled words within the
text. Its integration spans various applications such as digitally correcting handwritten text, aiding
user word corrections during retrieval, and more. This thesis outlines the creation,
implementation, and assessment of a model intended to rectify both non-word and real-word
errors. The central objective of this research is to devise a context-based spellchecker for Sidaamu
afoo. This system relies on the language's error patterns, deduced from word sequences within
input sentences. The chosen technique for this spellchecking entails an unsupervised statistical
method, which is particularly beneficial for languages like Sidaamu afoo by enabling analysis
without the need for extensive tagged datasets. The process of rectifying spelling unfolds through
distinct phases: identifying errors, proposing potential corrections, and arranging these
suggestions by priority. Error identification hinges on a combination of dictionary lookup and
bigram analysis. Data for the dictionary and Bigram model, essential for error detection and
correction, were collected from diverse sources by the researcher. Addressing non-word errors
involves computing the similarity between the misspelled word and tokens in the dictionary,
measured using the Levenshtein distance, resulting in ranking and correction suggestions. In cases
of real-word errors, bigram frequency aids in error detection, while bigram probability informs
the correction process for misspelled words. The experimental phase encompassed the utilization
of 52,093 tokens and 5,788 tokens for model learning and testing, respectively. The outcome
revealed a spellchecker recall score of 92.4% and an accuracy rate of 92.5% for both non-word
and real-word errors. These findings, aligned with the gated result accuracy of 92.5%, underscore
the system's capability to rectify Sidaamu afoo misspellings. Future enhancements could explore
advanced neural architectures to improve model quality further
Description
Keywords
context-based spellchecker, Levenshtein distance, bigram model, real-word errors, non-word errors, N-gram methods, dictionary lookup
