CONTEXT-BASED SPELL CHECKER FOR SIDAAMU AFOO
No Thumbnail Available
Date
2022-03-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Hawassa University
Abstract
A spell checker is one of the applications of natural language processing that is used to detect and
correct spelling errors in written text. Spelling errors that occur in the written text can be non-word
errors or real-word errors. A non-word error is a misspelled word that is not found in the language
and has no meaning whereas a real-word error, that is, the word is a valid word in the language but it
does not fit contextually in the sentence. We designed and implemented a spell checker for Sidaamu
Afoo that can detect and correct both non-word and real-word errors. Sidaamu Afoo is one of the
languages spoken in the Sidaama region in the south-central part of Ethiopia. It is an official working
language and is used as a medium of instruction in primary schools of the Sidaama national regional
state in Ethiopia. To address the issue of spelling errors in the Sidaamu Afoo text, a spell checker is
required.
In this study, the dictionary look-up approach with a hashing algorithm is used to detect non-word
errors, and the character-based encoder-decoder model is used to correct the non-word errors. The
LSTM model with attention mechanism and edit distance is used to detect and correct the context based spelling error. To conduct the experiment, 55440 sentences were used, of which 90% were for
training (i.e., 49,896) and 10% were for testing (i.e., 5544). According to the experimental results, for
an isolated spell checker, dictionary lookup with hashing achieved an accuracy of 93.05%, a recall of
correct words of 91.51%, and a precision of incorrect words of 72.37% for detection. The encoder decoder model achieved a recall of 91.76% for corrections. For a context-sensitive spell checker, the
LSTM model with attention and edit distance achieved an accuracy of 88.8%, recall of the correct
word of 86.18%, and precision of the incorrect word of 62.84% for detection. It achieved a recall
of 74.28% for the correction. The results of the experiment show that the model used to detect and
correct both non-word and real-word spelling errors in Sidaamu Afoo’s written text performed well.
Finally, to improve the performance of the model, we recommend using additional data set and a
state-of-the-art transformer model.
Description
Keywords
Spell checker, isolated word spell checker, context-based spell checker, encoder-decoder model, LSTM, Sidaamu Afoo
