QUERY EXPANSION FOR AFAAN OROMO INFORMATION RETRIEVAL USING AUTOMATIC THESAURUS
No Thumbnail Available
Date
2021-03-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Hawassa University
Abstract
Recently, the amount of textual information written in Afaan Oromo language is increasing
dynamically. Likewise, the need to access the information also increases. But, it is difficult to
retrieve and satisfy one`s own information need, because of the inability of the users to formulate
a good query and the terminological variation or term mismatching among the world of readers
and the world of authors. Hence, query expansion is an effective mechanism to reduce term
mismatching problems and also to improve the retrieval performance of IR systems. The idea
behind query expansion is to reformulate the user’s original query by adding related terms. In this
study, an automatic Afaan Oromo thesaurus is constructed from manually collected documents.
After the text preprocessing tasks are performed on the document corpus, the preprocessed words
are vectorized in multidimensional space by using Word2Vec`s skip-gram model. In which, words
that share similar context have similar vector representation. Then cosine similarity measure was
applied to construct the thesaurus. A one-to-many association approach was employed to select
expansion terms. Hence top five terms that have the highest similarity score with the entire query
were selected from the thesaurus and added to the original query of the user for query expansion.
Then the reformulated query was used to retrieve more relevant documents.
Experimentations were performed to observe the quality of the constructed thesaurus and the effect
of integrating query expansion into the Afaan Oromo IR system. The result shows that the
constructed thesaurus generates related terms with average relatedness accuracy of 62.1%. On
the other hand, the integration of query expansion registered performance improvement by 14.3
% recall, 2.9 % F-measure, and performance decrement of 5.5% for precision
Description
Keywords
Query expansion, information retrieval, thesaurus, Word2Vec, skip-gram
