QUERY EXPANSION FOR AFAAN OROMO INFORMATION RETRIEVAL USING AUTOMATIC THESAURUS

No Thumbnail Available

Date

2021-03-05

Journal Title

Journal ISSN

Volume Title

Publisher

Hawassa University

Abstract

Recently, the amount of textual information written in Afaan Oromo language is increasing dynamically. Likewise, the need to access the information also increases. But, it is difficult to retrieve and satisfy one`s own information need, because of the inability of the users to formulate a good query and the terminological variation or term mismatching among the world of readers and the world of authors. Hence, query expansion is an effective mechanism to reduce term mismatching problems and also to improve the retrieval performance of IR systems. The idea behind query expansion is to reformulate the user’s original query by adding related terms. In this study, an automatic Afaan Oromo thesaurus is constructed from manually collected documents. After the text preprocessing tasks are performed on the document corpus, the preprocessed words are vectorized in multidimensional space by using Word2Vec`s skip-gram model. In which, words that share similar context have similar vector representation. Then cosine similarity measure was applied to construct the thesaurus. A one-to-many association approach was employed to select expansion terms. Hence top five terms that have the highest similarity score with the entire query were selected from the thesaurus and added to the original query of the user for query expansion. Then the reformulated query was used to retrieve more relevant documents. Experimentations were performed to observe the quality of the constructed thesaurus and the effect of integrating query expansion into the Afaan Oromo IR system. The result shows that the constructed thesaurus generates related terms with average relatedness accuracy of 62.1%. On the other hand, the integration of query expansion registered performance improvement by 14.3 % recall, 2.9 % F-measure, and performance decrement of 5.5% for precision

Description

Keywords

Query expansion, information retrieval, thesaurus, Word2Vec, skip-gram

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By