QUERY EXPANSION FOR SIDAMA LANGUAGE INFORMATION RETRIEVAL  SYSTEM

AEMRO NOKOLA MASALE

QUERY EXPANSION FOR SIDAMA LANGUAGE INFORMATION RETRIEVAL SYSTEM

Files

Aemro Thesis - Final Modified.pdf (1.62 MB)

Date

2024-10-14

Authors

AEMRO NOKOLA MASALE

Publisher

Hawassa University

Abstract

Information retrieval has become vital research topic in this computing era. Information retrieval is the process of searching and retrieving knowledge-based information from database. Wide range of the users are using some IR on an everyday activity. However, IR still have different challenges such as, short user query expression, ambiguity nature of the natural language and the vocabulary mismatch between query terms and relevant documents which are resulting decreased IR systems efficiency. The goal of this study was to design and develop manual-query expansion for Sidama language IR using Vector Space Model to improve Sidama language IR system by minimizing short query and query-document mismatch problems. To attain the abovementioned goal, we have studied several IR based literatures, get better understanding about IR, searching models, indexing, query expansion techniques and fundamental Sidama language morphology and language structure. We have designed the manual query expansion for Sidama IR in two subsections. The first subsection performs text preprocessing and indexing using inverted file indexing technique. The second subsection performs the comparison, query expansion, searching and ranking according to cosine similarity measurement. The implementation was done using Python programming language. In order to measure an implemented prototype system, we have collected 500 documents from different sources as document corpus and 20 initial queries. Query document relevance judgement was done manually by domain experts and documents are categorized according to query. Two experimentations have done for each of twenty queries. First experimentation was done searching with initial user query (without query expansion). The second experiment has been done searching with expanded query (with query expansion). The performance measurements have been calculated using common efficiency measurement units such as precision, recall, and F-measure. During the first experimentation, we recorded the results for each of 20 initial queries and obtained the average precision values of 67.86%, average recall value of 66.53%, and average F-measure value of 65.66%. The second experimentation has been performed using manual query expansion and obtained results were 75.73% average precision, 96.08% average recall and 83.68% average F-measure. As we can see the average results of the two experimentations, a significant improvement has been recorded in the second experimentation (manual query expansion-based searching). An average precision, recall and f measure values are increased by 7.87%, 29.55%, 18.02% respectively in the second experimentation than the searching results in the first experimentation. Greater improvement has been seen in recall value, which indicates that almost all relevant documents in the corpus have been successfully retrieved during query expansion searching. Finally, we can conclude that, the proposed manual query expansion-based searching results in greater improvement than searching without query expansion. But the lack of rule based stemming algorithm was the main issue that diminish the performance, in future need further studies

Keywords

Information Retrieval (IR), Vector- Space- Model (VSM), Indexing, Query expansion, Searching

URI

https://etd.hu.edu.et/handle/123456789/566

Collections

Computer Science

Full item page

QUERY EXPANSION FOR SIDAMA LANGUAGE INFORMATION RETRIEVAL SYSTEM

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By