QUERY EXPANSION FOR SIDAMA LANGUAGE INFORMATION RETRIEVAL SYSTEM
No Thumbnail Available
Date
2024-10-14
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Hawassa University
Abstract
Information retrieval has become vital research topic in this computing era. Information retrieval
is the process of searching and retrieving knowledge-based information from database. Wide
range of the users are using some IR on an everyday activity. However, IR still have different
challenges such as, short user query expression, ambiguity nature of the natural language and the
vocabulary mismatch between query terms and relevant documents which are resulting decreased
IR systems efficiency. The goal of this study was to design and develop manual-query expansion
for Sidama language IR using Vector Space Model to improve Sidama language IR system by
minimizing short query and query-document mismatch problems. To attain the abovementioned
goal, we have studied several IR based literatures, get better understanding about IR, searching
models, indexing, query expansion techniques and fundamental Sidama language morphology
and language structure. We have designed the manual query expansion for Sidama IR in two
subsections. The first subsection performs text preprocessing and indexing using inverted file
indexing technique. The second subsection performs the comparison, query expansion, searching
and ranking according to cosine similarity measurement. The implementation was done using
Python programming language. In order to measure an implemented prototype system, we have
collected 500 documents from different sources as document corpus and 20 initial queries. Query document relevance judgement was done manually by domain experts and documents are
categorized according to query. Two experimentations have done for each of twenty queries. First
experimentation was done searching with initial user query (without query expansion). The
second experiment has been done searching with expanded query (with query expansion). The
performance measurements have been calculated using common efficiency measurement units
such as precision, recall, and F-measure. During the first experimentation, we recorded the results
for each of 20 initial queries and obtained the average precision values of 67.86%, average recall
value of 66.53%, and average F-measure value of 65.66%. The second experimentation has been
performed using manual query expansion and obtained results were 75.73% average precision,
96.08% average recall and 83.68% average F-measure. As we can see the average results of the
two experimentations, a significant improvement has been recorded in the second
experimentation (manual query expansion-based searching). An average precision, recall and f measure values are increased by 7.87%, 29.55%, 18.02% respectively in the second
experimentation than the searching results in the first experimentation. Greater improvement has
been seen in recall value, which indicates that almost all relevant documents in the corpus have
been successfully retrieved during query expansion searching. Finally, we can conclude that, the
proposed manual query expansion-based searching results in greater improvement than searching
without query expansion. But the lack of rule based stemming algorithm was the main issue that
diminish the performance, in future need further studies
Description
Keywords
Information Retrieval (IR), Vector- Space- Model (VSM), Indexing, Query expansion, Searching
