Computer Science

Permanent URI for this collectionhttps://etd.hu.edu.et/handle/123456789/76

Browse

Search Results

Now showing 1 - 10 of 35
  • Item
    DETECTION AND CLASSIFICATION OF INDIGENOUS ROCK MINERALS USING DEEP LEARNING TECHNIQUES
    (Hawassa University, 2023-03-08) HADIA BERKA WAKO
    Ethiopia is undoubtedly a place of riches, with a vast and diverse landmass that is rich in resources. However, less attention has been given in utilizing computing discipline like Artificial Intelligence to solve the current problems in the area of mineral mining in Ethiopia. GUJI Zone is one of Oromia 20 administrative zones blessed with different mineral resources. Despite the fact that mineral has lions share contribution to economy of Ethiopia, little work is done in modernizing the mining industry in Ethiopia especially in empowering small-scale Artisanal community. GUJI is one of the zones following outmoded techniques to identify minerals in mining industry. Rock mineral detection and classification employing conventional methods involves testing physical and chemical properties at both the micro- and macro-scale in the laboratory, which is expensive and time-consuming. Identifying tiny rock minerals and detecting its originality using traditional procedure and techniques takes too much time. Identification of minerals merely through visual observation is often erroneous. To address these problems, a deep learning approach for the classification and detection of Rock Minerals is proposed. The design- science research methodology is followed to achieve the objectives of the research. To conduct this study, 2000 images were collected from Guji’s zone and Mindat.org website. After collecting the images, image pre-processing techniques such as image resizing, image segmentation using roboflow, and image annotation are performed. Moreover, data augmentation is applied to balance the dataset and increase the number of images. This research work focuses on classifying and detecting fifteen types of rock minerals. Based on YOLOv7 deep learning model we have used 70% of the dataset to train the model and 30 % of the dataset to test the performance of the model. Finally, the developed model is evaluated using accuracy, precision, recall, and mAP with other models. Experimental result shows that the accuracy obtained from YOLOv7 is 76%mAP for large objects comparing to other models. Consequently, the pretrained weight of yolov7 achieved a 97.3% accuracy in classifying and detecting with other images
  • Item
    EXPLORING A BETTER FEATURE EXTRACTION METHOD FOR AMHARIC HATE SPEECH DETECTION
    (Hawassa University, 2021-10-08) YESUF MOHAMED YIMAM
    Hate speech is a speech that causes people to be attacked, discriminated, and hated because of their personal and collective identities. When hate speech grows, it will cause death and displacement of peoples from their homes and properties. Social media has the ability of widely spreading hate speech. To solve this problem, various researchers have studied many ways to detect social media hate speeches that are spreading in international and local languages. Because the problem is so serious, it needs to be carefully studied and better addressed in a variety of solutions. The previous studies detect a speech as hate speech, based on the frequency (occurrence) of a word in a given dataset; this means it does not consider the role of each word in a given sentence. The main purpose of this study is to design a method that can generate hate speech features from a given text by identifying the role of a word in a given sentence, so that hate speech can easily be distinguished from other forms of speech in a better way. To do this, various researches related to this study have been studied and reviewed. This study created a new feature extraction method for Amharic hate speech detection. The created model needs a training and testing dataset, so that posts and comments, which are posted on 25 popular Facebook pages, have been collected to build the dataset. Whether a speech is hateful or not, should be determined by the law that prohibits hate speech. So that, using different filtration methods, datasets that contain religious, ethnic, and hate words are collected and given to law experts, to annotate it manually. The law experts labeled 2590 datasets into three classes; Religion-hate, Ethnic-hate, and Non-hate. After dataset preparation, a new feature extraction method, which can distinguish hate speech from other speech, is developed. The new feature extraction method and other feature extraction methods that are used in other related studies are implemented and computed with three machine learning classification algorithms: SVM, NB, and RF. The result in different evaluation metrics shows that the new feature extraction method performed better in all combinations of classification algorithms. By using 80% of 2590 labeled datasets as a training set and the rest as a test set, 96.2% average accuracy is achieved using the combination of SVM with the new feature extraction method.
  • Item
    A MODEL TOWARDS PRICE PREDICTION FOR COMMODITIES USING DEEP LEARNING: CASE OF ETHIOPIAN COMMODITY EXCHANGE
    (Hawassa University, 2022-10-03) SOLEN GOBENA
    The development of information technology makes it possible to collect and store large amounts of data every second. Market Enterprises are generating large amounts of data, and it is difficult to use traditional data analysis methods to analyze and predict their future market price. Price predictions are an integral component of trade and policy analysis. The prices of agricultural commodities directly influence the real income of farmers and it also affects the national foreign currency. Haricot bean is produced in many areas of Ethiopia and it is rich in starch, protein, and dietary fiber, and is an excellent source of minerals and vitamins. Haricot bean is also the main agricultural commodity traded on the Ethiopian commodity exchange (ECX) market for the past 10 years. Though there are price prediction works for various crops in Ethiopia and abroad using machine learning and deep learning approaches, price prediction for Haricot bean has not been studied using machine learning as to the best of our knowledge,. The main objective of this study is to develop a price prediction model that can predict future prices of Haricot Bean traded at the ECX market based on time series data. Past 10 years, data has been obtained from the Ethiopian commodity exchange (ECX) with sample dataset size of 12272. Simple linear regression (SLR), multiple linear regression (MLR), and long short term memory (LSTM) were evaluated as predictive models. The results showed that LSTM outperformed other predictive models in all measures of model performance for predicting the Haricot Bean prices by achieving a coefficient of determination (R2 ) of 0.97, mean absolute percentage error (MAPE) of 0.015, and mean absolute error (MAE) of 0.032.
  • Item
    FOR SIDAMA LANGUAGE USING THE HIDDEN MARKOV MODEL WITH VITERBI ALGORITHM
    (Hawassa University, 2022-04-07) BELACHEW KEBEDE ESHETU
    The Parts of Speech (POS) tagger is an essential low-level tool in many natural language processing (NLP) applications. POS tagging is the process of assigning a corresponding part of a speech tag to a word that describes how it is used in a sentence. There are different approaches to POS tagging. The most common approaches are rule-based, stochastic, and hybrid POS tagging. In this paper, the stochastic approach, particularly the Hidden Markov Model (HMM) approach with the Viterbi algorithm, was applied to develop the part of the speech tagger for Sidaama. The HMM POS tagger tags the words based on the most probable sequence of words. For training and testing the model, 9,660 Sidaama sentences containing 130,847 tokens (words, punctuation, and symbols) were collected, and 4 experts in the language undertook the POS annotation. Thirty-one (31) POS tags were used in the annotation. The source of the corpus is fables, news, reading passages, and some scripts from the Bible. 90% of the corpus is used for training and the remaining 10% is used for testing. The POS tagger was implemented using the Python programming language (python 3.7.0) and the Natural Language Toolkit (NLTK 3.0.0). The performance of the Sidaama POS tagger was tested and validated using a ten-fold cross-validation technique. In the performance analysis experiment, the model achieved an accuracy of 91.25% for HMM model and 98.46% with the Viterbi algorithm
  • Item
    QUERY EXPANSION FOR AFAAN OROMO INFORMATION RETRIEVAL USING AUTOMATIC THESAURUS
    (Hawassa University, 2021-03-05) SAMUEL MESFIN BAYU
    Recently, the amount of textual information written in Afaan Oromo language is increasing dynamically. Likewise, the need to access the information also increases. But, it is difficult to retrieve and satisfy one`s own information need, because of the inability of the users to formulate a good query and the terminological variation or term mismatching among the world of readers and the world of authors. Hence, query expansion is an effective mechanism to reduce term mismatching problems and also to improve the retrieval performance of IR systems. The idea behind query expansion is to reformulate the user’s original query by adding related terms. In this study, an automatic Afaan Oromo thesaurus is constructed from manually collected documents. After the text preprocessing tasks are performed on the document corpus, the preprocessed words are vectorized in multidimensional space by using Word2Vec`s skip-gram model. In which, words that share similar context have similar vector representation. Then cosine similarity measure was applied to construct the thesaurus. A one-to-many association approach was employed to select expansion terms. Hence top five terms that have the highest similarity score with the entire query were selected from the thesaurus and added to the original query of the user for query expansion. Then the reformulated query was used to retrieve more relevant documents. Experimentations were performed to observe the quality of the constructed thesaurus and the effect of integrating query expansion into the Afaan Oromo IR system. The result shows that the constructed thesaurus generates related terms with average relatedness accuracy of 62.1%. On the other hand, the integration of query expansion registered performance improvement by 14.3 % recall, 2.9 % F-measure, and performance decrement of 5.5% for precision
  • Item
    Predictions of the Status of Undernutrition for Children below Five Using Ensemble Metho
    (Hawassa University, 2023-08-02) Natnael Abate Choreno
    Undernutrition is one of the main causes of morbidity and mortality in children under five in most developing countries, including Ethiopia. It increases the risk of infectious diseases, impairs cognitive and physical development, reduces school performance and productivity, and perpetuates intergenerational cycles of poverty and malnutrition. The primary goal of this thesis is to build an ensemble model that predicts the undernutrition status of children under five using data from the 2019 EMDHS. The experiments covered 15082 instances and 20 attributes. Ensemble methods combine several models to deliver better results. Typically, results from an ensemble approach are more accurate than those from a single model. The selected method consists of preprocessing, feature selection, k-fold cross-validation, model building, an ensemble classifier, and final prediction steps. In this work, different machine learning classification models such as the Decision Tree, Support Vector Machine, K-Nearest Neighbors, and Naive Bayes classifiers have been used as base model algorithms with an accuracy rate of 0.92%, 0.94%, 0.92%, and 0.75% respectively. The final result was combined by the stacking ensemble method with logistic regression. The most accurate predictive model, with a 96 % accuracy rate was created using the stacking ensemble method. HAZ, WAZ, WHZ, age in 5 years groups, region, source of drinking water, education level, type of toilet facility, wealth index, total children born, number of antenatal visits, vaccination, breastfeeding duration, ever had nutritious food and plain water has given are the major features that contribute to undernutrition in children under-five. The findings of this study provided encouraging evidence that using the ensemble method could support the development of a predictive model that predicts the nutritional status of children under five in Ethiopia. Future research could produce better results by combining large datasets from clinical and hospital datasets. Future research may also include children over the age of five and children with obesity as a malnutrition status
  • Item
    Improving delay tolerant network buffer management approach for rural area’s health professionals’ information exchange syste
    (Hawassa University, 2022-08-06) Mulusew Abebe
    Delay-tolerant networks (DTNs) are mobile networks in the field of wireless network which are emphasized to provide end-to-end connectivity in the areas where the networks are not reliable and often susceptible to interferences. Despite the rapid advancement of communication technology, there are still rural places that are not connected to the Internet. Health information exchange between rural area and the urban areas still hampered by in adequate telecommunication infrastructures coverage, intermittent connectivity and absence of end-to-end connectivity. The term Delay Tolerant Network (DTN) is invented to bridged communication gaps that have not been connected to the Internet. In current TCP/IP technology communication is possible only when end-to-end path is available. As a result, the usual Internet and TCP/IP network cannot be valid for some hard environments which are characterized by lack of direct path between nodes, lot of power outages and intermittent connectivity. In this work, the researcher investigated the performance of various delay tolerant network routing protocols and selected MaxProp which is convenient for the proposed framework. Most routing algorithm of delay tolerant network assume the nodes buffer space as unlimited but, it is not the case in reality. As flooding-based routing relies on buffer to have a copy of every message at every node, buffer space has substantial impact on delivery probability. The existing buffer management policies compute in biased way, directed by a single parameter in a random manner while other relevant parameters are completely neglected, resulting in an inability to make a reasonable selection. Therefore, the researcher proposed a reasonable buffer management approach on the situations where there is a short contact duration, limited bandwidth and buffer. The proposed buffer management approach improves buffer availability by implementing three buffer management strategies: scheduling, dropping, and clearing buffers entirely for computing purposes, using three parameters: message type, hop count and time to live. The performance of proposed approach is validated through simulation by using opportunistic Network Environment (ONE) simulator. They were analyzed on three metrics, namely delivery probability, average latency and overhead ratio. The simulation results collected in this thesis shows that when the nodes buffer get constrained the proposed method MaxProp Routing based on Message Type Priority (MPRMTP) perform better than the existing buffer management policy by increasing the message delivery quality and decreasing overhead ratio. However, when there is sufficient buffer space, both MaxProp, and MPRMTP shows comparable performance
  • Item
    CONTEXT-BASED SPELL CHECKER FOR SIDAAMU AFOO
    (Hawassa University, 2022-03-04) MEKONEN MOKE WAGARA
    A spell checker is one of the applications of natural language processing that is used to detect and correct spelling errors in written text. Spelling errors that occur in the written text can be non-word errors or real-word errors. A non-word error is a misspelled word that is not found in the language and has no meaning whereas a real-word error, that is, the word is a valid word in the language but it does not fit contextually in the sentence. We designed and implemented a spell checker for Sidaamu Afoo that can detect and correct both non-word and real-word errors. Sidaamu Afoo is one of the languages spoken in the Sidaama region in the south-central part of Ethiopia. It is an official working language and is used as a medium of instruction in primary schools of the Sidaama national regional state in Ethiopia. To address the issue of spelling errors in the Sidaamu Afoo text, a spell checker is required. In this study, the dictionary look-up approach with a hashing algorithm is used to detect non-word errors, and the character-based encoder-decoder model is used to correct the non-word errors. The LSTM model with attention mechanism and edit distance is used to detect and correct the context based spelling error. To conduct the experiment, 55440 sentences were used, of which 90% were for training (i.e., 49,896) and 10% were for testing (i.e., 5544). According to the experimental results, for an isolated spell checker, dictionary lookup with hashing achieved an accuracy of 93.05%, a recall of correct words of 91.51%, and a precision of incorrect words of 72.37% for detection. The encoder decoder model achieved a recall of 91.76% for corrections. For a context-sensitive spell checker, the LSTM model with attention and edit distance achieved an accuracy of 88.8%, recall of the correct word of 86.18%, and precision of the incorrect word of 62.84% for detection. It achieved a recall of 74.28% for the correction. The results of the experiment show that the model used to detect and correct both non-word and real-word spelling errors in Sidaamu Afoo’s written text performed well. Finally, to improve the performance of the model, we recommend using additional data set and a state-of-the-art transformer model.
  • Item
    MORPHOLOGICAL ANALYZER AND GENERATOR FOR KAMBAATISSA USING FINITE STATE TRANSDUCER
    (Hawassa University, 2023-08-03) LIDIYA TADESSE GETISO
    Kambaatissa is a Highland East Cushitic Language spoken in the Kambaata Xambaaro zone of South Nation Nationality and People Regional State, Ethiopia. It is a strictly suffixing and morphologically rich language. The language is one of the under-resourced languages in Ethiopia. For languages with complex morphology, nearly all computational work depends on the presence of tools for morphological processing. Many researches have been conducted in morphological analysis extensively for different languages, while this work is the first work in Kambaatissa natural language processing applications. This study focused on a morphological analyzer and generator, which is a lower-level natural language processing application that is used as a base for many higher NLP applications. A finite state transducer is a framework for modeling morphology. In this study, Foma is used as an implementation toolkit and lexc formalism for designing the lexicon. The experiment is done using 860 root verbs. There are seventeen continuous classes and forty different rules in the lexicon and foma file respectively. Result from the experiment shows that 92,020 words are generated among them the transducer gives 95.2% correct Kambaatissa verbs.
  • Item
    Bi-Directional Sidaamu Afoo - Amharic Statistical Machine Translation
    (Hawassa University, 2023-04-06) Kebebush Kamiso
    Machine translation (MT) is the area of Natural Language Processing (NLP) that focuses on obtaining a target language text from a source language text using automatic techniques. It is a multidisciplinary field and the challenge has been approached from various points of view including linguistics and statistics. MT usually involves one or more approaches. Our preference for this study is to develop the bi directional Sidaamu Afoo - Amharic machine translation system, make use of a statistical machine translation (SMT) approach. To conduct the experiment, a parallel corpus was collected from all possible available sources. These include mostly the Old and New Testaments of the Holy Bible for both languages. We used the monolingual Contemporary Amharic Corpus and the Sidama Afoo corpus compiled by a research team in the Informatics Faculty of Hawassa University. Different preprocessing tasks such as tokenization, cleaning, and normalization have been done to make the corpus suitable for the system. To accomplish the objective of this thesis work, we conducted four experiments using word and morpheme-based translation units with SMT for Sidaamu Afoo - Amharic language pairs. The first two experiments focus on word-based SMT and the next two on morpheme-based translation using unsupervised morphological segmentation tool; Morfessor. For each experiment, we used 30,100 parallel sentences. Out of the total parallel sentences, we used 80% (24,100) of randomly selected parallel sentences for training, 10% (3,000) for tuning and another 10% (3,000) for testing. The basic tools used for accomplishing the machine translation are Moses for the translation process which is MGIZA ++ for word and morpheme alignment and KenLM for language modeling; Morfessor for morphological segmentation. For evaluation SacreBLEU package which are BLEU, ChrF and TER metrics. According to the experimental findings, the differences between Amharic to Sidaamu Afoo and Sidaamu Afoo to Amharic in the Word-based alignment translation were 6.2, 16, and 1.9 for BLUE, ChrF2, and TER, respectively. In the Morpheme-based alignment, the differences between Amharic to Sidaamu Afoo and Sidaamu Afoo to Amharic translation were 7.5, 20.4, and 5.1, for BLUE, ChrF2, and TER respectively. In conclusion, the results show that morpheme-based alignment performance is better than word based alignment, for Amharic to Sidaamu Afoo than Sidaamu Afoo to Amharic