Computer Science

Permanent URI for this collectionhttps://etd.hu.edu.et/handle/123456789/76

Browse

Search Results

Now showing 1 - 10 of 44
  • Item
    ENSET DISEASE DETECTION AND CLASSIFICATION USING DEEP LEARNING TECHNIQUES
    (Hawassa University, 2024-12-10) : ENDASHAW NIGUSE ASTATEKE
    Ethiopians, especially those in Sidama and Central Ethiopia, are the main consumers of enset. Enset is thought to be a staple food source for 20 million people in Ethiopia. Ethiopians mostly employ enset plants as a staple food crop. For many Ethiopians, the plant's root and stems are a major source of energy due to their high fiber and carbohydrate content. Typically, stems are picked, cleaned, and fermented to produce kocho and bulla, which are food items that are similar to bread. This research investigates the application of deep learning techniques, specifically Convolutional Neural Networks (CNNs), for the detection and classification of diseases affecting Enset leaves and stems. By employing sophisticated image processing tools and methodologies, the study aims to improve the accuracy and efficiency of disease identification in Enset plants. Experimental findings underscore the effectiveness of the proposed CNN models in achieving notable accuracy rates in disease detection, showcasing the potential of deep learning in revolutionizing agricultural practices. The study not only emphasizes the importance of advanced image processing in agricultural contexts but also underscores the necessity for further research in crop disease detection to enhance agricultural sustainability and productivity. We collected from the Central Ethiopia Region (Wonago and Dilla) and the Sidama Region (Hawassa Zuria, Boricha, Yirgalem, and Aletawondo) to use a 5,000 image dataset. There are 1000 images in each class. A total of 700 training, 200 validation, and 100 testing images were chosen. We used pre-trained models, MobileNetV3Small and EfficientNetB7, to compare the results with the newly developed model. Bacterial wilt, Mosaic Virus, Bacterial leaf spot, Insect pest, and Healthy Leaves are the disease classes. The Nadam optimizer, 32 batch sizes, 65 epochs, and 0.001 learning rate are the selected hyperparameters. The model was stable at epoch 65 and has an accuracy rate of 99.30%. EfficientNetB7 and MobileNetV3Small, the pre trained models, have accuracy rates of 95.32% and 93.08%, respectively. The developed model's output has a high degree of accuracy in identifying and classifying diseases in Enset leaves
  • Item
    QUERY EXPANSION FOR SIDAMA LANGUAGE INFORMATION RETRIEVAL SYSTEM
    (Hawassa University, 2024-10-14) AEMRO NOKOLA MASALE
    Information retrieval has become vital research topic in this computing era. Information retrieval is the process of searching and retrieving knowledge-based information from database. Wide range of the users are using some IR on an everyday activity. However, IR still have different challenges such as, short user query expression, ambiguity nature of the natural language and the vocabulary mismatch between query terms and relevant documents which are resulting decreased IR systems efficiency. The goal of this study was to design and develop manual-query expansion for Sidama language IR using Vector Space Model to improve Sidama language IR system by minimizing short query and query-document mismatch problems. To attain the abovementioned goal, we have studied several IR based literatures, get better understanding about IR, searching models, indexing, query expansion techniques and fundamental Sidama language morphology and language structure. We have designed the manual query expansion for Sidama IR in two subsections. The first subsection performs text preprocessing and indexing using inverted file indexing technique. The second subsection performs the comparison, query expansion, searching and ranking according to cosine similarity measurement. The implementation was done using Python programming language. In order to measure an implemented prototype system, we have collected 500 documents from different sources as document corpus and 20 initial queries. Query document relevance judgement was done manually by domain experts and documents are categorized according to query. Two experimentations have done for each of twenty queries. First experimentation was done searching with initial user query (without query expansion). The second experiment has been done searching with expanded query (with query expansion). The performance measurements have been calculated using common efficiency measurement units such as precision, recall, and F-measure. During the first experimentation, we recorded the results for each of 20 initial queries and obtained the average precision values of 67.86%, average recall value of 66.53%, and average F-measure value of 65.66%. The second experimentation has been performed using manual query expansion and obtained results were 75.73% average precision, 96.08% average recall and 83.68% average F-measure. As we can see the average results of the two experimentations, a significant improvement has been recorded in the second experimentation (manual query expansion-based searching). An average precision, recall and f measure values are increased by 7.87%, 29.55%, 18.02% respectively in the second experimentation than the searching results in the first experimentation. Greater improvement has been seen in recall value, which indicates that almost all relevant documents in the corpus have been successfully retrieved during query expansion searching. Finally, we can conclude that, the proposed manual query expansion-based searching results in greater improvement than searching without query expansion. But the lack of rule based stemming algorithm was the main issue that diminish the performance, in future need further studies
  • Item
    AMHARIC MULTI-HOP QUESTION ANSWERING IN HISTORICAL TEXTS: A DEEP LEARNING APPROACH
    (Hawassa University, 2024-07-03) BEREKET ENDALE
    In our daily lives, questioning is the most effective way to gain knowledge. However, manual extraction of answers is time-consuming and requires expertise in the field. As a result, implementing fully question answering could accelerate extraction times and reduce the requirement for human labour. Numerous studies have been done on question answering in full resource languages like English, and others using various recent techniques. However, unlike previous research, which concentrated exclusively on single hop question answering, this thesis proposes the concept of multi-hop question answering in Amharic. Until yet, no studies have investigated multi-hop question answering in the context of the Amharic language, which includes reasoning over numerous pieces of evidence or documents to generate an answer. Furthermore, there is no existing question answering data set to address these issues; therefore, this study used deep learning for the Amharic multi-hop question answering problem, a neural network method. To do this, we preprocess our dataset using tokenization, normalization, stop word removal, and, padding before feeding it to a deep learning model such as CNN, LSTM, and Bi-LSTM to create question type classification based on the given input. Because there is no multi-hop Question answering training dataset in Amharic, training data must be created manually, which is time-consuming and tedious. It is around 1500 questions and contexts associated with five classes. The class depicts as ((0) for factoid_date, (1) for factoid_person, (2) for factoid_location, and (3) for factoid_organization. Accuracy, precision, the F-measure, and the confusion matrix are performance metrics used to evaluate the model's overall efficiency when applied to the provided dataset. According to performance measurements, the maximum achievable accuracy rates for this study's LSTM, CNN, and Bi-LSTM were 96%, 96.38%, and 97.04%, respectively. The findings indicated that the suggested Bi LSTM outperformed the other two models in terms of Amharic multi-hop questions type classification.
  • Item
    AMHARIC EXTRACTIVE TEXT SUMMARIZATION USING AmRoBERTa –BiLSTM MODEL
    (Hawassa University, 2024-04-14) EDEN AHMED
    Extractive text summarization is a crucial task in natural language processing, allowing users to quickly grasp the main ideas of lengthy documents. The manual summarization process is often labor-intensive and time-consuming. As the volume of information in the Amharic language continues to grow, the need for effective summarization systems has become essential. While various summarization techniques have been developed for multiple languages, research specifically focused on Amharic remains limited. Most existing studies rely on traditional methods that often lack of contextual embeddings, which are crucial for understanding the meaning within the text. Additionally, current approaches often struggle to capture long-range dependencies among sentences and none of the existing studies have utilized hybrid deep models, which have demonstrated state of-the-art performance in summarization tasks for other languages. This study addresses the challenge of extractive text summarization for Amharic news articles by proposing a hybrid deep learning model that combines the contextual understanding of AmRoBERTa with the sequential processing capabilities of Bidirectional Long Short-Term Memory. A dataset of 1,200 Amharic news articles, covering a variety of topics, was collected. Each article was segmented into sentences and labeled by experts to indicate their relevance for summarization. Preprocessing was conducted, including normalization and tokenization using AmRoBERTa, to prepare the data for modeling. The proposed model was trained using various hyperparameter configurations and optimization techniques. Its effectiveness was evaluated using ROUGE metrics. The results demonstrate that our model achieved significant performance, with a ROUGE-1 score of 44.48, a ROUGE-2 score of 34.73, and a ROUGE-L score of 44.47
  • Item
    CLASSIFYING EFFECT OF E-BANKING SERVICE ON DEPOSIT MOBILIZATION USING MACHINE-LEARNING TECHNIQUES
    (Hawassa University, 2024-10-03) BALCHA BEKELE
    Identifying services that are more likely potential to E-banking product offering is an important issue. Cooperative Bank of Oromia S.C., being one of the former private banks in Ethiopia is offering E-Banking products. The main objective of this study is to apply machine learning algorithms for developing Deposit mobilization Performance prediction Model that forecast potential of E-banking channel service in Cooperative Bank of Oromia. This research follows experimental research. For modelling purpose, data was gathered from the institution head office. Since irrelevant features result in bad model performance, data pre processing was performed in order to determine the inputs to the model. This thesis investigates the creation and assessment of six machine learning algorithms to forecast deposit behavior from customers: CART, SVM, KNN, Naïve Bayes, Logistic Regression and Random Forest. Cross tables were used to show the results of precision calculations and confusion matrices used to evaluate the performance of these models. With an emphasis on the relevance of various attributes in predicting customer deposits, the suitability of various classification algorithms, the relative effectiveness of ensemble versus base learning models, and forecasting based on influential attributes, the study tackled three main research questions. Experimental results exhibit that, the ensemble learning model achieved 98.496% accuracy in categorizing deposits, outperforming individual algorithms like KNN (98.491%) and SVM (98.401%), emphasizing the superiority of ensemble methods for deposit mobilization prediction. Random Forest Classifier identified "other_debit," "gender," and "mobile banking" as the most significant predictors of deposit mobilization, with relevance scores of 20%, 18%, and 13% respectively. Moderately important features included "mobile_credit", "mobile_debit", "card_debit", and "marital_status", while "atm_card" and "other_credit" were negligible. Finally, this thesis shows the effectiveness of machine learning in financial prediction by offering a thorough comparison of six popular categorization methods. The result offer valuable insights for enhancing customer deposit strategies at CBO and potentially other banking institutions
  • Item
    ASPECT BASED SENTIMENT ANALYSIS FOR AFAAN OROMOO TEXT USING BERT
    (Hawassa University, 2024-08-14) FETIYA FURI
    Aspect-based sentiment analysis (ABSA) is a more important and advanced task of sentiment analysis which determine both the sentiments and the aspects within the text. It is an essential research field within natural language processing, especially for languages that lack extensive resources. This study focuses on developing an ABSA model for Afaan Oromoo language, one of the widely spoken languages in Ethiopia. Despite the rich linguistic diversity of Afaan Oromoo, there is a scarcity of computational tools and datasets for sentiment analysis in this language. Our research addresses this gap by creating a comprehensive dataset annotated with BIO annotation scheme for aspect terms and integrates CNN and BiLSTM for aspect extraction, and BERT for aspect sentiment classification. We fine-tuned pre-trained BERT model on our annotated Afaan Oromoo dataset to perform aspect based sentiment analysis. The total of 2550 review text collected from FBC Afaan Oromoo Facebook page, BBC Afaan Oromoo and other relevant social media are used for this study. After data collection, two annotators’ annotated data manually into three classes (i.e., positive, negative and neutral). The aspect terms used for study are extracted from three domain, coffee, gold and flower. Basically ten aspect terms namely (qulqullinna bunaa, oomisha bunaa, foolii, dandhama, worqee baasuu, galii, gatii, diinagdee, agarsiisa worqee and al-ergii) are used for the study. CNN-BiLSTM is used for aspect extraction and performed 92.8% of accuracy. BERT model performed accuracy of 87% for aspect sentiment classification. This work not only contributes to the development of sentiment analysis for Afaan Oromoo but also provides a framework for applying advanced NLP techniques to other low-resource languages
  • Item
    AUTOMATIC FISH SPECIES IDENTIFICATION USING DEEP LEARNING TECHNIQUE
    (Hawassa University, 2023-03-17) HABTAMUA ZERIHUN
    In recent years, the growing global population has led to an increased demand for animal protein, including fish and other aquatic products. Aquaculture has emerged as a primary method for meeting this demand. There is a need for reliable and accurate methods to identify fish species. However, the accurate identification of fish species remains a challenge as there are various fish species endemic to different regions. This research focuses on addressing this challenge by developing a system for automatic fish species identification using deep learning technique, with a specific emphasis on convolutional neural network (CNN). To accomplish the objective of the research, fish species images were collected from Lake Hawassa. The collected dataset was certified by domain experts from the Centre for Aquaculture Research and Education (CARE) at Hawassa University. A custom dataset was prepared, consisting of a total of 6000 images of six fish species: Oreochromis niloticus, Clarias garipienus, LabeoBarbus intermedius, Barbus paludinosis, Garra quadrimaculata, and Aplocheilichthys. The proposed system for fish species identification implements a preprocessing module that involves image resizing and pixel value normalization to ensure uniformity and enhance training performance. Data augmentation techniques were utilized to generate diverse training examples. For classification, convolutional neural network (CNN) is employed, either trained using Convolutional neural network (CNN) architectures or utilizing pre-trained models such as Inceptionv3, VGG16, and ResNet50. Evaluation metrics were employed with two different dataset ratios: 70/30 and 80/20 and also three pre-trained models were used for comparison. The results demonstrate that our proposed model 70/30 ratio outperforms the pre-trained models in terms of training, testing accuracy, as well as loss. Our model achieved a training accuracy of 100%, validation accuracy of 99.7% and a testing accuracy of 99.5%, indicating better learning and classification capabilities. Additionally, the model achieved a recall, precision and f1 score of 100%. This research contributes to the field of fish species identification. By leveraging deep learning techniques, Particularly CNN, our model achieves better accuracy in automatic fish species identification. It reduces reliance on expert skills, addresses unresolved problems, and contributes to the progress of accurate fish species identification
  • Item
    ENSEMBLE LEARNING-BASED PREDICTION OF STROKE RISK
    (Hawassa University, 2024-07-12) SELAMAWIT TADESSE CHACHAMO
    A stroke is a potentially fatal illness that results from insufficient blood flow to a portion of the brain or bursts of arteries. It is the leading cause of disability and ranks second globally in terms of causes of death. Stroke is currently one of the most common reasons for hospital admission in many healthcare facilities and has become a serious public health concern in Ethiopia. Early prediction is necessary to reduce death and disability. Additionally, as risk factors for stroke include where you live, your lifestyle, your diet, the temperature, the environment, and socioeconomic issues, it is important to investigate the risk of stroke in different geographic places. The study aims to predict stroke risk using three ensemble learning models.Random Forest, XGBoost, and LightGBM are used in this study to predict stroke risk across the study area.The collected data is integrated, cleaned, normalization, the missing data is handled, and Synthetic Minority Over sampling Technique (SMOTE) is used to handle a class imbalance in the data before evaluation started, Grid search technique is also used to find best performances of the models. The model is evaluated with accuracy, precision, recall, F1-score, and confusion matrix, and a correlation graph is also used to capture the relationship of the attributes. Random Forest had the maximum accuracy of 97.6% among models, followed by XGBoost at 96.1% and LightGBM at 92.9%.The study found that Discontinuation of Anti Hypertensive drug is the major risk factor for Stroke in the study
  • Item
    PSYCHIATRIC MENTAL DISORDER PREDICTION USING ARTIFICIAL NEURAL NETWORK
    (Hawassa University, 2024-10-15) HABTAMU DEBASA
    Psychiatric mental illnesses represent a serious public health risk, and successful management of these conditions depends on early detection and intervention. The use of artificial neural networks (ANN) as a predictive tool to identify people who may acquire psychiatric mental problems is examined in this research. Artificial neural networks (ANNs) are capable of efficiently learning patterns and associations that human clinicians might not be able to see by utilizing vast databases of clinical and demographic data. In Ethiopia, mental disorders are the most leading non communicable disorder [8]. According to World Health Organization (WHO) report shows that 4,480,113 (4.7%) and 3,139,003 (3.3%) people in Ethiopia are estimated to suffer from depression and anxiety respectively; the total years lived with a disability was about 837,683 (10.1%) led by depressive disorder and 292,650 (3.6%) by anxiety disorder. Experimental research design was used for predictions of 8 classes of target variables by using 30 independent variables. This research paper used ANN model with three different architectures. MLP_1L, MLP_2L, MLP_3L were trained with different hyperparameter values and achieved 98.9%, 99.5%, 99.5 respectively. We used MLP_3L for prediction of each disorder types and we achieved accuracy of Bipolar = 99%, ADHD= 97.5%, PTSD = 99.4%, Anxiety = 99.5%, Major depressive disorder = 99.5%, OCD = 99.9%, Schizophrenia = 99.9%, and PD = 99.4% respectively. The result we have in our experiments have proven that MLP_2L and MLP_3L model can significantly support accurate prediction of psychiatric mental disorders with highest accuracy than MLP_1L
  • Item
    Maize Crop Yield Prediction Using Machine Learning Techniques
    (Hawassa University, 2023-12-29) Kedija Abdurhman
    Maize is one of the main crops cultivated all throughout the world, including in Ethiopia. However, the production of maize changes widely based on many factors, such as weather, soil quality, and fertilizer usage. Predicting maize yields is crucial for farmers because it allows them to make informed crop management decisions. Machine learning approaches have shown promise in predicting crop productivity in recent years. The goal of this thesis is to explore the utilization of ensemble methods, namely Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Decision Trees (DT), in the context of maize yield prediction. Ensemble methods involve combining predictions from multiple models to enhance accuracy and fortify the reliability of the forecasting process. The dataset used in this study was compiled from data collected between 2003 and 2022 on several aspects such as weather, soil quality, and maize production. Before developing the ensemble techniques, the dataset was preprocessed and the features were normalized. The results of this research show that ensemble techniques can potentially be employed for predicting maize yields with great performance. The MAE was 0. 0025, the MSE was 0. 0032, the RMSE was 0. 0057, and the R2 was 0. 9928. The results show that meteorological factors like rainfall and temperature have a considerable impact on maize yields. Soil quality was also recognized as an important factor influencing maize crop production by the model. The research demonstrates that ensemble techniques could potentially be used to accurately predict maize yields. Farmers can use the study's findings to informed decisions about their agricultural practices. The research also emphasizes the significance of meteorological conditions and soil quality in predicting maize yields.