Computer Science

Permanent URI for this collectionhttps://etd.hu.edu.et/handle/123456789/76

Browse

Search Results

Now showing 1 - 10 of 65

APPLICATION OF HYBRID APPROACH FOR WOLAITA LANGUAGE PART OF SPEECH TAGGING
(Hawassa University, 2020-03-24) BIRHANESH FIKRE SHIRKO
The aim of this research is to develop part-of-speech tagger for Wolaita Language using hybrid approach. Part of speech tagger is one of the subtasks in natural language processing (NLP) applications which is vital for other NLP tasks, like parser, machine translator, speech recognizer and search engines. It is a process of labeling a corresponding part of speech (PoS) tag for a word that defines how the word is used in a sentence. The PoS tagging for Wolaita language is not sufficient yet to be used as one important component in other natural language processing (NLP) applications. In this thesis, the development of part of speech tagger using hybrid approach that combines HMM and transformation based learning approaches is conducted for Wolaita language. In general, HMM model needs large data to increase the performance and the transformation based learning model learn rule based on the language features. The HMM model tags the words based on the optimal path for a given sequence of words and transformation based learning is a rule based model that tag the words based on rules; it learns rule directly from the training corpus without expert knowledge. The developed hybrid approach of Wolaita language PoS tagger uses HMM tagger as initial annotators and the rule based tagger as a corrector based on fixed threshold values. For implementation and experiment, the author used python programming and NLTK. For training and testing the model, 14,358 untagged Wolaita language words are collected from three different categories (Bible, Social media in Wolaita language (Wogetta FM 96.6) and Wolaita language department). The annotation of corpus performed manually by two language experts. For tagging purpose 26 PoS tag are identified based on the work of Berhanu H., work of wakasa (2008) and with help of language experts. From the entire corpus, 90% is used for training and the remaining 10% is used for testing purpose. The performance of the three taggers is tested by using different experiments. After experiment the researcher found that the performance of HMM, rule based and hybrid taggers shows 88.14%, 92.96% and 94.82% respectively. Generally, hybrid approach showed the better performance to assigning part of speech tag for Wolaita language sentences
DEVELOPING IMAGE-BASED ENSET PLANT DISEASE IDENTIFICATION USING CONVOLUTIONAL NEURAL NETWORK
(2020-11-12) UMER NURI MOHAMMED
Nowadays, decline in food plant productivity is a major problem causing food insecurity to which plant disease is one of the factors. Early identification and accurate diagnosis of the health status of food plants is hence critical to limit the spread of plant diseases and it should be in a technological manner rather than by the labor force. Traditional observation methods by farmers or domain experts is perhaps time-consuming, expensive and sometimes inaccurate. Based on the literature, the literature suggests that deep learning approaches are the most accurate models for the detection of plant disease. Convolutional Neural network (CNN) is one of the popular approaches that allows computational models that are composed of multiple processing layers to learn representations of image data with multiple levels of abstraction. These models have dramatically improved the state-of-the-art in visual object recognition and image classification that makes it a good way for enset plant disease classification problems. For this purpose, we used an appropriate CNN based model for identifying and classifying the three most critical diseases of enset plants: - enset bacterial wilt, enset Leaf spot, and Root mealybug diseases. Enset is one of a major source of food in the South, Central and Southwestern parts of Ethiopia. A total of 14,992 images are used for conducting experiments including augmented images with four different categories; three diseased and a healthy class obtained from the different agricultural sectors stationed at Hawassa and Worabe Ethiopia, these images are provided as input to the proposed model. Under the 10-fold cross-validation strategy, the experimental results show that the proposed model can effectively detect and classify four classes of enset plant diseases with the best classification accuracy of 99.53%, which is higher than compared to other classical deep learning models such as MobileNet and Inception v3 deep learning models.
ENSET DISEASE DETECTION AND CLASSIFICATION USING DEEP LEARNING TECHNIQUES
(Hawassa University, 2024-12-10) : ENDASHAW NIGUSE ASTATEKE
Ethiopians, especially those in Sidama and Central Ethiopia, are the main consumers of enset. Enset is thought to be a staple food source for 20 million people in Ethiopia. Ethiopians mostly employ enset plants as a staple food crop. For many Ethiopians, the plant's root and stems are a major source of energy due to their high fiber and carbohydrate content. Typically, stems are picked, cleaned, and fermented to produce kocho and bulla, which are food items that are similar to bread. This research investigates the application of deep learning techniques, specifically Convolutional Neural Networks (CNNs), for the detection and classification of diseases affecting Enset leaves and stems. By employing sophisticated image processing tools and methodologies, the study aims to improve the accuracy and efficiency of disease identification in Enset plants. Experimental findings underscore the effectiveness of the proposed CNN models in achieving notable accuracy rates in disease detection, showcasing the potential of deep learning in revolutionizing agricultural practices. The study not only emphasizes the importance of advanced image processing in agricultural contexts but also underscores the necessity for further research in crop disease detection to enhance agricultural sustainability and productivity. We collected from the Central Ethiopia Region (Wonago and Dilla) and the Sidama Region (Hawassa Zuria, Boricha, Yirgalem, and Aletawondo) to use a 5,000 image dataset. There are 1000 images in each class. A total of 700 training, 200 validation, and 100 testing images were chosen. We used pre-trained models, MobileNetV3Small and EfficientNetB7, to compare the results with the newly developed model. Bacterial wilt, Mosaic Virus, Bacterial leaf spot, Insect pest, and Healthy Leaves are the disease classes. The Nadam optimizer, 32 batch sizes, 65 epochs, and 0.001 learning rate are the selected hyperparameters. The model was stable at epoch 65 and has an accuracy rate of 99.30%. EfficientNetB7 and MobileNetV3Small, the pre trained models, have accuracy rates of 95.32% and 93.08%, respectively. The developed model's output has a high degree of accuracy in identifying and classifying diseases in Enset leaves
QUERY EXPANSION FOR SIDAMA LANGUAGE INFORMATION RETRIEVAL SYSTEM
(Hawassa University, 2024-10-14) AEMRO NOKOLA MASALE
Information retrieval has become vital research topic in this computing era. Information retrieval is the process of searching and retrieving knowledge-based information from database. Wide range of the users are using some IR on an everyday activity. However, IR still have different challenges such as, short user query expression, ambiguity nature of the natural language and the vocabulary mismatch between query terms and relevant documents which are resulting decreased IR systems efficiency. The goal of this study was to design and develop manual-query expansion for Sidama language IR using Vector Space Model to improve Sidama language IR system by minimizing short query and query-document mismatch problems. To attain the abovementioned goal, we have studied several IR based literatures, get better understanding about IR, searching models, indexing, query expansion techniques and fundamental Sidama language morphology and language structure. We have designed the manual query expansion for Sidama IR in two subsections. The first subsection performs text preprocessing and indexing using inverted file indexing technique. The second subsection performs the comparison, query expansion, searching and ranking according to cosine similarity measurement. The implementation was done using Python programming language. In order to measure an implemented prototype system, we have collected 500 documents from different sources as document corpus and 20 initial queries. Query document relevance judgement was done manually by domain experts and documents are categorized according to query. Two experimentations have done for each of twenty queries. First experimentation was done searching with initial user query (without query expansion). The second experiment has been done searching with expanded query (with query expansion). The performance measurements have been calculated using common efficiency measurement units such as precision, recall, and F-measure. During the first experimentation, we recorded the results for each of 20 initial queries and obtained the average precision values of 67.86%, average recall value of 66.53%, and average F-measure value of 65.66%. The second experimentation has been performed using manual query expansion and obtained results were 75.73% average precision, 96.08% average recall and 83.68% average F-measure. As we can see the average results of the two experimentations, a significant improvement has been recorded in the second experimentation (manual query expansion-based searching). An average precision, recall and f measure values are increased by 7.87%, 29.55%, 18.02% respectively in the second experimentation than the searching results in the first experimentation. Greater improvement has been seen in recall value, which indicates that almost all relevant documents in the corpus have been successfully retrieved during query expansion searching. Finally, we can conclude that, the proposed manual query expansion-based searching results in greater improvement than searching without query expansion. But the lack of rule based stemming algorithm was the main issue that diminish the performance, in future need further studies
AMHARIC MULTI-HOP QUESTION ANSWERING IN HISTORICAL TEXTS: A DEEP LEARNING APPROACH
(Hawassa University, 2024-07-03) BEREKET ENDALE
In our daily lives, questioning is the most effective way to gain knowledge. However, manual extraction of answers is time-consuming and requires expertise in the field. As a result, implementing fully question answering could accelerate extraction times and reduce the requirement for human labour. Numerous studies have been done on question answering in full resource languages like English, and others using various recent techniques. However, unlike previous research, which concentrated exclusively on single hop question answering, this thesis proposes the concept of multi-hop question answering in Amharic. Until yet, no studies have investigated multi-hop question answering in the context of the Amharic language, which includes reasoning over numerous pieces of evidence or documents to generate an answer. Furthermore, there is no existing question answering data set to address these issues; therefore, this study used deep learning for the Amharic multi-hop question answering problem, a neural network method. To do this, we preprocess our dataset using tokenization, normalization, stop word removal, and, padding before feeding it to a deep learning model such as CNN, LSTM, and Bi-LSTM to create question type classification based on the given input. Because there is no multi-hop Question answering training dataset in Amharic, training data must be created manually, which is time-consuming and tedious. It is around 1500 questions and contexts associated with five classes. The class depicts as ((0) for factoid_date, (1) for factoid_person, (2) for factoid_location, and (3) for factoid_organization. Accuracy, precision, the F-measure, and the confusion matrix are performance metrics used to evaluate the model's overall efficiency when applied to the provided dataset. According to performance measurements, the maximum achievable accuracy rates for this study's LSTM, CNN, and Bi-LSTM were 96%, 96.38%, and 97.04%, respectively. The findings indicated that the suggested Bi LSTM outperformed the other two models in terms of Amharic multi-hop questions type classification.
Academic Performance Prediction Model for Teacher's Training Colleges Using Machine learning Approach
(Hawassa University, 2020-08-19) Firehiwot Getachew
Data mining is the process of extracting novel or previously unknown information from a large amount of data. The purpose of this study is to develop an academic performance prediction model and identifying the factors that affect academic performance of college student using data mining techniques. The data used for this study are 1023 active students from HCTE in 2018/19 academic year. For the consumption of this research, both primary and secondary data was used. Primary data such as age, gender, previous high school, department, library usage, study hours, sport interest, mother education, father education, time spent in social media, family support and economic status of family is collected by means of questionnaire. Secondary data was obtained from the HCTE registrar office. The prediction model was developed using multilayer perceptron (MLP) classification algorithm, Naive Bayes and J48 and correlation based feature selection (CFS) is applied to identify the predictive attributes of academic performance. Finally, Multilayer perceptron, Naive Bayes and J48 is compared using the same dataset. According to the result of the experiments, Multilayer perceptron using all attributes with test method of 10-fold cross validation and accuracy 60.6% gives better result compared to Naive Bayes, J48 and MLP after applying attribute selection. The study findings also showed that sex of the student, total courses credit hours taken by the students, study hours, assignment performance and library usage of the students are identified as a significant factor affecting academic performance. WEKA 3.8.1 tool was used for data mining process.
AMHARIC EXTRACTIVE TEXT SUMMARIZATION USING AmRoBERTa –BiLSTM MODEL
(Hawassa University, 2024-04-14) EDEN AHMED
Extractive text summarization is a crucial task in natural language processing, allowing users to quickly grasp the main ideas of lengthy documents. The manual summarization process is often labor-intensive and time-consuming. As the volume of information in the Amharic language continues to grow, the need for effective summarization systems has become essential. While various summarization techniques have been developed for multiple languages, research specifically focused on Amharic remains limited. Most existing studies rely on traditional methods that often lack of contextual embeddings, which are crucial for understanding the meaning within the text. Additionally, current approaches often struggle to capture long-range dependencies among sentences and none of the existing studies have utilized hybrid deep models, which have demonstrated state of-the-art performance in summarization tasks for other languages. This study addresses the challenge of extractive text summarization for Amharic news articles by proposing a hybrid deep learning model that combines the contextual understanding of AmRoBERTa with the sequential processing capabilities of Bidirectional Long Short-Term Memory. A dataset of 1,200 Amharic news articles, covering a variety of topics, was collected. Each article was segmented into sentences and labeled by experts to indicate their relevance for summarization. Preprocessing was conducted, including normalization and tokenization using AmRoBERTa, to prepare the data for modeling. The proposed model was trained using various hyperparameter configurations and optimization techniques. Its effectiveness was evaluated using ROUGE metrics. The results demonstrate that our model achieved significant performance, with a ROUGE-1 score of 44.48, a ROUGE-2 score of 34.73, and a ROUGE-L score of 44.47
CLASSIFYING EFFECT OF E-BANKING SERVICE ON DEPOSIT MOBILIZATION USING MACHINE-LEARNING TECHNIQUES
(Hawassa University, 2024-10-03) BALCHA BEKELE
Identifying services that are more likely potential to E-banking product offering is an important issue. Cooperative Bank of Oromia S.C., being one of the former private banks in Ethiopia is offering E-Banking products. The main objective of this study is to apply machine learning algorithms for developing Deposit mobilization Performance prediction Model that forecast potential of E-banking channel service in Cooperative Bank of Oromia. This research follows experimental research. For modelling purpose, data was gathered from the institution head office. Since irrelevant features result in bad model performance, data pre processing was performed in order to determine the inputs to the model. This thesis investigates the creation and assessment of six machine learning algorithms to forecast deposit behavior from customers: CART, SVM, KNN, Naïve Bayes, Logistic Regression and Random Forest. Cross tables were used to show the results of precision calculations and confusion matrices used to evaluate the performance of these models. With an emphasis on the relevance of various attributes in predicting customer deposits, the suitability of various classification algorithms, the relative effectiveness of ensemble versus base learning models, and forecasting based on influential attributes, the study tackled three main research questions. Experimental results exhibit that, the ensemble learning model achieved 98.496% accuracy in categorizing deposits, outperforming individual algorithms like KNN (98.491%) and SVM (98.401%), emphasizing the superiority of ensemble methods for deposit mobilization prediction. Random Forest Classifier identified "other_debit," "gender," and "mobile banking" as the most significant predictors of deposit mobilization, with relevance scores of 20%, 18%, and 13% respectively. Moderately important features included "mobile_credit", "mobile_debit", "card_debit", and "marital_status", while "atm_card" and "other_credit" were negligible. Finally, this thesis shows the effectiveness of machine learning in financial prediction by offering a thorough comparison of six popular categorization methods. The result offer valuable insights for enhancing customer deposit strategies at CBO and potentially other banking institutions
ASPECT BASED SENTIMENT ANALYSIS FOR AFAAN OROMOO TEXT USING BERT
(Hawassa University, 2024-08-14) FETIYA FURI
Aspect-based sentiment analysis (ABSA) is a more important and advanced task of sentiment analysis which determine both the sentiments and the aspects within the text. It is an essential research field within natural language processing, especially for languages that lack extensive resources. This study focuses on developing an ABSA model for Afaan Oromoo language, one of the widely spoken languages in Ethiopia. Despite the rich linguistic diversity of Afaan Oromoo, there is a scarcity of computational tools and datasets for sentiment analysis in this language. Our research addresses this gap by creating a comprehensive dataset annotated with BIO annotation scheme for aspect terms and integrates CNN and BiLSTM for aspect extraction, and BERT for aspect sentiment classification. We fine-tuned pre-trained BERT model on our annotated Afaan Oromoo dataset to perform aspect based sentiment analysis. The total of 2550 review text collected from FBC Afaan Oromoo Facebook page, BBC Afaan Oromoo and other relevant social media are used for this study. After data collection, two annotators’ annotated data manually into three classes (i.e., positive, negative and neutral). The aspect terms used for study are extracted from three domain, coffee, gold and flower. Basically ten aspect terms namely (qulqullinna bunaa, oomisha bunaa, foolii, dandhama, worqee baasuu, galii, gatii, diinagdee, agarsiisa worqee and al-ergii) are used for the study. CNN-BiLSTM is used for aspect extraction and performed 92.8% of accuracy. BERT model performed accuracy of 87% for aspect sentiment classification. This work not only contributes to the development of sentiment analysis for Afaan Oromoo but also provides a framework for applying advanced NLP techniques to other low-resource languages
AUTOMATIC FISH SPECIES IDENTIFICATION USING DEEP LEARNING TECHNIQUE
(Hawassa University, 2023-03-17) HABTAMUA ZERIHUN
In recent years, the growing global population has led to an increased demand for animal protein, including fish and other aquatic products. Aquaculture has emerged as a primary method for meeting this demand. There is a need for reliable and accurate methods to identify fish species. However, the accurate identification of fish species remains a challenge as there are various fish species endemic to different regions. This research focuses on addressing this challenge by developing a system for automatic fish species identification using deep learning technique, with a specific emphasis on convolutional neural network (CNN). To accomplish the objective of the research, fish species images were collected from Lake Hawassa. The collected dataset was certified by domain experts from the Centre for Aquaculture Research and Education (CARE) at Hawassa University. A custom dataset was prepared, consisting of a total of 6000 images of six fish species: Oreochromis niloticus, Clarias garipienus, LabeoBarbus intermedius, Barbus paludinosis, Garra quadrimaculata, and Aplocheilichthys. The proposed system for fish species identification implements a preprocessing module that involves image resizing and pixel value normalization to ensure uniformity and enhance training performance. Data augmentation techniques were utilized to generate diverse training examples. For classification, convolutional neural network (CNN) is employed, either trained using Convolutional neural network (CNN) architectures or utilizing pre-trained models such as Inceptionv3, VGG16, and ResNet50. Evaluation metrics were employed with two different dataset ratios: 70/30 and 80/20 and also three pre-trained models were used for comparison. The results demonstrate that our proposed model 70/30 ratio outperforms the pre-trained models in terms of training, testing accuracy, as well as loss. Our model achieved a training accuracy of 100%, validation accuracy of 99.7% and a testing accuracy of 99.5%, indicating better learning and classification capabilities. Additionally, the model achieved a recall, precision and f1 score of 100%. This research contributes to the field of fish species identification. By leveraging deep learning techniques, Particularly CNN, our model achieves better accuracy in automatic fish species identification. It reduces reliance on expert skills, addresses unresolved problems, and contributes to the progress of accurate fish species identification

Computer Science

Browse

Filters

Settings

Sort By

Results per page

Search Results