Institute of Technology

Permanent URI for this communityhttps://etd.hu.edu.et/handle/123456789/66

The Institute of Technology focuses on education, research, and innovation in engineering, technology, and applied sciences to support sustainable development.

Browse

Search Results

Now showing 1 - 10 of 25
  • Item
    BI-DIRECTIONAL NEURAL MACHINE TRANSLATION FOR
    (Hawassa Unversity, 2023-07-07) ANDU ALEMAYEHU
    Machine translation is one of the Natural Language Processing applications, which enables the translation of text from one natural language to another. This study aimed to design and develop a bidirectional English-Sidaamu Afoo neural machine translation system, as the need system has become increasingly important due to the growing number of language users, it needs to increase its presence on the web, For effective communication and information sharing, translation of various official documents, news articles, and other written texts in both languages is necessary and last to need integrating the other high-level NLP tools, but no prior solution in this area. Recently, Neural Machine Translation has emerged as a promising approach to machine translation, delivering state-of-the-art translation quality. Unlike traditional machine translation methods, NMT uses a single neural network that can be continuously fine-tuned to improve translation performance. This study aimed to develop a bidirectional Sidaamu Afoo-English machine translation system using deep learning techniques, specifically LSTM and Transformer models. In an attempt to do this study, due to un availability of parallel data for machine translation, we opted to collect parallel data from a religious domain, specifically from Bible and Sidaamu Afoo conversation. After gathering the data, experiments were conducted using 15,000 parallel sentences from different domains. To determine the optimal model, the efficiency in terms of training time, memory usage, and BLEU score was evaluated. The results showed that the Transformer model yielded the best results, with a BLEU score of 0.413 for Sidaamu Afoo to English translation and 0.465 for English to Sidaamu Afoo translation. Future work to enhance the performance of the system could include further research and the addition of more clean data and larger corpus sizes.
  • Item
    BI-DIRECTIONAL NEURAL MACHINE TRANSLATION FOR
    (Hawassa University, 2023-07-12) ANDU ALEMAYEHU
    Machine translation is one of the Natural Language Processing applications, which enables the translation of text from one natural language to another. This study aimed to design and develop a bidirectional English-Sidaamu Afoo neural machine translation system, as the need system has become increasingly important due to the growing number of language users, it needs to increase its presence on the web, For effective communication and information sharing, translation of various official documents, news articles, and other written texts in both languages is necessary and last to need integrating the other high-level NLP tools, but no prior solution in this area. Recently, Neural Machine Translation has emerged as a promising approach to machine translation, delivering state-of-the-art translation quality. Unlike traditional machine translation methods, NMT uses a single neural network that can be continuously fine-tuned to improve translation performance. This study aimed to develop a bidirectional Sidaamu Afoo-English machine translation system using deep learning techniques, specifically LSTM and Transformer models. In an attempt to do this study, due to un availability of parallel data for machine translation, we opted to collect parallel data from a religious domain, specifically from Bible and Sidaamu Afoo conversation. After gathering the data, experiments were conducted using 15,000 parallel sentences from different domains. To determine the optimal model, the efficiency in terms of training time, memory usage, and BLEU score was evaluated. The results showed that the Transformer model yielded the best results, with a BLEU score of 0.413 for Sidaamu Afoo to English translation and 0.465 for English to Sidaamu Afoo translation. Future work to enhance the performance of the system could include further research and the addition of more clean data and larger corpus sizes
  • Item
    Developing Koorete Part of Speech (POS) Tagger: an Empirical Evaluation of Neural Word Embedding and N-Gram Based Statistical Approaches
    (Hawassa University, 2021-12-12) Agegnehu Ashenafi
    The Koorete language is spoken by the Koore people in Amaro Kele Special Woreda and in four Kebeles of Burji Special Woreda, Southern regional state. Koorete is written with Latin alphabets (or called ‗Diizo Beyta’ in Koorete language). This means, Latin alphabet is adopted to the language by adding additional combinations of letters for peculiar sounds totaling to 31- consonants (‗Artaxita’ in Koorete), 5-vowels (‗Arxaxita’ in Koorete), and one more symbol. The syntax of Koorete sentence structure is “Subject (Zeere utaade) + Object (efaxe) + Verb (Hanta beyiisaxe)”. This study develops Koorete POS Tagger using the empirical evaluation of Neural Word Embedding and N-gram based statistical POS tagging approaches. Parts-of-speech (POS) tagging is the process of assigning part-of-speech labels/tags to each word from Koorete POS tagset. Neural word embedding are distributed representations of words into vectors applying Bi-LSTM RNN model. N-gram based statistical approach uses probability frequencies of sequence labeling of words from the KPT corpus. Words having similar meanings can be represented similarly, which enable deep learning methods. The behavior of having similar representation orients to the reduction of out-of-vocabulary impact. This means, binary vector |V| dimension reduction. In simple language, word embedding is a language modeling technique which maps words to vectors using Word2Vec package, and would be computed in RNN. This Word2Vec package converts words to arrays of real numbers, and concatenate the original corpus word categories to the generated vectors. Word2Vec has a capability of capturing context of a word (semantic and syntactic similarity) in a document in relation with other words. For the purpose of sequence labeling method and distributed representation, this study uses Bi LSTM RNN by achieving the state-of-the-art POS tagging accuracy and N-gram based statistics approaches in contrast to the more classic approaches. Bi-LSTM handles or adds letter case functions to keep the original letter case information of word. This study applies skip-gram algorithm to encode words into a limited vector space. Because skip-gram model is efficient method for learning high quality vector representations of words from large amounts of unstructured text data. So experiments were practiced on Bi-LSTM RNN model, and N-gram tagger statistical approach. For this, KPT corpus is used about size of 1718 sentences (33220 words), and then divided this corpus into 90% training data and 10% testing data. The experiment on Bi-LSTM RNN word embedding POS tagging approach did better than the N-gram statistical POS tagging approach with the accuracy of 98.53%. Hence, this study solves the problems of (1) no rich resource in NLP applications, (2) Koorete language not having its own KPT corpus and tagsets for NLP applications, (3) the state-of-the art tagging performance algorithms accuracy with other relative languages POS tagging model.
  • Item
    INVESTIGATION OF MACHINE LEARNING MODELS FOR FOODBORNE DISEASE CLASSIFICATION
    (Hawassa University, 2024-11) WULETAWU IYASU FARACHO
    Foodborne disease is a disease that has a high prevalence in low and middle income countries around the world. There are many people affected by foodborne disease in Ethiopia, due to various causes. There are high burdens of infection; the control of most foodborne diseases in Ethiopia is in its infancy due to a lack of technology that can classify foodborne diseases easily in order to support healthcare professionals for better diagnoses. There is a lack of study conducted to classify the foodborne diseases which are common in Ethiopia. It is in view of this facts, the study aims to undertake an investigation on the topic and fill the research gap observed using machine learning model which can learn from past data, identify patterns and make decisions with a minimal human intervention. These applications in the healthcare and biomedical domain are popular for the early detection of diseases and help to make a better diagnosis. This study focuses on foodborne diseases, some of the prevalent foodborne illnesses in Ethiopia, selected in consultation with medical experts. To achieve the objective of the study the researcher used an experimental research design and mixed research approach (both quantitative and qualitative). For this study, secondary data of foodborne diseases were collected from Hospitals, and to perform most of the research activities such as data pre-processing, analysis, model training, and testing, python programming is used, and to design a conceptual model, Edraw max is used based on its good features. After preprocessing the collected data, the researcher trained a model using state-of- art machine learning algorithms like Decision Tree, Random Forest, XGBoost and Stacking ensemble learning method. Based on the experiment conducted, the Stacking ensemble learning method model outperforms the others with an accuracy of 98.1%, followed by Random Forest, XGBoost, and Decision Tree with accuracy of 97.5%, 96.9%, and 96.5% respectively. The result obtained by the study depicts that, the Stacking ensemble learning model is suitable for diseases classification.
  • Item
    KEYSTROKE DYNAMICS BASED MULTI-FACTOR AUTHENTICATION USING MACHINE LEARNING
    (Hawassa University, 2024-11) MESERET DEGEFI
    User authentication is a vital part of securing digital services and preventing unauthorized users from gaining access to the system. Nowadays, organizations use Multi-Factor Authentication (MFA) to provide robust protection by utilizing two or more identity procedures instead of using Single Factor Authentication (SFA) which became less secure. Keystroke dynamics is a behavioural biometric that examines a user’s typing rhythm to determine the subject’s legitimacy using the system. Keystroke dynamics have a minimal implementation cost and do not need special hardware in the authentication process since the gathering of typing data is reasonably straightforward and does not involve additional effort from the user. In this research we used the CMU fixed benchmark data set of 20400 sizes which is used for keystroke dynamics. The data set collects 51 users’ keystroke dynamics information where each user typed the same password. .tie5Roanl 400 times over 8 sessions and there are 50 repetitions in each session. We tested four different machine learning algorithms: Random Forest, Support Vector Machines, Multi-Layer Perceptron and Extra Trees, to determine which algorism is most effective on accuracy. We also tested these four algorithms with respect to Accuracy, Precision, Recall and F1 score evaluation matrix to compare the performance. The random forest classifier scores extremely high accuracy (99.19%) and with these final results, we can determine what method of machine learning is most effective at accurately authenticating users.
  • Item
    AMHARIC EXTRACTIVE TEXT SUMMARIZATION USING AmRoBERTa –BiLSTM MODEL
    (Hawassa University, 2024-05) EDEN AHMED
    Extractive text summarization is a crucial task in natural language processing, allowing users to quickly grasp the main ideas of lengthy documents. The manual summarization process is often labor-intensive and time-consuming. As the volume of information in the Amharic language continues to grow, the need for effective summarization systems has become essential. While various summarization techniques have been developed for multiple languages, research specifically focused on Amharic remains limited. Most existing studies rely on traditional methods that often lack of contextual embeddings, which are crucial for understanding the meaning within the text. Additionally, current approaches often struggle to capture long-range dependencies among sentences and none of the existing studies have utilized hybrid deep models, which have demonstrated state of-the-art performance in summarization tasks for other languages. This study addresses the challenge of extractive text summarization for Amharic news articles by proposing a hybrid deep learning model that combines the contextual understanding of AmRoBERTa with the sequential processing capabilities of Bidirectional Long Short-Term Memory. A dataset of 1,200 Amharic news articles, covering a variety of topics, was collected. Each article was segmented into sentences and labeled by experts to indicate their relevance for summarization. Preprocessing was conducted, including normalization and tokenization using AmRoBERTa, to prepare the data for modeling. The proposed model was trained using various hyperparameter configurations and optimization techniques. Its effectiveness was evaluated using ROUGE metrics. The results demonstrate that our model achieved significant performance, with a ROUGE-1 score of 44.48, a ROUGE-2 score of 34.73, and a ROUGE-L score of 44.47.
  • Item
    COLLABORATIVE APPROACH OF AGILE AND DEVOPS FOR CONTINUOUS DELIVERY OF QUALITY SOFTWARE
    (Hawassa University, 2023-08) DESSALEGN MENGESHA
    We are in the era of high demand for quality software in many organizations in order to achieve their organizational goals. Many organizations around the globe have shown great interest in the automation of their business processes. This in turn causes emerging and improvement of different software development methodologies and the way of service provision dramatically. Among those methodologies, Agile Software Development Methodologies and DevOps culture/tool have become more popular due to their capability on supporting rapid software development, continuous integration, and continuous delivery. Even though the two methodologies are complementary and have their own significant role in the software development lifecycle, using the two approaches independently will not bring development process improvement to the optimum level. Contextualizing the software development process enables the practitioners to improve their development process and for better productivity. The objective of this thesis work is to integrate the two approaches together with minor modifications to the DevOps team structure by extending the role of the DevOps team to the development environment. The research is conducted as experimental research and the evaluation was done by using two working projects, one using classical Agile as a control group and the other by integrated approach of Agile and DevOps as an experimental group. The number of changes accepted and developed and the number of deliveries in a specific period of time are used as measurement parameters. The experiment was done using students who joined Hawassa University Application Development Team for practical attachments. The findings of the experiment demonstrate that the experimental group project, which utilized agile methodologies in conjunction with DevOps practices, achieved superior outcomes compared to the control group project, which relied on the department's standard Agile/Scrum approach. This improvement was evident in metrics such as accepted changes and committed deliveries. Furthermore, the guideline applied to the experimental group project was refined and is included in this paper to serve as a valuable resource for future researchers and developers.
  • Item
    MORPHOLOGICALANALYSISFORAFAANOROMOOUSING DEEPLEARNINGAPPROACHES
    (Hawassa University, 2024-08) BOKICHELKEBACHALI
    Afaan Oromoo, a widely spoken language in Ethiopia and neighbouring countries, presents unique challenges due to its complex morphological structure. Morphological analysis, which decomposes words into morphemes and assigns grammatical information, is a crucial natural language processing (NLP) task for this language. Previously some researchers conducted Afaan Oromoo morphological analysis using rule-based and traditional machine learning techniques. Rule-based methods are labour-intensive and time-consuming, especially with large datasets, while traditional machine learning approaches struggle with feature extraction and high-dimensional vector spaces, leading to information loss. This study addresses these challenges by employing deep learning architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), Gated Recurrent Units (GRUs), and Bidirectional LSTMs (BiLSTMs) which are not applied for Afaan Oromoo morphological analysis yet. In the study, Comprehensive evaluations were conducted on a dataset consisting of 30,636 training words, 10,213 validation words, and 4,539 testing words. Performance metrics such as accuracy, precision, recall, and F1-score were used to evaluate the models. The evaluation results for each model are as follows: Normal CNN-LSTM with 70.94% accuracy, Word2Vec CNN-LSTM with 94.74% accuracy, Fast Text CNN-LSTM with 95.25% accuracy, Normal LSTM with 95.06% accuracy, Word2Vec LSTM with 93.89% accuracy, Fast Text LSTM with 90.02% accuracy, Normal GRU with 92.96% accuracy, Word2Vec GRU with 91.98% accuracy, Fast Text GRU with 91.32% accuracy, Normal BiLSTM with 95.24% accuracy, Word2Vec BiLSTM with 96.21% accuracy, and Fast Text BiLSTM with 96.43% accuracy. The Bidirectional LSTM (BiLSTM) models, particularly those using Word2Vec and Fast Text embeddings, demonstrated the highest accuracies, highlighting the effectiveness of deep learning approaches and neural word embedding techniques in Afaan Oromoo morphological analysis. This research not only advances the state-of-the-art in this domain but also provides a robust methodology for handling the morphological complexity of Afaan Oromoo using deep learning.
  • Item
    AMHARIC MULTI-HOP QUESTION ANSWERING IN HISTORICAL TEXTS: A DEEP LEARNING APPROACH
    (Hawassa University, 2024-11) BEREKET ENDALE
    In our daily lives, questioning is the most effective way to gain knowledge. However, manual extraction of answers is time-consuming and requires expertise in the field. As a result, implementing fully question answering could accelerate extraction times and reduce the requirement for human labour. Numerous studies have been done on question answering in full resource languages like English, and others using various recent techniques. However, unlike previous research, which concentrated exclusively on single hop question answering, this thesis proposes the concept of multi-hop question answering in Amharic. Until yet, no studies have investigated multi-hop question answering in the context of the Amharic language, which includes reasoning over numerous pieces of evidence or documents to generate an answer. Furthermore, there is no existing question answering data set to address these issues; therefore, this study used deep learning for the Amharic multi-hop question answering problem, a neural network method. To do this, we preprocess our dataset using tokenization, normalization, stop word removal, and, padding before feeding it to a deep learning model such as CNN, LSTM, and Bi-LSTM to create question type classification based on the given input. Because there is no multi-hop Question answering training dataset in Amharic, training data must be created manually, which is time-consuming and tedious. It is around 1500 questions and contexts associated with five classes. The class depicts as ((0) for factoid_date, (1) for factoid_person, (2) for factoid_location, and (3) for factoid_organization. Accuracy, precision, the F-measure, and the confusion matrix are performance metrics used to evaluate the model's overall efficiency when applied to the provided dataset. According to performance measurements, the maximum achievable accuracy rates for this study's LSTM, CNN, and Bi-LSTM were 96%, 96.38%, and 97.04%, respectively. The findings indicated that the suggested Bi LSTM outperformed the other two models in terms of Amharic multi-hop questions type classification.
  • Item
    WATER CONSUMPTION PREDICTION USING MACHINE LEARNING: THE CASE OF HAWASSA CITY WATER SUPPLY AND SEWAGE SERVICE ENTERPRISE
    (Hawassa University, 2024-11) MUSE KEBEDE MULATU
    Proper management of water consumption ensures a better clean and healthy community. Therefore, predicting water consumption gives time to prepare and protect the community from unseen natural or unknown disasters. Previous studies have implemented many prediction models in specific areas that showed promise but were not applicable in developing countries. The study was conducted to develop a prediction model for water consumption for the Hawassa City Water Supply and Sewerage Service Enterprise (HCWSSSE), a city in the Sidama region, Ethiopia. The enterprise experienced water shortages due to its way of prediction solely based on the previous month's consumption rate and needed to consider seasonal changes. The models developed in the study use machine learning techniques on five-year Monthly Consumption data from 2009-2015 E.C of the Ethiopian budget year, with around 16012 data points, and modeled by training 80%, validating 10%, and testing 10%. This study explores the application of various machine learning algorithms including Random Forest (RF), Support Vector Regressor (SVR), Linear Regression (LR), and XGBoost for predicting. The performance of models was evaluated using key error evaluation metrics Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R²). For the Models, their R2 rates for training, validation, and testing were Random Forest (RF) 97.23%, 97.24%, and 97.22%, Linear Regression (LR) 78.18%, 78.38%, and 77.98%, Support Vector Regressor (SVR) 79.37%, 79.92%, and 78.81% and XGBoost 97.08%, 97.07%, and 97.08% respectively. The Random Forest (RF) and XGBoost showed promise in prediction, they demonstrated effectiveness in handling complex datasets. Specifically, Random Forest (RF) offered better predictions with reduced risk of overfitting. The successful application of RF and XGBoost highlights the importance of leveraging machine learning for sustainable water management in an era of growing demand and climate variability.