Institute of Technology

Permanent URI for this communityhttps://etd.hu.edu.et/handle/123456789/66

Browse

Search Results

Now showing 1 - 10 of 22
  • Item
    INVESTIGATION OF MACHINE LEARNING MODELS FOR FOODBORNE DISEASE CLASSIFICATION
    (Hawassa University, 2024-11) WULETAWU IYASU FARACHO
    Foodborne disease is a disease that has a high prevalence in low and middle income countries around the world. There are many people affected by foodborne disease in Ethiopia, due to various causes. There are high burdens of infection; the control of most foodborne diseases in Ethiopia is in its infancy due to a lack of technology that can classify foodborne diseases easily in order to support healthcare professionals for better diagnoses. There is a lack of study conducted to classify the foodborne diseases which are common in Ethiopia. It is in view of this facts, the study aims to undertake an investigation on the topic and fill the research gap observed using machine learning model which can learn from past data, identify patterns and make decisions with a minimal human intervention. These applications in the healthcare and biomedical domain are popular for the early detection of diseases and help to make a better diagnosis. This study focuses on foodborne diseases, some of the prevalent foodborne illnesses in Ethiopia, selected in consultation with medical experts. To achieve the objective of the study the researcher used an experimental research design and mixed research approach (both quantitative and qualitative). For this study, secondary data of foodborne diseases were collected from Hospitals, and to perform most of the research activities such as data pre-processing, analysis, model training, and testing, python programming is used, and to design a conceptual model, Edraw max is used based on its good features. After preprocessing the collected data, the researcher trained a model using state-of- art machine learning algorithms like Decision Tree, Random Forest, XGBoost and Stacking ensemble learning method. Based on the experiment conducted, the Stacking ensemble learning method model outperforms the others with an accuracy of 98.1%, followed by Random Forest, XGBoost, and Decision Tree with accuracy of 97.5%, 96.9%, and 96.5% respectively. The result obtained by the study depicts that, the Stacking ensemble learning model is suitable for diseases classification.
  • Item
    KEYSTROKE DYNAMICS BASED MULTI-FACTOR AUTHENTICATION USING MACHINE LEARNING
    (Hawassa University, 2024-11) MESERET DEGEFI
    User authentication is a vital part of securing digital services and preventing unauthorized users from gaining access to the system. Nowadays, organizations use Multi-Factor Authentication (MFA) to provide robust protection by utilizing two or more identity procedures instead of using Single Factor Authentication (SFA) which became less secure. Keystroke dynamics is a behavioural biometric that examines a user’s typing rhythm to determine the subject’s legitimacy using the system. Keystroke dynamics have a minimal implementation cost and do not need special hardware in the authentication process since the gathering of typing data is reasonably straightforward and does not involve additional effort from the user. In this research we used the CMU fixed benchmark data set of 20400 sizes which is used for keystroke dynamics. The data set collects 51 users’ keystroke dynamics information where each user typed the same password. .tie5Roanl 400 times over 8 sessions and there are 50 repetitions in each session. We tested four different machine learning algorithms: Random Forest, Support Vector Machines, Multi-Layer Perceptron and Extra Trees, to determine which algorism is most effective on accuracy. We also tested these four algorithms with respect to Accuracy, Precision, Recall and F1 score evaluation matrix to compare the performance. The random forest classifier scores extremely high accuracy (99.19%) and with these final results, we can determine what method of machine learning is most effective at accurately authenticating users.
  • Item
    AMHARIC EXTRACTIVE TEXT SUMMARIZATION USING AmRoBERTa –BiLSTM MODEL
    (Hawassa University, 2024-05) EDEN AHMED
    Extractive text summarization is a crucial task in natural language processing, allowing users to quickly grasp the main ideas of lengthy documents. The manual summarization process is often labor-intensive and time-consuming. As the volume of information in the Amharic language continues to grow, the need for effective summarization systems has become essential. While various summarization techniques have been developed for multiple languages, research specifically focused on Amharic remains limited. Most existing studies rely on traditional methods that often lack of contextual embeddings, which are crucial for understanding the meaning within the text. Additionally, current approaches often struggle to capture long-range dependencies among sentences and none of the existing studies have utilized hybrid deep models, which have demonstrated state of-the-art performance in summarization tasks for other languages. This study addresses the challenge of extractive text summarization for Amharic news articles by proposing a hybrid deep learning model that combines the contextual understanding of AmRoBERTa with the sequential processing capabilities of Bidirectional Long Short-Term Memory. A dataset of 1,200 Amharic news articles, covering a variety of topics, was collected. Each article was segmented into sentences and labeled by experts to indicate their relevance for summarization. Preprocessing was conducted, including normalization and tokenization using AmRoBERTa, to prepare the data for modeling. The proposed model was trained using various hyperparameter configurations and optimization techniques. Its effectiveness was evaluated using ROUGE metrics. The results demonstrate that our model achieved significant performance, with a ROUGE-1 score of 44.48, a ROUGE-2 score of 34.73, and a ROUGE-L score of 44.47.
  • Item
    COLLABORATIVE APPROACH OF AGILE AND DEVOPS FOR CONTINUOUS DELIVERY OF QUALITY SOFTWARE
    (Hawassa University, 2023-08) DESSALEGN MENGESHA
    We are in the era of high demand for quality software in many organizations in order to achieve their organizational goals. Many organizations around the globe have shown great interest in the automation of their business processes. This in turn causes emerging and improvement of different software development methodologies and the way of service provision dramatically. Among those methodologies, Agile Software Development Methodologies and DevOps culture/tool have become more popular due to their capability on supporting rapid software development, continuous integration, and continuous delivery. Even though the two methodologies are complementary and have their own significant role in the software development lifecycle, using the two approaches independently will not bring development process improvement to the optimum level. Contextualizing the software development process enables the practitioners to improve their development process and for better productivity. The objective of this thesis work is to integrate the two approaches together with minor modifications to the DevOps team structure by extending the role of the DevOps team to the development environment. The research is conducted as experimental research and the evaluation was done by using two working projects, one using classical Agile as a control group and the other by integrated approach of Agile and DevOps as an experimental group. The number of changes accepted and developed and the number of deliveries in a specific period of time are used as measurement parameters. The experiment was done using students who joined Hawassa University Application Development Team for practical attachments. The findings of the experiment demonstrate that the experimental group project, which utilized agile methodologies in conjunction with DevOps practices, achieved superior outcomes compared to the control group project, which relied on the department's standard Agile/Scrum approach. This improvement was evident in metrics such as accepted changes and committed deliveries. Furthermore, the guideline applied to the experimental group project was refined and is included in this paper to serve as a valuable resource for future researchers and developers.
  • Item
    MORPHOLOGICALANALYSISFORAFAANOROMOOUSING DEEPLEARNINGAPPROACHES
    (Hawassa University, 2024-08) BOKICHELKEBACHALI
    Afaan Oromoo, a widely spoken language in Ethiopia and neighbouring countries, presents unique challenges due to its complex morphological structure. Morphological analysis, which decomposes words into morphemes and assigns grammatical information, is a crucial natural language processing (NLP) task for this language. Previously some researchers conducted Afaan Oromoo morphological analysis using rule-based and traditional machine learning techniques. Rule-based methods are labour-intensive and time-consuming, especially with large datasets, while traditional machine learning approaches struggle with feature extraction and high-dimensional vector spaces, leading to information loss. This study addresses these challenges by employing deep learning architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), Gated Recurrent Units (GRUs), and Bidirectional LSTMs (BiLSTMs) which are not applied for Afaan Oromoo morphological analysis yet. In the study, Comprehensive evaluations were conducted on a dataset consisting of 30,636 training words, 10,213 validation words, and 4,539 testing words. Performance metrics such as accuracy, precision, recall, and F1-score were used to evaluate the models. The evaluation results for each model are as follows: Normal CNN-LSTM with 70.94% accuracy, Word2Vec CNN-LSTM with 94.74% accuracy, Fast Text CNN-LSTM with 95.25% accuracy, Normal LSTM with 95.06% accuracy, Word2Vec LSTM with 93.89% accuracy, Fast Text LSTM with 90.02% accuracy, Normal GRU with 92.96% accuracy, Word2Vec GRU with 91.98% accuracy, Fast Text GRU with 91.32% accuracy, Normal BiLSTM with 95.24% accuracy, Word2Vec BiLSTM with 96.21% accuracy, and Fast Text BiLSTM with 96.43% accuracy. The Bidirectional LSTM (BiLSTM) models, particularly those using Word2Vec and Fast Text embeddings, demonstrated the highest accuracies, highlighting the effectiveness of deep learning approaches and neural word embedding techniques in Afaan Oromoo morphological analysis. This research not only advances the state-of-the-art in this domain but also provides a robust methodology for handling the morphological complexity of Afaan Oromoo using deep learning.
  • Item
    AMHARIC MULTI-HOP QUESTION ANSWERING IN HISTORICAL TEXTS: A DEEP LEARNING APPROACH
    (Hawassa University, 2024-11) BEREKET ENDALE
    In our daily lives, questioning is the most effective way to gain knowledge. However, manual extraction of answers is time-consuming and requires expertise in the field. As a result, implementing fully question answering could accelerate extraction times and reduce the requirement for human labour. Numerous studies have been done on question answering in full resource languages like English, and others using various recent techniques. However, unlike previous research, which concentrated exclusively on single hop question answering, this thesis proposes the concept of multi-hop question answering in Amharic. Until yet, no studies have investigated multi-hop question answering in the context of the Amharic language, which includes reasoning over numerous pieces of evidence or documents to generate an answer. Furthermore, there is no existing question answering data set to address these issues; therefore, this study used deep learning for the Amharic multi-hop question answering problem, a neural network method. To do this, we preprocess our dataset using tokenization, normalization, stop word removal, and, padding before feeding it to a deep learning model such as CNN, LSTM, and Bi-LSTM to create question type classification based on the given input. Because there is no multi-hop Question answering training dataset in Amharic, training data must be created manually, which is time-consuming and tedious. It is around 1500 questions and contexts associated with five classes. The class depicts as ((0) for factoid_date, (1) for factoid_person, (2) for factoid_location, and (3) for factoid_organization. Accuracy, precision, the F-measure, and the confusion matrix are performance metrics used to evaluate the model's overall efficiency when applied to the provided dataset. According to performance measurements, the maximum achievable accuracy rates for this study's LSTM, CNN, and Bi-LSTM were 96%, 96.38%, and 97.04%, respectively. The findings indicated that the suggested Bi LSTM outperformed the other two models in terms of Amharic multi-hop questions type classification.
  • Item
    WATER CONSUMPTION PREDICTION USING MACHINE LEARNING: THE CASE OF HAWASSA CITY WATER SUPPLY AND SEWAGE SERVICE ENTERPRISE
    (Hawassa University, 2024-11) MUSE KEBEDE MULATU
    Proper management of water consumption ensures a better clean and healthy community. Therefore, predicting water consumption gives time to prepare and protect the community from unseen natural or unknown disasters. Previous studies have implemented many prediction models in specific areas that showed promise but were not applicable in developing countries. The study was conducted to develop a prediction model for water consumption for the Hawassa City Water Supply and Sewerage Service Enterprise (HCWSSSE), a city in the Sidama region, Ethiopia. The enterprise experienced water shortages due to its way of prediction solely based on the previous month's consumption rate and needed to consider seasonal changes. The models developed in the study use machine learning techniques on five-year Monthly Consumption data from 2009-2015 E.C of the Ethiopian budget year, with around 16012 data points, and modeled by training 80%, validating 10%, and testing 10%. This study explores the application of various machine learning algorithms including Random Forest (RF), Support Vector Regressor (SVR), Linear Regression (LR), and XGBoost for predicting. The performance of models was evaluated using key error evaluation metrics Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R²). For the Models, their R2 rates for training, validation, and testing were Random Forest (RF) 97.23%, 97.24%, and 97.22%, Linear Regression (LR) 78.18%, 78.38%, and 77.98%, Support Vector Regressor (SVR) 79.37%, 79.92%, and 78.81% and XGBoost 97.08%, 97.07%, and 97.08% respectively. The Random Forest (RF) and XGBoost showed promise in prediction, they demonstrated effectiveness in handling complex datasets. Specifically, Random Forest (RF) offered better predictions with reduced risk of overfitting. The successful application of RF and XGBoost highlights the importance of leveraging machine learning for sustainable water management in an era of growing demand and climate variability.
  • Item
    IOT-BOTNETASSISTEDDDOSATTACKDETECTIONAND CLASSIFICATIONUSINGGRAPHMACHINELEARNINGAPPROACH
    (Hawassa University, 2024-11) MELAKUABRIHAM
    Distributed denial-of-service (DDoS) attacks are a major threat on the Internet, especially with the increasing use of the Internet of Things (IoTs). The rise of IoT-based botnets has made DDoS attacks even more common and dangerous. In response to this issue, researchers have developed various DDoS attack detection models for IoT networks, but there is still a need for new techniques to combat these evolving threats. In this study, we proposed a model that utilizes Graph Neural Networks (GNNs) to analyze network flow data and detect and classify attack traffic in IoT networks. We conducted experiments using the CIC-BoT-IoT and CICIoT2023 datasets, which contain both normal and attack network traffic. We preprocessed the data, applied the SMOTE technique to address imbalanced data, and constructed a graph structure using the training and test datasets. Our model leveraged the natural structure of network information to classify network traffic, particularly focusing on IoT botnet DDoS attacks. The evaluation results of our proposed classifier demonstrated high accuracy, with a score of 99.14% using the CIC-BoT-IoT dataset and 99.39% using the CICIoT2023 dataset. The F1 score, recall rate, and AUC ROC also showed good performance, indicating the effectiveness of our model in detecting IoT botnet DDoS attacks. These results suggest that our algorithm surpasses existing methods and holds promise for enhancing IoT security in real-world applications.
  • Item
    ANALYSIS AND MODELING OF 5G NETWORK PERFORMANCE BASED ON RESPONSE TIME REDUCTION
    (Hawassa University, 2023-11) MEKASHA MEKURIA
    Despite the fact that 5G technology has the benefits of meeting all of the key requirements for a 5G system and understanding the secrets for attaining a reduced response time, which was the most dominating component in 5G, the globe had adequate bandwidth in earlier generations for daily usage. However, response time was not a major concern, but for today's applications such as VANET and ongoing online gaming, as well as for vertical industries accessibilities such as SDN (software-defined network), NFV (network function virtualization), URLLC (ultra-reliable low latency communication), backhaul connection, and control or location update information, response time is more crucial than output. To address the aforementioned challenges and gaps, the study have analyzed the numerologies to 5G NR(radio network) recognizing KPI for cellular system analysis based on human demands and technological efforts to fulfil purpose, and address the aforementioned challenges by using the 5G toolbox for techniques of simulating hidden 5G numerologies. The simulation results show that our proposed approach outperforms state-of-the-art techniques because it yields the highest probability in regarding the requirements from the access network in response time reduction. As a practical implication of the study, the researcher have realized that the adaptable subframe structure leads to a very low symbol duration, which enables low response time, as time critical applications increased, and that wider subcarrier spacing could be used for users to provide them with very low response time symbol duration. In the future work, the study planned to incorporate the channel modeling of the mmwave band was relatively complex; which does not have any perfect channel model, high capacity backhaul connectivity, for its challenging for the exponentially growing data demands of 5G and would be required more additional exploration in depth and spectrum and interference management due to the scarcity of the spectrum resources and interference issues, thus needs efficiently manage the 5G spectrum, hence should be to conduct comparative performance analysis.
  • Item
    Maize Crop Yield Prediction Using Machine Learning Techniques
    (Hawassa University, 2023-11) Kedija Abdurhman
    Maize is one of the main crops cultivated all throughout the world, including in Ethiopia. However, the production of maize changes widely based on many factors, such as weather, soil quality, and fertilizer usage. Predicting maize yields is crucial for farmers because it allows them to make informed crop management decisions. Machine learning approaches have shown promise in predicting crop productivity in recent years. The goal of this thesis is to explore the utilization of ensemble methods, namely Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Decision Trees (DT), in the context of maize yield prediction. Ensemble methods involve combining predictions from multiple models to enhance accuracy and fortify the reliability of the forecasting process. The dataset used in this study was compiled from data collected between 2003 and 2022 on several aspects such as weather, soil quality, and maize production. Before developing the ensemble techniques, the dataset was preprocessed and the features were normalized. The results of this research show that ensemble techniques can potentially be employed for predicting maize yields with great performance. The MAE was 0. 0025, the MSE was 0. 0032, the RMSE was 0. 0057, and the R2 was 0. 9928. The results show that meteorological factors like rainfall and temperature have a considerable impact on maize yields. Soil quality was also recognized as an important factor influencing maize crop production by the model. The research demonstrates that ensemble techniques could potentially be used to accurately predict maize yields. Farmers can use the study's findings to informed decisions about their agricultural practices. The research also emphasizes the significance of meteorological conditions and soil quality in predicting maize yields.