Computer Science
Permanent URI for this collectionhttps://etd.hu.edu.et/handle/123456789/76
Browse
Item EXPLORING A BETTER FEATURE EXTRACTION METHOD FOR AMHARIC HATE SPEECH DETECTION(Hawassa University, 2021-10-08) YESUF MOHAMED YIMAMHate speech is a speech that causes people to be attacked, discriminated, and hated because of their personal and collective identities. When hate speech grows, it will cause death and displacement of peoples from their homes and properties. Social media has the ability of widely spreading hate speech. To solve this problem, various researchers have studied many ways to detect social media hate speeches that are spreading in international and local languages. Because the problem is so serious, it needs to be carefully studied and better addressed in a variety of solutions. The previous studies detect a speech as hate speech, based on the frequency (occurrence) of a word in a given dataset; this means it does not consider the role of each word in a given sentence. The main purpose of this study is to design a method that can generate hate speech features from a given text by identifying the role of a word in a given sentence, so that hate speech can easily be distinguished from other forms of speech in a better way. To do this, various researches related to this study have been studied and reviewed. This study created a new feature extraction method for Amharic hate speech detection. The created model needs a training and testing dataset, so that posts and comments, which are posted on 25 popular Facebook pages, have been collected to build the dataset. Whether a speech is hateful or not, should be determined by the law that prohibits hate speech. So that, using different filtration methods, datasets that contain religious, ethnic, and hate words are collected and given to law experts, to annotate it manually. The law experts labeled 2590 datasets into three classes; Religion-hate, Ethnic-hate, and Non-hate. After dataset preparation, a new feature extraction method, which can distinguish hate speech from other speech, is developed. The new feature extraction method and other feature extraction methods that are used in other related studies are implemented and computed with three machine learning classification algorithms: SVM, NB, and RF. The result in different evaluation metrics shows that the new feature extraction method performed better in all combinations of classification algorithms. By using 80% of 2590 labeled datasets as a training set and the rest as a test set, 96.2% average accuracy is achieved using the combination of SVM with the new feature extraction method.
