Authors
Zewdie Mossie and Jenq-Haur Wang, National Taipei University of Technology, Taiwan
Abstract
The anonymity of social networks makes it attractive for hate speech to mask their criminal activities online posing a challenge to the world and in particular Ethiopia. With this ever-increasing volume of social media data, hate speech identification becomes a challenge in aggravating conflict between citizens of nations. The high rate of production, has become difficult to collect, store and analyze such big data using traditional detection methods. This paper proposed the application of apache spark in hate speech detection to reduce the challenges. Authors developed an apache spark based model to classify Amharic Facebook posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold cross-validation, the model based on word2vec embedding performed best with 79.83%accuracy. The proposed method achieve a promising result with unique feature of spark for big data.
Keywords
Amharic Hate speech detection, Social networks and spark, Amharic posts and comments