keyboard_arrow_up
Unsupervised Detection of Violent Content in Arabic Social Media

Authors

Kareem E Abdelfatah1,3, Gabriel Terejanu1 and Ayman A Alhelbawy2,3, 1University of South Carolina, USA, 2University of Essex, United Kingdom and 3Fayoum University, Egypt

Abstract

A monitoring system is proposed to detect violent content in Arabic social media. This is a new and challenging task due to the presence of various Arabic dialects in the social media and the non-violent context where violent words might be used. We proposed to use a probabilistic nonlinear dimensionality reduction technique called sparse Gaussian process latent variable model (SGPLVM) followed by k-means to separate violent from non-violent content. This framework does not require any labelled corpora for training. We show that violent and non-violent Arabic tweets are not separable using k-means in the original high dimensional space, however better results are achieved by clustering in low dimensional latent space of SGPLVM.

Keywords

Violence, Social Media, Arabic, SGPLVM, Dimensionality Reduction, Unsupervised learning

Full Text  Volume 7, Number 4