Authors
Nicole Ma1, Yu Sun2, 1Sage Hill School, USA, 2California State Polytechnic University, USA
Abstract
Instances of hate speech on popular social media platforms such as Twitter are becoming increasingly common and intense. However, there still exists a lack of comprehensive deeplearning models to combat Twitter hate speech. In this project, a comprehensive detection and reporting platform, entitled “TweetWatch,” was created to solve this issue. A binary classification CNN (Convolutional Neural Network) and a multi-class CNN were created to detect hate speech from real-time Twitter data and classify tweets with hate speech into five categories. The binary classification model has an AUC score of 98.95% and an F1 score of 97.88%. The multi-class classification model has an AUC score of 89.46%. All metrics reached over a targeted 5% increase from previous models in multiple papers, validating the proposed solution. Additionally, the only real-time choropleth map for hate speech in the United States was successfully created.
Keywords
Web scraping, Natural language processing, Deep learning, Neural networks.