Authors
Ervin Vladić, Benjamin Mehanović, Mirza Novalić, Dino Kečo and Dželila Mehanović, International Burch University, Herzegovina
Abstract
In the last ten years, social networks have appeared as main opinion-sharing and discussion-enabling resources. At the same time, the development of machine learning (ML) and natural language processing (NLP) technologies has allowed for new approaches to analyzing the huge quantities of data created by users. This research uses data loading, class imbalance handling, text preprocessing and tokenization, sentiment analysis, and model assessment techniques to analyze the sentiment of the tweets. Using metrics like accuracy, precision, recall, and F1 score the study reveals that SVM and Logistic Regression are the most suitable machine-learning models for this purpose. SVM attained an accuracy of 90% for training and 77% for testing while Logistic Regression showed 83% for training and 78% for testing.
Keywords
Social Media Analysis, Classification Models, Data Preprocessing