Authors
Sanaz Rasti, Sarah Anne Dunne and Eugenia Siapera, University College Dublin, Ireland
Abstract
The rapid growth of Alt-tech platforms and concerns over their less stringent content moderation policies, make them a good case for opinion mining. This study aims at investigating the topic models that exist in specific Alt-tech channel on Telegram, using data collected in two time points of 2021 and 2023. Three different topic models of LDA, NMF and Contextualized NTM were explored and a model selection procedure was proposed to choose the best performing model among all. To validatethe model selection algorithm quantitatively and qualitatively, the approach was tested on publicly available labelled datasets. For all the experiments, data was pre-processed employing an effective NLP pre-processing procedure along with an Alt-tech customised list of stop-words. Using the validated topic model selection algorithm, LDA topics with Ngram range = (4, 4) were extracted from the targeted Alt-tech dataset. The findings from topic models were qualitatively evaluated by a social scientist and are further discussed. The conclusion of the work suggests that the proposed model selection procedure is effective for corresponding corpus length and context. Future work avenues are suggested to improve the outcome ofAlt-tech topic modeling.
Keywords
Topic Modeling, Topic Model Selection,LDA, NMF, Contextualized NTM, Alt-tech