Volume 17
Abstract: The increases in digitalization have it is possible to track and measure every click, every payment, every message, almost every thought that everyone has daily. Companies are extremely interested in the robustness of the data, specifically regarding understanding the sentiment of consumers. Yet the amount of information being produced, processed, and communed is quite staggering causing information overload. As such, companies tend to fall into analysis paralysis which can resulting missing important insights that could help their business. The goal of the study is to analyze and categorize the top posts on multiple hacking subreddits to determine the most discussed topics and to examine the sentiment of top post expressed by users in hacking subreddits. We began by scraping data, specifically the top posts title, ID, score, comments, and URL for each post, from multiple hacking subreddit communities and then used the Natural Language Toolkit (NLTK) to perform the data preprocessing techniques for an effective analytic process and bias-free results. The results of the testing were able to filter through the posts and determine whether sentiment was positive, negative, or neutral. In the case of the hacking subreddits, many of the posts were of a neutral opinion. This study aims to provide a contribution by utilizing Natural Language Processing methods Topic Modeling such as Term Frequency Inverse Document Frequency, Latent Semantic Analysis (LSA) algorithm, and Sentiment Analysis to gather and synthesize cybersecurity data. Download this article: JISAR - V17 N1 Page 64.pdf Recommended Citation: Omakwu , S., Wimmer, H., Rebman, C., (2024). Using Textual Analytics to Process Information Overload of Cyber Security Subreddits. Journal of Information Systems Applied Research 17(1) pp 64-74. https://doi.org/10.62273/AJJR5232 |