Active learning on Indonesian Twitter sentiment analysis using uncertainty sampling
Abstract
Nowadays, sentiment analysis research in social media is rapidly developing. Sentiment analysis typically falls under supervised learning, which requires annotating data. However, the annotation process for sentiment analysis tasks is notoriously time-consuming. Fortunately, an effective strategy to overcome this challenge has emerged, known as active learning. Active learning involves labeling only a small subset of the dataset, leaving the rest for annotation through sampling strategies. This study focuses on comparing two active learning strategies: random sampling and boundary sampling. These strategies are applied to machine learning models such as logistic regression and random forests. In addition, we present an evaluation of the model performance and data savings achieved by implementing these strategies in the context of traditional machine learning for sentiment analysis on Twitter. The dataset considered consists of two labels: positive and negative sentiments. The results of our investigation show that active learning can significantly reduce the amount of training data required, saving up to 65% of the total training data required to achieve peak model accuracy. The most successful model identified uses a random forest with a margin sampling strategy, yielding an accuracy of 81.12% and an F1 score of 88.60%. This research highlights the effectiveness of active learning strategies in sentiment analysis, demonstrating their potential to improve model performance and resource efficiency. The results underscore the viability of employing active learning methods, particularly the combination of random forest models with margin sampling, for more efficient sentiment analysis in social media.
Article Metrics
Abstract: 172 Viewers PDF: 123 ViewersKeywords
active learning; uncertainty sampling; logistic regression; random forest; sentiment analysis
Full Text:
PDFRefbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
ISSN | : | 2723-6471 (Online) |
Organized by | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
Website | : | http://bright-journal.org/JADS |
: | taqwa@amikompurwokerto.ac.id (principal contact) | |
support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0