Social Media Sentiment Analysis: Taking Weibo as an Example
An empirical study on positive and negative sentiment classification of ten thousand tweets based on deep learning and CNN-RNN models.
Detail
Published
23/12/2025
Key Chapter Title List
- Introduction
- Related Research
- Sentiment Analysis
- Word Embedding
- Deep Learning
- Deep Learning Algorithms and Technical Types
- Recurrent Neural Networks
- Evaluation Metrics and Methods
- Research Methodology
- Experimental Results
- Conclusion
Document Introduction
Against the backdrop of the increasingly prominent influence of social media, public sentiment analysis has become a crucial support in areas such as corporate market response analysis, political election prediction, and macroeconomic phenomenon forecasting. Twitter, as a globally renowned microblogging and social networking platform, boasts over 200 million registered users and 100 million active users, generating approximately 250 million tweets daily. Its massive volume of unstructured data provides rich samples for sentiment mining while also presenting technical challenges. This study focuses on the sentiment analysis problem on the Twitter platform, with the core objective of constructing an efficient model to achieve accurate classification of tweet sentiment as positive or negative.
The research first outlines the core concepts and application scenarios of sentiment analysis, clarifying the classification dimensions of Subjectivity and Sentiment Analysis (SSA), including document-level, sentence-level, and aspect-level sentiment classification frameworks. Simultaneously, it systematically reviews related research achievements in this field, covering various research approaches such as machine learning algorithms, semantic analysis techniques, and sentiment lexicon integration, laying a theoretical foundation for subsequent model construction.
At the technical methodology level, the study adopts a deep learning approach, constructing a hybrid model that integrates Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), with a focus on introducing Long Short-Term Memory (LSTM) networks to address the long-term dependency problem of traditional RNNs. The data processing phase follows a strict standardized workflow, including key steps such as dataset acquisition and preprocessing (removing stop words, punctuation, special characters, etc.), text representation (converting text into vectors based on word embedding techniques), and dataset splitting (80% training set and 20% test set), ensuring data quality and the effectiveness of model training.
The study uses a dataset of 1.6 million tweets (containing 800,000 positive tweets and 800,000 negative tweets) sourced from kaggle.com for model training and validation. Performance is evaluated through multiple metrics including accuracy, precision, recall, and F1-score. Experimental results show that the constructed model achieved a 93.91% success rate in tweet sentiment recognition. The application of the word embedding layer, parameter optimization of LSTM units, and the adoption of the Adam optimization algorithm played key roles in improving classification accuracy.
This research validates the effectiveness of deep learning models in sentiment analysis of unstructured social media data. Its high-accuracy model can provide decision support for practical scenarios such as political marketing, while also laying a technical foundation for the subsequent development of an integrated real-time sentiment analysis system. The dataset processing workflow, model architecture design, and hyperparameter selection experience formed during the research process also provide reusable empirical references for similar social media sentiment analysis studies.