Reactions on Ukraine-Russia War 2022

with Tweets Sentiment Analysis

Hoesu Chun, Yuna Lee, Cameron Taylor, Jinlin Ye, Yishin Gan, Shanshan Huan

ukraine

Introduction/Motivation

People’s reaction on Twitter sometimes forms a social movement that impacts world events. Accordingly, we performed sentiment and emotion analysis of tweets to measure and understand the opinions of large populations about the Russia-Ukraine conflict. We believe our interactive visualizations will help us better understand the current situation and trends, which can be used as an important reference for further analysis or related decision-making.

Problem Definition

The Russia-Ukraine conflict has been one of the most significant events on the world stage since last year (2021). On the heels of the Covid-19 pandemic, Russia’s invasion of Ukraine has had far-reaching consequences that directly impact the economy, supply chains, and, most importantly, hundreds of thousands of Ukrainian lives. To explore and understand this severe issue deeply, our team decided to use the tweets dataset. The social media platform Twitter has been a great resource for measuring the opinions of large populations through collections of the citizen’s unfiltered opinions. Twitter allows users to “tweet” short messages (up to 140 characters), sharing their views on whatever is currently on their minds. It also includes hashtags that indicate the message the user hopes to convey, which is sometimes used to trigger a social movement. Therefore, we believe the sentiment and emotion content analysis in the text of tweets over time will inform important messages surrounding the conflict in various countries worldwide.

Beyond exploring the tweets, we measured their sentiment/emotion with a pre-trained BERT model. Despite the popularity of sentiment analysis in many fields of study, the Ukraine war is a relatively new topic. Our study aims to add novelty and insight regarding this event. We investigated the trend of opinions towards the war, measured the opinion shifts in specific regions (such as countries that suffered the most from economic or energy crises caused by the war), and studied the correlation between public opinions and conflict events.

However, since the original tweets dataset does not have the label for sentiment/emotion, we could not evaluate the model’s performance. To overcome this problem, we choose the BERT model pre-trained especially with the tweets dataset. This may help reduce the gap between the model’s performances because the tweets for the conflict are also a part of the tweets.

We worked on this project for two months. We used Python for machine learning modeling and pre-processing and D3 for visualization.

Literature Survey

We referred to previous studies, which also covered political topics on Twitter. They include elections, propaganda, and military conflicts. (Kušen and Strembeck, 2018)[10] summarizes sentiment patterns of tweets posted by politicians during the Australian presidential election. It mainly describes the reaction difference between winners and losers. (Guido Caldarelli et al, 2020)[3] reveals that leading Twitter propaganda accounts have more bot account followers retweet their content. It explains the user polarization and human manipulation of Twitter content. (Gabel et al., 2020)[8] analyzes the usage of Twitter and the categories of tweet contents during the Jammu and Kashmir conflict. Besides political topics, (Bashar et al., 2020)[2] provides volume analysis, sentiment detection, and semantic brand score on Covid19 comments in Australia. These four studies convinced us of the usefulness of analyzing tweet datasets for political or impactful worldwide events.

(Chen et al., 2022)[4] deals with Ukraine Russia conflict, which is consistent with our topic. It tracks word frequency and hashtags for Tweets, mentioning the war in early 2022. However, we wanted to go further from fundamental text analysis with the updated datasets. We would like to classify the tweets dataset by sentiment and emotions for our research and visualization.

Studies related to the classification of text datasets such as (Nadia et al., 2014)[7], (Anees et al., 2017)[9], (Adyan et al., 2017)[14], (Shun et al., 2014)[16] use Naive Bayes, SVM, DNN, GMM, ensemble, etc., as an approach. (Coletta et al., 2014)[6] use the combination of an SVM classifier and a cluster ensemble as a refined algorithm. (Li et al., 2016)[11] proposes a weighted text feature and word embedding method. (Phan et al., 2020)[12] handles tweet data with a fuzzy word using a feature ensemble approach. However, the aforementioned models and methods are outdated. Some are too basic, and some cannot handle unbalanced datasets well, which may result in low performances. Instead, many recent researchers use LSTM or BERT for text analysis, which manages various types of texts well with advanced embedding functions and model structure. We also adopted BERT for accurate analysis.

(Pota et al., 2021)[13] provides several data preprocessing techniques and strategies for BERT. Moreover, (Chiorrini et el., 2021)[5] and (Singh et al., 2021)[15] suggest the pre-trained model BERT as an approach to analyzing social media text data; one works on emotion recognition, and the other focuses on people’s opinion during the pandemic. We also decided to use a pre-trained BERT model. However, unlike the aforementioned studies, we needed to put more effort into visualization in addition to classification.

(Almjawel et al., 2019)[1] describes Interactive visualization schemes on book reviews. This gave us insight into a visualization for text datasets.

Proposed method

Sentiment analysis is the process of identifying and analyzing the opinions or emotions of the users regarding a specific topic. The large volume of text data on the social media platform favored scientists to apply advanced machine learning techniques to capture insights that lie behind the text. Recent research showed that sentiment analysis results on social media data strongly correlate to public opinions, which can further model a broad concept of international relations or predict social movements. Therefore, we believe sentiment analysis on a current hot potato will be beneficial to understand the public’s stands as well as the topic itself, which might affect the future direction of the issue.

We use the Tweets dataset related to the Ukraine Conflict since Apr 1st, 2022, from Kaggle, which is unexplored by state-of-the-art papers yet. Out of 53M comments, we keep English comments which include country information as well (6.1M). Also, we keep only relevant columns for our analysis: 'userid,’ 'username,’ 'acctdesc,’ 'location,’ 'following,’ 'followers,’ 'tweetcreatedts,’ 'retweet count,’ 'text,’ and 'hashtags.’ The hashtags column is reformatted for easier information extraction.

Next, we build a machine learning model, roBERTa, which is pre-trained for Twitter sentiment/emotion analysis. We classify our dataset with the model to identify the text’s sentiment and emotion. The plot below shows the distribution of 6.1M comments according to their predicted sentiment/emotion. The sentiment column consists of neutral (48%), negative (40%), and positive (12%). The emotion column consists of optimism (52%), anger (20%), joy (19%), and sadness (9%). Then, we use python for data cleansing and pre-processing. We create small new datasets for each visualization.

sentiment emotion

For interactive visualization, we use d3: a graph, bar chart, word cloud, world map, etc. Visualizing the underlying analysis data helps readers better understand a clear-cut concept without reading complicated analysis processes and reports. In this manner, our visualizations also explain the current situation by showing the words frequently mentioned, the relationship between words, and the distribution of the top 30 hashtags worldwide.

Once we understand the dataset and its general trends with exploratory data analysis, we connect the news headlines on a specific date and the sentiment/emotion analysis on the same date to investigate how the sentiments/emotions are affected by the news. As users select the date, news, emotion chart by country, world map with an average sentiment score, and word cloud by country are changed accordingly. They give an insight into how certain exogenous events (e.g., Visa and Mastercard withdrawing from Russia) may have changed the sentiment/emotion about the conflict within impacted countries.

Besides, we build a sentiment world map highlighting the average sentiment scores of each country’s tweets from within the dataset. Users can click a button to automatically watch the changes in sentiment on a world map over time, which clearly displays worldwide trends.

Users can explore all our visualizations, specifying the time frame or the country of their interests with a button provided. This will deliver a more nuanced observation other than the general result; By filtering out the condition, it is easy to check that any decision on a specific date had an impact on a certain country’s opinion.

Experiments/Evaluation

Exploratory Data Analysis

Sentiment/Emotion Analysis

References

[1] Aljoharah Almjawel et al. “Sentiment analysis and visualization of amazon books’ reviews”. In: 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS). IEEE. 2019, pp. 1–6.

[2] Md Abul Bashar, Richi Nayak, and Thirunavukarasu Balasubramaniam. “Topic, sentiment and impact analysis: Covid19 information seeking on social media”. In: arXiv preprint arXiv:2008.12435 (2020).

[3] Guido Caldarelli et al. “The role of bot squads in the political propaganda on Twitter”. In: Communications Physics 3.1 (2020), pp. 1–15.

[4] Emily Chen and Emilio Ferrara. “Tweets in time of conflict: A public dataset tracking the twitter discourse on the war between ukraine and russia”. In: arXiv preprint arXiv:2203.07488 (2022).

[5] Andrea Chiorrini et al. “Emotion and sentiment analysis of tweets using BERT.” In: EDBT/ICDT Workshops. 2021.

[6] Luiz FS Coletta et al. “Combining classification and clustering for tweet sentiment analysis”. In: 2014 Brazilian conference on intelligent systems. IEEE. 2014, pp. 210–215.

[7] Nadia FF Da Silva, Eduardo R Hruschka, and Estevam R Hruschka Jr. “Tweet sentiment analysis with classifier ensembles”. In: Decision support systems 66 (2014), pp. 170–179.

[8] Sabrina Gabel, Lilian Reichert, and Christian Reuter. “Discussing conflict in social media: The use of Twitter in the Jammu and Kashmir conflict”. In: Media, War & Conflict (2020), p. 1750635220970997.

[9] Anees Ul Hassan et al. “Sentiment analysis of social networking sites (SNS) data using machine learning approach for the measurement of depression”. In: 2017 international conference on information and communication technology convergence (ICTC). IEEE. 2017, pp. 138–140.

[10] Ema Kuˇsen and Mark Strembeck. “Politics, sentiments, and misinformation: An analysis of the Twitter discussion on the 2016 Austrian Presidential Elections”. In: Online Social Networks and Media 5 (2018), pp. 37–50.

[11] Quanzhi Li et al. “Tweet sentiment analysis by incorporating sentiment-specific word embedding and weighted text features”. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE. 2016, pp. 568–571.

[12] Huyen Trang Phan et al. “Improving the performance of sentiment analysis of tweets containing fuzzy sentiment using the feature ensemble model”. In: IEEE Access 8 (2020), pp. 14630–14641.

[13] Marco Pota et al. “Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets”. In: Expert Systems with Applications 181 (2021), p. 115119.

[14] Adyan Marendra Ramadhani and Hong Soon Goo. “Twitter sentiment analysis using deep learning methods”. In: 2017 7th International annual engineering seminar (InAES). IEEE. 2017, pp. 1–4.

[15] Mrityunjay Singh, Amit Kumar Jakhar, and Shivam Pandey. “Sentiment analysis on the impact of coronavirus in social life using the BERT model”. In: Social Network Analysis and Mining 11.1 (2021), pp. 1–11.

[16] Shun Yoshida et al. “Sentiment analysis for various SNS media using Naıve Bayes classifier and its application to flaming detection”. In: 2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD). IEEE. 2014, pp. 1–6.