Lexicon-based sentiment analysis using Twitter data - a case of COVID-19 outbreak in India and abroad


COVID-19 is a kind of virus of the Corona family originated from Wuhan, China, and spread over more than 215 countries in the world, more than 2.3 lakhs people died, and more than 32 lakhs are affected globally till date and numbers are continuously increasing. Because of this global pandemic, citizens of the country are in a panic situation. Sentiment Analysis (SA) is a prominent field to analyze data available on social media. This research work explores SA using the Lexicon-based approach to analyze the sentiment of six different countries: India, the USA, Spain, Italy, France, and the UK. Data from March 15 to April 15, 2020 extracted from Twitter and used to identify sentiment as Negative, Neutral, or Positive using Lexicon-based and Valence Aware Dictionary for Sentiment Reasoning (VADER)-based approaches. Empirical results show that negativity exists in almost all the countries because of COVID-19. Out of six countries considered for the SA, the UK has the highest negativity of 23.03%, followed by France with 22.71%, the USA with 22.01%, and India is having negativity of 18.39% using Simple Lexicon-based approach. At the same time, it is 35.92% in France, 35.68% in the UK, and 35.38% in the USA, while India has the least negativity of 31.03% based on the VADER-based approach. Both approaches are almost producing negativity in the same order with slight variations. Furthermore, a comparative detail analysis of India has also been done based on Twitter data. The data collected before and after lockdown using a simple Lexicon-based approach, and it has been observed that negativity is increasing after lockdown and slightly decreased during lockdown 2.0. Overall implication of this research work is that however negativity exists but people are more positive toward panic situation because of COVID-19 and also fighting against COVID-19 with restrictions like lockdown, home isolation, quarantine, limited access of resources, etc.

In Data Science for COVID-19