NLP | Sentiment analysis of tweets: TextBlob, VADER and Flair
Last updated
Last updated
In this project I evaluate three libraries for sentiment analysis on text data: TextBlob, VADER and Flair. Because working with tweets is tricky. They are a totally different story than usual text.
Tweets authors tend not to use punctuation, don't pay attention to writing grammatically correct, use emojis to disclose emotions just as well as they use text etc.
This makes sentiment analysis trickier than on regular text documents.
See project repository on GitHub.
I'll be using a dataset of tweets, collected for a period of three weeks around the 2020 US presidential elections.
I downloaded the dataset directly from the link above, but if you want to tackle another topic, you can open a Twitter developer account and start collecting your own tweets using Tweet lookup or snscrape.
I start with some classic exploratory data analysis (to answer questions like how many tweets we have or what is the date range ?), I proceed to text pre-processing (text data has a lot of extra material like stop words and many words are not in the most useful form (e.g. plurals will be converted to singular etc), then I do sentiment analysis using three libraries (TextBlob, VADEDR and Flair) and compare the results to see which is the most suited for our dataset.
And because I am using data related to a political campaign, I want to see what actionable insights we can draw based on the sentiment analysis results. After all, the purpose of any Data Science analysis is to find out how to solve a problem. In this case, I'm part of the campaign management for one of the candidates and we use Twitter sentiment analysis to see how to increase our voter base. #This is a hypothetical problem; I am not actually involved in Politics and hope to never be.
How to use it: Open the Jupyter Notebook in this folder. You can clone it, download it or just read it here. There is also a link at the top of the Notebook which takes you to the same Notebook on Kaggle.
Contents of this Notebook:
! Spoiler alert !
I am mostly interested in what is the best way to preprocess tweets before applying sentiment analysis algorithms and in which algorithm works best on this type of data. I included my observations and insight in markdown cells, so this notebook is not just rich on code, but rich on explanations too.
I will leave it in the form of Jupyter Notebook and not simple Python scripts, because it is intended also as instructional material.