Mihaela Grigore
  • 👋About
  • 👩‍🏭Personal projects
    • Computer Vision | Deep Learning with Tensorflow & Keras (ResNet50, GPU training)
    • Computer Vision | Convolutional Neural Networks with PyTorch
    • Computer Vision | Facial Recognition with Keras, FaceNet, Inception, Siamese Networks
    • NLP | Topic modeling on tweets
    • NLP | Sentiment analysis of tweets: TextBlob, VADER and Flair
    • Time series | Exploration on Crypto price dataset
    • Data scraping | Social Media Scraping: Twitter Developer API for Academics
    • Data Scraping | Collecting historical tweets without Twitter API
  • ✍️Notes
    • Machine Learning in Production
      • Feature transforms
      • Feature selection
      • Data journey
    • NLP
      • Information Retrieval
    • Computer Vision
    • Time series
      • Stationarity
    • Data
      • Labeling
    • Python
      • ndarray slicing with index out of bounds
  • 📚Readings & other media
    • Computer Vision
      • Selection of research articles
    • NLP
      • Handwriting Text
      • Information Retrieval
      • Mono- / multilingual
      • Topic Modeling
      • Language Models
    • Time Series
    • Generative Adversarial Netoworks (GAN)
    • Python
      • Python basics
Powered by GitBook
On this page
  1. Personal projects

NLP | Sentiment analysis of tweets: TextBlob, VADER and Flair

PreviousNLP | Topic modeling on tweetsNextTime series | Exploration on Crypto price dataset

Last updated 3 years ago

In this project I evaluate three libraries for sentiment analysis on text data: TextBlob, VADER and Flair. Because working with tweets is tricky. They are a totally different story than usual text.

Tweets authors tend not to use punctuation, don't pay attention to writing grammatically correct, use emojis to disclose emotions just as well as they use text etc.

This makes sentiment analysis trickier than on regular text documents.

See

I'll be using a , collected for a period of three weeks around the 2020 US presidential elections.

I downloaded the dataset directly from the link above, but if you want to tackle another topic, you can open a and start collecting your own tweets using or .

I start with some classic exploratory data analysis (to answer questions like how many tweets we have or what is the date range ?), I proceed to text pre-processing (text data has a lot of extra material like stop words and many words are not in the most useful form (e.g. plurals will be converted to singular etc), then I do sentiment analysis using three libraries (TextBlob, VADEDR and Flair) and compare the results to see which is the most suited for our dataset.

And because I am using data related to a political campaign, I want to see what actionable insights we can draw based on the sentiment analysis results. After all, the purpose of any Data Science analysis is to find out how to solve a problem. In this case, I'm part of the campaign management for one of the candidates and we use Twitter sentiment analysis to see how to increase our voter base. #This is a hypothetical problem; I am not actually involved in Politics and hope to never be.

How to use it: Open the Jupyter Notebook in this folder. You can clone it, download it or just read it here. There is also a link at the top of the Notebook which takes you to the same Notebook on Kaggle.

Contents of this :

Exploratory Data Analysis
Text pre-processing
Intro to sentiment analysis
Sentiment analysis with TextBlob
Sentiment analysis with VADER
Sentiment analysis with Flair
Which is the best sentiment analysis library ?
Actionable insights from sentiment analysis of tweets

! Spoiler alert !

I will leave it in the form of Jupyter Notebook and not simple Python scripts, because it is intended also as instructional material.

I am mostly interested in what is the best way to preprocess tweets before applying sentiment analysis algorithms and in which algorithm works best on this type of data. I included my observations and insight in markdown cells, so this is not just rich on code, but rich on explanations too.

👩‍🏭
notebook
project repository on GitHub.
dataset of tweets
Twitter developer account
Tweet lookup
snscrape
Notebook
image
image
image