Mihaela Grigore
  • 👋About
  • 👩‍🏭Personal projects
    • Computer Vision | Deep Learning with Tensorflow & Keras (ResNet50, GPU training)
    • Computer Vision | Convolutional Neural Networks with PyTorch
    • Computer Vision | Facial Recognition with Keras, FaceNet, Inception, Siamese Networks
    • NLP | Topic modeling on tweets
    • NLP | Sentiment analysis of tweets: TextBlob, VADER and Flair
    • Time series | Exploration on Crypto price dataset
    • Data scraping | Social Media Scraping: Twitter Developer API for Academics
    • Data Scraping | Collecting historical tweets without Twitter API
  • ✍️Notes
    • Machine Learning in Production
      • Feature transforms
      • Feature selection
      • Data journey
    • NLP
      • Information Retrieval
    • Computer Vision
    • Time series
      • Stationarity
    • Data
      • Labeling
    • Python
      • ndarray slicing with index out of bounds
  • 📚Readings & other media
    • Computer Vision
      • Selection of research articles
    • NLP
      • Handwriting Text
      • Information Retrieval
      • Mono- / multilingual
      • Topic Modeling
      • Language Models
    • Time Series
    • Generative Adversarial Netoworks (GAN)
    • Python
      • Python basics
Powered by GitBook
On this page
  1. Personal projects

Data Scraping | Collecting historical tweets without Twitter API

PreviousData scraping | Social Media Scraping: Twitter Developer API for AcademicsNextMachine Learning in Production

Last updated 3 years ago

See

Collecting historical tweets using snscrape.

What you need:

What you don't need:

a Twitter Developer Account

For a research project related to public discourse about results on international large scale assessments I needed to scrape historical tweets, going back all the way to the begining of Twitter. This is how I discovered snscrape, a wonderful tool, easy to setup and use.

I didn't find snscrape from the start, initially I was reading through the intricate details of Twitter Developer Account, application procedure, different levels of access, limits etc etc. But luckily a friend recommended snscrape and suddenly the task of collecting tweets became extremely easy.

Snscrape is a popular tool with social scientists for Tweets collection, at least in 2021. Apparently, it bypasses several limitations of the Twitter API. The prettiest thing is that you don't need Twitter developer account credentials (like you do with Tweepy, for example)

How to use this repo: Open the Jupyter Notebook in this folder. You can clone it, download it or just read it here. There is also a link at the top of the Notebook which takes you to the same Notebook on Kaggle.

Contents of the notebook:

Installing snscrape
How to use snscrape
Calling snscrape CLI commands from Python Notebook
Using snscrape Python wrapper
Tweets meta-information gathered with snscrape
Dataset manipulation: JSON, CSV and Pandas DataFrame
Basic exploration of our collected dataset of tweets
Bonus: Publishing your Jupyter Notebook on Medium
What next ? Sentiment analysis
👩‍🏭
project repository on GitHub.