Mihaela Grigore
  • 👋About
  • 👩‍🏭Personal projects
    • Computer Vision | Deep Learning with Tensorflow & Keras (ResNet50, GPU training)
    • Computer Vision | Convolutional Neural Networks with PyTorch
    • Computer Vision | Facial Recognition with Keras, FaceNet, Inception, Siamese Networks
    • NLP | Topic modeling on tweets
    • NLP | Sentiment analysis of tweets: TextBlob, VADER and Flair
    • Time series | Exploration on Crypto price dataset
    • Data scraping | Social Media Scraping: Twitter Developer API for Academics
    • Data Scraping | Collecting historical tweets without Twitter API
  • ✍️Notes
    • Machine Learning in Production
      • Feature transforms
      • Feature selection
      • Data journey
    • NLP
      • Information Retrieval
    • Computer Vision
    • Time series
      • Stationarity
    • Data
      • Labeling
    • Python
      • ndarray slicing with index out of bounds
  • 📚Readings & other media
    • Computer Vision
      • Selection of research articles
    • NLP
      • Handwriting Text
      • Information Retrieval
      • Mono- / multilingual
      • Topic Modeling
      • Language Models
    • Time Series
    • Generative Adversarial Netoworks (GAN)
    • Python
      • Python basics
Powered by GitBook
On this page
  1. Readings & other media
  2. NLP

Language Models

PreviousTopic ModelingNextTime Series

Last updated 3 years ago

Papers

  • , by , Yes, this is Jermey Howard from fast.AI. As he has no incentive to publish papers like the people who work in Academia do, we know that when he publishes, it's something worthwhile.

  • , by , , , , , , , , , - they introduce and share FlauBERT, a model learned on a very large and heterogeneous French corpus. - Models of different sizes are trained using the new CNRS (French National Centre for Scientific Research) Jean Zay supercomputer. - they test on a series of NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most of the time they outperform other pre-training approaches

  • , by , , , , , , , - this is the other French BERT-based model, similar in performance to FlauBERT

  • , by , , , -> a paper co-authored by Vctor Sanh, who already has a reputation in NLP -> knowledge distillation is a very interesting procedure for obtaining lighter models while not comprimising much on performance.

  • , by , , , , , , , , 2017. This is the paper that introduced the Trasformer. A must-read.

  • , by , , , , , , , , , , 2019.

  • , by , , 2013 -> the encoder-decoder architecture

  • , by , , , 2020

  • , by , 2013 A must-read paper for NLP

  • , by , , , 2015

📚
Universal Language Model Fine-tuning for Text Classification
Jeremy Howard
Sebastian Ruder
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
Alexandre Allauzen
Benoît Crabbé
Laurent Besacier
Didier Schwab
CamemBERT: a Tasty French Language Model
Louis Martin
Benjamin Muller
Pedro Javier Ortiz Suárez
Yoann Dupont
Laurent Romary
Éric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
Attention is all you need
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Lukasz Kaiser
Illia Polosukhin
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
Mike Lewis
Luke Zettlemoyer
Veselin Stoyanov
Recurrent continuous translation models
Nal Kalchbrenner
P. Blunsom
Task-Aware Representation of Sentences for Generic Text Classification
Kishaloy Halder
A. Akbik
Roland Vollgraf
Generating Sequences With Recurrent Neural Networks
Alex Graves
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio