Mihaela Grigore

Language Models

Papers

Universal Language Model Fine-tuning for Text Classification, by Jeremy Howard, Sebastian Ruder Yes, this is Jermey Howard from fast.AI. As he has no incentive to publish papers like the people who work in Academia do, we know that when he publishes, it's something worthwhile.
FlauBERT: Unsupervised Language Model Pre-training for French, by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab - they introduce and share FlauBERT, a model learned on a very large and heterogeneous French corpus. - Models of different sizes are trained using the new CNRS (French National Centre for Scientific Research) Jean Zay supercomputer. - they test on a series of NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most of the time they outperform other pre-training approaches
CamemBERT: a Tasty French Language Model, by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot - this is the other French BERT-based model, similar in performance to FlauBERT
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, by Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf -> a paper co-authored by Vctor Sanh, who already has a reputation in NLP -> knowledge distillation is a very interesting procedure for obtaining lighter models while not comprimising much on performance.
Attention is all you need, by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017. This is the paper that introduced the Trasformer. A must-read.
RoBERTa: A Robustly Optimized BERT Pretraining Approach, by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov, 2019.
Recurrent continuous translation models, by Nal Kalchbrenner, P. Blunsom, 2013 -> the encoder-decoder architecture
Task-Aware Representation of Sentences for Generic Text Classification, by Kishaloy Halder, A. Akbik, Roland Vollgraf, 2020
Generating Sequences With Recurrent Neural Networks, by Alex Graves, 2013 A must-read paper for NLP
Neural Machine Translation by Jointly Learning to Align and Translate, by Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015

PreviousTopic Modeling NextTime Series

Last updated 3 years ago