Language Models
Last updated
Last updated
, by , Yes, this is Jermey Howard from fast.AI. As he has no incentive to publish papers like the people who work in Academia do, we know that when he publishes, it's something worthwhile.
, by , , , , , , , , , - they introduce and share FlauBERT, a model learned on a very large and heterogeneous French corpus. - Models of different sizes are trained using the new CNRS (French National Centre for Scientific Research) Jean Zay supercomputer. - they test on a series of NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most of the time they outperform other pre-training approaches
, by , , , , , , , - this is the other French BERT-based model, similar in performance to FlauBERT
, by , , , -> a paper co-authored by Vctor Sanh, who already has a reputation in NLP -> knowledge distillation is a very interesting procedure for obtaining lighter models while not comprimising much on performance.
, by , , , , , , , , 2017. This is the paper that introduced the Trasformer. A must-read.
, by , , , , , , , , , , 2019.
, by , , 2013 -> the encoder-decoder architecture
, by , , , 2020
, by , 2013 A must-read paper for NLP
, by , , , 2015