Mono- / multilingual
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation, by Nils Reimers, Iryna Gurevych [ paper ] An easy and efficient method to extend existing sentence embedding models to new languages. This allows to create multilingual versions from previously monolingual models.
Multilingual Universal Sentence Encoder for Semantic Retrieval, by Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil [ paper ] When the authors team is from Google and it comprises of 12 people, it's a sign of something worth reading. - They propose two pre-trained multilingual (16 languages) retrieval models based on Transformer and CNN model architectures. - They reach SOTA performance on the folowing tasks: semantic retrieval (SR), translation pair bitext retrieval (BR) and retrieval question answering (ReQA). - They make the models available on TensorFlow Hub. - They also wrote a short blog post about their findngs on Google Blog.
Last updated