Mihaela Grigore
  • 👋About
  • 👩‍🏭Personal projects
    • Computer Vision | Deep Learning with Tensorflow & Keras (ResNet50, GPU training)
    • Computer Vision | Convolutional Neural Networks with PyTorch
    • Computer Vision | Facial Recognition with Keras, FaceNet, Inception, Siamese Networks
    • NLP | Topic modeling on tweets
    • NLP | Sentiment analysis of tweets: TextBlob, VADER and Flair
    • Time series | Exploration on Crypto price dataset
    • Data scraping | Social Media Scraping: Twitter Developer API for Academics
    • Data Scraping | Collecting historical tweets without Twitter API
  • ✍️Notes
    • Machine Learning in Production
      • Feature transforms
      • Feature selection
      • Data journey
    • NLP
      • Information Retrieval
    • Computer Vision
    • Time series
      • Stationarity
    • Data
      • Labeling
    • Python
      • ndarray slicing with index out of bounds
  • 📚Readings & other media
    • Computer Vision
      • Selection of research articles
    • NLP
      • Handwriting Text
      • Information Retrieval
      • Mono- / multilingual
      • Topic Modeling
      • Language Models
    • Time Series
    • Generative Adversarial Netoworks (GAN)
    • Python
      • Python basics
Powered by GitBook
On this page
  1. Readings & other media
  2. NLP

Mono- / multilingual

PreviousInformation RetrievalNextTopic Modeling

Last updated 3 years ago

  • , by , [ paper ] An easy and efficient method to extend existing sentence embedding models to new languages. This allows to create multilingual versions from previously monolingual models.

  • , by , , , , , , , , , , , [ paper ] When the authors team is from Google and it comprises of 12 people, it's a sign of something worth reading. - They propose two pre-trained multilingual (16 languages) retrieval models based on Transformer and CNN model architectures. - They reach SOTA performance on the folowing tasks: semantic retrieval (SR), translation pair bitext retrieval (BR) and retrieval question answering (ReQA). - They make the models available on TensorFlow Hub. - They also wrote a about their findngs on Google Blog.

📚
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Nils Reimers
Iryna Gurevych
Multilingual Universal Sentence Encoder for Semantic Retrieval
Yinfei Yang
Daniel Cer
Amin Ahmad
Mandy Guo
Jax Law
Noah Constant
Gustavo Hernandez Abrego
Steve Yuan
Chris Tar
Yun-Hsuan Sung
Brian Strope
Ray Kurzweil
short blog post