Mihaela Grigore
  • 👋About
  • 👩‍🏭Personal projects
    • Computer Vision | Deep Learning with Tensorflow & Keras (ResNet50, GPU training)
    • Computer Vision | Convolutional Neural Networks with PyTorch
    • Computer Vision | Facial Recognition with Keras, FaceNet, Inception, Siamese Networks
    • NLP | Topic modeling on tweets
    • NLP | Sentiment analysis of tweets: TextBlob, VADER and Flair
    • Time series | Exploration on Crypto price dataset
    • Data scraping | Social Media Scraping: Twitter Developer API for Academics
    • Data Scraping | Collecting historical tweets without Twitter API
  • ✍️Notes
    • Machine Learning in Production
      • Feature transforms
      • Feature selection
      • Data journey
    • NLP
      • Information Retrieval
    • Computer Vision
    • Time series
      • Stationarity
    • Data
      • Labeling
    • Python
      • ndarray slicing with index out of bounds
  • 📚Readings & other media
    • Computer Vision
      • Selection of research articles
    • NLP
      • Handwriting Text
      • Information Retrieval
      • Mono- / multilingual
      • Topic Modeling
      • Language Models
    • Time Series
    • Generative Adversarial Netoworks (GAN)
    • Python
      • Python basics
Powered by GitBook
On this page
  • Books
  • Papers
  • Other
  1. Readings & other media
  2. NLP

Information Retrieval

PreviousHandwriting TextNextMono- / multilingual

Last updated 3 years ago

Books

  • , by Bast, Hannah & Buchhold, Björn & Haussmann, Elmar, 2016. A comprehensive overview of the broad area of semantic search on text and knowledge bases. It was written in 2016, which means the analysis stops before Transformers were widely adopted in IR. This is the book to read in order to understand the IR field prior to the introduction of Transformers.

  • , by Jimmy Lin, Rodrigo Nogueira, and Andrew Yates, 2021. The previous book was about the IR field prior to 2016. This one is about the recent years. In particular, about Transformers usage in text ranking. This is the book to read in order to uderstand the present of IR (at least at the time when I'm writing this).

Papers

  • by , , , , 2019 Rodrigo Nogueira is co-author of the second book I mentioned at the start of this page. This is a must-read article, as R.N. has important contributions in this field.

  • , by , , 2019.

  • , by , , . They propose Localized Contrastive Estimation (LCE) for training rerankers and demonstrate it significantly improves deep two-stage models

  • , by , , , They investigate how fine-tuning affects the representations of the BERT model. Findings: - Fine-tuning does not lead to catastrophic forgetting of linguistic phenomena - fine-tuning primarily affects the top layers of BERT, but with noteworthy variation across tasks - weaker effect on representations of out-of-domain sentences

  • , by , , ,

  • by , , , , , ,

  • , by , ,

  • , by Jun Xu, Hang Li, Xiangnan He, 2019

  • , by , , , , , , , , 2020

Other

Resource Name

Type

Tags

Description

short-paper

query suggestion, personalization

a sequence-to-sequence-model--based query suggestion framework that is capable of modeling structured, personalized features and unstructured query texts naturally.

article

query suggestion, personalization

using fastText library for personalized autosuggest

article

query suggestion, personalization

using RNNs

article

article

query intent

what they do is actually use these models to predict only the type of document targeted by the user through this search (that i, to predict whether the user is looking for a user profile, a job posting, feed post etc)

article

passage and document ranking

This year we have further evidence that rankers with BERT-style pretraining outperform other rankers in the large data regime.

article

BERT, fine-tuning, classification

experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning.

blog article

BERT

paper

french Q&A dataset

paper

GPU memory usage

most ML projects that fail while running / training actually run out of memory. They proposed a way to estimate memory needs so your experiment doesn't fail half way through. (only read abstract)

paper

mistakes metrics in information retrieval

most research in information retrieval evaluates results using MRR, MAP

blog post

SBERT, FAISS on MS MARCO for document retrieval using FAISS to store the embeddings and search; presumably easier than setting up Elasticsearch

blog post

it's a nice overview of what TF-IDF does and in which cases BM25 is better

paper

passage retrieval

Adapting BM25 into BM25P to identify relevant candidate documents for the passage retrieval task

, by Stergios Efes This is an interesting work done as a master thess

📚
Semantic Search on Text and Knowledge Bases
Pretrained Transformers for Text Ranking: BERT and Beyond
Multi-Stage Document Ranking with BERT
Rodrigo Nogueira
Wei Yang
Kyunghyun Cho
Jimmy Lin
Passage Re-ranking with BERT
Rodrigo Nogueira
Kyunghyun Cho
Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline
Luyu Gao
Zhuyun Dai
Jamie Callan
What happens to BERT embeddings during fine-tuning
Amil Merchant
Elahe Rahimtoroghi
Ellie Pavlick
Ian Tenney
Understanding the Behaviors of BERT in Ranking
Yifan Qiao
Chenyan Xiong
Zhenghao Liu
Zhiyuan Liu
Language models as knowledge bases ?
Fabio Petroni
Tim Rocktäschel
Patrick Lewis
Anton Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
Understanding BERT Rankers Under Distillation
Luyu Gao
Zhuyun Dai
Jamie Callan
Deep Learning for Matching in Search and Recommendation
Project PIAF: Building a Native French Question-Answering Dataset
Rachel Keraron
Guillaume Lancrenon
Mathilde Bras
Frédéric Allary
Gilles Moyse
Thomas Scialom
Edmundo-Pavel Soriano-Morales
Jacopo Staiano
Zero-shot, One Kill: BERT for Neural Information Retrieval
Personalized Query Suggestionsdid not find the full paper
Personalized query Auto-Completion Through a Lightweight Representation of the User Context
Attention-based Hierarchical Neural Query Suggestion
Using BERT and BART for Query Suggestion
Deep Search Query Intent Understanding
OVERVIEW OF THE TREC 2020 DEEP LEARNING TRACK
How to Fine-Tune BERT for Text Classification?
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
Project PIAF: Building a Native French Question-Answering Dataset
Estimating GPU Memory Consumption of Deep Learning Models
Some Common Mistakes In IR Evaluation, And How They Can Be Avoided
Semantic Search with S-BERT
BM25 vs TF-IDF
Enhanced News Retrieval: Passages Lead the Way!