Information Retrieval
Books
Semantic Search on Text and Knowledge Bases, by Bast, Hannah & Buchhold, Björn & Haussmann, Elmar, 2016. A comprehensive overview of the broad area of semantic search on text and knowledge bases. It was written in 2016, which means the analysis stops before Transformers were widely adopted in IR. This is the book to read in order to understand the IR field prior to the introduction of Transformers.
Pretrained Transformers for Text Ranking: BERT and Beyond, by Jimmy Lin, Rodrigo Nogueira, and Andrew Yates, 2021. The previous book was about the IR field prior to 2016. This one is about the recent years. In particular, about Transformers usage in text ranking. This is the book to read in order to uderstand the present of IR (at least at the time when I'm writing this).
Papers
Multi-Stage Document Ranking with BERT by Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, Jimmy Lin, 2019 Rodrigo Nogueira is co-author of the second book I mentioned at the start of this page. This is a must-read article, as R.N. has important contributions in this field.
Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline, by Luyu Gao, Zhuyun Dai, Jamie Callan. They propose Localized Contrastive Estimation (LCE) for training rerankers and demonstrate it significantly improves deep two-stage models
What happens to BERT embeddings during fine-tuning, by Amil Merchant, Elahe Rahimtoroghi, Ellie Pavlick, Ian Tenney They investigate how fine-tuning affects the representations of the BERT model. Findings: - Fine-tuning does not lead to catastrophic forgetting of linguistic phenomena - fine-tuning primarily affects the top layers of BERT, but with noteworthy variation across tasks - weaker effect on representations of out-of-domain sentences
Deep Learning for Matching in Search and Recommendation, by Jun Xu, Hang Li, Xiangnan He, 2019
Other
Zero-shot, One Kill: BERT for Neural Information Retrieval, by Stergios Efes This is an interesting work done as a master thess
Resource Name
Type
Tags
Description
short-paper
query suggestion, personalization
a sequence-to-sequence-model--based query suggestion framework that is capable of modeling structured, personalized features and unstructured query texts naturally.
article
query suggestion, personalization
using fastText library for personalized autosuggest
article
query suggestion, personalization
using RNNs
article
article
query intent
what they do is actually use these models to predict only the type of document targeted by the user through this search (that i, to predict whether the user is looking for a user profile, a job posting, feed post etc)
article
passage and document ranking
This year we have further evidence that rankers with BERT-style pretraining outperform other rankers in the large data regime.
article
BERT, fine-tuning, classification
experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning.
blog article
BERT
paper
french Q&A dataset
paper
GPU memory usage
most ML projects that fail while running / training actually run out of memory. They proposed a way to estimate memory needs so your experiment doesn't fail half way through. (only read abstract)
paper
mistakes metrics in information retrieval
most research in information retrieval evaluates results using MRR, MAP
blog post
SBERT, FAISS on MS MARCO for document retrieval using FAISS to store the embeddings and search; presumably easier than setting up Elasticsearch
blog post
it's a nice overview of what TF-IDF does and in which cases BM25 is better
paper
passage retrieval
Adapting BM25 into BM25P to identify relevant candidate documents for the passage retrieval task
Last updated