Information Retrieval

Books

  • Semantic Search on Text and Knowledge Bases, by Bast, Hannah & Buchhold, Björn & Haussmann, Elmar, 2016. A comprehensive overview of the broad area of semantic search on text and knowledge bases. It was written in 2016, which means the analysis stops before Transformers were widely adopted in IR. This is the book to read in order to understand the IR field prior to the introduction of Transformers.

  • Pretrained Transformers for Text Ranking: BERT and Beyond, by Jimmy Lin, Rodrigo Nogueira, and Andrew Yates, 2021. The previous book was about the IR field prior to 2016. This one is about the recent years. In particular, about Transformers usage in text ranking. This is the book to read in order to uderstand the present of IR (at least at the time when I'm writing this).

Papers

Other

Resource Name

Type

Tags

Description

short-paper

query suggestion, personalization

a sequence-to-sequence-model--based query suggestion framework that is capable of modeling structured, personalized features and unstructured query texts naturally.

article

query suggestion, personalization

using fastText library for personalized autosuggest

article

query suggestion, personalization

using RNNs

article

article

query intent

what they do is actually use these models to predict only the type of document targeted by the user through this search (that i, to predict whether the user is looking for a user profile, a job posting, feed post etc)

article

passage and document ranking

This year we have further evidence that rankers with BERT-style pretraining outperform other rankers in the large data regime.

article

BERT, fine-tuning, classification

experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning.

blog article

BERT

paper

french Q&A dataset

paper

GPU memory usage

most ML projects that fail while running / training actually run out of memory. They proposed a way to estimate memory needs so your experiment doesn't fail half way through. (only read abstract)

paper

mistakes metrics in information retrieval

most research in information retrieval evaluates results using MRR, MAP

blog post

SBERT, FAISS on MS MARCO for document retrieval using FAISS to store the embeddings and search; presumably easier than setting up Elasticsearch

blog post

it's a nice overview of what TF-IDF does and in which cases BM25 is better

paper

passage retrieval

Adapting BM25 into BM25P to identify relevant candidate documents for the passage retrieval task

Last updated