Skip to content

Information Retrieval & Search

Finding the right information in huge collections โ€” the foundation of search engines and of retrieval-augmented generation.

Imagine a giant library with millions of books but no catalogue. If a visitor asks a question, you would have to open every book to find the answer โ€” hopeless. Information retrieval is the craft of building a smart catalogue so you can find the few pages that actually matter in seconds. Search engines do this every time you type. Two big tricks help. The first matches the exact words you asked for. The second is cleverer: it matches meaning, so a search for "car" can also surface pages about "automobiles". Modern AI assistants lean on this heavily. Before an assistant replies, it often retrieves the most relevant snippets from a trusted collection and reads them first โ€” a pattern called RAG. Good retrieval is what lets a computer answer from real, specific information instead of guessing.

The main ideas

  • Keyword search โ€” Inverted indexes and ranking functions like BM25 โ€” still strong baselines.
  • Semantic / vector search โ€” Embedding queries and documents to match on meaning, not just words.
  • Vector databases & ANN โ€” Approximate nearest-neighbor indexes (HNSW, IVF) that make embedding search scale.
  • Hybrid & re-ranking โ€” Combining lexical and semantic signals, then re-ranking with cross-encoders.
  • Chunking & indexing โ€” Turning documents into retrievable units โ€” the unglamorous key to good RAG.
  • Evaluation โ€” Recall@k, nDCG, and MRR โ€” measuring retrieval quality.

NLP & Large Language Models ยท Recommender Systems ยท Building with AI


Want to make things?

Head to AI School โ€” AI camps where kids build their own games.