UL | CSC | ILIAS | MINE


Home

: Team
: Teaching
: Publications
: Research
: Conferences
: Events
: Open Theses
: Jobs
: Contact

: mics
: binfo
: ilias
: uni gr


internal only
Information Retrieval and Learning



GENERAL

OUTLINE

  • The course is offered in the winter term with 2h/week and counts for 2 ECTS.
  • It bases on the book Introduction to Information Retrieval by C. Manning, P. Raghavan, and H. Schuetze.
  • This lecture has been widely used at other universities, for example:
    • Cornell University, Computer and Information Science. Course on "Information Retrieval".
    • University of Tuebingen. Seminar fuer Sprachwissenschaft. Course on Introduction to Information Retrieval.
    • Stanford University, Dept. of Computer Science. Course on Information Retrieval and Web Search.

COURSE CONTENTS

  • C01: Boolean Retrieval - Introduction and Course Overview - Boolean (Information) Retrieval - Classical Search Model - Incidence Vectors - Inverted Index Construction - Inter-merging postings - Example: WestLaw - Query Optimisation. Boolean Retrieval
  • C02: Dictionaries - Tokens and Terms - Lemmatisation and Stemming - Porter's algorithm - Skip pointers and lists - Phase queries and positional indexes - Selection of a Data Structure: Records, Hash Tables, Binary Trees, B-Trees. Dictionaries
  • C03: Tolerant Retrieval - WildCard Queries: Problem Statement, Application to B-Trees, Bi-grams, Processing of Wild-Card Queries. - Spell Correction: Isolated Word Scenario: Query Misspellings, Levenshtein Distance, n-gram Overlap, Jaccard coefficient, Tanimoto coefficient, matching the tri-grams. - Context-Sensitive Scenario: hit-based spelling correction, conjunction of bi-words. - Soundex - phonetic equivalents: algorithmic idea. Tolerant Retrieval
  • C04: Term Frequency, Inverted Document Frequency (TF/IDF). Ranked Retrieval - Scoring Documents & Weighting Schemes: Jaccard, Binary, Term Frequencies (TF), Document and Inverted Document Frequencies (IDF), Log Frequencies, Collection frequency - Vector Space Model. TF/IDF
  • C05: Measures to evaluate retrieved results. Precision and Recall for Information Retrieval and Classification. - Harmonic combinations of Precision and Recall: F-Measure, Matthews correlation coefficient, van Rijsbergen effectiveness measure, kappa measure. - Precision/Recall curves and the interpolation of discrete values. - Example Queries.Precision & Recall
  • C06: Relevance Feedback - Query Expansion. - Centroid method, Rocchio Algorithm, Relevance Feedback in Vector Spaces. - Positive and Negative Feedback, Assumptions. - Thesaurus-based Query Expansion. Relevance Feedback
  • C07: Exercise: Information Retrieval and Learning. The following text (= pdf document / "Chapter 7 - General Summary" - see below) briefly summarises the main topics. I have written it "at one go", but I very hope that it helps you to better understand the course concerns. Please note that - from an "industrial" point of view, a fast, intelligent and adaptive, and user-friendly information retrieval system directly corresponds to a higher revenue. Executive Summary
  • C08: Text Classification - Motivation of Text Classification. - k nearest neighbour. - Centroid-method. Applications -Projects
  • C09: Text Classification - 3 Projects. The usage of "Microfeatures" with respect to Natural Language Understanding - Disambiguation of Word meanings and Context Classification. - Analysis of Spam Emails: Content Zoning and Digital Image Sorting towards a predictive usage. - Linguistic Description of textual documents regarding the fingerprinting of authors.
  • C10: Link Analysis: Understanding the World Wide Web as a Graph. - What is citation? What are popular pages? - What is a Markov Chain? What is an Ergodic Markov Chain? - How does the PageRank algorithm work? Link Analysis
A course organisation can be found here: Organisation as well as a Examination Check List.

"Information Retrieval and Learning" is mentioned on: Courses | Knowledge Discovery and Data Mining | Mine


Printable Version
VeryQuickWiki - HTML Export
Version: 2.7.1 (UniLux: 1.15.0 2006-01-19)
Modified: 2012-04-11 18:22:11
Exported: 2013-05-23 01:31:37