GENERALOUTLINE
-
The course is offered in the winter term with 2h/week and counts for 2 ECTS.
-
It bases on the book Introduction to Information Retrieval by C. Manning, P. Raghavan, and H. Schuetze.
-
This lecture has been widely used at other universities, for example:
-
Cornell University, Computer and Information Science. Course on "Information Retrieval".
-
University of Tuebingen. Seminar fuer Sprachwissenschaft. Course on Introduction to Information Retrieval.
-
Stanford University, Dept. of Computer Science. Course on Information Retrieval and Web Search.
COURSE CONTENTS
-
C01: Boolean Retrieval - Introduction and Course Overview - Boolean (Information) Retrieval - Classical Search Model - Incidence Vectors - Inverted Index Construction - Inter-merging postings - Example: WestLaw - Query Optimisation. Boolean Retrieval
-
C02: Dictionaries - Tokens and Terms - Lemmatisation and Stemming - Porter's algorithm - Skip pointers and lists - Phase queries and positional indexes - Selection of a Data Structure: Records, Hash Tables, Binary Trees, B-Trees. Dictionaries
-
C03: Tolerant Retrieval - WildCard Queries: Problem Statement, Application to B-Trees, Bi-grams, Processing of Wild-Card Queries. - Spell Correction: Isolated Word Scenario: Query Misspellings, Levenshtein Distance, n-gram Overlap, Jaccard coefficient, Tanimoto coefficient, matching the tri-grams. - Context-Sensitive Scenario: hit-based spelling correction, conjunction of bi-words. - Soundex - phonetic equivalents: algorithmic idea. Tolerant Retrieval
-
C04: Term Frequency, Inverted Document Frequency (TF/IDF). Ranked Retrieval - Scoring Documents & Weighting Schemes: Jaccard, Binary, Term Frequencies (TF), Document and Inverted Document Frequencies (IDF), Log Frequencies, Collection frequency - Vector Space Model. TF/IDF
-
C05: Measures to evaluate retrieved results. Precision and Recall for Information Retrieval and Classification. - Harmonic combinations of Precision and Recall: F-Measure, Matthews correlation coefficient, van Rijsbergen effectiveness measure, kappa measure. - Precision/Recall curves and the interpolation of discrete values. - Example Queries.Precision & Recall
-
C06: Relevance Feedback - Query Expansion. - Centroid method, Rocchio Algorithm, Relevance Feedback in Vector Spaces. - Positive and Negative Feedback, Assumptions. - Thesaurus-based Query Expansion. Relevance Feedback
-
C07: Exercise: Information Retrieval and Learning. The following text (= pdf document / "Chapter 7 - General Summary" - see below) briefly summarises the main topics. I have written it "at one go", but I very hope that it helps you to better understand the course concerns. Please note that - from an "industrial" point of view, a fast, intelligent and adaptive, and user-friendly information retrieval system directly corresponds to a higher revenue. Executive Summary
-
C08: Text Classification - Motivation of Text Classification. - k nearest neighbour. - Centroid-method. Applications -Projects
-
C09: Text Classification - 3 Projects. The usage of "Microfeatures" with respect to Natural Language Understanding - Disambiguation of Word meanings and Context Classification. - Analysis of Spam Emails: Content Zoning and Digital Image Sorting towards a predictive usage. - Linguistic Description of textual documents regarding the fingerprinting of authors.
-
C10: Link Analysis: Understanding the World Wide Web as a Graph. - What is citation? What are popular pages? - What is a Markov Chain? What is an Ergodic Markov Chain? - How does the PageRank algorithm work? Link Analysis
A course organisation can be found here: Organisation as well as a Examination Check List.
"Information Retrieval and Learning" is mentioned on: Courses | Knowledge Discovery and Data Mining | Mine
|