Lecture "Information Retrieval and Web Search Engines"

Information
Classification: 
Master
Credits: 
4 or 5 (depending on examination rules)
Exam: 
Available Oral Exams Dates: 27.07., 07.-11.08
Regular Dates: 
Thursdays, 15:00-17:30 hrs.
Lecture will take place at the Informatikzentrum, Mühlenpfordtstraße 23, room IZ 161
Contents
Contents: 
Summary

This lecture provides an introduction to the fields of information retrieval and web search. We will discuss how relevant information can be found in very large and mostly unstructured data collections; this is particularly interesting in cases where users cannot provide a clear formulation of their current information need. Web search engines like Google are a typical application of the techniques covered by this course.

This lecture will be in English

Other:
The homework solutions have to be written in English

Materials
  Date Topic References Materials
1 06.04.2017 Introduction and fundamental notions (Bush, 1945)

Slides
Homework
Video is broken, added one from a previous year

2 13.04.2017 Fuzzy retrieval model
Coordination level matching
Vector space model
(Zadeh, 1965)
(Zadeh, 1978)
(Ogawa et al., 1991)
(Salton et al., 1975)
(Luhn, 1961)
(Spärck Jones, 1972)
(Robertson and Spärck Jones, 1976)
(Salton et al., 1983)

Slides
Video

3 20.04.2017 Probabilistic retrieval models (Robertson, 1977)
(Maron and Kuhns, 1960)
(van Rijsbergen, 1977)
(Croft and Harper, 1979)
(Greiff, 1998)

Slides
Video

4 27.04.2017 Indexing (Porter, 1980)
(Zobel and Moffat, 2006)

Slides
Video

5 04.05.2017 Latent Semantic
Indexing
(Salton and Buckley, 1988)
(Deerwester et al., 1990)
(Berry et al., 1995)
(Landauer and Dumais, 1997)

Slides
Homework
Video

6 11.05.2017 Language models
Retrieval Evaluation
(Saracevic, 2007)

Slides
Homework
Video

7 18.05.2017 Document clustering  

Slides
Video

  25.05.2017 Ascension Day/ NO Lecture    
8 01.06.2017 Relevance feedback
Classification
(Rocchio, 1971)

Slides
Homework
Video

  08.06.2017 Excursion Week/ NO Lecture    
9 15.06.2017 Support vector machines (Burges, 1998)
(Ivanciuc, 2007)
(Joachims, 1998)
(Joachims, 2002)

Slides
Video (from previous year)

10 22.06.2017 Introduction to Web Retrieval (Fetterly et al., 2004)

Slides
Video

11 29.06.2017 Web crawling  

Slides
Homework
Video

12 06.07.2017 Link analysis (Boyack et al., 2005)
(Page et al., 1999)
(Kleinberg, 1999)

Slides
Homework
Video

13 13.07.2017 Miscellaneous  

Slides
Video

Books

(Manning et al., 2008)
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. [ http ]
(Belew, 2000)
Richard K. Belew. Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW. Cambridge University Press, 2000. [ http ]
(Baeza-Yates and Ribeiro-Neto, 1999)
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. [ http ]
(van Rijsbergen, 1979)
Cornelis Joost van Rijsbergen. Information Retrieval. Butterworths, second edition, 1979. [ .html ]

Further Readings

(Bush, 1945)
Vannevar Bush. As we may think. The Atlantic Monthly, 176(1):101-108, 1945.
(Zadeh, 1965)
Lotfi A. Zadeh. Fuzzy sets. Information and Control, 8(3):338-353, 1965. [ DOI ]
(Zadeh, 1978)
Lotfi A. Zadeh. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1(1):3-28, 1978. [ DOI ]
(Ogawa et al., 1991)
Yasushi Ogawa, Tetsuya Morita, and Kiyohiko Kobayashi. A fuzzy document retrieval system using the keyword connection matrix and a learning method. Fuzzy Sets and Systems, 39(2):163-179, 1991. [ DOI ]
(Salton et al., 1975)
Gerard Salton, Chung-Shu Yang, and Anita Wong. A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620, 1975. [ DOI ]
(Luhn, 1961)
Hans Peter Luhn. The automatic derivation of information retrieval encodements from machine-readable texts. Volume 3, Part 2 of Information Retrieval and Machine Translation, pages 1021-1028. Interscience Publishers, 1961.
(Spärck Jones, 1972)
Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11-21, 1972. [ DOI
(Robertson and Spärck Jones, 1976)
Stephen E. Robertson and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129-146, 1976.
(Salton et al., 1983)
Gerard Salton, Edward A. Fox, and Harry Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022-1036, 1983. [ DOI ]
(Robertson, 1977)
Stephen E. Robertson. The probabilistic ranking principle in IR. The Journal of Documentation, 33(4):294-304, 1977.
(Maron and Kuhns, 1960)
Melvin E. Maron and John L. Kuhns. On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7(3):216-244, 1960. [ DOI ]
(van Rijsbergen, 1977)
Cornelis Joost van Rijsbergen. A theoretical basis for the use of co-occurrence data in information retrieval. The Journal of Documentation, 33(2):106-119, 1977. [ http ]
(Croft and Harper, 1979)
W. Bruce Croft und David J. Harper. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4):285-295, 1979.
(Greiff, 1998)
Warren R. Greiff. A theory of term weighting based on exploratory data analysis. In W. Bruce Croft, Alistair Moffat, Cornelis Joost van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pages 11-19. ACM Press, 1998. [ DOI ]
(Salton and Buckley, 1988)
Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513-523, 1988. [ DOI ]
(Deerwester et al., 1990)
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407, 1990. [ DOI ]
(Berry et al., 1995)
Michael W. Berry, Susan T. Dumais, and Gavin W. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573-595, 1995. [ DOI ]
(Landauer and Dumais, 1997)
Thomas K. Landauer and Susan T. Dumais. A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211-240, 1997. [ DOI ]
(Saracevic, 2007)
Tefko Saracevic. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: Nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(13):1915-1933, 2007. [ DOI ]
(Rocchio, 1971)
Joseph J. Rocchio, Jr. Relevance feedback in information retrieval. In Gerard Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313-323. Prentice Hall, 1971.
(Burges, 1998)
Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121-167, 1998. [ DOI ]
(Ivanciuc, 2007)
Ovidiu Ivanciuc. Applications of support vector machines in chemistry. In Kenneth B. Lipkowitz, Thomas R. Cundari, and Donald B. Boyd, editors, Reviews in Computational Chemistry, volume 23, pages 291-400. Wiley, 2007. [ .pdf ]
(Joachims, 1998)
Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Claire Nédellec and Céline Rouveirol, editors, Proceedings of the 10th European Conference on Machine Learning (ECML 1998), volume 1398 of Lecture Notes in Artificial Intelligence, pages 137-142. Springer, 1998. [ DOI ]
(Joachims, 2002)
Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2002), pages 133-142. ACM Press, 2002. [ DOI ]
(Porter, 1980)
Martin F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137, 1980.
(Zobel and Moffat, 2006)
Justin Zobel and Alistair Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. [ DOI ]
(Fetterly et al., 2004)
Dennis Fetterly, Mark Manasse, Marc Najork, and Janet L. Wiener. A large-scale study of Web pages. Software: Practice and Experience, 34(2):231-237, 2004. [ DOI ]
(Boyack et al., 2005)
Kevin W. Boyack, Richard Klavans, and Katy Börner. Mapping the backbone of science Scientometrics, 64(3):351-374, 2005. [ DOI ]
(Page et al., 1999)
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the Web. Technical Report 1999-66, Stanford InfoLab, 1999.
(Kleinberg, 1999)
Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604-632, 1999. [ DOI ]