Information Retrieval and Web Search Engines

Information
Classification: 
Master
Credits: 
4 or 5 (depending on examination rules)
Exam: 
Oral
Regular Dates: 
Videos will be delivered online Thursdays
Live stream Thursdays (03:00 pm)
Contents
Contents: 

Oral Exams in Summer Semester
July 29, 30, 31
August 19, 20, 21
September 3 and 4
September 8 and 9
Please arrange an appointment via informationssystemeattu-bs [dot] de (email) or phone.

 

Registration for this lecture via Studi.IP is necessary: Link

We will have our next live-steam on Thursday (09.07, 03:00 pm) via WebEX. The WebEX (room) link can be found on Studi.IP 

What will this year's lecture look like?

We will not be able to offer a “classroom” lecture this semester. Instead, we will upload a lecture video, lecture slides every Thursday as well as exercise sheets.

Please note: The exercises will be neither collected, nor corrected, or graded.

In addition, we will offer a live stream every Thursday at 03:00 pm.
In these live stream it is possible to ask questions and discuss the lecture content of the current week with other students.

 
Summary

This lecture provides an introduction to the fields of information retrieval and web search. We will discuss how relevant information can be found in very large and mostly unstructured data collections; this is particularly interesting in cases where users cannot provide a clear formulation of their current information need. Web search engines like Google are a typical application of the techniques covered by this course.

This lecture will be in English

Materials
  Date Topic References Materials
1 23.04.2020 Introduction and fundamental notions (Bush, 1945)

Slides
Homework
Video

2 30.04.2020 Fuzzy retrieval model
Coordination level matching
Vector space model
(Zadeh, 1965)
(Zadeh, 1978)
(Ogawa et al., 1991)
(Salton et al., 1975)
(Luhn, 1961)
(Spärck Jones, 1972)
(Robertson and Spärck Jones, 1976)
(Salton et al., 1983)

Slides
Homework
Video

StudIP

3 07.05.2020 Probabilistic retrieval models (Robertson, 1977)
(Maron and Kuhns, 1960)
(van Rijsbergen, 1977)
(Croft and Harper, 1979)
(Greiff, 1998)

Slides
Homework
Video

4 14.05.2020 Indexing (Porter, 1980)
(Zobel and Moffat, 2006)

Slides
Homework
Video

5 28.05.2020 Latent Semantic
Indexing
(Salton and Buckley, 1988)
(Deerwester et al., 1990)
(Berry et al., 1995)
(Landauer and Dumais, 1997)

Slides
Homework
Video

6 04.06.2020 Language models
Retrieval Evaluation
(Saracevic, 2007)

Slides
Homework
Video

7 11.06.2020 Document clustering  

Slides
Homework
Video

8 18.06.2020 Relevance feedback
Classification
(Rocchio, 1971)

Slides
Homework
Video

9 25.06.2020 Support vector machines (Burges, 1998)
(Ivanciuc, 2007)
(Joachims, 1998)
(Joachims, 2002)

Slides
No Homework
Video

10 02.06.2020 Introduction to Web Retrieval (Fetterly et al., 2004)

Slides
Homework
Video

11 09.07.2020 Web crawling  

Slides
Homework
Video

12 16.07.2020 Link analysis (Boyack et al., 2005)
(Page et al., 1999)
(Kleinberg, 1999)

Slides
Homework
Video

13 23.07.2020 Miscellaneous  

Slides

No Homework
Video

Books

(Manning et al., 2008)
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. [ http ]
(Belew, 2000)
Richard K. Belew. Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW. Cambridge University Press, 2000. [ http ]
(Baeza-Yates and Ribeiro-Neto, 1999)
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. [ http ]
(van Rijsbergen, 1979)
Cornelis Joost van Rijsbergen. Information Retrieval. Butterworths, second edition, 1979. [ .html ]

Further Readings

(Bush, 1945)
Vannevar Bush. As we may think. The Atlantic Monthly, 176(1):101-108, 1945.
(Zadeh, 1965)
Lotfi A. Zadeh. Fuzzy sets. Information and Control, 8(3):338-353, 1965. [ DOI ]
(Zadeh, 1978)
Lotfi A. Zadeh. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1(1):3-28, 1978. [ DOI ]
(Ogawa et al., 1991)
Yasushi Ogawa, Tetsuya Morita, and Kiyohiko Kobayashi. A fuzzy document retrieval system using the keyword connection matrix and a learning method. Fuzzy Sets and Systems, 39(2):163-179, 1991. [ DOI ]
(Salton et al., 1975)
Gerard Salton, Chung-Shu Yang, and Anita Wong. A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620, 1975. [ DOI ]
(Luhn, 1961)
Hans Peter Luhn. The automatic derivation of information retrieval encodements from machine-readable texts. Volume 3, Part 2 of Information Retrieval and Machine Translation, pages 1021-1028. Interscience Publishers, 1961.
(Spärck Jones, 1972)
Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11-21, 1972. [ DOI
(Robertson and Spärck Jones, 1976)
Stephen E. Robertson and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129-146, 1976.
(Salton et al., 1983)
Gerard Salton, Edward A. Fox, and Harry Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022-1036, 1983. [ DOI ]
(Robertson, 1977)
Stephen E. Robertson. The probabilistic ranking principle in IR. The Journal of Documentation, 33(4):294-304, 1977.
(Maron and Kuhns, 1960)
Melvin E. Maron and John L. Kuhns. On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7(3):216-244, 1960. [ DOI ]
(van Rijsbergen, 1977)
Cornelis Joost van Rijsbergen. A theoretical basis for the use of co-occurrence data in information retrieval. The Journal of Documentation, 33(2):106-119, 1977. [ http ]
(Croft and Harper, 1979)
W. Bruce Croft und David J. Harper. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4):285-295, 1979.
(Greiff, 1998)
Warren R. Greiff. A theory of term weighting based on exploratory data analysis. In W. Bruce Croft, Alistair Moffat, Cornelis Joost van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pages 11-19. ACM Press, 1998. [ DOI ]
(Salton and Buckley, 1988)
Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513-523, 1988. [ DOI ]
(Deerwester et al., 1990)
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407, 1990. [ DOI ]
(Berry et al., 1995)
Michael W. Berry, Susan T. Dumais, and Gavin W. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573-595, 1995. [ DOI ]
(Landauer and Dumais, 1997)
Thomas K. Landauer and Susan T. Dumais. A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211-240, 1997. [ DOI ]
(Saracevic, 2007)
Tefko Saracevic. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: Nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(13):1915-1933, 2007. [ DOI ]
(Rocchio, 1971)
Joseph J. Rocchio, Jr. Relevance feedback in information retrieval. In Gerard Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313-323. Prentice Hall, 1971.
(Burges, 1998)
Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121-167, 1998. [ DOI ]
(Ivanciuc, 2007)
Ovidiu Ivanciuc. Applications of support vector machines in chemistry. In Kenneth B. Lipkowitz, Thomas R. Cundari, and Donald B. Boyd, editors, Reviews in Computational Chemistry, volume 23, pages 291-400. Wiley, 2007. [ .pdf ]
(Joachims, 1998)
Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Claire Nédellec and Céline Rouveirol, editors, Proceedings of the 10th European Conference on Machine Learning (ECML 1998), volume 1398 of Lecture Notes in Artificial Intelligence, pages 137-142. Springer, 1998. [ DOI ]
(Joachims, 2002)
Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2002), pages 133-142. ACM Press, 2002. [ DOI ]
(Porter, 1980)
Martin F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137, 1980.
(Zobel and Moffat, 2006)
Justin Zobel and Alistair Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. [ DOI ]
(Fetterly et al., 2004)
Dennis Fetterly, Mark Manasse, Marc Najork, and Janet L. Wiener. A large-scale study of Web pages. Software: Practice and Experience, 34(2):231-237, 2004. [ DOI ]
(Boyack et al., 2005)
Kevin W. Boyack, Richard Klavans, and Katy Börner. Mapping the backbone of science Scientometrics, 64(3):351-374, 2005. [ DOI ]
(Page et al., 1999)
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the Web. Technical Report 1999-66, Stanford InfoLab, 1999.
(Kleinberg, 1999)
Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604-632, 1999. [ DOI ]
AttachmentSize
Exercise 01 - Introduction.pdf65.87 KB
irws-ws2020-v1.pdf2.75 MB
Exercise 02 - MRM.pdf179.4 KB
irws-ws2017-v2.pdf1.97 MB
irws-ws2020-v2.pdf1.97 MB
StudIP.pdf800.07 KB
Exercise 03.pdf186 KB
irws-ws2020-v3.pdf2.12 MB
Exercise 04.pdf73.82 KB
irws-ws2020-v4.pdf2.11 MB
irws-ws2020-v5.pdf2.31 MB
Exercise 05.pdf116.74 KB
irws-ws2020-v6.pdf1.43 MB
Exercise 06.pdf122.57 KB
irws-ws2020-v7.pdf1.6 MB
Exercise 07.pdf130.39 KB
irws-ws2020-v8.pdf2.53 MB
Exercise 08.pdf70.23 KB
irws-ws2020-v9.pdf1.55 MB
irws-ws2020-v10.pdf2.14 MB
Exercise 10.pdf107.03 KB
irws-ws2020-v11.pdf2.49 MB
irws-ws2020-v11.pdf2.49 MB
Exercise 11.pdf308.86 KB
Exercise 11.pdf308.86 KB
Exercise 11.pdf310.82 KB
irws-ws2020-v12.pdf2.3 MB
Exercise 12.pdf222.98 KB
irws-ws2020-v13.pdf1.42 MB