Seminar “Information Extraction: How to Read the Web”

Master Informatik
Regular Dates: 
Tuesdays 15:00-16:30
Room IZ 251

Within this seminar, we will have a close look at the most important text analysis methods. Each participant will present a method, starting with a paper provided by us, which will be used as seed for further literature research.

Important notice:

In order to better understand the subject of the seminar, please attend the seminar presentation on Tuesday 04.02.2010, 15:00 in PK2.2.
Seminar presentation slides, here.

The topics will be assigned on our first meeting on Tue 06.04.2010 15:00 in IZ 251.


Registration of participants will be handled using the QIS portal.

Every week exactly one participant will present her/his seminar topic by giving a scientific talk. Each talk should take between 30 and 45 minutes. Thereafter, the speaker gets (constructive) feedback from the other participants regarding her/his presentation skills. The main goal of this seminar is to improve these skills. At the first seminar meeting, we will work out together how a good talk should look like and how to avoid typical mistakes.

For passing this seminar, we expect the following from each participant:

  • Carefully prepare your talk; an outline of the talk has to be explained to and discussed with the teaching staff at least two weeks before giving your talk
  • Give an excellent talk
  • Provide detailed feedback on the other participants' talks within discussions
  • Attend every seminar meeting


Subject Date Paper Student
Topic assignment 06.04 - -
How to hold a good speech 13.04 - -
3 Weeks Pause
Text Mining Overview  04.05  [HO05], [ST07]  Philipp Wille
1 Week Pause
Part of Speech Tagging with HMM  18.05  [BR00]  Mahmoud Alfarra
Named Entity Recognition  01.06 [NA07]  Patrick Werner
 Word Sense Disambiguation  15.06 [YA95] Alexander Raue
Ontologies in Text Mining  22.06 [SP05] Anna-Lena Berndt
Implicit Product Feature Extraction  29.06 [GH06] Simon Barthel
Review Mining  06.07 [HU04] Axel Schön
Grades 13.07 - -

 Seed Papers

[HO05] Hotho, A., A. Nürnberger, and G. Paaß (2005). A Brief Survey of Text Mining. Journal for Language Technology and Computational
Linguistics (JLCL) 20(1), 19–62. [URL]

[ST07] A.Stavrianou, P. Andritsos, N. Nicoloyannis “Overview and semantic issues of text mining”, SIGMOD Record, 2007, Vol.36,N03, 2007 [DOI]

[BR00] Brants, T. (2000b). TnT – A Statistical Part-of-Speech Tagger. In Proceedings of the Sixth Conference on Applied Natural Language Processing ANLP-2000. Seattle, WA. [DOI]

[TO00] Kristina Toutanova and Christopher Manning. 2000. Enriching the knowledge sources used in a maximum entropy part-of speech tagger. In EMNLP/VLC 1999, pages 63–71. [DOI]

[NA07] David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification. Linguisticae Investigationes, 30(1):3–26, 2007. [URL]

[YA95] D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the .33rd Ann& Meetzng of the Association for Computational Linguistics, pages 189-196, 1995. [DOI]

[PA02] Pang, B., Lee, L., and Vaithyanathan, S., 2002. Thumbs up? Sentiment Classification Using Machine Learning Techniques. In Proc. of EMNLP 2002 [DOI]

[SP05] Spasic, I. et al. (2005) Text mining and ontologies in biomedicine: making sense of raw text. Brief Bioinform. 6, 239–251 [DOI]

[GH06] Ghani, R., Probst, K., Liu, Y., Krema, M., and Fano, A. Text mining for product attribute extraction. SIGKDD Explorations 1, 8 (June 2006), 41–48.[DOI]

[HU04] Hu, M., Liu, B.: Mining and Summarizing Customer Reviews. In: Procs. of KDD, Seattle, WA (2004) 168–177 [DOI]