Lecture "Multimedia Databases"

4 or 5 (depending on examination rules)
Oral. Examination dates: 16.08 - 19.08 (4 days) and 26.09-30.09 (5 days).
Regular Dates: 
Thursdays, 9:45 - 12:15; IZ 161; first lecture on April 7, 2011

In this course, we examine the aspects regarding building multimedia database systems and give an insight into the used techniques. The course deals with content-specific retrieval of multimedia data. Basic issue is the efficient storage and subsequent retrieval of multimedia documents.

The general structure of the course is:

  • Basic characteristics of multimedia databases
  • Evaluation of retrieval effectiveness, Precision-Recall Analysis
  • Semantic content of image-content search
  • Image representation, low-level and high-level features
  • Texture features, random-field models 
  • Audio formats, sampling, metadata
  • Thematic search within music tracks
  • Query formulation in music databases
  • Media representation for video
  • Frame / Shot Detection, Event Detection
  • Video segmentation and video summarization
  • Video Indexing, MPEG-7
  • Extraction of low-and high-level features
  • Integration of features and efficient similarity comparison
  • Indexing over inverted file index, indexing Gemini, R *- trees


Date Topic Slides Exercises Video Literature

Basic concepts

Evaluation procedures

Slides - Print Slides None Video1  

BR99 (P. 1–18), Sch05 (P. 1–15), Chr85

14.04.11 Features introduction Slides - Print Slides Exercise1 Video2

CB02 (P. 261–284), Sch05 (P. 67–91), Sch05 (P. 91–96)

Color features and color histograms CB02 (P. 285–311)
Matching of color histograms:

CB02 (P. 285–311), Sch05 (S. 170–174), Sch05 (P. 229–231), Sch05 (P. 175–179), Smi97 , SB91 , HCP95 , SD96


Texture Features

Slides - Print Slides   Video3

CB02 (P. 313–344), RL93

Low-Level Texture Features

CB02 (P. 313–344), Jul62, JGSF73, Jul75, Jul81

Tamura Measure

CB02 (S. 313–344), TMY78, RT71, RTL72, EN94

Random Field Models CB02 (P. 313–344), Sch05 (P. 111–146)
Transform Domain Features

CB02 (P. 313–344), Woo72, Bes74, MJ92

28.04.11 Multiresolution Analysis Slides - Print Slides Exercise2 Video4

Sch05 (P. 134–137), Mal89

Form based Features CB02 (P. 345–372)

RC78, ZRL77, MM97

Edge Detection

BL79, KWT88

Morphological Operators


05.05.11 Chain Codes Slides - Print Slides Exercise3


Fre61a, Fre61b, BG78, CMVZ94

Area based Retrieval  

Bar81, Blu73, SK05

Moment Invariants

Woo96, Hu62

Query by Visual example

HK92, Ege97

12.05.11 Introduction in Audio Retrieval Slides - Print Slides   Video6_1, Video6_2


19.05.11 Audio Low level Features Slides - Print Slides Exercise4 - starter - audio Video7

LH98, WBKW96

Difference Limen JWG77
Pitch Recognition

Fle34, Gre90, Gol73, Sch68, Nol69, KS00, GR69

26.05.11  Query by Humming Slides - Print Slides   Video8  GLCS95
Melody Representation

Par75, MS90, KNSYK00, BC94, ZS03

Hidden Markov Model Rab89
09.06.11 Hodden Markov Model  Slides - Print Slides   Video9  

Vit67, BPSW70

 Video Retrieval  
23.06.11  Shot Detection Slides - Print Slides Exercise5 Video10  

ZKS93, Ton91, IP96, TD98, MJC95, VL00

30.06.11 Video Similarity Slides - Print Slides   Video11  


07.07.11 Video Abstraction Slides - Print Slides   Video12  

SC02, PLE01, RBK98

14.07.11 Indexes Slides - Print Slides   Video13  

CB02 (P. 373–434), Sch05 (P. 261–302), Gut84, SRF87, BKSS90, BKK96, CPZ97






 [CB02] Vittorio Castelli and Lawrence D. Bergman, editors. Image Databases. Search and Retrieval of Digital Imagery. Wiley, 2002. [ .html ]

[BR99] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. [ http ]

[Par75] Denys Parsons. The Directory of Tunes and Musical Themes. Spencer Brown, 1975.

[Sch05] Ingo Schmitt. Ähnlichkeitssuche in Multimedia-Datenbanken. Retrieval, Suchalgorithmen und Anfragebehandlung. Oldenbourg, 2005. [ http ]

[vR79] Cornelis Joost van Rijsbergen. Information Retrieval. Butterworths, second edition, 1979. [ .html ]



Note: Many of the following documents are available for download free of charge, through the univeristy network. The ones that are not free of charge can be obtained as printed versions from the univeristy library.

[Bar81] Alan H. Barr. Superquadratics and angle-preserving transformations. IEEE Computer Graphics and Applications, 1(1):11-23, 1981. [ http ]

[Bes74] Julian Besag. Spatial interaction and the statistical analysis of lattice systems.Journal of the Royal Statistical Society, Series B (Methodological), 36(2):192-236, 1974. [ http ]

[BC94] Donald J. Berndt and James Clifford. Using dynamic time warping to find patterns in time series. In Usama M. Fayyad and Ramasamy Uthurusamy, editors, Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop, pages 359-370. AAAI Press, 1994.

[BDF+97] Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Jospeh M. Hellerstein, Yannis Ioannidis, Hosagrahar V. Jagadish, Theodore Johnson, Raymond Ng, Viswanath Poosala, Kenneth A. Ross, and Kenneth C. Sevcik. The new jersey data reduction report. Bulletin of the Technical Committee on Data Engineering, 20(4):3-42, 1997. [ .pdf ]

[BG78] Ernesto Bribiesca and Adolfo Guzmán. Shape detection and shape similarity measurement for two-dimensional regions. In Proceedings of the 4th International Joint Conference on Pattern Recognition, pages 608-612, 1978.





[BGRS98] Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is “nearest neighbor” meaningful? In Catiel Beeri and Peter Buneman, editors, Proceedings of the 7th International Conference on Database Theory (ICDT 1999), volume 1540 of Lecture Notes in Computer Science, pages 217-235. Springer, 1999. [ http ]









[BKSS90] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In Hector Garcia-Molina and Hosagrahar V. Jagadish, editors, Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD 1990), pages 322-331. ACM Press, 1990. [ DOI ]
[BKK96] Stefan Berchtold, Daniel A. Keim, and Hans-Peter Kriegel. The X-tree: An index structure for high-dimensional data. In T. M. Vijayaraman, Alejandro P. Buchmann, Chandrasekaran Mohan, and Nandlal L. Sarda, editors, Proceedings of 22th International Conference on Very Large Data Bases (VLDB 1996), pages 28-39. Morgan Kaufmann, 1996. [ .html ]
[BL79] Serge Beucher and Christian Lantuejoul. Use of watersheds in contour detection. In Proceedings of the International Workshop on Image Processing, Real-Time Edge and Motion Detection/Estimation, 1979. [ .pdf ]
[Blu73] Harry Blum. Biological shape and visual science (part I). Journal of Theoretical Biology, 38(2):205-287, 1973. [ DOI ]
[BPSW70] Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1):164-171, 1970. [ http ]
[Chr85] Stavros Christodoulakis. Multimedia data base management: Applications and problems. A position paper. In Shamkant B. Navathe, editor, Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data (SIGMOD 1985), pages 304-305, 1985. [ DOI ]
[CMVZ94] Guido Cortelazzo, Gian A. Mian, G. Vezzi, and Piero Zamperoni. Trademark shapes description by string-matching techniques. Pattern Recognition, 27(8):1005-1018, 1994. [ DOI ]
[CPZ97] Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, and Manfred A. Jeusfeld, editors, Proceedings of 23th International Conference on Very Large Data Bases (VLDB 1997), pages 426-435. Morgan Kaufmann, 1997. [ .html ]
[CZ03] Sen-ching Samson Cheung and Avideh Zakhor. Efficient video similarity measurement with video signature. IEEE Transactions on Circuits and Systems for Video Technology, 13(1):59-74, 2003. [ DOI ]
[DDFLH90] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407, 1990. [ DOI ]
[Ege97] Max J. Egenhofer. Query processing in spatial-query-by-sketch. Journal of Visual Languages and Computing, 8(4):403-424, 1997. [ DOI ]
[EN94] William Equitz and Wayne Niblack. Retrieving images from a database using texture. Algorithms from the QBIC system. Technical Report RJ-9805, IBM Almaden Research Center, 1994.
[Fal95] Christos Faloutsos. Fast searching by content in multimedia databases. Bulletin of the Technical Committee on Data Engineering, 18(4):31-40, 1995. [ .pdf ]
[FL95] Christos Faloutsos and King-Ip Lin. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Michael J. Carey and Donovan A. Schneider, editors, Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD 1995), pages 163-174. ACM Press, 1995. [ DOI ]
[Fle34] Harvey Fletcher. Loudness, pitch and the timbre of musical tones and their relation to the intensity, the frequency and the overtone structure. Journal of the Acoustical Society of America, 6(2):59-69, 1934. [ DOI ]
[Fre61a] Herbert Freeman. On the encoding of arbitrary geometric configurations. IRE Transactions on Electronic Computers, 10(2):260-268, 1961.
[Fre61b] Herbert Freeman. A technique for the classification and recognition of geometric patterns. In Actes du 3e Congrès International de Cybernétique, 1961.
[GLCS95] Asif Ghias, Jonathan Logan, David Chamberlin, and Brian C. Smith. Query by humming: Musical information retrieval in an audio database. In Proceedings of the 3rd ACM International Conference on Multimedia (ACM MM 1995), pages 231-236. ACM Press, 1995. [ DOI ]
[Gol73] Julius L. Goldstein. An optimum processor theory for the central formation of the pitch of complex tones. Journal of the Acoustical Society of America, 54(6):1496-1516, 1973. [ DOI ]
[GR69] Bernard Gold and Lawrence R. Rabiner. Parallel processing techniques for estimating pitch periods of speech in the time domain. Journal of the Acoustical Society of America, 46(2). [ DOI ]
[Gre90] Donald D. Greenwood. A cochlear frequency-position function for several species—29 years later. Journal of the Acoustical Society of America, 87(6):2592-2605, 1990. [ DOI ]
[Gut84] Antonin Guttman. R-trees: A dynamic index structure for spatial searching. In Beatrice Yormark, editor, Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD 1984), pages 47-57. ACM Press, 1984. [ DOI ]
[HCP95] Wynne Hsu, Tat-Seng Chua, and Hung Keng Pung. An integrated color-spatial approach to content-based image retrieval. In Proceedings of the 3rd ACM International Conference on Multimedia (ACM Multimedia 1995), pages 305-313. ACM Press, 1995. [ DOI ]
[HK92] Kyoji Hirata and Toshikazu Kato. Query by visual example—content based image retrieval. In Alain Pirotte, Claude Delobel, and Georg Gottlob, editors, Advances in Database Technology. Proceedings of the 3rd International Conference on Extending Database Technology (EDBT 1992), volume 580 of Lecture Notes in Computer Science, pages 56-71. Springer, 1992. [ DOI ]
[HSZ87] Robert M. Haralick, Stanley R. Sternberg, and Xinhua Zhuang. Image analysis using mathematical morphology. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4):532-550, 1987.
[Hu62] Ming-Kuei Hu. Visual pattern recognition by moment invariants. IEEE Transactions on Information Theory, 8(2):179-187, 1962. [ http ]
[IP96] Fayez M. Idris and Sethuraman Panchanathan. Indexing of compressed video sequences. In Ishwar K. Sethi and Ramesh C. Jain, editors, Storage and Retrieval for Still Image and Video Databases IV, volume 2670 of Proceedings of SPIE, pages 247-253. SPIE, 1996. [ DOI ]
[JGSF73] Bela Julesz, Edgar N. Gilbert, Larry A. Shepp, and Harry L. Frisch. Inability of humans to discriminate between visual textures that agree in second-order statistics—revisited. Perception, 2(4):391-405, 1973. [ DOI ]
[Jul62] Bela Julesz. Visual pattern discrimination. IRE Transactions on Information Theory, 8(2):84-92, 1962. [ http ]
[Jul75] Bela Julesz. Experiments in the visual perception of texture. Scientific American, 232(4):34-43, 1975.
[Jul81] Bela Julesz. Textons, the elements of texture perception, and their interactions. Nature, 290(12):91-97, 1981. [ DOI ]
[JWG77] Walt Jesteadt, Craig C. Wier, and David M. Green. Intensity discrimination as a function of frequency and sensation level. Journal of the Acoustical Society of America, 61(1):169-177, 1977. [ DOI ]
[KS00] Hajime Kobayashi and Tetsuya Shimamura. A weighted autocorrelation method for pitch extraction of noisy speech. In Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), volume 3, pages 1307-1310. IEEE, 2000. [ DOI ]
[KNSYK00] Naoko Kosugi, Yuichi Nishihara, Tetsuo Sakata, Masashi Yamamuro, and Kazuhiko Kushima. A practical query-by-humming system for a large music database. In Proceedings of the 8th ACM International Conference on Multimedia (ACM MM 2000), pages 333-342. ACM Press, 2000. [ DOI ]
[KWT88] Michael Kass, Andrew Witkin, and Demetri Terzopoulos. Snakes: Active contour models. International Journal of Computer Vision, 1(4):321-331, 1988. [ DOI ]
[LH98] Guojun Lu and Templar Hankinson. A technique towards automatic audio classification and retrieval. In Proceedings of the 4th International Conference on Signal Processing (ICSP 1998), volume 2, pages 1142-1145. IEEE, 1998. [ DOI ]
[Mal89] Stéphane G. Mallat. Multifrequency channel decompositions of images and wavelet models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(12):2091-2110, 1989. [ DOI ]
[MJ92] Jianchang Mao and Anil K. Jain. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recognition, 25(2):173-188, 1992. [ DOI ]
[MJC95] Jianhao Meng, Yujen Juan, and Shih-Fu Chang. Scene change detection in an MPEG compressed video sequence. In Arturo A. Rodriguez, Robert J. Safranek, and Edward J. Delp, editors, Digital Video Compression: Algorithms and Technologies 1995, volume 2419 of Proceedings of SPIE, pages 14-25. SPIE, 1995. [ DOI ]
[MM97] Wei-Ying Ma and B. S. Manjunath. Edge Flow: A framework of boundary detection and image segmentation. In Proceedings of the 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 744-749. IEEE Computer Society, 1997. [ DOI ]
[MS90] Marcel Mongeau and David Sankoff. Comparison of musical sequences. Computers and the Humanities, 24(3):161-175, 1990. [ DOI ]
[Nol69] A. Michael Noll. Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum, and a maximum likelihood estimate. In Jerome Fox, editor, Proceedings of the Symposium on Computer Processing in Communications, volume 19 of Microwave Research Institute Symposia Series, pages 779-797. Polytechnic Press of the Polytechnic Institute of Brooklyn, 1969.
[Nyq28] Harry Nyquist. Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers, 47:617-644, 1928. Reprint. [ DOI ]
[PLE01] Silvia Pfeiffer, Rainer Lienhart, and Wolfgang Efflsberg. Scene determination based on video and audio features. Multimedia Tools and Applications, 15(1):59-81, 2001. [ DOI ]
[Rab89] Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286, 1989. [ DOI ]
[RBK98] Henry A. Rowley, Shumeet Baluja, and Takeo Kanade. Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23-38, 1998. [ DOI ]
[RC78] T. W. Ridler und S. Calvard. Picture thresholding using an iterative selection method. IEEE Transactions on Systems, Man, and Cybernetics, 8(6):630-632, 1978.
[RL93] A. Ravishankar Rao and Gerald L. Lohse. Identifying high level features of texture perception. CVGIP: Graphical Models and Image Processing, 55(3):218-233, 1992. [ DOI ]
[RT71] Azriel Rosenfeld and Mark Thurston. Edge and curve detection for visual scene analysis. IEEE Transactions on Computers, 20(5):562-569, 1971. [ http ]
[RTL72] Azriel Rosenfeld, Mark Thurston, and Yung-Han Lee. Edge and curve detection: Further experiments. IEEE Transactions on Computers, 21(7):677-715, 1972. [ http ]
[SB91] Michael J. Swain and Dana H. Ballard. Color indexing. International Journal of Computer Vision, 7(1):11-32, 1991. [ DOI ]
[SC02] Hari Sundaram and Shih-Fu Chang. Computable scenes and structures in films. IEEE Transactions on Multimedia, 4(4):482-491, 2002. [ DOI ]
[Sch68] Manfred R. Schroeder. Period histogram and product spectrum: New methods for fundamental-frequency measurement. Journal of the Acoustical Society of America, 43(4):829-834, 1968. [ DOI ]
[SD96] Markus Stricker and Alexander Dimai. Color indexing with weak spatial constraints. In Proceedings of Storage and Retrieval for Image and Video Databases IV, 1996. [ .html ]
[SK05] Thomas B. Sebastian and Benjamin B. Kimia. Curves vs. skeletons in object recognition. Signal Processing, 85(2):247-263, 2005. [ DOI ]
[Smi97] John R. Smith. Integrated Spatial and Feature Image Systems: Retrieval , Analysis and Compression. PhD thesis, Columbia University, 1997. [ .html ]
[SRF87] Timos K. Sellis, Nick Roussopoulos, and Christos Faloutsos. The R+-tree: A dynamic index for multi-dimensional objects. In Peter M. Stocker, William Kent, and Peter Hammersley, editors, Proceedings of 13th International Conference on Very Large Data Bases (VLDB 1987), pages 507-518. Morgan Kaufmann, 1987. [ .html ]
[TD98] Cüneyt M. Taskiran and Edward J. Delp. Video scene change detection using the generalized sequence trace. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 1998), volume 5, pages 2961-2964. IEEE, 1998. [ DOI ]
[TMY78] Hideyuki Tamura, Shunji Mori, and Takashi Yamawaki. Textural features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics, 8(6):460-473, 1978. [ DOI ]
[Ton91] Yoshinobu Tonomura. Video handling based on structured information for hypermedia systems. In International Conference on Multimedia Information Systems 1991, pages 333-344. McGraw-Hill, 1991.
[Vit67] Andrew Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260-269, 1967. [ http ]
[VL00] Nuno Vasconcelos and Andrew Lippman. Statistical models of video structure for content analysis and characterization. IEEE Transactions on Image Processing, 9(1):3-19, 2000. [ DOI ]
[WBKW96] Erling Wold, Thom Blum, Douglas Keislar, and James Wheaton. Content-based classification, search, and retrieval of audio. IEEE Multimedia, 3(3):27-36, 1996. [ DOI ]
[Woo72] John W. Woods. Two-dimensional discrete markovian fields. IEEE Transactions on Information Theory, 18(2):232-240, 1972. [ http ]
[Woo96] Jeffrey Wood. Invariant pattern recognition: A review. Pattern Recognition, 29(1):1-17, 1996. [ DOI ]
[WSB98] Roger Weber, Hans-Jörg Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Ashish Gupta, Oded Shmueli, and Jennifer Widom, editors, Proceedings of 24th International Conference on Very Large Data Bases (VLDB 1998), pages 194-205. Morgan Kaufmann, 1998. [ .html ]
[ZKS93] HongJiang Zhang, Atreyi Kankanhalli, and Stephen W. Smoliar. Automatic partitioning of full-motion video. Multimedia Systems, 1(1):10-28, 1993. [ DOI ]
[ZRL77] W. E. Rogers und S. A. Latt G. W. Zack. Automatic measurement of sister chromatid exchange frequency. The Journal of Histochemistry and Cytochemistry, 25(7):741-753, 1977. [ http ]
[ZS03] Yunyue Zhu and Dennis Shasha. Warping indexes with envelope transforms for query by humming. In Alon Y. Halevy, Zachary G. Ives, and AnHai Doan, editors, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003), pages 181-192. ACM Press, 2003. [ DOI ]





File C5.pdf06/05/11 1:51 pm2.07 MB
File Print_C5.pdf06/05/11 1:51 pm1.31 MB
File mmdb_ss11-u3.pdf06/05/11 1:52 pm199.16 KB
File C6.pdf16/05/11 12:14 pm1.75 MB
File Print_C6.pdf16/05/11 12:14 pm1.3 MB
File C7.pdf20/05/11 10:58 am2.6 MB
File Print_C7.pdf20/05/11 10:58 am1.75 MB
File mmdb_ss11-u4.pdf20/05/11 10:58 am243.87 KB
File starter.zip20/05/11 10:59 am795 bytes
File audio.zip20/05/11 10:59 am11.19 MB
File C8.pdf30/05/11 9:04 am2.53 MB
File Print_C8.pdf30/05/11 9:04 am1.66 MB
File C11.pdf01/07/11 2:50 pm3.98 MB
File Print_C11.pdf01/07/11 2:50 pm1.66 MB
File C13.pdf18/07/11 9:39 am2.17 MB
File Print_C13.pdf18/07/11 9:40 am1.37 MB