The recent advances in multimedia hardware, networking infrastructure, computer vision processing, speech processing and programming language technologies more and more allow to construct large-scale distributed multimedia applications such as tele-immersive collaborative environments, surveillance systems with multi-arrays of cameras, multi-arrays surround sound systems and streaming applications with hundreds of flows, distributed over LANs or WANs (e.g. , ). However, all these systems are still being built and programmed as monolithic and ‘proof of concept’ systems. Less attention is being paid to build these large-scale distributed multimedia systems in a scalable, easy-programmable and flexible service-oriented manner, with a maximum of possible reusability of common components. This has to change; otherwise the multimedia community will either require very large teams to build large-scale systems (see e.g. the Coliseum system from HP ) or will have to validate novel approaches on insignificantly small systems with two to three 640 by 480 pixel video streams.
To advance the state of the art in construction of sophisticated large-scale multimedia systems, and avoid the above discussed situation, we want to create synergies between two relevant communities: the multimedia community that has an excellent understanding of the various media types, their description, processing and networking behaviors, and the Web community that has an excellent understanding of services, their description, semantics and behaviors in large-scale systems. The two communities meet in the special session on “multimedia service composition” during the ACM Multimedia 2004 conference, and discuss service composition concepts, well-known in the Web community, as one of the main approaches to advance construction of large-scale distributed multimedia services and applications in a scalable, easy-programmable, and efficient manner.