Dr. Silviu Homoceanu

General

Dr. Silviu Homoceanu



Technische Universität Braunschweig
Institut für Informationssysteme
Mühlenpfordtstraße 23, 2.OG
D-38106 Braunschweig


Phone:
Email:
Room: --

Detail

Current

Research Interests

Conceptual Search: I have performed most of the research in this field during my activity in the CONCEPT project.

Technologies of interest: Data Mining and Machine learning on Big Data. Natural language processing techniques, opinion mining, flat or hierarchical, multidimensional clustering, and classification methods like Adaptive Boosting, Bayesian Classification, Support Vector Machines and Decision Trees belong to my area of expertise. Typical data sets I work with, are the well-known ClueWeb09 and ClueWeb12 corpora comprising about 60TB of uncompressed documents harvested from the Web and a locally hosted Virtuoso cache with over 5 billion triples from DBpedia, Freebase, YAGO, LinkedMDB, NewYork Times, DrugBase, MusicBrainz, GeoNames, DBTropes, CiteSeer and ACM data stores (this represents about 10% of all linked data available on the Web). To manage this volume of data I use graph databases like Neo4J, cloud based Lucene like inverted index technology like Elastic Search and MapReduce paradigms (Hadoop).

Projects: CONCEPT
Conceptual Queries in Entity-Centric Search:
According to reports published by search engines like Yahoo! about 50% of the Web queries today, involves searching for entities. While simple, keyword-based search can very well be mastered with state-of-the-art boolean search, searching for entities by means of concepts, like for instance city car, gaming laptop or a business cellphone are not well supported by such techniques. Given a concept like city car, a person would immediately think of a small sized vehicle, easy to park and with low fuel consumption, something like the Volkswagen Polo or the Mercedes Smart. But for a machine, such concepts are nothing more than keywords.

A lot of work has been invested by the artificial intelligence (AI) community to build a system that is capable of reasoning much like a human. Cyc for instance is a well-known AI project attempting to assemble a comprehensive global ontology and knowledge base of common sense knowledge. This would empower machines to understand concepts and render human-like reasoning possible. Unfortunately, 30 years later, after investing 350 man-years of effort in teaching Cyc common sense knowledge, no real advances have been achieved. In contrast to such approaches, we believe flexible, contextual-based knowledge (and not one global ontology) is a better approach for this task. Fostered by the massive amount of information available today, such knowledge could be learned directly from the Web.

The outcome of this project will provide essential insights into how the meaning of concepts can be learned from a large volume of noisy information like it is the case with data on the Web. This raises multiple research questions: What definition of a concept is more suitable for this task? Is an intensional representation of a concept (through typical properties) helpful for nailing its meaning? How can property typicality be quantified? What about extensional concept representation? How can such representations be efficiently learned from huge volumes of heterogeneous data? What learning methods are suitable for these tasks?

Summary to date: 9 publications to international conferences, 11 Bachelor/Master theses, 3 software development projects (8 students per team and project) for building prototypes.
 

Student theses I coordinated for the CONCEPT project:

Thesis Type Student Name Title
Master thesis Kalo, Jan-Christoph Analyse des Transitivitätsproblems von Instance Matching Verfahren auf Linked Data
Master thesis Meine, Matthias Product Search by Means of Natural Language
Master thesis Su, Rongfeng Extracting Ontologies for Supporting Implicit Feature Resolution from Product Reviews
Master thesis Turmo, Juan Jose Mining Semantic Related Terms for Product Features from Structured and Unstructured Data
Diploma thesis Loster, Michael-Reinhard Opinion Mining & Sentiment Analysis in Reviews
Diploma thesis Zimmermann, Dirk Einfluß von Typischen Entitäten auf die Festlegung von geeigneten Kategorien für den Entitätstyp
Bachelor thesis Dechand, Sergej Analyzing User's Point of View in Feature based Opinion Mining
Bachelor thesis Dermitzel, Philipp Auswirkungen von Datenqualität in Business Warehouse Umgebungen
Bachelor thesis Geilert, Felix Analyse der Akzeptanz und Breitenverwendung der auf schema.org zur Verfügung stehenden Schemata im Web
Bachelor thesis Gröber, Christoph Analyse von Paraphrasen für OpenIE Triple
Bachelor thesis Wille, Philipp Establishing Proximity Boundaries for Concept Extraction in Product Reviews
 
Software development projects:
  • Movie Genie is a system that can "read" queries about movies, written in natural language, as they would be addressed to a human video rental sales person. The system interprets the query, it extracts hard facts like the movie genre, and soft features like a "good story" and it generates a ranked list. It considers user feedback in form of 'I have seen this movie and liked it'/'I have seen this movie and dis-liked it' such that the so marked movies are eliminated from the result list, and the ranking will be restored considering what the user liked and did not like. This project won the first prize at TDSE 2012.
  • Movie Miner is a Web service for navigation through movie data. It extracts typical movie features like 'acting performance', 'special effects', 'suspense', 'character depth', 'plot', 'story', etc. users talk about in movie reviews on IMDb it analyses user opinion with respect to these features and it displays them in an intuitive polarity profile. This project won the first prize at TDSE 2011.


Experiments data: Instance Matching Data  


Teaching

Publications
2015
Homoceanu, S., and W. - T. Balke, "A Chip Off the Old Block – Extracting Typical Attributes for Entities based on Family Resemblance", 20th International Conference on Database Systems for Advanced Applications (DASFAA), Hanoi, Vietnam, 04/2015. Abstract  Download: DASFAA15_camera-ready.pdf (1.15 MB)
Homoceanu, S., "What Search Engines Can’t Do. Holistic Entity Search on Web Data", Carl-Friedrich-Gauß-Fakultät: Technische Universität Braunschweig, 2015. Abstract  Download: Diss_Homoceanu_Silviu.pdf (4.6 MB)
2014
Homoceanu, S., J. - C. Kalo, and W. - T. Balke, "Putting Instance Matching to the Test. Is Instance Matching Ready for Reliable Data Linking?", 21st International Symposium on Methodologies for Intelligent Systems (ISMIS), Roskilde, Denmark, 2014. Abstract  Download: ismis_homoceanu.pdf (832.92 KB)
Homoceanu, S., F. Geilert, C. Pek, and W. - T. Balke, "Any Suggestions? Active Schema Support for Structuring Web Information", 19th International Conference on Database Systems for Advanced Applications (DASFAA), Bali, Indonesia, 04/2014. Abstract  Download: DASFAA14_conference_105.pdf (1.1 MB)
Homoceanu, S., and W. - T. Balke, "Querying concepts in product data by means of query expansion", Web Intelligence and Agent Systems: An International Journal , vol. 12: IOS Press, 02/2014. Abstract  Download: QE_concept.pdf (657.14 KB)
2013
Homoceanu, S., S. Tönnies, P. Wille, and W. - T. Balke, "Time-Based Exploratory Search in Scientific Literature", Research and Advanced Technologies for Digital Libraries: 17th International Conference on Theory and Practice of Digital Libraries (TPDL), Valletta, Malta, 09/2013. Abstract  Download: essence_crc.pdf (538.82 KB)
Homoceanu, S., P. Wille, and W. - T. Balke, "ProSWIP: Property-based Data Access for Semantic Web Interactive Programming", 12th International Semantic Web Conference (ISWC), Sydney, Australia, 2013. Abstract  Download: proswip crc.pdf (858.03 KB)
2012
Selke, J., S. Homoceanu, and W. - T. Balke, "Conceptual Views for Entity-Centric Search: Turning Data into Meaningful Concepts (extended)", Computer Science: Research and Development , vol. 27, no. 1: Springer, pp. 65-79, 2012. Abstract  Download: concept_queries.pdf (1.63 MB)
2011
Homoceanu, S., S. Dechand, and W. - T. Balke, "Review Driven Customer Segmentation for Improved E-Shopping Experience", 3rd International Conference on Web Science, Koblenz, Germany, 2011. Abstract  Download: ws.pdf (434.25 KB)
Homoceanu, S., and W. - T. Balke, "What Makes a Phone a Business Phone - Querying Concepts in Product Data", The 2011 IEEE/WIC/ACM International Conference on Web Intelligence, Lyon, France, 2011. Abstract  Download: querying_concepts.pdf (670.87 KB)
Selke, J., S. Homoceanu, and W. - T. Balke, "Conceptual Views for Entity-Centric Search: Turning Data into Meaningful Concepts", 14. GI-Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW), Kaiserslautern, Germany, 2011. Abstract  Download: conceptual_views.pdf (1.19 MB)
Homoceanu, S., M. Loster, C. Lofi, and W. - T. Balke, "Will I like it? – Providing Product Overviews based on Opinion Excerpts", IEEE Conference on Commerce and Enterprise Computing (CEC), Luxembourg, Luxembourg, 2011. Abstract  Download: Full Text CEC 2011 (587.95 KB)