November 24, 2010
Title: Semantic Link Discovery over Web Data
Abstract: Discovering links between different data items in a single data source or across different data sources is a challenging problem faced in many data management systems today. In particular, recent efforts in transforming web documents into high-quality Linked Data sources have highlighted the paramount importance of establishing semantic links among web data sources. The Linking Open Data community project at W3C, Metaweb's Freebase, and the EU-funded OKKAM project are examples of such efforts. Despite their success, the quality and quantity of the links between the published data sources are still limited. So, from a user's perspective, many of these sources resemble islands of data (or data silos), where each island may contain only part of the data necessary to satisfy his or her information needs. Penetrating these silos to both understand their contents and understand potential semantic connections is a daunting task that involves many challenges, some of which will be explored in this talk. In particular, I will present our work in progress on a declarative framework for discovery of semantic links over relational data. This work is motivated by the success of widely-used Linked Data publication tools that operate over standard relational database systems. The proposed framework is based on declarative specification of the linkage requirements by the user, that allows matching data items in many real-world scenarios. I will show how the framework can be applied in an interesting health care application where the goal is discovery of links between a data source of clinical trials and several related data sources. I will also outline our solution to a few other challenges involved in discovering semantic links and publishing and maintaining Linked Data on the Web. These challenges include grouping linked objects in the presence of conflicting evidence, discovery of linkage points for multi-source entity resolution, and online annotation of text streams with structured entities. The talk will end with an outline of our research agenda on building a light-weight online data analytics system.
Biography: Oktie Hassanzadeh is a PhD candidate and an IBM PhD Fellow in the Department of Computer Science at the University of Toronto. His research interests are in the areas of data cleaning and integration, text data management and web information retrieval. He has received the 2010 Yahoo! Key Scientific Challenges award, and is a two-time recipient of the first prize at the Linked Data Triplification Challenge, an annual contest that awards prizes to the most promising projects in the area of Linked Data. He holds an MSc in Computer Science from University of Toronto and a dual bachelor's degree in Software Engineering and Hardware Engineering from Sharif University of Technology.