
|
Home : Our Team : Teaching : Publications : Research : Conferences : Events : NEW: Open Theses : NEW: Jobs : Contact : mics : binfo : ilias : uni gr internal only Goethe AG |
Data Streams 2007 - Program
The Web makes large data bases ubiquitous. As a side effect more people get aware of the problem of data quality. During the last years the database research community became more interested in this problem. ACM now starts the "ACM Journal of Data and Information Quality (JDIQ - http://www.acm.org/pubs/periodicals/jdiq/ ), the IEEE Data Engineering Bulletin and the German Datenbank Spektrum recently published special issues about data quality. The 'classical' literature, like the textbooks by Redman and Olson focus on organizational issues in the workflow. Now tools and algorithms are developed, which should help to improve the quality of large collections. DBLP is a computer science bibliography maintained at the University of Trier. The structure of the data is very primitive, the bibliographic records are very similar to BiBTeX. Compared to commercial applications the amount of data is tiny. Nevertheless data quality is the main problem - we focus on the problem of unifiying the spelling of person names. The talk starts with the 'history' of DBLP. The simple internal architecture is explained. Several graphs may be derived from the DBLP data. For the person name problem the coauthor graph is very useful ...
Ambient Intelligence (AmI) envisions the ‘invisible’ incorporation into our surrounding environment and everyday objects of billions of loosely-coupled data stream components such as cameras, motion, temperature, and localisation sensors, as well as RFID readers. The aim is to offer Humans an unprecedented level of convenience and flexibility for living and working, by ubiquitously and adaptively supporting their daily activities thanks to these data streams. Wireless Sensor Network (WSN) is a key enabling technology for AmI. These networks of tiny, resource limited, computing and wirelessly communicating sensors can be easily deployed, and used to capture and cooperatively process at real-time different kind of field data streams. By integrating WSNs, existing applications may be fundamentally improved, and applications in completely new domains become possible. A major drawback with WSNs in general is the cost of their programming, which remains extremely difficult even for expert programmers. In this talk I present a meta-level architecture for WSN programming systems that significantly facilitates programming WSNs, and thereby handling field data streams. By combining sophisticated techniques from Artificial Intelligence, Service-Oriented Computing, and Model-Driven Engineering, this architecture allows high-level specification of data streams’ filtering and semantic interpretation rules at run-time, as well as the adaptive deployment and concurrent execution of those specifications on an open WSN. This architecture is defined and prototyped in the course of the Ambiance project, funded by the University of Luxembourg, and conducted in close collaboration with the Open Systems Laboratory at the University of Illinois at Urbana Champaign, USA.
Due to the vast amount of data being received as high-speed data streams from active data sources like sensors, new methods are required for efficient processing of data streams directly without storing them persistently in a database system. Such methods are important in a large variety of applications like monitoring of production processes, and network traffic. In this talk, we present our data stream infrastructure PIPES that provides a powerful algebra of operators for query processing on streams. PIPES differs from competitive approaches as it comes along with a comprehensive cost model for and a collection of statistical estimators satisfying the restrictive requirements of data stream processing. Both of these unique features are essential for our simulation-based approach to query optimization. We present the details of our optimization approach and report experimental results that underpin its benefits.
Grids are an enabling technology that permit the transparent coupling of geographically-dispersed resources (machines, networks, data storage, ...) for large-scale distributed applications. Grids provide several important benefits for users and applications to share: computing and data storage, knowledge, instruments, etc). In this talk we will introduce Grid computing paradigms and then present some applications in combinatorial optimization and data mining. Parallel combinatorial optimization and data mining algorithms are raising a large interest in science and technology since they are able to solve complex problems in different domains (telecommunications, genomics, logistics and transportation, environment, engineering design, etc.). "Data Streams 2007 - Program" is mentioned on: Conferences |