
|
Home : Our Team : Teaching : Publications : Research : Conferences : Events : Open Theses : Jobs : Contact : mics : binfo : ilias : uni gr internal only Goethe AG |
AI Lecture Series, Vol. III - Data Mining Applications - Abstracts
Text mining and e-ForensicsProf. Dr. Marie-Francine Moens, KU Leuven (Belgium): In this seminar we report on the research of the EU-FP6 AntiPhish project (2006-2009). In a first part of the project we have built technology for the detection and resolution of hidden content in spam messages. In a second part we have investigated several feature selection and extraction techniques when building spam and phishing filters. Our best email classification results are obtained using the dimensionality reduction technique Biased Discriminant Analysis yielding highly discriminative features for distinguishing spam and ham, and phishing and spam.Optimizing your IT Security InvestmentsDr. Klaus Julisch, IBM Zurich (Switzerland): Data mining conventionally starts with large data repositories and "mines" them to extract actionable insights. This approach has also been taken to derive IT Security metrics, but the resulting metrics generally lacked validity and repeatability. It is therefore necessary to ask what security data one should collect, how this data should be evaluated, and what conclusions one can really draw from these evaluations about the security of a system. The presentation will explore these issues in the context of IT security and risk, but more broadly, it will illustrate the problems associated with mining the data that's available, rather than seeking out the data that would be suitable to answer a given research question.Data Integration and Mining in Context of Gene RegulationDr. Merja Heinaniemi & Anke Wienecke, University Luxembourg (Luxembourg): Hypothesis driven data mining in biology faces problems, like a high amount of public databases to evaluate and quality control, in order to extract information in a computationally processable format and its automating via interfaces. Additionally, for new types of high throughput data, e.g. ChIP-seq, no public databases are established yet, which would provide a certain standardized data format in particular concerning annotation or analysis aspects. Therefore, we decided to set up a database system integrating published, in silico generated and experimental datasets.The discovery of genetic disease related traits is often done by performing a high-throughput experiment, like genotyping microarrays, resulting in a list of statistically significant phenotype associated point mutations. For example, up to now 38 susceptibility loci for type 2 diabetes have been published. However, for the majority no underlying biological mechanism has been established yet. In this talk we will address the challenges for data mining in context of different biological datatypes. We will first exemplify the challenges using microarray data which has been collected in abundance over the last couple of decades (Merja Heinäniemi) and continue with a description of a pilot study that addresses DNA sequence variant data and epidemiological data of type 2 diabetes patient cohorts (Anke Wienecke-Baldacchino). We will present a data resource aimed to support an a priori biological hypothesis generation by means of integrating published, in silico generated and experimental datasets. Bioinformatics challenges in personalised cancer treatmentProf. Dr. Alfredo Valencia, Spanish Cancer Research Institute (Spain): Fast progress in genomics is making increasingly possible to use genomic information in the cancer treatment clinical practice, to design clinical trial stratification strategies and ultimately to adjust cancer treatments to genomic composition. It is quite obvious that the complexity of the information makes of Bioinformatics an essential actor in all the basic steps of this new cancer personalize treatment scenarios, including:structural and functional analysis of the patient genome: Analysis of NGS data. Finding mutations, assigning confidence values and validating them. Analysis of mutations in coding and non-coding regions and assigning them disease-risk. Treatment of structural variations and copy number variations. Functional analysis (gene expression, proteomics, miRNA and other non-coding RNAs, epigenetics marks and others). Systems level analysis of the information, including: Network/Pathway based interpretation of patient genomic data in the context of disease knowledge. Network/Pathway based interpretation of patient genomic data in the context of drug knowledge (previous compilation of drugs target and mutation disease relations). Comparative analysis of Networks/Pathways altered among different diseases. Analysis of the information at the clinical level, including: Analysis of genome specific toxicity/drug responses and medical history of the patient, Analysis of the results in animal models (i.e. mouse xenografts). Handling of clinical data through disease ontologies, information extraction from text and related approaches. Treatment of the information at the quantitative level, including drug administration and physiological environments Furthermore, the specific information on cancer patients would have to be considered in relation with other clinical information, including linking the patient clinical record with individual genomic data, and the information will have to be offered to the physician in the appropriate setting for decision making, along with the rest of the medical information and complementary tests, with tools for prognosis and suggestions of potential and dosages. Last but not least, the information will have to be provided to the patients in the adequate consultation environment with the involvement of the specialist and professionals in the various areas, including of course the bioinformaticians. During this talk I will revise the current status and main scientific challenges that personalize cancer treatment opens to bioinformatics. Data mining in Geographical Contexts and TextsProf. Dr. Geoffrey Caruso, University Luxembourg (Luxembourg): An increasing number of insttutions, acting at different scales and within different sectors, create in-house geographical information systems, e.g. for regional statistics, for land and transport management, for local urban planning, etc. In addition, with the advent of new technologies, such as GPS's or web-mapping facilities, the use of such geographical data is being more and more popular and data is made more easily accessible (sometimes even contributed by the end-users). Geographers find themselves in rather data rich environments today (irrespective of homogeneity and quality). Also geographical objects require specific visualization and statistical methods. The application and adaptation of data mining approaches in geographical contexts is an increasingly important research topic.In this lecture we will start from theoretical considerations on data mining in geography, particularly emphasizing what is special with exploratory spatial data analysis. We will then refer to ongoing research related to geographical data mining undertaken at the University of Luxembourg in collaboration with colleagues from other institutions. A first example will refer to a large and homogeneous dataset of all dwellings within a Belgian province. Using graph theory and local spatial statistics, the data is used to identify and categorize urbanisation patterns across scales in an iterative way. A second example will depict an application of 'self-organizing maps' to understand patterns of 'territorial cohesion' in Europe using a rather small and lacunary dataset. The third example will be dedicated to a text-mining application to a rather large corpus of documents related to spatial development in Europe. This work funded under the ESPON (European Spatial Observatory Network) aims at producing a relevant thematic structure to the online regional statistics database of the ESPON network. Image and Video Data MiningDr. Ivan Laptev, INRIA Paris (France): People can seamlessly interpret visual scenes and events, however, the search through millions of images or hundreds of hours of video becomes nearly impossible if done manually. The recent explosion of visual data sparked by the efficient solutions for image capture, share and storage, now sets challenges and provides exciting opportunities for the new image and video mining technology and applications.This talk will overview the recent advances in computer vision targeting the access and mining of large image and video data collections. We will in particular focus on the two topics: (i) instance-level recognition and (ii) category-level recognition. We will review the recent success story of local image descriptors and will demonstrate their application to the efficient search of particular object and scene instances in web-scale database of ten million images. Next, we will introduce the challenge of category-level recognition and will illustrate learning-based methods relying on the large amount of annotated training data. To address the generally prohibitive cost of manual data annotation, we will investigate the weakly-supervised methods leveraging readily available and noisy annotation. We will demonstrate systems able to learn character names and event classes automatically from the video and associated video scripts. Forensic LinguisticsDr. Sabine Erhardt, BKA (Germany): Within the context of criminal offences, written texts - just like DNA and fingerprints - are traces that are well worth analysing because each of us uses language in a distinctive and idiosyncratic way. The scientific discipline of Forensic Linguistics aims at drawing conclusions about authors of texts by linguistically analysing the texts they have written. With the help of methods like analysis of linguistic deviations, a forensic linguistic is able to categorise authors with respect to their mother tongue, their origin, their age, their profession etc. Additionally, linguistic analysis can be used to compare texts to conclude whether or not these texts have been written by the same person.Like in many other professions, computer programs are employed to simplify or speed up the work that needs to be done. In the case of Forensic Linguistics, incriminating texts are gathered to build up an appropriate text corpus which can then be used for evaluation purposes. At the German BKA, the corpus software KISTE has been developed in order to combine the three fundamental tasks of administrating, analysing and evaluating all the linguistic data that is both generated during case work und used to do case work. "AI Lecture Series, Vol. III - Data Mining Applications - Abstracts" is mentioned on: Conferences |