UL | CSC | ILIAS | MINE


Home

: Our Team
: Teaching
: Publications
: Research
: Conferences
: Events
: NEW: Open Theses
: NEW: Jobs
: Contact

: mics
: binfo
: ilias
: uni gr


internal only

Goethe AG
MSC

Mining Scientific Communities

  • Mining of scientific communities may help to
  • define various existing scientific groups;
  • achieve a fine-grained hierarchical partitioning of communities into sub-communities;
  • perform trends analysis.
Bibliographical databases such as Citeseer, Google Scholar, DBLP could serve as data sources for the further community analysis under condition that they provide accurate, consistent, up-to-date information. Achieving these is challenging because of the large amounts of data to be processed. The modern bibliographic database may contain hundreds of thousands of records, each of which being composed of various fields: author name, publication title, name of journal or conference where it has appeared, etc. Person names are of the core importance for the community analysis. That is the reason why we would like to address the problem of person names maintenance in the bibliographic databases as a first step towards gathering the data for our future research.

Particular difficulty of the person name managing comes from the fact that the number of distinct names is huge, and sources they come from are various and sometimes contradictory. This leads to the name inconsistence in the database: more than one record refers to the same person, where each record represents a variation of the name spelling. On the other hand, several different persons may have exactly the same name or share name component(s). This yields records which represent a mixture of information actually belonging to the different persons.

Our current goal is to develop a method which would allow for detecting bibliographic records affected by the person name misspelling.

"MSC" is mentioned on: MINE Projects


Printable Version
VeryQuickWiki - HTML Export
Version: 2.7.1 (UniLux: 1.15.0 2006-01-19)
Modified: 2007-08-06 14:51:12
Exported: 2012-02-08 01:31:35