About
Mrs. Naipeng Dong currently performs her Master Thesis (Specialization: Security and Trust) within a cooperation project between UL and the University of Shadong, China.
Project
-
FINE - A Fingerprint Engine for Author Profiling
Abstract
Texts are generally written by human authors who share an individual style. This style may be characterized by many attributes, like for example the
author’s vocabulary, the usage of linguistic attributes like
hapax legomena, or generally the domain where the text belongs to. Additionally, statistical parameters like the
length of sentences in average or the
number of words that occur more than k are taken into account as well. In the following, we call these attributes objective attributes.
Objective attributes may be sufficient up to a certain level and they show acceptable results if the text is written in an ob jective style, telling the facts of a story or describing the structures of an ob ject. However, if the text compounds sub jective components like for example the author’s opinion then the ob jective attributes fail. In this respect, the Master Thesis will be concerned with the building of a fingerprint engine that takes into account both objective attributes and subjective attributes. A subjective attribute might be for example an attribute like
Text-Contains-A-Beliefor
Text-Contains-A-Desire. Assume that we have a text like
Meeting today with Palestinian leaders in the West Bank, I believe that President Bush
will say that a Middle East peace treaty would be signed.
then we can extract subjective attributes like
Text-Contains-A-Belief = yes and
Text-Contains-A-Desire = no.
The aim of FINE is to become a prototype on how such a subjective-objective engine works. The input must be texts of the selected domain, the output a vector representing both the objective and subjective attributes (Both the vector representation, the similarity measure and the comparison itself must be defined). Ideally, the vectors could be compared and a distance or similarity be calculated: FINE may then suggest another text that is close to a given one.