
|
Home : Our Team : Teaching : Publications : Research : Conferences : Events : Open Theses : Jobs : Contact : mics : binfo : ilias : uni gr internal only Goethe AG |
SEREBIF
![]() Web search engines nowadays use more and more complex and elaborated ranking functions to deliver the proper results for a given query (in appropriate order according to their relevance). Often enough though, results which are not relevant for a user show up in the result list, too. SEREBIF is an approach which tries to incorporate information taken from the users into the results to increase their quality. SEREBIF stands for Search Engine Result Enhancement By Implicit Feedback. The overall goal is to analyze the preferences and in general the behavior of users utilizing a search engine to get information about the real relevance of the results. Observing the users actions can lead to valuable information about how important a result seems to be for the user. For example, if the users always tend to click on the second given result for a certain query and usually leave out the first one, the ranking doesn't seem to be appropriate and could be changed accordingly [1]. So it should be possible to increase the quality of the results for a given search engine over time, just by taking the implicitely collected information about the clicked results into account. One advantage of this approach is that there is no need for the users to invest any extra time to give (explicit) feedback as to what they find to be relevant or not. The users just do what they need to in order to find the relevant information that they're looking for. We keep track of entered queries, clicked links and we try to estimate the time that users stay on the clicked result pages. To avoid users having to fear for being tracked all the way and possibly being identified by all the entered queries (as happened with a large amount of search queries released by AOL in 2006), we do not connect queries of multiple sessions together. By this way, we collect information from all users with a low chance of re-identification to make sure privacy issues are respected (of course, someone entering the own name as a query could still be identified, but we expect this not to happen). We do however try to see if the one user entered multiple queries shortly one after the other (in the same session, i.e. in a short time period and without closing the browser). This is important because such a process might indicate a connection between two different queries. See e.g. [2] for a more detailed description of so-called "Query Chains". The whole approach requires that we already have a search engine to base upon. We chose to realize a sort of a proxy between the users and an existing search engine (for our first tests, we chose to use Google faciliating the API they provide to make a limited amount of queries per day with SOAP). So our system looks like a search engine to the users, but in fact just redirects queries to the underlying search engine. Similarly, the provided results of the engine are shown to the user afterwards. The gathered information is preprocessed and afterwards merged in a storage system based on the ANIMA system [3] that has also been developed in our MINE research group. The following figure shows how such a storage can bascially look like. ![]() We use three different types of nodes for this network, denoted by different colors in the picture:
Once we got enough test data, we can use the information in the network in different ways. Examples are:
[1] - T. Joachims - Optimizing Search Engines using Clickthrough Data, SIGKDD 2003 [2] - F. Radlinsky, T. Joachims - Query Chains: Learning to Rank from Implicit Feedback, KDD 2005 [3] - C. Schommer, B. Schroeder - ANIMA: Associate Memories for Categorical Data Streams, ICCSA 2005 [4] - D. Kelly, N.J. Belkin - Display Time as Implicit Feedback: Understanding Task Effects, SIGIR 2004 "SEREBIF" is mentioned on: ADAM |