* Front page
  * Overview
  * Install
  * Screenshots
  * Documentation
     User guide
     Related links
  * License
  * Download
  * Evaluation


Mustru: Evaluation


Mustru Version 0.1 was tested on a TREC-8 Question & Answer dataset published in 1999. It consists of about 524K articles from various sources including the Financial Times, LA Times, and FBIS. The article text was segmented into passages with the maximum and minimum sizes of passages limited to 250 and 50 bytes respectively. 3.5 M passages were created with an average of 6.7 passages per article. The size of the text content alone excluding all tags was about 1.5 Gbytes.

A small set of development questions were provided and Q&A systems were tested on a set of 198 questions. An answer was judged correct if it matched a regular expression generated for the particular question AND if the answer sentence originated in a document considered relevant for the question.

A correct answer was awarded points based on the position in the hit list returned by the search engine. Points were scored only for the top ranking answer from the hit list.

  • 1 point for an answer in the first hit
  • 1/2 point for an answer in the second hit
  • 1/3 point for an answer in the third hit
  • 1/4 point for an answer in the fourth hit
  • 1/5 point for an answer in the fifth hit
  • 0 points for an answer found after the fifth hit


Hit Position No. Answered Points
1 97 97
2 25 12.5
3 5 1.66
4 9 2.25
5 7 1.4
Total 143 114.81

The final precision count (aka Mean Reciprocal Rank) for Mustru was 0.58 (144.81 / 198). A question was converted to a search engine query with five components. Each of the five components added to the overall precision of the answer with different contributions.

Excluded No. Answered Precision
General hypernyms 137 0.56
Question hypernyms 139 0.56
All hypernyms 136 0.56
Bigrams 111 0.43
Unigrams 133 0.49
Transformations 143 0.58
None 143 0.58

Not suprisingly, bigrams appear to be the largest contributor to the overall precision followed by unigrams. When used in queries, entities (hypernyms) and transformations appear to provide a marginal improvement in precision.

Version 0.2

In version 0.2, the entity extractor was replaced by a simple table lookup to speed up indexing and reduce memory requirements. Instead of indexing sentences and documents twice as in Version 0.1, a document is indexed just once. The most likely document to answer a question is first retrieved followed by a search for the top two passages that may answer the question. In version 0.1, a search query was generated for the best passage and the associated document was not fetched.

The results in version 0.2 have lower precision , but are reasonable. As before the top 5 hits were used to judge if the search engine found an answer. Even though the document that contains the answer was fetched, passage retrieval extracted the sentence containing the answer in 122 out of 151 (80%) questions.

  • Mustru found the document that contains the answer in 151 out of 198 questions (76% ) with a precision count of 0.65 .
  • After passage retrieval, Mustru found the answers for 122 out of 198 questions ( 61% ) with a precision count of 0.49 .

Copyright © 2007 Mustru Search Services. All rights reserved.