Skip to content

Evaluation and Error analysis

victorzhao1990 edited this page Nov 4, 2014 · 6 revisions
Evaluation/Error analysis

Using different scoring mechanism to analyze the result produced by the system.

  • Gold Answer Scoring
  • Token Overlap Scoring
  • N-Gram Overlap Scoring
  • ...

Based on our scoring mechanism, it is also important to also try different strategies of optimization.

  • Better similarity calculation methods
  • Pre-processing the dataset.

At the end of the analysis, also divide the errors into different categories.

Detailed methods for evaluation selected from project1.pdf
Retrieved Item Unordered retrieval measures Ordered retrieval measures
concepts mean percision, recall, F-measure MAP, GMAP
articles mean percision, recall, F-measure MAP, GMAP
triples mean percision, recall, F-measure MAP, GMAP
  • Pricision and Recall

Precision takes all retrieved documents into account, but it can also be evaluated at a given cut-off rank, considering only the topmost results returned by the system. This measure is called precision at n or P@n.

For example for a text search on a set of documents precision is the number of correct results divided by the number of all returned results.

Precision is also used with recall, the percent of all relevant documents that is returned by the search. The two measures are sometimes used together in the F1 Score (or f-measure) to provide a single measurement for a system.

  • F-measure

In statistical analysis of binary classification, the F1 score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct results divided by the number of all returned results and r is the number of correct results divided by the number of results that should have been returned. The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.

  • Average Precision(AP):

Precision and recall are single-value metrics based on the whole list of documents returned by the system. For systems that return a ranked sequence of documents, it is desirable to also consider the order in which the returned documents are presented. By computing a precision and recall at every position in the ranked sequence of documents, one can plot a precision-recall curve, plotting precision p(r) as a function of recall r. Average precision computes the average value of p(r) over the interval from r=0 to r=1.

  • Mean Average Precision(MAP):

  • Geometric Mean Average Precision(GMAP):

References:

Precision and recall. (2014, October 29). In Wikipedia, The Free Encyclopedia. Retrieved 23:14, November 4, 2014, from http://en.wikipedia.org/w/index.php?title=Precision_and_recall&oldid=631631496

Information retrieval. (2014, September 16). In Wikipedia, The Free Encyclopedia. Retrieved 23:29, November 4, 2014, from http://en.wikipedia.org/w/index.php?title=Information_retrieval&oldid=625801930