-
Notifications
You must be signed in to change notification settings - Fork 5
Evaluation and Error analysis
Using different scoring mechanism to analyze the result produced by the system.
- Gold Answer Scoring
- Token Overlap Scoring
- N-Gram Overlap Scoring
- ...
Based on our scoring mechanism, it is also important to also try different strategies of optimization.
- Better similarity calculation methods
- Pre-processing the dataset.
At the end of the analysis, also divide the errors into different categories.
Retrieved Item | Unordered retrieval measures | Ordered retrieval measures |
---|---|---|
concepts | mean percision, recall, F-measure | MAP, GMAP |
articles | mean percision, recall, F-measure | MAP, GMAP |
triples | mean percision, recall, F-measure | MAP, GMAP |
- Pricision and Recall
Precision takes all retrieved documents into account, but it can also be evaluated at a given cut-off rank, considering only the topmost results returned by the system. This measure is called precision at n or P@n.
For example for a text search on a set of documents precision is the number of correct results divided by the number of all returned results.
Precision is also used with recall, the percent of all relevant documents that is returned by the search. The two measures are sometimes used together in the F1 Score (or f-measure) to provide a single measurement for a system.
- F-measure
In statistical analysis of binary classification, the F1 score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct results divided by the number of all returned results and r is the number of correct results divided by the number of results that should have been returned. The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.
- Average Precision(AP):
Precision and recall are single-value metrics based on the whole list of documents returned by the system. For systems that return a ranked sequence of documents, it is desirable to also consider the order in which the returned documents are presented. By computing a precision and recall at every position in the ranked sequence of documents, one can plot a precision-recall curve, plotting precision p(r) as a function of recall r. Average precision computes the average value of p(r) over the interval from r=0 to r=1.
-
Mean Average Precision(MAP):
-
Geometric Mean Average Precision(GMAP):
References:
Precision and recall. (2014, October 29). In Wikipedia, The Free Encyclopedia. Retrieved 23:14, November 4, 2014, from http://en.wikipedia.org/w/index.php?title=Precision_and_recall&oldid=631631496
Information retrieval. (2014, September 16). In Wikipedia, The Free Encyclopedia. Retrieved 23:29, November 4, 2014, from http://en.wikipedia.org/w/index.php?title=Information_retrieval&oldid=625801930