Precision and recall

avatar 2020年1月7日18:45:30Precision and recall已关闭评论

Precision is to measure the quality of our predictions only based on what predictor claims to be positive regardless of all it might miss. Precision is computed from: all we predicted correctly divided by all we predicted, correctly or wrongly.正確被檢索的結果(TP)占所有"實際被檢索到的(TP+FP)結果"的比例。

(precision)P=TP+P = frac{TP}{TP+FP}FPTP

Whereas, Recall is to measure such quality with respect to the mistakes we did. Recall is computed from: all we predicted correctly divided by all we should have predicted. 正確被檢索的結果(TP)占所有"應該被檢索到的(TP+FN)結果"的比例。

(recall)R=TP+FNTP

Precision measures that how many are actually true among of all the samples we classified as true?

Recall measures that how many did we classify as true among of all the actual true samples?"

It is that recall gives us information about a classifier's performance with respect to false negatives, i.e., how many did we miss, while precision gives us information about its performance with respect to false positives.

系統當然希望檢索出的結果是:P越高越好,R也越高越好,但P和R兩者經常是trade-off(互償,魚與熊掌)。極端的例子說明:當只搜出了一個結果,且是準確的,則P=100%,但R就很低;反之,如果classifier傳回所有結果,則R=100%,但P可能就極低。

因此,根據不同的需求來判斷希望P比較高還是R比較高,是必要的,可以借助Precision-Recall曲線來呈現及分析PR兩者間的關係,另外使用F measure 和ROC (Receiver Operating Characteristics) plot也有助於判斷。分別介紹如下。

F-Measure

P和R有常常存在互常(矛盾)的關係。F-Measure(或F-Score)則是綜合考慮P和R的衡量方法。

F-Measure是Precision和Recall加權調和平均:

當參數a=1時,就是最常見的F1了:

當F1較高時則比較支持classifier是理想的。

ROC (Receiver Operating Characteristics) plot (出處)

ROC plot is a popular measure for evaluating classifier performance. ROC has been used in a wide range of fields, and the characteristics of the plot is also well studied.

ROC shows trade-offs between
Sensitivity(靈敏度,真陽性率,召回率(Recall),避免假陰性的指標) ,is a performance measure of the whole positive part.
Specificity(特異度,真陰性率=TN/(TN+FP),避免假陽性的指標,
is a performance measure of the whole negative part of a dataset

A dataset has two labels (P and N), and a classifier separates the dataset into four outcomes – TP, TN, FP, FN. The ROC plot is based on two basic measures – specificity and sensitivity that are calculated from the four outcomes.

The ROC plot uses 1 – specificity on the x-axis and sensitivity on the y-axis. False positive rate (FPR) is identical with 1 – specificity, and true positive rate (TPR) is identical with sensitivity.

ROC和Precision Recall one-to-one 關係圖例如下:

TP: the number of true positives, i.e., articles did have the given term which we correctly predicted as positive news.

TN: the number of true negatives, i.e., articles did not have the given term which we correctly predicted as not positive news.

FP: the number of false positives, i.e., articles did not have the given term which we incorrectly diagnosed as positive news.

FN: the number of false negatives, i.e., articles did have the given term which we incorrectly diagnosed as not positive news.

【參閱】

https://classeval.wordpress.com/introduction/introduction-to-the-roc-receiver-operating-characteristics-plot/ (引用出處,推薦)

【註】(出處

Trade-off

This is pretty intuitive. If you have to recall everything, you will have to keep generating results which are not accurate, hence lowering your precision.
To exemplify this, imagine the case of digital world (again, amazon.com?), wherein there is a limited space on each webpage, and extremely limited attention span of the customer. Therefore, if the customer is shown a lot of irrelevant results and very few relevant results (in order to achieve a high recall), the customer will not keep browsing each and every product forever to finally find the one he or she intends to buy, and will probably switch to Facebook, twitter, or may be Airbnb to plan his or her next vacation. This is a huge loss, and hence the underlying model or algorithm would need a fix to balance the recall and precision.

avatar