Performance metrics for classification and Regression

6 min readOct 18, 2020

Most important to understand the performance of our model and below are the methods to measure the same.

1. Accuracy —

This measures the no of correctly classified points in the dataset or in other words you can say that how close your predictions are to the specific or the true value.

Accuracy = No of correctly classified points / Total no of points in dataset

Accuracy

True Positive(TP) - Total positive points which has been predicted correctly.
True Negative(TP) — Total negative points which has been predicted correctly
False Positive(FP)-Total points which has been incorrectly predicted as positives but are actually negatives.
False Negatives (FN)- Total points which has been incorrectly predicted as negatives but are actually positives.

Easy to understand and measure is the major advantage of this metric.

However if your dataset is imbalanced means if the data is favouring any particular class labels then is not good for determing the performance of the model.

Eg — Lets say you have 90 points points and 10 negative points in your dataset and without using any model you just classify all the points as points as positive then also your accuracy will be 90% which is not evaluating the performance. Hence it is also important to know that which metrics to be used when.

2. Sensitivity or True Positive Rate ( TPR ) or Recall

This is where you want to get the how many points are correctly classified as positives out of total positive points available in the dataset.

True Positive rate

Hence is it termed as positive rate as it just gives the prediction of the true positive points.

In an imbalanced classification problem with two classes, recall is calculated as the number of true positives divided by the total number of true positives and false negatives.

3. specificity, selectivity or true negative rate (TNR)

This is where you want to get the how many points are correctly classified as negatives out of total negative points available in the dataset.

Hence is it termed as positive rate as it just gives the prediction of the true positive points.

4. Precision-

This gives us the out of total positive predicted points how many are correctly classified as positives.

Also for imbalanced data set many times it is better to use precision and recall depending on the problem statement than accuracy.

This is basically more important when you are only for positive predictions made.

Precision quantifies the number of positive class predictions that actually belong to the positive class.

5.F1- Score-

A measure that combines precision and recall is the harmonic mean of precision and recall, the traditional F-measure or balanced F-score:

As we know the basis our use case we select precision and recall but in case we want to give equal weightage to both precision and recall then we must go with F1-Score.

6.ROC — AUC Curve

ROC stands for receiver operating characteristics and AUC stands for Area Under curve .

ROC — This is a probability curve which tells us how good our model is in classifying the classes bases on True Positive rate and False Positive rate.

ROC curves are appropriate when the observations are balanced between each class, whereas precision-recall curves are appropriate for imbalanced datasets. Hence this is more about probability of occurance of a particular class.

AUC Curve - This basically ranges from 0 to 1 and it measures how good my model is in terms of separating classes which means to classify positives as positives and negatives as negatives.

A perfect model has AUC close to 1 which means it is able to separate the two classes. A poor model has AUC close to the 0 which means it is not able to separate the two classes and this incorrectly classifying almost all the points And when AUC is 0.5, it means model is not able to separate at all.

7.Root Mean Squared Error ( RMSE) —

This is one of the most frequently used evaluation metrics in the regression problem statements.

The root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSD represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences.

Actually is the average of the difference between the predicted and the actual points and taking the root of it.

Hence lower the RMSE better the model performance is.

8.. R-Squared/Adjusted R-Squared

R- Squared basically measures the performance of the model so in the best case the value can be 1 which beans our model is able to predict all the points correctly.

This is where we can use R-Squared metric. The formula for R-Squared is as follows:

Total sum of squares - this is just the square of the difference between the actual and the predicted value

Residual sum of squares is basically the difference between the expected and the actual value

Here the expected value is the mean of the actual values.

Adjusted R-Squared

Basically for R-Square the best case is the value of 1 but this is only when the no of independent variables is less. However if the no of independent variables start to increase the R2-remains the same or increase meaning the model is performing very well but this is not the actual scenario.

Hence we have Adjusted r2 for the scenario where the independent variables are more.

Let’s say you are comparing a model with five independent variables to a model with one variable and the five variable model has a higher R-squared. Is the model with five variables actually a better model, or does it just have more variables? To determine this, just compare the adjusted R-squared values!

The adjusted R-squared adjusts for the number of terms in the model. Importantly, its value increases only when the new term improves the model fit more than expected by chance alone. The adjusted R-squared value actually decreases when the term doesn’t improve the model fit by a sufficient amount.

p: number of features

N: number of samples

Thanks for Reading.
If you found this post useful, an extra motivation will be helpful by giving this post some claps 👏.Feel free to ask questions and share your suggestions.
Reach me at :
LinkedIn : https://www.linkedin.com/in/saurabh-mishra-553ab188/
Github : https://github.com/SaurabhMishra779