Start Validating a model using statistics

Validating a model using statistics

Its a better practice to look at the AIC and prediction accuracy on validation sample when deciding on the efficacy of a model. This is because, since all the variables in the original model is also present, their contribution to explain the dependent variable will be present in the super-set as well, therefore, whatever new variable we add can only add (if not significantly) to the variation that was already explained.

By calculating accuracy measures (like min_max accuracy) and error rates (MAPE or MSE), we can find out the prediction accuracy of the model. From the model summary, the model p value and predictor’s p value are less than the significance level, so we know we have a statistically significant model.

Also, the R-Sq and Adj R-Sq are comparative to the original model built on full data.

A simple correlation between the actuals and predicted values can be used as a form of accuracy measure.

Suppose, the model predicts satisfactorily on the 20% split (test data), is that enough to believe that your model will perform equally well all the time?

Typically, for each of the independent variables (predictors), the following plots are drawn to visualize the following behavior: Scatter plots can help visualize any linear relationships between the dependent (response) variable and independent (predictor) variables.

Ideally, if you are having multiple predictor variables, a scatter plot is drawn for each one of them against the response, along with the line of best as seen below.

One of them is the model p-Value (bottom last line) and the p-Value of individual predictor variables (extreme right column under ‘Coefficients’).