On the coefficient of determination
I have often been asked by my students why the (R^2) measure is not a measure of comparision between two models.
In general, as will be shown below, the (R^{2) value is simply how well the model that you’ve used fits the data that you have.
Let’s start of with a basic linear regression model:
\[Y = \mathbf{X}\beta + \varepsilon,\]
and assume the usual assumptions of the classical linear regression model hold:
- (\mathbf{X}) causes (Y),
- (\varepsilon) causes (Y),
- (\varepsilon) does not cause (\mathbf{X})
- (\mathbf{Y}) does not cause (\mathbf{X})
- Nothing which causes $\varepsilon$ also causes (\mathbf{X}).
see here
When the above holds, we have that the least squares estimators for (\beta), denoted (b), are consistent.
Denote the estimator of the residual (\varepsilon) as:
\[e = Y - \mathbf{x}b,\]
where (\mathbf{x}) represents the (n\times k) realizations of random matrix (\mathbf{X}).
We can show that if (\mathbf{X}^{T}\mathbf{X}) has an inverse, then $b$ is estimated via the least squares normal equations as follows,
\[b = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}Y.\]
Hence, the estimator $e$ is:
\[e = Y - (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}Y = (\mathbf{I} - (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T})Y = \mathbf{M}Y.\]
Now, suppose we model the above with the addition of another variable:
\[Y = \mathbf{X}b + Zc + a\]
where $c$ is a scalar and the usual assumptions for linear regression hold. An application of Frisch Waugh Lovell Theorem gives:
\[d = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}(Y - Zc) = b - (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}Zc,\]
and hence we have:
\[a = Y - \mathbf{X}d - Zc = e - MZc,\]
and furthermore we have:
\[a^{T}a \leq e^{T}e.\]
Finally, we arrive at $R^{2}$ also known as the goodness of fit measure.
For (Y = \mathbf{X}b + e), denote (M^{0}) as the (n \times n) idempotent matrix that transforms realization (\mathbf{y}) of (\mathbf{Y}) into deviations from sample means of (\mathbf{X}).
Then, (Y^{T}M^{0}Y = b^{T}\mathbf{X}^T M^{0} \mathbf{X} b + e^{T}e ) and the coefficient of determination (the proportion of the variation in the dependent variable that is predictable from the independent variable) is:
\[R^{2} = \frac{b^{T}\mathbf{X}^T M^{0} \mathbf{X} b}{Y^{T} M^{0}Y} = 1 - \frac{e^{T} e}{Y^{T}M^{0}Y}.\]
From above, we have that (R’^{2} = 1 - \frac{a^{T} a}{Y^{T}M^{0}Y} \geq R^{2}). That is, one can increase the model’s goodness of fit measure by arbitrarily including more regressors.