On the coefficient of determination

December 23, 2022

On the coefficient of determination

I have often been asked by my students why the (R^2) measure is not a measure of comparision between two models.

In general, as will be shown below, the (R^{2) value is simply how well the model that you’ve used fits the data that you have.

Let’s start of with a basic linear regression model:

\[Y = \mathbf{X}\beta + \varepsilon,\]

and assume the usual assumptions of the classical linear regression model hold:

(\mathbf{X}) causes (Y),
(\varepsilon) causes (Y),
(\varepsilon) does not cause (\mathbf{X})
(\mathbf{Y}) does not cause (\mathbf{X})
Nothing which causes $\varepsilon$ also causes (\mathbf{X}).

see here

When the above holds, we have that the least squares estimators for (\beta), denoted (b), are consistent.

Denote the estimator of the residual (\varepsilon) as:

\[e = Y - \mathbf{x}b,\]

where (\mathbf{x}) represents the (n\times k) realizations of random matrix (\mathbf{X}).

We can show that if (\mathbf{X}^{T}\mathbf{X}) has an inverse, then $b$ is estimated via the least squares normal equations as follows,

\[b = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}Y.\]

Hence, the estimator $e$ is:

\[e = Y - (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}Y = (\mathbf{I} - (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T})Y = \mathbf{M}Y.\]

Now, suppose we model the above with the addition of another variable:

\[Y = \mathbf{X}b + Zc + a\]

where $c$ is a scalar and the usual assumptions for linear regression hold. An application of Frisch Waugh Lovell Theorem gives:

\[d = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}(Y - Zc) = b - (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}Zc,\]

and hence we have:

\[a = Y - \mathbf{X}d - Zc = e - MZc,\]

and furthermore we have:

\[a^{T}a \leq e^{T}e.\]

Finally, we arrive at $R^{2}$ also known as the goodness of fit measure.

For (Y = \mathbf{X}b + e), denote (M^{0}) as the (n \times n) idempotent matrix that transforms realization (\mathbf{y}) of (\mathbf{Y}) into deviations from sample means of (\mathbf{X}).

Then, (Y^{T}M^{0}Y = b^{T}\mathbf{X}^T M^{0} \mathbf{X} b + e^{T}e ) and the coefficient of determination (the proportion of the variation in the dependent variable that is predictable from the independent variable) is:

\[R^{2} = \frac{b^{T}\mathbf{X}^T M^{0} \mathbf{X} b}{Y^{T} M^{0}Y} = 1 - \frac{e^{T} e}{Y^{T}M^{0}Y}.\]

From above, we have that (R’^{2} = 1 - \frac{a^{T} a}{Y^{T}M^{0}Y} \geq R^{2}). That is, one can increase the model’s goodness of fit measure by arbitrarily including more regressors.