DARHUBER: A Computer Program for Effect Size Estimation in Linear Regression and for Calculating the Significance of Difference between Observed and Expected R^{2} Values

James B. Hittner^{1}* and N. Clayton Silver^{2}

In linear multiple regression it is common practice to test whether the squared multiple correlation coefficient, *R*^{2}, differs significantly from zero. Although frequently used, this test is misleading because the expected value of *R*^{2} is not zero under the null hypothesis that ρ, the population value of the multiple correlation coefficient, equals zero. The non-zero expected value of *R*^{2} has implications both for significance testing and effect size estimation involving the squared multiple correlation coefficient. In this paper we discuss and offer a freely available computer program that calculates the expected value of *R*^{2}, an adjusted *R*^{2} value and effect size measure that both incorporate the expected value of *R*^{2}, and an *F* statistic that tests the significance of difference between the obtained *R*^{2} and the expected value of *R*^{2} under the null hypothesis that ρ = 0. The interactive, stand-alone program is written in FORTRAN 77 for a Windows environment. The user simply enters the value of a multiple correlation coefficient from a linear regression, the number of predictors, and the sample size. No knowledge of FORTRAN or any other statistical programming language is required.

**Keywords: **Multiple correlation; Regression; Effect size; Hypothesis testing; Computer program; FORTRAN

Imagine that a clinical psychologist wishes to predict frequency of depressive symptoms in a community sample of adults from levels of trait anxiety, pessimism, loneliness, and perceived social support. Such data are often modeled using linear multiple regression analysis, and it is common practice to examine whether the squared multiple correlation coefficient, *R*^{2}, differs significantly from zero. Despite being widely used, this test is misleading because the expected value of *R*^{2} is *not* zero when the population parameter, ρ, equals 0 (where ρ is the population value for the multiple correlation coefficient). Instead, the expected value of *R*^{2} is equal to *p* / *n* – 1, where *p* is the number of predictor variables and *n* is the sample size [1]. The non-zero expected value of *R*^{2} has implications both for significance testing and effect size estimation involving the squared multiple correlation coefficient. One implication is that in the context of statistical significance testing, the observed value of *R*^{2} should be evaluated against the expected value of *R*^{2}, and not against zero. A second implication concerns effect size estimation. In particular, as pointed out by Huberty[2], the squared multiple correlation coefficient should be adjusted, or corrected, by explicitly incorporating the expected value of *R*^{2}. In addition, Hubertys [2] suggested interpreting an effect size measure created by subtracting the expected value of *R*^{2} from adjusted *R*^{2} value. As regards to hypothesis testing, Darlington[3] presented an *F* statistic for testing the null hypothesis that the observed *R*^{2} equals the expected value of *R*^{2}.

Unfortunately, the aforementioned statistical quantities are not routinely calculated by widely used statistical software packages such as IBM SPSS, Minitab, and SAS. To address this gap, Hittner [4] wrote a SAS data step computer program that calculates these quantities. However, to implement the SAS data step program users must have access to SAS, which is an expensive commercial software package. To accommodate researchers who do not have SAS, we have written an interactive, stand-alone FORTRAN 77 computer program for the Windows environment. No knowledge of FORTRAN or any other statistical programming language is required to use the program.

**Program description**

The user is queried interactively for the multiple correlation coefficient, number of predictor variables, and sample size. The program responds with a restatement of the inputted values, the expected value of *R*^{2}, Darlington’s [3] *F* statistic for testing the null hypothesis that the observed *R*^{2} equals the expected value of *R*^{2}, the observed probability value for Darlington’s F, Huberty’s [2] adjusted *R*^{2} index, and Huberty’s [2] effect size measure. The name of the program is Darhuber and it is written in FORTRAN 77, using the GNU FORTRAN compiler, and runs on a Windows PC or compatible. The output is contained in darhube.out.

** Worked example**

Let’s revisit the example mentioned at the beginning of the Introduction whereby a clinical psychologist wishes to predict frequency of depressive symptoms from four putative risk factors, such that the number of predictors, *p*, equals four. Suppose the sample size, *n*, is 60 and the multiple correlation coefficient, *R*, equals 0.56. The output from darhuber.out is contained in Table 1. As these results indicate, the expected value of *R*^{2} is 0.0678, which is the value of *R*^{2} expected by chance alone. Darlington’s *F*-test was 3.0308 with a *p*-value of 0.0140, indicating that, at the nominal alpha level of 0.05, the observed *R*^{2} (0.3136) was significantly different (greater) than the expected value of *R*^{2} (0.0678). Thus, in our example the four predictors accounted for 31.36% of the variance in frequency of depressive symptoms. This proportion of variance (31.36%) is significantly greater (p < 0.05) than the proportion of predicted outcome variance (6.78%) that would be expected based on chance alone. Huberty’s adjusted *R*^{2} value was 0.2637. This value “corrects” the obtained *R*^{2} by explicitly incorporating the expected value of *R*^{2}. One way to conceptualize Huberty’s adjusted *R*^{2} is that it accounts for sample-to-population shrinkage by directly modeling the expected value of *R*^{2}. Finally, Huberty’s *R*^{2}-based effect size estimate was 0.1959, which, according to Cohen’s [5] criteria, is a medium-sized effect.

**Table 1:** Sample Output from the DARHUBER FORTRAN Program

__ __

SAMPLE R = 0.5600

SAMPLE R-SQUARED = 0.3136

SAMPLE SIZE = 60.0000

NUMBER OF PREDICTORS = 4.0000

EXPECTED VALUE OF R-SQUARED = 0.0678

DARLINGTON F = 3.0308 WITH A PROBABILITY OF 0.0140

NUMERATOR AND DENOMINATOR DF OF F = 5.2155, 55.0000

HUBERTY ADJUSTED R-SQUARED = 0.2637

HUBERTY EFFECT SIZE = 0.1959

__ __

**Availability**

DARHUBER.FOR and the executable version (DARHUBER.EXE) may be obtained at no charge by sending an e-mail request to N. Clayton Silver, Department of Psychology, University of Nevada, Las Vegas, Las Vegas, NV 89154-5030 at "mailto:fdnsilvr@unlv.nevada.edu" fdnsilvr@unlv.nevada.edu.