Enhancing Validity in Observational Settings When Replication Is Not Possible
19 Pages Posted: 31 Dec 2014 Last revised: 13 Apr 2016
Date Written: April 12, 2016
Abstract
We argue that political scientists can provide additional evidence for the predictive validity of observational and quasi-experimental research designs by minimizing the expected prediction error or generalization error of their empirical models. For observational and quasi-experimental data not generated by a stochastic mechanism under the researcher's control, the reproduction of statistical analyses is possible but replication of the data generating procedures is not. Estimating the generalization error of a model for this type of data and then adjusting the model to minimize this estimate --- regularization --- provides evidence for the predictive validity of the study by decreasing the risk of overfitting. Estimating generalization error also allows for model comparisons that highlight underfitting: when a model generalizes poorly due to missing systematic features of the data generating process. Thus, minimizing generalization error provides a principled method for modeling relationships between variables that are measured but whose relationships with the outcome(s) are left unspecified by a deductively valid theory. Overall, the minimization of generalization error is important because it quantifies the expected reliability of predictions in a way that is similar to external validity, consequently increasing the validity of the study's conclusions.
Suggested Citation: Suggested Citation