From the Quarterly Journal of Business and Economics, Volume 23, Number 1 Issue Econometric Replication: Lessons from the Experimental Sciences Robert A. Mittelstaedt Nathan Gold Professor of Marketing University of Nebraska-Lincoln Thomas S. Zorn Assistant Professor of Finance University of Nebraska-Lincoln I. Introduction The case for replication has been made by many. The editor of the Journal of Irreproducible Results puts it this way: The glorious endeavor that we know today as science has grown out of the murk of sorcery, religious ritual, and cooking. But while witches, priests, and chefs were developing taller and taller hats, scientists worked out a method for determining the validity of their experimental results: they learned to ask “are they reproducible?” [7]. The experimental disciplines have developed a tradition of replication that Kane [4] calls “improvisational”. It is the purpose of this paper to examine more closely the nature of replication in the experimental tradition; this study argues that a similar view is important in econometric research and suggests that a restructuring of incentives for researchers is in order. 1. Replication in the Experimental Tradition By their nature, experiments are reproducible events. That fact in itself might lead to the expectation of replication in the experimental disciplines. The experimenter’s emphasis on replication, however, grows less from the ease of doing than from those disciplines’ attitude toward the relationship between data and knowledge. The French physiologist Claude Bernard noted over 100 years ago: The fact is nothing in itself, it has value only through the proof it supplies. We have said elsewhere that, when one calls anew fact a discovery, the fact itself is not the discovery, but rather the new idea derived from it; in the same way, when a fact proves anything, the fact does not itself give the proof, but only the rational relation which it establishes between the phenomenon and its cause [1]. Shifting the focus of attention from the phenomenon to the relationship among variables raises questions about the robustness and generalizability of those relationships. This suggests replication is vital, and that it may be done for several purposes. The term “replication” has yet to be defined. Beyond the general statement that it is an attempt to test the consistency of a relationship among two or more variables under si111ilar or predictably different conditions, the term nearly defies precise definition. It is useful to describe several types of replication as shown in Exhibit I. Exhibit 1—Types of Replication, Classified by Activity of Replicator Compared to Original Study’s Methods and Data Sources Observations Of Same Phenomena Observations Of Different Phenomena Same measures and methods of establishing relationships Type I Type III Different measures and methods of establishing relationships Type II Type IV A Type I replication is described by Kane [4] as an “econometric audit.” The replicating researcher (RR) uses the same data sources, models, proxy variables and statistical methods as the ori~nal researcher (OR). In Type II replication, RR uses the same data sources, but employs different models, proxy variables and/or statistical methods. In Type III replication, RR US~3 the same models, proxy variables and statistical methods, but applies them to different data than those used by OR. In Type IV replication, different models, proxy variables and statistical methods are applied to different data.1 Each type contributes to the growth of a discipline’s knowledge base. It is the attempt to replicate that makes the contribution; something is learned whether the original findings are confirmed or not. To illustrate, imagine OR has hypothesized Y = f(x1, x2, x3) and shown this relationship holds when tested by a multiple regression model on U.S. data for the years 1953-1978. RR attempts to replicate using OR’s exact methodological recipe on the same data, a Type I replication. If RR’s results match OR’s, it would be beneficial to the research community to know of the confirmation; OR’s finding has been “audited.” Unfortunately, RR’s confirmatory results will likely go unreported. On the other hand, if RR’s results do not confirm OR’s findings, it may be reported in the literature. While this information is very important to the community of researchers, it adds almost nothing to the base of knowledge. 1 Kane [4] refers to Type II and Type III replications as “improvisational replication.” Lykken [6] calls Type I “literal replication,” Type II “operational replication,” and Type III and Type IV “constructive replication.” Type I confirmatory findings should be reported because they eliminate doubts of gross errors in OR’s study. They provide important incentives to both OR and RR. Economists are aware of the importance of incentives. If confirmatory findings are typically not reported, few individuals are likely to undertake such efforts on the small chance that an error of sufficient magnitude to warrant publication will be uncovered. Moreover, if OR knows that his steps are likely to be retraced, his incentives to avoid errors are affected. Type I confirmatory replication need not be reported at any length (although the material should be available on request). Nor is it suggested that RR’s work should be esteemed by the profession as highly as OR’s. Without Type I replication, however, gross errors may persist unnoticed. RR might attempt a Type II replication by following OR’s recipe on the same data, but altering the specification of the model to improve its explanatory power. If an improvement results, the research community would benefit. Alternatively, RR might show OR’s findings were sensitive to the method used and lacked robustness. In either case, it seems likely that RR will have considerable difficulty finding a journal to publish these results; they lack the “glamour” of discovery of a major error by OR. RR’s finding that OR’s work was insensitive to change in method would be even less likely reported. This is unfortunate, as this form of confirmation would probably advance knowledge further, for it shows that OR’s results possess robustness. There is a temptation to suppose that OR should do this type of sensitivity analysis. Excessive “data mining” by OR, however, may be viewed as a fishing expedition for “significant” results, a technique scorned by statisticians. Ideally OR postulates a relationship or hypothesis, formulates an econometric model, and does one computer run. OR should not be encouraged to try every conceivable specification that current technology permits. It is difficult to specify the rules for OR and RR with respect to Type II manipulation and replication. Only shrewd judgments by qualified researchers can distinguish mindless variation and Type n replication. Interesting examples of the latter were stimulated by Milton Friedman’s research on the demand for money. Replication of his findings led to the discovery of the sensitivity of money demand to the interest rate. A number of studies that directly or indirectly addressed this issue were published because important policy issues were at stake.2 Clearly, the importance of the original study affects the value of Type n replications. This episode illustrates that Type II replications cannot be dismissed out of hand; rather, they form an important basis for the tentative acceptance or rejection of any hypothesized empirical relationship by the profession. Valuable as confirmatory Type I and Type II replications may be, they demonstrate little about the generalizability of the relationships. Major impediments to generalizability can remain: (1) OR’s confirmed findings may be an artifact of 2 Laidler [5] cites and discusses a number of these studies. data which contains systematic error, (2) the data may be error free, but the observed levels of X1, X2, and X3 are peculiar to the time period and environment observed, and (3) y is driven by an unobserved variable (X?) which, had it been measured, would have explained a greater proportion of the variance than one or more of the independent variables observed by OR and used by RR. The extension of knowledge involves the search for generalizability. In a Type III replication, RR applies OR’s methods to different data. Confirmatory results help to rule out the possibility that OR’s findings resulted from artifacts of error ridden data, observations made in unique circumstances, or simple data “massaging”. A Type III replication, however, cannot rule out the possibility that some unobserved variable, common to both sets, drives Y. Obviously, a failure to replicate in the Type III sense suggests a limit to the generality of the relationship. It should be noted that a Type III replication in econometrics comes the closest to replication in the experimental sciences. Both Type I and II replications represent RR peering over the shoulder of OR and critically evaluating the experiment. Type III replication is a crucial step toward the generalizability of the results; it needs to be institutionalized as part of the research process. The caution exhibited in the experimental sciences, with respect to results that have not been confirmed by independent researchers, should be noted. It is arguable if Type IV replications should be considered replications at all. This depends on the claims made by OR (i.e., how general his results should be considered) and, in part, on the intentions of RR. If RR replicates the study with different data (a Type III replication), and is forced to use a different proxy for one of the variables, such a study should be considered a Type IV replication. If RR uses what he considers an improved econometric model, it could still fall within the category of replication. A Type IV replication should be a skillful blend of Types II and III. At some point, the changes become so significant that the study would not be considered a replication. The current emphasis on originality, particularly by journal editors and referees, probably encourages needless and wasteful product differentiation by researchers who are understandably anxious to be published. More might be learned if fewer such studies were attempted. Many Type IV replications are not consciously undertaken. These Type IV replications are not the result of RR’s empirical work, but the result of a scholarly review of the work of others. A good literature review examines the generalizability of a particular relationship; the developing field of meta-analysis formalizes the process [3]. Testing the limits of generalizability can reach far beyond the boundaries of one discipline. For example, confidence in the assertion that people seek to maximize the return from their own efforts increases when it is learned that Heinrich [2] demonstrated that pollen-gathering bumblebees exhibit the same optimizing behavior. 2. The Gains from Replication The number of known relationships that exhibit sufficient invariance and universality to merit the label “law” are few. This does not relieve researchers from the responsibility to test the limits of those observed relationships. “Original research” tends to be rewarded, and “mere replication” tends to be downgraded. In the end, however, if all that is known is that “X is related to y by a particular set of observations using a particular statistical technique,” little is known about the world around us. That which isn’t worth replicating isn’t worth knowing. Those who produce “original research” thus have the most to gain from replication. Because both confirmations and disconfirmations contribute to the base of knowledge, the research community should have the opportunity to know of all attempts to replicate. Confirmations extend the generalizability of knowledge; disconfirmations suggest new approaches must be found. It is important that econometric replications be taken as seriously as replications in the experimental sciences. References 1. Claude Bernard, An Introduction to the Study of Experimental Medicine, translated by Henry C. Greene (New York: Collier Books, 1961). 2. Bernd Heinrich, Bumblebee Economics (Cambridge: Harvard University Press, 1979). 3. John E. Hunter, Frank L. Schmidt and Gregg B. Jackson, Metaanalysis: Cumulating Research Findings Across Studies (Beverly Hills: Sage Publications, 1982). 4. Edward J. Kane, “Why Journal Editors Should Encourage the Replication or Applied Econometric Research”, Quarterly Journal of Business and Economics, 23, No.1 (Winter 1984), pp. 3-8. 5. David E. Laidler, The Demand for Money: Theories and Evidence (Scranton: International Textbook Company, 1969). 6. David T. Lykken, “Statistical Significance in Psychological Research”, Psychological Bulletin, 70 (February 1968), pp. 151-159. 7. George H. Scherr, “Irreproducible Science: Editor’s Introduction”, The Best of the Journal of Irreproducible Results (New York: Workman Publishing,1983).
© Copyright 2025 Paperzz