Open science and trustworthy data
In our letter (November 2015), we urged the Society’s boards and senior committees to respond to the very serious problems of replicating psychological research that were revealed by the meagre 36 per cent success rate of the Reproducibility Project’s report of 100 attempted replications. In reply, Professor Andy Tolmie commented that ‘low n research may be a more endemic part of the problem than any deliberate attempts at massaging data’. However, low ns were not the problem for the Reproducibility Project because a priori power analyses for the replications indicated that a 92 per cent replication rate was predicted based on the originally reported effect sizes.
The Project’s report (Open Science Collaboration, 2015) noted that the best predictor of replication success was the effect size observed in the replication, which is independent of sample size. Sadly, the average effect size for the replications was less than half of that for the original studies. The report described the original studies as having ‘upwardly biased effect sizes’. It seems likely that the psychology literature reflects questionable research practices that can inflate effect sizes, such as: p-hacking, unreported removal of troublesome data, and capitalising on chance through selective publishing after adjusting a paradigm to produce significant results or reporting a ‘successful’ dependent variable but not those showing smaller effects.
One issue that we raised in our letter was the temptation faced by junior researchers to further their careers by removing, adjusting or inventing data. The rewards for such data manipulation can be considerable, while the dangers of discovery under present systems are very small. A recent case provides an illustration: Manipulation of data by a junior author has led to the withdrawal of papers from recent issues of three leading psychology journals (Journal of Experimental Psychology: Human Perception and Performance; Attention, Perception & Psychophysics; and Psychological Science). Details of this case can be found at tinyurl.com/jo4n8my. Of course, senior as well as junior researchers have provided false data: Diederik Stapel’s well-known case (tinyurl.com/5tlc4vp) is a powerful example. We expect that most researchers provide complete and accurate data, but it is clear that psychological researchers are subject to temptations and that rewards can sometimes overcome integrity.
Today’s technology facilitates sharing data between members of a research team as the data are collected, including details of the date, time and conditions of collection. If researchers were to update these records in real time, including (where possible) contact details for the participants, other members of the research team could routinely contact a sample of the participants to check that they had been tested as claimed. A similar policy could be adopted by research students and their supervisors. Just as papers report inter-rater reliability, they would report the percentage of participants who had been verified and provide clear explanations of any discrepancies. Of course, this procedure would not prevent determined fraud or data doctoring, but it could be one step towards redressing the balance between the benefits and costs of inventing data. The present arrangements, where the participants whose data form the basis of important psychological research claims cannot be traced, might surprise anyone from outside psychology who decided to evaluate the reliability of the discipline.
Tackling the multiple sources of the problem of reproducibility will involve wide, serious and determined efforts that are likely to require a change in the way that research is conducted. However, with a government seeking to make very major cuts in public spending, psychology must be able to defend itself from an accusation that much research funding is being wasted because its research findings cannot be trusted. As a discipline, we must take steps to guard against falsification of data, selective reporting and overreliance on p-values rather than effect sizes.
Professor Peter E. Morris
Dr Catherine O. Fritz
University of Northampton
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science 349(6251). doi: 10.1126/science.aac4716
BPS Members can discuss this article
Already a member? Or Create an account
Not a member? Find out about becoming a member or subscriber