The reproducibility project: Disaster or triumph for psychology?

…or both? Our editor Jon Sutton reports on today's announcement, and how psychologists felt to be part of the initiative.

‘Ever since Brian Nosek and his colleagues first set up the Reproducibility Project in 2011 many psychologists have been twitchy.’ So begins psychologist Professor Dorothy Bishop, writing in The Guardian. How are psychologists feeling today, now that the Project has announced its findings that only 36 per cent of findings in psychology appear to stand up to an replication attempt? [See coverage and links on our Research Digest blog]

The reaction has so far been decidedly mixed, with some sections of the media reporting ‘study reveals that a lot of psychology research really is just ‘psycho-babble’.’ Science journalist Ed Yong presents a more reasoned account, pointing out that nobody really knows whether 36 per cent is good or bad. And scientists and psychologists themselves appear to be looking on the bright side: John Ioannidis, who in 2012 published his anticipation of a 53 per cent replication failure rate for psychology, writes that ‘Hopefully this successful, highly informative paradigm will help improve research practices in this field. Many other scientific fields without strong replication cultures may also be prompted now to embrace replications and reproducible research practices.’ Writing on Mind Hacks, psychologist Dr Vaughan Bell says that the project is ‘either a disaster, a triumph or both for psychology’.

We recommend you read all the coverage, starting with our Research Digest and moving on via Ed Yong. Here we take a slightly different tack: how did those who took part in the project feel about it? Two psychologists spoke to the press conference and Science announcement.

Joshua Correll, a psychologist at the University of Colorado, Boulder: ‘This is how science works. How else will we converge on the truth? Really, the surprising thing is that this kind of systematic attempt at replication is not more common. [The failure to replicate my results, by Etienne LeBel’s lab at the University of Western Ontario in Canada] does not convince me that my original effects were [a] fluke. I know that other researchers have found similar patterns ... and my lab has additional data that supports our basic claim.’

Dr E.J. Masicampo: ‘I led a research team that conducted one of the 100 replications, and at the same time, an original author of one of the studies that was targeted for replication by another team.

So I'll share my experience in both of these roles and talk about how the project was conducted. The project was community crowd sourced. Researchers volunteered for the project after hearing about it through various outlets such as conference presentations or psychology listers.

In the end, enough research teams volunteered that 100 replications were conducted with the vast majority of replication teams being led by researchers with Ph.D.'s.

As research teams joined the effort, they were presented with a pool of studies drawn from three prominent psychology journals and the replication teams selected the studies they would replicate based on their interests, expertise, and available resources.

The next steps involved designing the study and then conducting it, and at both stages, there are numerous steps taken to maximize the quality of the replications that were conducted.

When designing their studies, replication teams consulted with original authors. This ensured that the replication studies were as faithful as possible to the originals. Each study was also highly powered, meaning plan to run enough participants to ensure high odds of detecting the original effect.

And before data were collected, the study designs were reviewed both by the original authors and by third-party reviewers. And in addition to this, the increased transparency replication teams registered and uploaded their methodology to a central repository.

So this openness served to increase each replication team's accountability, and ideally the quality of their work. Replication teams then collected and analyzed – according to the preregistered plan.

Teams completed a write-up of the results and interpretations, which were, again, reviewed by the original authors and a third-party, and once again, to increase transparency, final write-ups were uploaded to a central repository.

So to reiterate some of the project's features, this was a highly collaborative process not just in terms of many teams contributing replication studies, but also in terms of the individual replications involving collaboration between replication teams and original authors.

There were third-party reviews of each stage of the process and the entire process was transparent with all materials registered and made publically available. So personally, I found the project to be precisely what I hoped it to be.

As the leader of a replication team, I was excited by the scope of the project. I felt I was taking part in an important groundbreaking effort and this motivated me to invest heavily in the replication study that I conducted.

I also saw the project was a fun opportunity to replicate a well-respected study that I had long been fascinated by. In fact, the study I replicated is one that I teach to my undergraduates every semester.

So the experience was also made even better by the fact that I was able to work closely with the original authors in executing the study. As an original author whose work was being replicated, I felt like my research was being treated in the best way possible.

The replication team was competent and motivated. I was consultant at every stage of the process. Everything was transparent, and I felt confident that a best attempt at replicating my work was being made.

So in short, I think these were well-planned replications conducted by highly qualified and motivated scientists; thus, one major benefit of this project is that it really served as a model for how to conduct high quality replications and how to coordinate a very large number of them.”

Are you a psychologist who took part in the project? We would love to hear your views on the process. Or perhaps you are a psychologist with a view on whether that 36 per cent figure is good, bad or both? Comment below (if you are a British Psychological Society member), e-mail [email protected] or tweet @psychmag.

- See also our opinion special on replication from May 2012, and our announcement of the initiative.

Other links: 
Here's why science can't make up its mind
Why so many psych studies may be false
We found only one-third of published psychology studies is reliable - now what?
No, science’s reproducibility problem is not limited to psychology
Coverage in the Times Higher Education
Three popular psychology studies that did not hold up
Now what?
How to rethink psychology
Psychology is not in crisis
Psychology's reproducibility problem is journalism's problem too

BPS Members can discuss this article

Already a member? Or Create an account

Not a member? Find out about becoming a member or subscriber

Comments

We (Catherine Fritz and I) are writing following the publication of the Reproducibility Project (see the Psychologist, Vol 28, no. 10, p. 794) to encourage the Society’s Boards and its Editorial Advisory Group to take steps to identify effective ways to respond to the implications of the Project and implement them. We are concerned that there has been an element of complacency and even self-satisfaction, in the reporting of the Project. It is claimed, for example, by the Project’s corresponding author, that the Project shows the essential quality of self-correction. However, the Project has attracted attention in part because it is unique within Psychology, and it is unlikely to be repeated regularly because it depends upon many researchers giving up their time and resources voluntarily for little personal reward. Few institutions would be happy with researchers doing so regularly at the expense of their main research objectives.

The collective results make very embarrassing reading for psychology. The bottom line is that for any recently published significant result in a leading psychology journal, there is only a one in three chance that the research, if repeated, would produce a statistically significant replication. This lack of reliability must be a deterrent to the application or extension of new research. Furthermore, the effect size of the repeated study is likely to be less than half of that originally reported. Any potential users or students of psychology who encounter these findings are likely to question the legitimacy of the discipline unless the situation is improved.

Some of the reasons for the very poor replicability of published research have been widely discussed. Selective publishing, p-hacking, and other ways of massaging results exist, and strategies of registering all planned research can help to address them, but this needs to be formally incorporated into research procedures. However, we believe that there is a further possibility that has not been mentioned in the reports but that will have contributed to some of the misleading original findings. The pressures upon those who collect data that will further their careers must surely lead to some spurious data being reported. Data are often collected by research assistants and postgraduate students, and the temptation to report the results desired by their employers or supervisors must sometimes lead to data that have been adjusted or possibly invented. There have been a few published examples of identified data fixing, but much more will have been going on. The rewards for falsification are big and, at present, the risks of being caught are small. More should be done to redress this situation. It will take imaginative procedures established from the top of the profession to reduce, with a goal of eliminating, the temptations and opportunities to cheat and, by so doing, to restore the respectability of psychological research.

At present, attempting replications is a low status activity and publishing the results is difficult. The use of databanks to keep attempted replications publically available is a step in the right direction, but such databanks need to be permanently well funded, and the Society may be able to help here. Even then, the balance in status between replication and original research needs to be shifted where possible. There is a place for the Society’s journals to encourage the publication of attempted replications, and an investigation into how this could be achieved in practice without excessive increase in costs and reader boredom needs to be undertaken.

One step that might be considered by teachers of psychology at all levels, as well as textbook authors, is to cite only research that has been replicated. This means forgoing introducing some new, novel findings that might entertain students but which are more likely to fail to replicate, thus degrading the perception of the field. Such a strategy could help to support the publication of replications, if their publication was necessary for the advancement of the knowledge of students and other users of psychology. Ofqual and the various exam boards currently select the studies addressed in AS and A-level exams; the Society could and probably should encourage them to take similar steps.

We hope to hear that the Society, in response to the reports of the Replication Project, is taking a leading role in developing a secure knowledge base in psychology so that the science of psychology will be respected and imitated.