Science or alchemy?

Letters on the Reproducibility Project.

19 October 2015

We are writing following the publication of the Reproducibility Project (reported in The Psychologist, October 2015, p.794) to encourage the Society’s Boards and its Editorial Advisory Group to take steps to identify effective ways to respond to the implications of the Project and implement them. We are concerned that there has been an element of complacency and even self-satisfaction, in the reporting of the Project. It is claimed, for example, by the Project’s corresponding author, that the Project shows the essential quality of self-correction. However, the Project has attracted attention in part because it is unique within psychology, and it is unlikely to be repeated regularly because it depends upon many researchers giving up their time and resources voluntarily for little personal reward. Few institutions would be happy with researchers doing so regularly at the expense of their main research objectives.

The collective results make very embarrassing reading for psychology. The bottom line is that for any recently published significant result in a leading psychology journal, there is only a one in three chance that the research, if repeated, would produce a statistically significant replication. This lack of reliability must be a deterrent to the application or extension of new research. Furthermore, the effect size of the repeated study is likely to be less than half of that originally reported. Any potential users or students of psychology who encounter these findings are likely to question the legitimacy of the discipline.

Some of the reasons for the very poor replicability of published research have been widely discussed. Selective publishing, p-hacking, and other ways of massaging results exist, and strategies of registering all planned research can help to address them, but this needs to be formally incorporated into research procedures. However, we believe that there is a further possibility that has not been mentioned in the reports but that will have contributed to some of the misleading original findings. Data are often collected by research assistants and postgraduate students, and the temptation to report the results desired by their employers or supervisors must sometimes lead to data that have been adjusted or possibly invented. There have been a few published examples of identified data fixing, but much more will have been going on. The rewards for falsification are big and, at present, the risks of being caught are small. It will take imaginative procedures established from the top of the profession to reduce, with a goal of eliminating, the temptations and opportunities to cheat.

At present, attempting replications is a low-status activity and publishing the results is difficult. The use of databanks to keep attempted replications publically available is a step in the right direction, but such databanks need to be permanently well funded, and the Society may be able to help here. Even then, the balance in status between replication and original research needs to be shifted where possible. There is a place for the Society’s journals to encourage the publication of attempted replications, and an investigation into how this could be achieved in practice without excessive increase in costs and reader boredom needs to be undertaken.

One step that might be considered by teachers of psychology at all levels, as well as textbook authors, is to cite only research that has been replicated. This means forgoing introducing some new, novel findings that might entertain students but which are more likely to fail to replicate. Such a strategy could help to support the publication of replications, if their publication was necessary for the advancement of the knowledge of students and other users of psychology. Ofqual and the various exam boards currently select the studies addressed in AS- and A-level exams; the Society could and probably should encourage them to take similar steps.

We hope to hear that the Society, in response to the reports of the Replication Project, is taking a leading role in developing a secure knowledge base in psychology so that the science of psychology will be respected and imitated.

Professor Peter E. Morris
Dr Catherine O. Fritz
University of Northampton

Professor Andy Tolmie, Chair of the BPS Editorial Advisory Group, comments: The Reproducibility Project is without question an important piece of work, and the EAG discussed the implications of its findings and the related Open Science Framework at its last meeting. We believe there is a clear need for a coordinated, discipline-wide response, however, which goes substantially beyond the publishing practices of journals. This includes a searching analysis of the reasons for poor replicability – low n research may be a more endemic part of the problem than any deliberate attempts at massaging data – and the messages for research funders, who commonly hold different expectations about the scale and costs of work in psychology compared to other scientific disciplines, and who regard the funding of replications as a low priority. Journals may need to adopt a different stance to publication of such work, but in fact we actually receive relatively few submissions of this kind – to a large extent, the source of the problem lies further back in the research pipeline.

Professor Daryl O’Connor, Chair of the BPS Research Board, comments: This Project represents an important step forward for psychological science specifically, and science more generally. Other areas of science have encountered problems with reproducibility in the past, for example, clinical medicine and genetics, therefore, psychology is not alone. However, publication of this report is likely to propel psychological researchers forward, improve scientific practice and trigger new ways of working.

A great deal of publicity has been given to the recent findings of the Reproducibility Project, with the Guardian summarising: ‘Findings vanished when experiments repeated.’ The lead author, Brian Nosek, asserted that the study should not be used as a stick with which to beat psychology; if anything, this is an example of science at its best. While I agree, we should not ignore the implications of the problem of lack of replication. Outcome studies on the effects of psychotherapy are rarely, if ever, truly replicated. There are many reasons for this as I discussed in a recent paper (Marzillier, 2014). This is true of systematic reviews and meta-analyses as well as individual research trials.

To take one example, in 2007 on the basis of a meta-analytic review it was confidently asserted that the trauma-exposure therapies and eye movement desensitisation and reprocessing (EMDR) were clearly superior to other psychological therapies in the treatment of people with problems such as PTSD (Bisson et al., 2007). This scientific conclusion informed the guidelines to practitioners published by the National Institute of Health and Clinical Excellence. Trauma-exposure therapies and EMDR are the therapies of choice. But are they? A year later and another meta-analytic review concluded that there was no good scientific evidence to conclude that any one form of trauma therapy was more effective than another (Benish et al., 2008). Predictably since then there has been a dispute between the authors of the different reviews (Ehlers et al., 2010; Wampold et al., 2010).

The truth is that we cannot rely on conclusions drawn from reviews of research studies into the effects of psychotherapy for several reasons that I discuss in my article, one of which is few if any studies are actual replications. In this field there is a lot of noise in the system. What was confirmed at one point is almost always later questioned. As psychologists, we should understand this and not pretend that the scientific evidence is better than it is. In particular, we should question the way what is a limited and flawed database is transformed, like base metal into gold, to produce rigid guidelines about which therapies can or cannot be used. This is not science but something else, more to do with vested interests, power and prestige. It should and must be resisted.
John Marzillier
Oxford

References
Benish, S.G., Imel, Z.E. & Wampold, B.E. (2008). The relative efficacy of bona fide psychotherapies for treating post-traumatic stress disorder. Clinical Psychology Review, 28, 746–758.
Bisson J., Ehlers, A., Matthews, R. et al. (2007). Systematic review and meta-analysis of psychological treatments for post-traumatic stress disorder. British Journal of Psychiatry, 190, 97–104.
Ehlers, A., Bisson, J., Clark, D.M. et al. (2010). Do all psychological treatments really work the same in post-traumatic stress disorder? Clinical Psychology Review, 30, 269–276.
Marzillier, J. (2014). The flawed nature of evidence-based psychotherapy. Psychotherapy Section Review, 51, 25–33.
Wampold, B.E., Imel, Z.E., Laska, K.M. et al. (2010). Determining what works in the treatment of PTSD. Clinical Psychology Review, 30, 923–933.