‘There’s this conspiracy of silence around how science really works’
Our editor Jon Sutton met Marcus Munafò at the Annual Conference of the British Psychological Society's Psychobiology Section, where he was an invited speaker. Winner of the British Psychological Society’s Presidents’ Award, Munafò is Professor of Biological Psychology at the University of Bristol, and a key player in the debate over replication and open science.
Has the ‘replication crisis’ surprised you?
Not in the sense that I started thinking about these issues, although I didn’t describe them in those terms, when I was a postdoc, back in 2004/2005. My entry into this metascience literature, this way of using scientific methods to look at how science works itself, was just an accident of the kind of research I was doing at the time and the need to be selective about where we focused our effort. I was doing pharmacogenetic research, looking at which genetic variants influence response to smoking cessation treatments. That was in the days when genotyping was slow, done by hand, expensive, so you couldn’t do hundreds of tests like you can now very easily on a genome-wide chip, you had to pick genes that you wanted to target. So I started to do meta-analyses to narrow down our options. The first thing I noticed was that the evidence is actually pretty flaky, very little was robust.
And that was research done by epidemiologists?
Yes, genetic epidemiology, but I was a psychologist working with them… I’ve always been a slightly unusual psychologist in that I’ve hopped around between different departments. Even now that I do work in a psychology department it is in an epidemiology unit and with geneticists.
‘Intellectual magpie’, it says on your Twitter profile.
That’s it. There’s an advantage to that – you can pick up different strengths and weaknesses of different fields and try to build together the best way of working. That’s very relevant to the current debate around reproducibility. Anyway, those meta-analyses suggested what’s now well established – candidate gene studies are bunk, basically. Very few replicated. But more interestingly it showed that you could use those databases that you created to look at other factors, to look at whether there was a correlation between the effect size that a study was reporting and which year it was published, or which country the study was done in. That really piqued my interest, that there was almost a sociology of science behind this. Different countries, because they placed slightly different incentive structures around their scientists, were creating different pressures, which led to different biases that you could see in the published literature. I thought that was fascinating.
That interest in trying to understand what was driving the behaviour of scientists became a hobby essentially, a sideline… But it has been the stuff that people noticed and cited – more so than my main research, which no one reads!
Is that because you’re more outspoken in that area, the passion of it comes across?
Maybe a bit of that. More mundanely I was lucky enough to become interested in the topic before it became fashionable, I was an early adopter of that way of thinking – I already knew a certain amount about the literature, for example John Ioannidis’ article in 2005.
I think an important way to reframe the whole debate is that we shouldn’t be talking about it as a crisis. I don’t think that’s helpful language. It tends to polarise people and put some on the defensive if they feel like their research is being criticised. I see it much more as an opportunity, to simply reflect on the way that we do science, and the way that incentive structures around us are shaped, and whether both of those things can be done better. We’ve gone past the stage of just thinking ‘Is there a problem?’. Not everyone agrees with the magnitude of the problem, but most people agree that science isn’t operating optimally. On the other hand, we haven’t really got a clear sense of what ‘optimally’ means. That’s never going to be 100 per cent of our studies being replicable, because then we would be investing huge amounts of resources into single studies where we are sure we’ve got the right answer, which would probably be wasteful, because we overpower our studies, or we would be studying a really trivial, mundane, known effect. There has to be an element of risk in the research that we do. The question is more ‘Have we got the balance right, of risk and reward?’.
You’re talking about the ecosystem science is operating in, and that takes issues of blame out of it. You’re saying it’s inevitable, that psychologists and scientists more broadly are going to work to ‘maximise their fitness’ in the system they are in.
There have been papers which have used exactly that approach. The Smaldino and McElreath one on ‘The natural selection of bad science’… we’re incentivised not to get the right answer, but to publish and get grants, and those are proxies for being good scientists. The kinds of behaviours that will lead to success in the current ecosystem – running quick and dirty studies to get lots of publications out of the door, for example – may be good for individual scientists’ careers, but won’t be good for science. Those labs that work in that way will have more progeny, will ‘succeed’ in an evolutionary sense in the current system.
But I think the reason psychologists are interested in replicability is that this isn’t an issue of people wanting to game the system consciously. It’s about how humans respond unconsciously to subtle pressures. Everyone goes into science because they’re interested in the subject matter, they’re excited about finding something out. Then they gradually feel themselves getting bent out of shape because they know they have to publish towards the end of their PhD if they want to get a postdoc, they need to get a grant at the end
of their postdoc if they want to go on to a lectureship, and they start to feel the squeeze as they go through that early career stage.
Ultimately these are questions of human behaviour, and that’s what psychologists do. So I don’t think psychology is particularly good or particularly bad as a discipline, it’s the right discipline – along with some others such as economics and potentially evolutionary biology – to think about how the current situation has arisen and what the solutions might be.
The way you approach science is as a very active, social, collaborative process; so, rather than individual research practices, human behaviour will be the key to finding ways out of this?
There’s a bit about individual research practices, but also a lot about what the funders, the institutions, the journals can do to promote best practice. We’re seeing some examples of that in Psychological Science awarding badges for certain kinds of practices, a classic ‘nudge’ intervention if you like. There’s a paper that shows a rapid change in data-sharing practices in that journal that weren’t mirrored across other journals.
But we’ve got to be alive to the possibility that what seems like a good idea actually ends up making things worse. And it’s not a one-off process you go through, coming out on the sunlit uplands of reproducible science and never having to worry about it again. The nature of any changes we introduce is that people will gradually adapt to that new ecosystem and new problems will arise, because people are always going to be consciously or unconsciously wanting to do well, to get promoted, receive awards, whatever it might be.
What is ‘doing well’ in research terms?
We are incentivised not only to find something as opposed to nothing, but also for the findings to be novel, eye-catching, groundbreaking. At Cambridge, Ottoline Leyser has this great quote, ‘We’re encouraged to do groundbreaking research, but what’s groundbreaking for? You break ground in order to build something, and if all you do is groundbreaking you end up with lots of holes in the ground.’ We’re incentivised to do the high-risk research, but not the foundational, corrective, replication research that builds the more robust edifice. It’s a bit like the situation we were in 10 years ago with the banking crisis… bankers were incentivised to go after the high-risk high-return investment and not the mundane stuff, and the cards eventually came crashing down.
Of course there are outright fraud cases, such as Diederik Stapel… a key quote of his was ‘I wanted to make the world just a little more beautiful’.
I think there he became so seduced by the narrative. But we teach that… students take lab classes, and half of the time it doesn’t work, and we sort of gloss over the messy reality of what data collection really looks like. We say that everything should look clean and beautiful. If you’ve been taught there need to be these clean narratives, these manuscripts that seamlessly move on from one experiment that worked to another, where you don’t report all of those missed steps and messy data, then you’re fuelling that belief.
The whole concept of a narrative is a bit tricky in science. We’re not storytellers. We’re trying to identify fundamental truths of nature, and sometimes nature doesn’t want to play ball. I’ve had reviewers ask me to take data, tables, figures, out of manuscripts because it messes with the narrative. They want me to present a curated version of reality rather than an honest one. You push back and you say, ‘It’s important that’s in there’, to pretend that those data don’t exist would be to make the interpretation false… it makes it more difficult, but more likely to be true.
It takes some confidence to push back though, and sometimes that comes from having a secure position. Somebody who does talk about his failures openly and candidly is James Pennebaker. He’ll say ‘try it, if it doesn’t work, doesn’t matter, move on, find something else’. That’s easy for him to say… if Pennebaker was starting out in academia today and went to his Dean of Research and said ‘I’ve just tried 13 things and none of them worked’…
I’m very conscious of that, and how fortunate I am to have got to the position where I am. I look back at some of my research and think I probably interpreted some of my results too enthusiastically when I was a callow youth… I’ve learned as I’ve gone along. By having done so, I have a responsibility to try to make things better while at the same time protecting the careers of early researchers.
Do you think we also need a shift in how we think about failing to find something?
Early-career researchers get a very distorted impression of what it takes to succeed as a mid-career or senior researcher, because there’s this conspiracy of silence around how science really works. Failure is an almost daily part of being a scientist. You get papers rejected, you get grants rejected, your experiments don’t work out the way you had hoped… and yet if you look at the psychology literature, 90 per cent of publications claim that they found what they were looking for in the first place, which is just miles away from reality. And that’s probably a combination of publication bias (only the stuff that ‘works’ gets through) and p-hacking (that is, retro-fitting results to fit the narrative). Early-career researchers think that there is something wrong with them, that they’re not succeeding at the same rate as everyone around them seems to be. It’s really healthy I think, particularly as senior researchers, to openly talk about the fact that we all go through this same process and it doesn’t go away, however successful you are. We’ve had a lot of meetings in our group where we talk about failure, and a Slack channel called Triumph and Disaster where people talk about their successes and failures. It demystifies the whole thing and strips a lot of the fear away from failure.
We need to be teaching early-career researchers resilience, because if you can’t just dust yourself off and get on with it, you’re going to have a really rough time in science. People care about their work, that’s in the nature of science, but if it really bowls you over every time you get a rejection, you’re going to have a really miserable career. We need to teach people ‘everyone experiences this, you have to dust yourself off’.
To move on to your research, how does your ‘beer goggles’ study fit into that thing of interpreting your own results?
That was an interesting one, because we did an observational study where we administered alcohol and we got the result we hypothesised, which was that people then rate faces as more attractive. But then we did a naturalistic, correlational study in a pub, and didn’t find an association between how drunk people were and attractiveness. So we’ve got discrepant results there, which we published, and it’s not clear what the best interpretation is. I wouldn’t hang my hat on either set of results, but we tried to not hide the messy results.
We publish our protocols now before we start data collection, so people can check whether we have published the results of our studies. That puts a subtle pressure on us to ensure we do publish the results of our studies, because it would be embarrassing if someone said ‘hang on, you did this study eight years ago and I can’t see any results from it’, they would call us out on that and rightly so. Those open science practices can create an environment that just subtly shapes our behaviour in a positive way… embarrassment is a powerful social force, and it just nudges us towards publishing all our results, irrespective of how they turn out.
Actually, publishing non-results is not as hard as people think it is… you won’t necessarily get it into Nature, Science and so on, but you will get it into a journal somewhere if you make the effort to write it up. And the other great thing about pre-registering a protocol is that most of your work is done, the introduction and method are already there. There’s been progress in pre-registration in that the uptake has really accelerated, and people are starting to see the benefits. But also there has been wider discussion of open science and reproducibility more generally.
One of the challenges is getting across to the public this idea of science as a progression, that we’re allowed to say ‘oh, that turned out to be wrong’.
Yes, and that goes back to the candidate gene studies… Retraction is not an appropriate corrective for studies which were published in good faith. They’re not wrong in the sense that you’ve fiddled the data or anything, it’s just that you did an underpowered study in a very noisy system, and got false positives that you over-interpreted.
In my smoking research, we’ve done a lot of work on plain packaging, which we now have in the UK. The process of introducing that legislation was relatively slow. You can’t do a randomised control study to see if it works. You have to build a case around other kinds of evidence. We did a lot of work on the impact of plain packaging on visual salience of health warnings, and those informed the government consultation. It’s a nice example of how research which uses quite basic cognitive neuroscience tools like eye tracking and fMRI and EEG can feed forward into quite applied policy settings.
You still deal with the genetics side too, what kind of insights has that produced in smoking?
That’s still very much a work in progress. We’ve got a nice paper under review at the moment showing causal effects of educational attainment on various smoking outcomes, so higher levels of attainment, more years in education, lead to reduced likelihood of starting smoking, lighter smoking if you are a smoker, and increased likelihood of quitting. We can be more confident that’s a causal effect because we used genetic variance in an almost quasi-experimental design. There’s a YouTube clip which explains Mendellian randomisation, a very elegant concept [watch it at tinyurl.com/yba55mag]. That gives policy makers information about what the likely impact is of increasing years in education on health outcomes in the population.
Do you think people will still be smoking at all in 20 or 30 years?
Probably some, but it’s really declining rapidly in this country, and the thing that’s changing it is e-cigarettes. I’m personally in favour of them as a harm reduction approach, because the harms associated with them are orders of magnitude less… there’s almost nothing you can do which is as harmful as smoking, so anything is going to be better than that, and it seems to work in terms of being an acceptable alternative to people who don’t want to medicalise their smoking, don’t want to go to a GP for help, don’t see that as a good fit for them, still want to use nicotine perhaps but don’t want to smoke.
I think we do need to do a better job of ourselves embracing uncertainty and then communicating to the public that most of our results are preliminary, that we gradually build evidence. One of the things we can do is get rid of the word ‘significant’. My lab has not used it in any of its papers in the last five or six years. It creates this false dichotomy, that once you get past a certain p value suddenly you can be confident that there’s something there. Really a p value is a sliding scale of strength of evidence. There’s nothing magical about .05. Everyone knows that, but the word creates a mindset that’s really unhelpful.
You get all kinds of variations, like ‘approaching significance’…
Have you seen Matthew Hankins’ blog on this? He’s text-mined all these tortured phrases you can use. The irony is that psychologists are always saying that everything is on a continuum, that there are no hard dichotomies, then when it comes to p values they love placing a dichotomy on this continually distributed measure of evidence.
People have been talking about that for as long as I’ve been involved in psychology. It just shows how ingrained these habits and incentive structures are.
But also that a simple intervention such as saying ‘we’re not going to use the word significance, you need to think more carefully about your evidence and the direction of the effect or association you’re observing’, just removing that word forces you to think a little bit more deeply about how you’re going to interpret your results. That’s something that a journal could do.
Do you think the journals are more key than the REF and funding bodies?
In some ways the journals are easy wins, because academics have much more control over them… I edit a journal, and one of the first things I wrote was an editorial on not using the word significant. I’m partnering our journal with a funder in a registered reports model. Chris Chambers introduced registered reports when he got on to the editorial board of Cortex.
The key group that has done very little is institutions. They are still hiring on the basis of ‘Did you publish in Nature?’ That’s not the fault of the journal. Journals such as Nature might say quite openly that they publish high-risk high-return research studies, most of which will probably turn out to be wrong, but that what’s right will be transformative. That’s fine, there’s a place for that, but we shouldn’t gear our whole hiring and promotion structure around such publications. We need to be a bit more nuanced in how we judge the quality of people’s work.
All of the stakeholders are interconnected though, all have a part to play, what one does will influence the other. It’s an ecosystem – a big whack-a-mole problem. You fix one incentive structure and something else will pop up to compensate, unless you treat it as a whole.
Our editor also met Kavita Vedhara at the Psychobiology Section Conference.
BPS Members can discuss this article
Already a member? Or Create an account
Not a member? Find out about becoming a member or subscriber