The Delphi method

Susanne Iqbal and Laura Pipon-Young with a step-by-step guide
The Delphi survey method is popular in many disciplines. Originally developed in the US as a means of forecasting future scenarios, this method has been used to determine the range of opinions on particular matters, to test questions of policy or clinical relevance, and to explore (or achieve) consensus on disputed topics.

Although there is considerable variation in how the method is applied, the Delphi method has its own distinct characteristics:

I    It uses a group of participants (known as ‘panellists’) specially selected for their particular expertise on a topic.

 

I    It is often conducted across a series of two or more sequential questionnaires known as ‘rounds’. It employs an initial ‘idea generation’ stage, in which panellists are asked to identify the range of salient issues.

I     It collates ideas from Round 1 to construct the survey instrument distributed in subsequent rounds.
I    It has an evaluation phase (third or further rounds) where panellists are provided with the panel’s responses and asked to re-evaluate their original responses.

I    It is interested in the formation or exploration of consensus, often defined as the number of panellists agreeing with each other on questionnaire items.

The Delphi method is particularly useful in areas of limited research, since survey instruments and ideas are generated from a knowledgeable participant pool (Hasson et al., 2000), and it is suited to explore areas where controversy, debate or a lack of clarity exist.
However, the many applications and descriptions available in the literature can be confusing, and the Delphi method, with a few exceptions, remains relatively unexploited in psychological research (e.g. Graham & Milne, 2003; Haggard & Haste, 1986; Haste et al., 2001; Jeffery et al., 2000; Petry et al., 2007). To make the Delphi method more accessible for psychologists unfamiliar with this method, this article provides a practical step-by-step guide based on our experiences of conducting Delphi studies in clinical psychology. It is not an exhaustive account, and further guidance is available elsewhere (e.g. Hackett et al., 2006, Graham & Milne, 2003, Keeney et al., 2006, Schneider & Dutton, 2002).

What is the aim of your study?
The first step is to determine whether the study aims to measure the diversity of opinions on a topic or to steer a group towards consensus. This is an important distinction in terms of the execution of the Delphi. In general, if your study aims to generate consensus, three or more rounds are preferable. Ideally, the same panel should be retained throughout and high response rates are particularly important in order to determine the impact of group feedback on panellists.

On the other hand, if the Delphi process is a means of measuring opinions, fewer rounds are generally acceptable. Having a complete dataset is less vital, and the panel can be expanded across rounds by inviting more panellists in Round 2.

Decide the structure
The next step is to decide the number of rounds, to draw up a timeframe and to construct study materials (e.g. letters to participants, consent forms, complete ethics procedures).

A two-round Delphi (e.g. Petry et al., 2007) is most suitable when there is a clear literature base from which to establish the survey instrument and if the main aim is to take the temperature of opinion on a topic. Although quantitative questionnaires have been used in the first round, a qualitative first round is optimal, because the primary function of the Delphi method is to explore an area of future thinking that goes beyond the currently known or believed. Also, the reliability and validity of the study may be improved if an initial group of experts produces the items.

When exploring consensus, rounds may continue until consensus is reached. However, this approach can quickly compromise panellists’ response rates and enthusiasm. Three rounds, which would typically take four months, often suffice (Stone Fish & Busby, 2005).

Selecting panellists
Panellists form the lynchpin of the Delphi, and clear inclusion criteria should be applied and outlined as a means of evaluating the results and establishing the study’s potential relevance to other settings and populations. The number of panellists depends very much on the topic area as well as the time and resources at the researchers’ disposal. Although Delphi surveys have been conducted with as few as seven and as many as 1000 panellists, Turoff (2002) recommends panels between 10 and 50. These numbers seem more appropriate, given the amount of data and subsequent analyses each panellist generates.

Researchers must also decide how to conceptualise and define ‘expertise’. The method may be undermined if panellists are recruited who lack specialist knowledge, qualifications and proven track records in the field (Keeney et al., 2001), although of course expertise comes in many guises and may include those who are ‘experts by experience’ (Hardy et al., 2004). In general, a varied panel is considered best in producing a credible questionnaire, and individuals who might provide a minority or differing perspective should be actively recruited to the panel (Linstone & Turoff, 2002). With regard to the recruitment process itself, panellists are often recruited via letter or e-mail. Recruitment can be broadened through ‘snowballing’ (asking panellists to pass on invitations  to other relevant individuals).

The questionnaires
The more open-ended the Round 1 questionnaire (Q1) the better, ideally involving a series of open-ended questions inviting panellists to brainstorm. A quantitative ‘tick-box’ style format may also be used, but since the Delphi method sets out to generate new ideas, a quantitative Q1 seems to defy this purpose.

The Q1 is usually created following a detailed literature review, consultation with relevant individuals and consideration of the aims of your Delphi study. Generally speaking, asking panellists to spend 30 minutes completing the questionnaire is considered reasonable, and pilot testing is essential
to determine timeframes as well as readability and relevance of the questions.

Online surveys can be an efficient alternative to posting questionnaires and often appeal to panellists. Web services (e.g. surveymonkey.com) can be a simple way of constructing online questionnaires. Once the questionnaire has been distributed, following up non-responders is recommended as high response rates can improve the credibility of a study (Beretta, 1996). Ideally, a 70 per cent response rate should be maintained (Sumsion, 1998). We found that regular contact, flexibility around eadlines and individual ‘thank you’ messages increased response rates.

The Round 2 questionnaire is constructed from the data gathered from the Q1. Commonly, a quantitative, ‘tick-box’ style survey using Likert (1932) type agreement scales or ranking scales are used. The construction of the Round 2 questionnaire (Q2) is often time-consuming. The use of methodological tools such as qualitative content analysis (e.g. Graneheim & Lundman, 2004) or thematic analysis (e.g. Braun & Clarke, 2006) is necessary to make the study methodologically more robust. Furthermore, careful attention to principles of questionnaire design is vital, and extended piloting may be necessary to iron out ambiguous, repetitive or inaccurate items.

On return of the Q2, descriptive data analyses of the panel’s responses can begin so that the Round 3 questionnaire (Q3) can be constructed. The purpose of the Q3 is to invite panellists to consider their scores in the light of the group response and decide whether they want to change any of their responses. We suggest feeding back percentages and providing individual round scores for every item (see Figure 1). This provides a visual means for the panellists of assessing the diversity of responses. It also allows them to check that researchers have recorded correct responses.

Analyses and dissemination
Upon receipt of the completed Q3, you need to check whether any changes have been made, in which case the data need to be re-analysed. Percentages, medians, interquartile ranges, means and standard deviations are commonly calculated.

Results can be presented in various ways. This includes reporting only those items that have reached a pre-agreed level of consensus (e.g. Petry et al., 2007), listing all items in order of consensus magnitude (Hardy et al., 2004), or also reporting those areas in which there is debate amongst the panel.

Finally, disseminate your findings (write a consensus report, article, present findings to services, etc.) amongst concerned parties, including your participants. 

Strengths and weaknesses
Like any other survey method, the Delphi method has strengths and weaknesses. These are summarised in Table 1, and further critiques can be found in and further critiques can be found in Goodman (1987), and Sackman (1975). In our experience, the benefits outweigh its drawbacks; and this method seems particularly relevant for psychology. Traditionally, there has been a divide between quantitative and qualitative methods. The Delphi method can straddle this divide. By virtue of its procedural structure (to incorporate both qualitative and quantitative methods), it provides the opportunity to achieve a more complete picture of the phenomenon under study.

 

The Delphi in use
Iqbal et al. (in press) used the Delphi to explore and approach consensus in a study exploring sexually inappropriate behaviours in children under the age of 10.  Review of the pertinent literature had revealed that children’s sexual behaviours were judged differently by different professionals. The Delphi seemed appropriate to explore this sensitive topic as it is an ideal tool to expose all the different positions, including arguments for and against these positions, to generate consensus and to communicate this.

The Delphi is a method for structuring a group communication process so that the process is effective in allowing a group of individuals, as a whole, to deal with a complex problem’ (Linstone & Turoff, 2002, p.3), and based on the idea that it is possible and valuable to reach a consensus (Stone Fish & Busby, 2005). By feeding back percentages of all views to each participant and inviting them to reflect on their responses in the light of these scores, consensus was achieved that children who display sexually inappropriate behaviours should not be called ‘sex offenders’. No consensus was achieved with regard to many other sexual behaviours, particularly those considered ‘normal’.

In line with the Delphi methodology, results were disseminated via participants themselves, reports, and journal publications. ‘Divergence’ (when no consensus was achieved) was also fed back to highlight how little agreement existed amongst very experienced professionals with regard to what counts as normal sexual behaviours.

This should improve practice and allow for further research to be carried out.

(Tables and figures can be seem in the PDF verison of this Article)

Susanne Iqbal is a Chartered Clinical Psychologist at George MacKenzie House, Fulbourn Hospital, [email protected]

Laura Pipon-Young is a Chartered Clinical Psychologist at the Women’s Service, Secure & Forensic Services, Hellingly, East Sussex

BPS Members can discuss this article

Already a member? Or Create an account

Not a member? Find out about becoming a member or subscriber