to RCT or to not RCT. On the issue of “evidence” and public health. 

This is a blog on “public health” and evidence. It’s a bit murky….and is a thorny issue
The blog is in 2 sections – interpreting evidence and generating evidence

Context….complexity is difficult to reduce in evidential terms

Imagine the complexity of something like “obesity” and sorting “the evidence” for this…..across multiple domains

Quoting Rutter (2013)

“The intellectual key to understanding obesity is to pursue that shift of paradigm, in the true sense of the phrase (Kuhn 1996), from conceptualising it as a complicated system, with many parts that interact in predictable linear ways, to understanding it as a complex one, with all that entails. It is very hard to reject a paradigm, but as the incommensurable gradually becomes irreconcilable, something will have to change. Perhaps, just perhaps, if I’m approached again in a few years time for advice on tackling obesity I will be asked a different kind of question… ”
And further back, Rutter in 2011

“Tackling obesity demands an approach that does not merely coordinate the discrete actions of a huge number of individuals, organisations, and sectors. Those actions need to be integrated, their unintended consequences understood, corrective actions undertaken, ineffective interventions stopped, and effective ones continuously tweaked and improved. We need to move from small steps and single solutions to ”big thinking, many changes”,6 taking a broad ecological approach.”
…….Thus it’s the same for the evidence we either generate or use & interpret

Now apply the same to something even more fuzzy like the intersection of “poverty and health” or “health inequalities”….

Tricky isn’t it…..

1) Interpreting evidence and “public health”
Evidence in the space of social policy is full of holes.

There are many issues around application of the clinical evidence paradigm to many aspects of public policy – doesn’t work well.

And as we know the evidence base for clinically focused interventions is always going to be better developed than for policy focused interventions….for a whole bunch of reasons.

Applying biolmedical mindset to social concept is tricky. There’s an issue around interpreting evidence from a biomedical model paradigm into social model type of interventions.

Compounding factor 1 = there will always be “better” evidence for biomedical model / clinical type interventions (eg SSS) cf policy and population level interventions – issue around neat / clean / controllable environments in which research happens vs what the world really looks like.

Compounding factor 2 = Cochrane and other SRs will ALWAYS conclude “more research needed” in pop level interventions (on account of above + on account that there are less good q RCTs as less commercial imperative to do them…thus they don’t get done)…. Net result is bias in evidence base towards individual level interventions. Leads towards (heavy) bias in intervention implementation towards individual level interventions …..

Compounding factor 3 = medical model more apolitical than social model. Less controversial to support, so subconscious bias ?(HT Duncan Cooper)
I think we need to alter the shape of that path a little bit……

Petticrew and others just published an excellent methodology overview of the shortcomings of reliance on the RCT paradigm and its peak p, the Cochrane review, in the area of complexity.

From the paper – ‘assessing the evidence on alcohol advertising and advertising restrictions needs to extend beyond such evaluations of population-based advertising restrictions. Instead, we need a more complete, reliable and actionable representation of the current evidence for the (wider) effects of alcohol advertising, placed firmly in context of the causal system. This needs to take a broader ‘systems perspective’, and to draw on diverse bodies of evidence from wider research literatures than those covered in previously published systematic reviews.’
Rugg-Gunn and others argue that narrow reliance on RCT and Cochrane type methods basically exclude a vast body of evidence that is often highly Pertinent in complex systems:- “many important public health questions could not be answered by the Review because of the restrictive criteria used to judge adequacy of study design and risk of bias. The potential benefits of using wider criteria in order to achieve a fuller understanding of the effectiveness of water fluoridation are discussed
I Think this is an area we desperately need to get in across public health. In another area – we’ve seen massive investment in individual level weight management programmes (despite v poor evidence of effectiveness) an little to no focus on environmental and policy stuff (secondary vs primary prevn).
As Prof Trish Greenhaulgh recently said “sort of like using ineffective drugs to treat cholera without sorting out clean water supply”…

I mostly take a nuanced approach to interpreting evidence in these complex social contexts.

This ISNT the same as “cherry picking evidence to suit my ideology” as is often levelled at me…. its about using evidence appropriately and responsibly.

2 Generating evidence – new research and evaluation

There’s always a view that RCT is best and nothing else matters. I love RCTs. They are ace. But they aren’t always fit for the job. Many issues see here for some of those. Why all randomised controlled trials produce biased results

RCT is no doubt the most robust design of evaluation methodology. But it’s far from perfect for every question. But RCT esd originally developed to answer (clinical) questions about effectiveness.
I’m suspecting that the enthusiasm for RCT in government circules around social policy has come from this simply fabulous paper done for the Govt BI team – Test Learn Adapt (see refs)

Im told this is popular in govt circles. It’s an amazing paper.

I can find little to disagree with, however it’s worth noting that all the eg reported relatively in the paper, and I’m aware of the other studies conducted under this banner (think some of the early evaluative work on plastic bag tax was one) are simple “policy” (term used advisedly) interventions…..Outpatient letters, simple strategies to reduce DNA for hospital appointment etc

FNP was the most ambitious of these, this wasn’t so simple but I’d still argue quite simple compared to what we want to do here. and it’s worth reflecting on the FNP study in context. Well conducted RCT that failed to meet its endpoint or prove primary hypothesis for a bunch of reasons, some methodological some contextual.

Especially if effect size likely to be small to moderate – RCT is (arguably – its not universally agreed – get into Bayesian analytic methods, it makes my brain hurt) the best method for controlling for biases.
responses to current complex challenges requires a different set of approaches
My stock short answer to whether RCT is best or not in the circumstances of difficult, complex, fiddly social interventions where the context is messy….. is as follows……
We try to evaluate a nebulous intervention that we haven’t yet properly nailed down in a system with multiple interlinked behavioural, social, economic, environmental and other factors.

Any evaluation of a complex policy area needs to be able to represent and preferably quantify the dynamic interactions between these factors in ways that linear models cannot.

A complex system won’t follow predictable, linear cause and effect pathways and thus we can set out with the best intentions of producing a nice, neat clean simple RCT….

But for a whole host of reasons our plans would be completely knocked off course by context, or events – thus invalidating the RCT. Once the RCT is set up its exceptionally hard to change it. Not the case with other methodologies.

Of course the purpose of the evaluation is to give us a robust understanding of casual chains/webs and quantified outcomes measures.

But I don’t think RCTs are good option for these contexts  for a number of reasons

Tilley and Pawson’s attacks on RCTs are interesting and constructive – the key text there is Pawson & Tilley, (1996) “Realistic Evaluation”. See ref

RCTs are great for dealing with linear models of causation, simple chains of cause and effect. They can’t make sense of dynamic, interconnected, economic, social and physical environments.
RCTs arguably too reductionist and don’t take adequate account of the real-world circumstances and complex contexts – in other words often have great internal validity but are quite hard to generalise from

I thus think its a dubious application of RCT but as I understand it that’s been imposed by the cross government Work & Health Unit – im assuming from BI Team. It was a condition of the grant and the whole submission was built around an RCT design. I get that.

But WHY…. what’s the diving thinking behind it? just because BI team say RCTs are great I don’t buy that RCT is atomatically to tool for this job

In this sort of context – you really can’t discuss “outcomes” without close attention to context and system dynamics. RCTs don’t do “system dynamics”…. They are really good at straightforward linear chain type of questions.

Specific technical issues to consider.

  • I fear if we want to do a trial there are a number of requirements that won’t be met in many of these messy systems…. Off the top of my head:
  • Messy system and context – Many moving pieces – all of which interact with each other in ways we don’t understand, and can’t control
  • RCTs are hard to set up, e.g. funding, statistical team, etc etc.
  • Issues re size of trial, n of recruits and power issues. Power – we have no way of knowing how many people we’d need to recruit into a trial to be appropriately powered. Power calcs (thus informing size) are best done on the basis of decent observational studies that can inform best guesses as the likely effect size – and thus how many folk need to be recruited…. Do we need 400 people, 4,000 or 10,000….I don’t know
  • PICO is often not clearly articulated in these complex systems – We cant precisely pin down the PICO questions. The I…… isn’t a single I – It’s a mix of interventions. Ditto the P, the C or the O….
  • We have Inability to control all the above to design and execute a trial that will give meaningful results.

What methods TO use in these contexts.

Id use either interrupted time series – can produce similar results to trials (with some qualifications). Fretheim paper or simpler still simple cohort with matched control – load of stuff published here. Most accessible is the stuff the Nuff Trust have been producing recently.

Most of their stuff is written in the context of evaluating new models of delivery of care – messy, complex, difficult context, ever changing environment, no clear intervention, no clear outcome…… sound familiar???
The Nuffield paper is great (see ref)

Complex System Theory is as good a method as any to try to understand whats going on and evaluate – but its rarely operaitonalised in a way that can generate good evidence.

Ditto OR type methods

The 2008 MRC Guidance on Evaluation of Complex Interventions is excellent –

Some other good refs

Lamont. New approaches to evaluating complex health and care Systems

BMJ 2016;352:i154 doi: 10.1136/bmj.i154 (Published 1 February 2016)


A reanalysis of cluster randomized trials showed interrupted time-series studies were valuable in health system evaluation JCE 2014

Realist Evaluation. Ray Pawson and Nick Tilley.2004

Evidence-Based Public Health: Moving Beyond Randomized Trials. American Journal of Public Health | March 2004, Vol 94, No. 3

The single most important intervention to tackle obesity. Int J Public Health 2012
. Harry Rutter . DOI 10.1007/s00038-012-0385-6
Where next for obesity. Rutter . The Lancet Vol378 August27,2011

Plsek PE, Greenhalgh T (2001) Complexity science: the challenge of complexity in health care. BMJ 323(7313):625–628
Test, Learn, Adapt: Developing Public Policy with Randomised Controlled Trials –
MRC Guidance on Evaluation of Complex Interventions

There are many similar frameworks.

Evaluation of complex health and care interventions using retrospective matched control methods | The Nuffield Trust

Petticrew  et al. Alcohol advertising and public health: systems perspectives versus narrow perspectives |
Rugg-Gunn et al Critique of the review of ‘Water fluoridation for the prevention of dental caries’ published by the Cochrane Collaboration in 2015 : Article


I shared the blog with an eminent prof (unnamed here to protect the innocent) who is a keen advocate of RCTs.

He disagrees with me (to be fair he is much cleverer than me also)

He’s going to write a piece on – ‘when it is right’ to do trials.

He will definitely be “right”

I thought about it a bit more …..

RCT remains king

But if I can have 12 bishops and 4 knights and 3 castles and 1 queens worth of evaluative power to shed SOME light on lots of problems…. Id be prepared to trade this off against one king
I accept it’s a flawed analogy as without a king in chess there’s no game. But….

Maybe it’s an issue of context?

Im prepared to accept a lower methodological quality to get more evaluative insight in to more issues…. And get the job done etc….i don’t know whether we’ve got the wherewithal (or resources) to scale up mass RCT eval of everything in sight…. Much as id like to…….

Lord knows theres enough stuff that’s burning for SOME evaluation that if we tried to do nothing but RCT we’d never get anywhere

Maybe should view in same light as phase 1,2,3 trial

The IDEAL framework that RCS persuing sets this out amazingly

Most bright ideas never get past P1 or 2

Admittedly we spend mega $£ on things we shouldn’t based on poor evidence….. and if we had more P3 then ……..

Postscript 2 – another profs view…..(again I will protect the innocent but can gladly put you in touch)

He nailed it in simple language
1. RCTs evolved to test simple interventions in essentially closed systems (one sailor’s scurvy is not affected by his shipmate’s lemon)

2. Simple interventions in similar closed systems typically have an approximately normal distribution of effects

3. Complex Interventions are interventions in complex systems

4. Complex systems usually buffer attempts to change them, but occasionally transform

5. The distribution of effects of Interventions in complex systems is non-normal, typically heavy tailed.

6. The statistical rationale for testing simple interventions does not hold when testing complex ones.

7. So it’s not so much a problem of trial design (which is why if you get into an argument with Robbie there are no winners) but of the fundamental assumptions about how you measure effect.

8. Because RCTs of complex interventions will usually show a minimal effect: except for the lucky ones which “work” due to the fortunate interaction of contexts, mechanisms and chance.

9. So until we have a different statistical approach which can deal with this (and I’m not even sure it’s Bayesian) then we should stop doing monolithic RCTs of complex interventions.


One comment

  1. A critical follow-through to Pawson & Tilley’s Realist Evaluation is Pawson’s work on Realist Synthesis (Evidence-based Policy 2006). We have used this approach to challenge the inappropriate RCT approach to quasi-experimental studies where the causal chain is complicated and the context vital. For example we used it in our evaluation of the WHO European Network of Healthy Cities. The Venice office of WHO-EURO has recently produced an excellent discussion paper on Social Return on Investment which essentially utilises the Realist Synthesis approach.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s