The limitations of administrative data for to describe quality and or outcomes resulting from a process or system
I am still really concerned about the use of ‘value packs’ and the like to be the starting point for “opportunities” . I guess my concern is that if the the general line of attack is that
1) the data mostly says “its all not very good”
2) the packs give some indication of “where to look” and size of the prize and some specific targets……………
…….then there are a bunch of issues with the credibility of the data in the value packs
I think we need to be exceptionally cautious of this line.
Having had to do this job many times, made me crystalise some of my concerns.
They are in three groups
1) primary care indicators.
For obvious reasons, QOF is the most commonly used source of data in describing quality and outcomes in primary care.
a) Often an assumption is made that QOF data is clinically and epidemiologically accurate picture of whats really going on.
This is a questionable assumption. Highly questionable.
There is a established body of literature that says whilst QOF is a pretty good mechanism for paying doctors, it is pretty shabby for real QI, epi or other similar detailed analysis.
A letter in BMJ highlights the issues and problems associated with the use of routine QOF data in interpreting A1C across different portals – in this case Healthier Lives (where use of this source would rank the CCG 162nd) versus GP Fingertips (where use of this source would rank the CCG 12th)
Discrepancies can be attributed to different handling of exception coding.
This is fine, but the detail matters! And makes a difference to decisions that may be made, or priorities that may be set.
Most of us don’t often engage with detail!
This letter in JPH highlights the other tricky issue of floor and ceiling thresholds in epidemiological interpretation of QOF data for quality purposes.
The issue of whether exception coded patients are counted in denominators, thresholds of payment (and variability in the way in which GPs interpret this – “if I’ve reached the threshold I won’t try much harder”
b) Assumptions made re Observed / Expected ratios –
the Epi analysis on which the “expected” prevalence is calculated is often questionable.
I can provide you with very detailed commentary on this if you want. I have questioned this repeatedly.
c) Diagnostic criteria
For example – the issue of AF resolved is not well handled in QOF. But it’s a very real issue in the real world – and has a bearing on anticoag decisions.
d) Discrepancies between QOF and clinical audit
We have found some quite significant discrepancies between what QOF tells us and what we have found when we actually rebuilt the GRASP AF audit in system 1.
One of the data points is about paying doctors, one is about careful and methodical clinical audit to represent a system. One of the two is wrong.
Bottom line 1 – QOF is a mechanism for paying GPs, not for epidemiology. It can be used for epi analysis but be mindful of the weaknesses.
Bottom line 2 – detail matters.
2) Secondary care indicators
a) Many of the secondary care indicators are actually beyond the control of secondary care….or certainly beyond the control of the a specific clinicians within the hospital as it’s the GPs, A&E, MAU clinicians that are making admission decisions.
Coding mostly gives reasons and procedures at discharge not at admission.
b) Assumptions made that high xxxxxxxx is “bad”….not necessarily so
c) The indicators as they are put simply are not sensitive enough to take into account the (often quite significant) variability in local pathways – especially the configuration of tertiary / secondary services. Who gets repatriated, where and when. This obviously has a bearing on what gets counted and coded, and how.
It isn’t possible to accurately reflect this in the way in which the macro indicators are collected.
Quite specifically I have seen experienced consultants (quite correctly) destroy data on angiography, angioplasty and PPCI presented at a regional meeting in the past.
And the mess that is the use of administrative data for HSMR and similar is well documented. This wastes time, vast resources, and potentially causes harm.
And thus we will spend hours arguing about data when it’s badly presented with no context
d) Discrepancies between different sources of data.
I’ve seen RightCare and similar CVD profiles that cover renal (obviously as a CVD risk factor) – I’m assuming using HES data, that carries significant discrepancies when checked against Renal Registry data.
I know which of the two sources I’d rather trust to give a rich and accurate picture of what’s really going on in a system.
3) Statistical points
a) All of the normal points, cautions and caveats when interpreting variability in data in a system.
That’s another paper
But it isn’t at all uncommon to look at a single set of “variation” data and draw completely the wrong conclusions
Standard variation vs special cause variation.
See for example this paper on value opportunities in cancer care
b) Sometimes the metrics used in various packs ARE age standardised, sometimes they are not.
At best this creates confusion. More likely will lead to confusion and misinterpretation.
I have seen many flagrant misinterpretation of the stats for eg an assumption that a DSR can simply be disaggregated to a crude rate and thus then a numerator to get a count of a metric of interest.
This is simply wrong, and misleading.
Such misinterpretations have been put by people who may not know the details of how the data and metrics are put together – but are put across confidently and authoritatively.
c) It is not at all unusual for metrics to be quoted with no confidence intervals. Making statistical interpretation of stat significant difficult at best
It is very rare for the info in many packs to have been adjusted for any confounding factors (other than age) – which makes interpretation of clinical and epi significance of variation questionable.
d) Point estimate of a metric AND trend / temporal analysis is rare
Thus misleading conclusion can be drawn if the year a point estimate is made is a “high year” when the natural rhythm is cyclical.
I am NOT arguing against the use of this routinely collected administrative data.
It is what it is. And helpful to a point.
But, frankly, it is pretty limited
And to state boldly that “this is the truth” is simply 1) red rag to bull and 2) invitation to argue with it…..
And if our starting point is “based on this data, current system rubbish and unacceptable” then everyone will be in defensive mode.
It is much better to use the collective intelligence in the room to identify improvement targets then build the data and analysis around this. THEN we can work out the best data and best metrics with which to measure improvement.
This is a fundamentally different approach to starting with a “value pack”……and assuming it is “the truth”
And in case you needed any further persuading
Evid Based Med 2017;22:35 doi:10.1136/ebmed-2016-110557
It’s a Commentary on: Li L, Rothwell PM. Biases in detection of apparent “weekend effect” on outcome with administrative coding data: population based study of stroke. BMJ 2016;353:i2648
“Confirmed acute stroke admissions were administratively coded as stroke 75% of the time. Admissions administratively coded as stroke were accurate 62% of the time. False positives occurred more on weekdays than on weekends (41% vs 26.5%, p<0.001), as stroke history was often classified as new stroke. False positives, often admitted for less morbid conditions, showed far lower mortality than true stroke admissions (3.8% vs 22.1%, p<0.001), this erroneously implicated increased mortality among weekend stroke admissions.”
I can recall similar for MI, just can’t find the reference off hand
BMJ, 2013? From memory
See also this thread on coding and use / weaknesses of routine data – https://twitter.com/jeannelenzer1/status/932586442156642306
Bottom line – administrative data is good for some things, but take care you don’t over egg it as ‘the truth’, it often misrepresents what’s actually going on.