Categories
Health and Social Care Health Care Public Health population health

Population health management, revisiting segmentation

Population health management, revisiting segmentation

The use and abuse of risk stratification.

Population health management is all the rage. I’ve blogged on this a lot recently.

This is a long, teccy blog. Largely based on something I wrote about 7 years ago. Basics haven’t changed since then.

Thanks especially to Steve Laitner and Martin Vernon some of whose Ideas I have nicked.

This blog focuses on segmentation, especially focused on predictive algorithm models for that.

1 Why segment and use of predictive risk – population health management

• using predictive risk models constitutes the first step in any strategy to improve care and services for susceptible patients.

• Lewis offers a pretty good primer on population health management

• It is a necessary pre requisite for moving from service provider oriented reactive care to population focused proactive and reactive model

• The whole population is too big an elephant to chew all at once

• Different segments have different characteristics and thus have different needs and issues, a tailored response is required.

• clinicians mainly use risk algorithms to discriminate the level of risk for individual patients to enable planning of treatment or preventive care

• Those with responsibility for populations – ie all of us – may use to target resources at those most likely to benefit.

• Utilising a targeted approach enables greater equity in service delivery based on need. Currently a ‘one size fits all’ is frequently used, an approach creates which an essential bias towards those with higher self-efficacy producing a wants based model.

2 How to segment

A number of options exist. Which option you take depends on what you want to achieve, what information you have available, how much analytic capacity you have

What is the organising principle for delivery and segmentation. Is the organising principle frailty / morbidity / impairment / something considerably wider than this. These are different issues, a great deal of overlap, but also some distinguishing features. The choice of organising principle may be determined by our worldview, lens and starting point. Either way, age per se should NOT the organising feature.

The segmentation may be based on some aspect of risk – both medical & social, or / and assets. Segmentation may be on the basis of frailty (the Electronic Frailty Index), number of long term conditions, some measure of disease control or complexity, or other.

Obvious tools to stratify include

• Electronic Frailty Index. EFI

• Combined Predictive Model. CPM

• Patient at Risk of Readmission. PARR

• simpler methods such as 1 LTC, 2-3 LTC, 4+ LTC, well people etc

Two dimensional stratification

Could easily get into stratification as vertical slicing along the predictive risk scale, and segmentation as horizontal slicing along the demographic scale. It’s an interesting thought to combine the two

So could segmen risk stratification for particular groups of interest, such as risk stratification of people having mental illness compared to general pop risk strat, and by deprivation.

And then provide care providers and planners with a palette of different measures of predictive risk for different purposes, or in combination (CPM + eFI + risk of care home admission + risk of social care costs escalation). These MUST be used in conjunction with / in the context of what else we know about that individual or population group to construct a judgement.

Risk stratification plus segmentation (vertical + horizontal slicing) …

Once stratification is done, it may be beneficial to use patient activation measures or some other measurement of capability as a means to best engaging people within segments. Specific interventions would be based on activation state – coaching, peer support, care planning – different  intentions for different groups.

Management strategies can then be cohort specific..

2b side note on the accuracy of predictive models (Short answer)

If you’re using predictive modelling to segment then as a clinical tool think about accuracy. Get yourself a good understanding of the stats used to describe test performance – sensitivity, specificity, predictive value. This matters. A lot.

Nuffield briefing on choosing predictive models from 2011 is background reading to all this – 2011

I’ve often seen people suggesting that these models have predictive accuracy (PPV) as 90%. I don’t necessarily buy this.

Billings 2006 is a critical paper. See table 1 especially on predictive power. Depending on what your risk threshold is – predictive value is towards 50% (one might characterise this a bit better than coin tossing?).

Kings Fund 2006– especially on test performance & predictive ACCURACY

They are basically regression models – use patient level data on a range of parameters, then linked to “event” (admit). Can then use the regression to predict the likelihood of admit in a cohort of patients displaying the same parameters / characteristics in the original analysis

Even the freq flyers group (apart from perhaps except the v v v freq flyers) there will be high variability (over time and between people within the cohort) in admission event count (and thus rate).

Some obvious and well documented weaknesses. see Kansagara et al – “most current readmission risk prediction models that were designed for either comparative or clinical purposes perform poorly. Although in certain settings such models may prove useful, efforts to improve their performance are needed as use becomes more widespread.”

Thus even when one has done the risk analysis and implemented highly effective interventions in a carefully targeted population you might not see the desired result – and that might be attributable to random variability or lack of effect of intervention.

See appendix for the glorious technical detail on accuracy.

2c Which tool should I pick to stratify or segment.

There is no “best” approach to predicting which patients (individual or small cohort level) are most likely to be admitted / readmitted, or some other bad thing happen.

Various tools exist. PARR ++ (patients at risk of readmission) and CPM (Combined Predictive Model) seem most used. Mostly the predictive performance of most of the commonly occurring models is pretty much the same. These are data hungry and made up of both data and software to use the data then splurge out results.

2d Comparing tools

Be mindful that EFI, PARR, CPM or whatever are predicting different things. You have to think this through quite hard.

you cant do a robust comparison unless you are going to test the sensitivity and specificity of the models against each other, being mindful of the above statement!

– nobody will have done this research (you are essentially asking whether the ability to predict risk profile that is predicted by model x and then how that risk profile compares to what actually happens in reality….compares to the same for model y and model z complex research nobody has done it

The science we have – see especially the KF 2006 paper and the Billings et al paper – clearly tell us the predictive power is a function of the threshold used (ie the risk score/ what % of the top x% are “in”) and that there isn’t a huge amount to choose between different models.

The science also tells us that clinical judgment adds a little, and that there isn’t huge differences between clinical judgement and “test” performance in terms of correct predictions.

You may get into the business of sensitivity/ specificity trade offs, practical data considerations, predictive power and a host of other issues.

Thus consider:

• what is cheapest

What are you trying to predict or segment on the basis of

• what is easiest to run and use

• what is filled with the most reliable and valid data (on the hypothesis that if model is filled with rubbish data then rubbish will be splurged out the other end)

2e Other means of framing, segmenting and targeting care toward cohorts definitely warrant serious consideration. Don’t just focus on “the data” in a predictive model to segment. Psychological and social issues also matter.

• Use a biopsychosocial segmentation – use a combination of data for eg number of LTCs, disease control/complexity (utilization or some proxy for complexity of disease), some measure of deteriorating outcomes despite optimal care and professional judgement. All of the above is quite tricky, but necessary. It would certainly be benefitial to add in deprivation/social economic status, social isolation (distinguish isolation from loneliness and living alone- distinct concepts and these need further thought), resilience (again, tricky), and functional ability.

• Could further stratification using as systematic, replicable model of assessing patient capability, engagement etc such a activation levels (i.e. 4  quadrant model) etc. To identify who and how to more assertively “go after” to get better outcomes (i.e. lower quartile) – do we have a role in suggesting the best models to use. This type of thinking can highlight areas of light touch, reduction in intervention etc

Other considerations might include

• “predicting “patient activation” or “co-operability,” aim – to concentrate resources on those people most likely to participate in and respond to upstream care.”

• “Patient Characteristics – less priority to or excluded patients with attributes suggestive of likely noncompliance. (which characteristics? mental health diagnoses (schizophrenia, depression, dementia, or learning difficulties), addictions, and social factors (language barrier, housing problems, or being a single parent). Difficult equality issues here

• Previous Noncompliance. Less priority to patients whose administrative data indicated that they previously had not complied with a particular treatment. attended a weight-loss clinic but whose subsequent data showed they remained overweight, or patients who had not filled all their prescriptions or attended all their follow-ups might be excluded from upstream care.

• “receptivity.” The aim here is to forecast what approach to preventive care is most likely to work best for each patient.

• Channels – brochure versus email versus telephone call, the best messenger (male versus female nurse, older versus younger health coach), timing and frequency of the message

• intervention matched to stages of change etc. See Lewis.

3 Other important concepts inherent in using data based tools to segment, stratify and manage risk in populations

Here are some tricky considerations to attend to:

3a Regression to mean (taken direct from Nuff 2011 brief)

• The phenomenon of ‘regression to the mean’ occurs whenever something is measured once and then measured again later. Observations made at the extreme the first time round will tend to come back to the population average the second time round. For example, the warmest place in the UK today is more likely to be relatively cooler tomorrow than warmer.

• So, when we look at which people are having frequent hospital admissions at the moment, on average these individuals will have lower rates of unplanned hospital admission in the future even without intervention. This point is very important. If you ask a community matron to work with patients who are currently having frequent hospital admissions, the community matron may notice how the patient has fewer admissions over time. However, this reduction might well have occurred anyway due to regression to the mean, and it cannot necessarily be attributed to the input of the community matron.

• Why does regression to the mean occur? Simply because after one extreme event, the next event is statistically likely to be less extreme.

3b Doesn’t take away clinical judgement

• Segmentation and segmentation especially using predictive algorithms Doesn’t take away clinical judgement

• Mathematical models won’t replace the clinician judgment about “the patients they worry most about”

• They can, however, shortcut a large chunk of work and be a useful tool to population level management of risk.

• There may be recent events that modify the risk of event that might not pick up

• May be social issues that cannot be factored into a risk score, or drug interactions etc etc

• will not be taken into account….the algorithm and regression, thus risk score, is built on diagnoses and utilisation

• Better to understand the weaknesses of the model (with all its know imperfections) and then adding in a judgement based on what a clinician knows about circumstances of the patient + other clinical other characteristics that a model might not have taken into account to ensure more sophisticated management of patient

• So maths can give a decent amount of guidance to management of risk at population level… ….it isn’t perfect, and can be made considerably better by input from clinicians that know the patients. The point is that at individual level readmissions or primary non elelctive are not that common and therefore hard to reliably predict. Simple statistics.

3c Ecological fallacy – look it up!

The risk score is about population based phenomenon. The score applies to a population of patient with similar risk.

It cannot tell you that the event WILL / WILL NOT happen in an individual patient! Thus don’t think it can predict the future in individual patient

3d Prevention paradox. DON’T only focus on “high risk”

• The relative risk of admission is highest in the high risk group (obvious!!) of freq admitted, but they are fewer in number.

• The absolute number of admits is by far higher in the low risk group – lower risk of event but far greater in size of pop.

Martin Roland’s uber classic is ABSOLUTELY ESSENTIAL – a focus on med and high risk is warranted, don’t take eye off ball and neglect the low risk – as that is where the volume is. If you want to reduce the overal event rate by say 30% if you only focus on those at highest risk you need to reduce the event rate in that population by 160% – clearly mathematically and clinically impossible. Thus you need to aim to shift the event rate in the whole population by a small amount as well. Reducing risk in a small number of patient, even if at high individual relative and absolute risk of event, will have limited pop benefit. See Roland.

• If you want the theory underpinning this I’d suggest you Google “Rose Prevention Paradox”. However – its not quite that simple. Rose assumes risk is normally distributed. It is complicated by diffusion of risk across a population. Population risk is diffused when individuals in a population share a similar baseline risk. Correspondingly, population risk is concentrated when risk varies considerably among individuals. When risk is concentrated, a small proportion of the population those at the highest risk bears a large proportion of the overall or population risk. See Manuel from 2012“Policy makers and researchers still unwaveringly quote Rose – “Too often, advocates for a particular population health strategy quote Rose’s principle that ‘shifting the curve is the best approach’ without his required caveat, ‘when risk is diffused in the population’. Too often, we assume that risk is widely distributed without actually assessing it, let alone using an appropriately discriminating risk assessment method such as multivariate risk algorithms”.

3e Need to spend some time to understand the data and planning population care models.

• Impactability and interventions what can the team you have actually do intervention wise for different segments to manage risk in that population segment.

• Need to spend a bit of time understanding the data and the epidemiology split by risk score strata How to define those strategy remains moot point…..

• What are the common diagnosis, age profiles, gender, What proportion of total no of events is within what strata

• How do you define “high risk”, “medium risk” etc, or however else you segment.

• Impactability. See here from Millbank Q. Geraint Lewis. “Impactibility models may refine the output of predictive models by (1) giving priority to patients with diseases that are particularly amenable to preventive care; (2) excluding patients who are least likely to respond to preventive care; or (3) identifying the form of preventive care best matched to each patient’s characteristics.

• Conclusions: Impactibility models could improve the efficiency of hospital- avoidance programs, but they have important implications for equity and access.”. see also here – Steventon & Billings

We must attend to the harm done by over diagnosis and over treatment. There are many drivers for this, but it certainly exists and must be part of the mix of solutions. One obvious example is addressing polypharmacy.

4 care models

I don’t dwell on this here. Not the purpose of the blog.

The Commonwealth Fund are consistently publishing excellent material on models of care for high need cohorts.

Don’t over focus on structure.

• There is deeply enhanced thinking that solution = structural integration

• Solution is NOT the current model

• With regard to health and care, need the right mix of generalist & specialist, clinical & social

• A suitable means of segmenting a population

• An approach that focuses on optimising the care of “high risk” however that’s defined and a population wide approach – focused on individuals and social context. That’s not an either or, both are needed

• Attention is needed to supply side considerations and ensuring we don’t neglect demand side (i.e. Primary care and social care)

• An approach focused on supply side and managing illness will likely end in bigger hospitals.

• Illness and wellness approaches are needed. We know we usually neglect one in favour of the other, for all sorts of reasons.

Some considerations

• Do you give Priority to Ambulatory Care–Sensitive Conditions?

• Or acute conditions that can be managed in primary care…..but can’t predict these in advance??

• Or Gaps in Care…. Not in receipt of high value interventions

• Excluding Patients Who Are Unlikely to Respond to Preventive Care – Difficult!

Key questions around frailty as an organizing concept.

• In the local population, who has overall responsibility for:

• Promoting frailty as a condition for which targeted interventions must be planned and delivered?

• Identifying individuals living with frailty?

• Planning care models to address key stages of frailty (pre/early, moderate or severe)?

• Identifying and reporting on measurable positive and negative frailty associated outcomes?

• Quality assurance and value for money of frailty care? • Getting best value for money from the investment by caring agencies re frailty?

• How do we do the right thing for the patient and at the same time recognise that costs shift from health to social care?

Appendix

Technical stuff on accuracy of predictive models

Geeky detail

1) Overview

They are basically regression models – use patient level data on a range of parameters, then linked to “event” (admit). Can then use the regression to predict the likelihood of admit in a cohort of patients displaying the same parameters / characteristics in the original analysis

Even the freq flyers group (apart from perhaps except the v v v freq flyers) there will be high variability (over time and between people within the cohort) in admission event count (and thus rate).

It is an event that is difficult to predict in advance (based on the routinely available data – back to point about tools not supplanting a clinicians judgment. Some obvious and well documented weaknesses (see Kansagara et al – “most current readmission risk prediction models that were designed for either comparative or clinical purposes perform poorly. Although in certain settings such models may prove useful, efforts to improve their performance are needed as use becomes more widespread.”

Thus even when one has done the risk analysis and implemented highly effective interventions in a carefully targeted population you might not see the desired result – and that might be attributable to random variability or lack of effect of intervention.

2)Nuffield briefing on choosing predictive models from 2011 is background reading to all this – 2011

especially explains some of the key concepts involved in this area.

Metric used to determine the predictive accuracy (lifted direct)

“R-squared – a commonly used statistical term that measures the explanatory power of a model. Values range from 0 to 1, generally the higher the better. As such, it provides an overall measure of how well the model predicts future outcomes.

Positive predictive value (PPV) – for any given predictive risk score threshold, this is the proportion of patients who are identified by the model as being ‘high risk’ that will truly experience the outcome being predicted. As such, it is a particularly useful metric when determining a business case for a preventive intervention. A high PPV means that a high proportion of the patients being offered the intervention would, without intervention, have experienced the costly adverse outcome that the intervention seeks to prevent. In contrast, with a lower PPV, many of the patients identified by the model would not have experienced the outcome in any case, and so in this sense the intervention is ‘wasted’ on these individuals. Of course, there may still be good reasons for offering a preventive intervention to people where the PPV is low, but it is important that the cost of the intervention should also be relatively low otherwise it will be impossible for the intervention to break even.

Sensitivity – for any given risk score threshold, this is the proportion of the population who will experience the outcome of interest that the model successfully identifies. For example, a model might have a sensitivity of 40 per cent for a risk score threshold of 35. This means that if an intervention is offered to every person with a risk score of 35 or above, then 40 per cent of the people in the population who would be having an unplanned hospital admission next year will now be offered the intervention.

Specificity and negative predictive value (NPV) – these two metrics relate to the ability of the model to predict which patients will not have a future unplanned admission. Specificity and NPV are analogous to the sensitivity and PPV, respectively. However, because the vast majority of the population will not have an unplanned hospital admission in the next 12 months, so the specificity and NPV are not particularly useful metrics in this context.

C-statistic – also known as the ‘area under the curve’ or AUC, this is the area under the receiver operating characteristics (ROC) curve, which displays the trade-off between sensitivity and specificity for a predictive model. The area under the ROC curve is an aggregate number that reflects the distribution of sensitivities and specificities across all risk scores. Like the r-squared, the c-statistic is useful because it allows comparisons of different models based on a single number. However, it is not very intuitive and, in reality, commissioners may only be interested in a certain portion of the ROC curve, rather than the average which the c-statistic reflects.”

3) Test performance

Ive often seen people suggesting that these models have predictive accuracy (PPV) as 90%. I don’t necessarily buy this. The science we have – see especially the KF 2006 paper and the Billings et al paper – clearly tell us the predictive power is a function of the threshold used (ie the risk score/ what % of the top x% are “in”) and that there isn’t a huge amount to choose between different models. The science also tells us that clinical judgement adds a little, and that there isn’t huge differences between clinical judgement and “test” performance in terms of correct predictions.

So depending on what the threshold is – more towards 50-70% (one might characterise this a a bit better than coin tossing?)

Kings Fund 2006– especially on test performance & predictive ACCURACY

the Combined Model is also effective in identifying patients in the very high and high risk segments of the population.

This can give a guide to opportunities for intervention.

In the highest risk segments where the most intensive outreach will be targeted, such as case management interventions, the Combined Model improves predictive performance over the PARR (i.e., PARR2) model for the same populations.

Figure 2 below shows the Positive Predictive Value (PPV)* for different cuts of population size identified by either the Combined or the PARR model.

Fig2

Interpretation of this

In the year following prediction this figure gives an indication of the patients who were predicted to have an emergency admission within that segment. For example, 586 out of the top 1000 patients predicted by the Combined Model actually had an emergency admission in the year following prediction as compared with 505 out of the top 1000 PARR patients.

Figure 3 – GP data adds a little to this

Billings 2006 is a critical paper

Choosing a model to predict hospital admission: an observational study of new variants of predictive models for case finding. There were 1 836 099 people aged 18 and over who were registered wit h a GP practice on 31 July 2009.Table 1 shows the combined results of individual site regressions including the number of patients correctly identified, PPV and sensitivity for four models :

1. IP based on hospital inpatient data only (including day cases and regular attendances);

2. IPAE using inpatient and A&E data;

3. IPAEOP using inpatient , A&E and outpatient data;

4. IPAEOPG P using inpatient, A&E, outpatient data and GP EMR.

Live, local accuracy test

An old colleague once did the detailed analysis and verification of the use of Combined Predictive Model as was in use at the time (2013 and 2014) in the Bradford cohort (c535k patients)

Results here

• True positive (TP) – patients given a high risk score (as defined by % or threshold) who have had an emergency IP admission

• False positive (FP) – patients given a high risk score (as above) who have not had an emergency IP admission

• False negative (FN) – patients not given a high risk score who have had an emergency IP admission

• Positive Predictive Value (PPV) = TP/(TP+FP)

• Sensitivity = TP/(TP+FN)

From the figures above, it appears that the PPV and sensitivity for the top 1% of patients on the model is higher than expected for the CPM on the Nuffield report (0.405 and 0.060 on Table 3 – see reference below) and closer to the results of the models discussed in the report.

2 replies on “Population health management, revisiting segmentation”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s