Downloaded from ep.bmj.com on August 1, 2012 - Published by group.bmj.com Best practice Clinical decision rules: how to use them B Phillips Correspondence to Dr Bob Phillips, Centre for Reviews and Dissemination, and Hull-York Medical School, University of York YO10 5DD, UK; [email protected] Accepted 16 February 2010 ABSTRACT The first of this pair of papers outlined what a clinical decision rule is and how one should be created. This section examines how to use a rule, by checking that it is likely to work (examining how it has been validated), understanding what the various numbers that tell us about “accuracy” mean, and considers some practical aspects of how clinicians think and work in order to make that information usable on the front lines. The fi rst key steps in managing a patient’s problem consist of accurately assessing the situation, making a clear diagnosis, understanding the psychological and social impact of the illness and negotiating an effective treatment plan. The fi rst of these, the arrival at an accurate diagnosis, if we are to undertake this using high-quality evidence, can be more difficult than we would like to admit. The fi rst of this pair of papers [whatever ref] outlined what a clinical decision rule (CDR) is and how one should be created. This section examines how to use a rule, by checking that it is likely to work (examining how it has been validated), understanding what the various numbers that tell us about “accuracy” mean, and considers some practical aspects of how clinicians think and work in order to make that information usable on the front lines. VALIDATING A CDR In order to even think about using a CDR, we should be fairly sure that it will work as we want it to. We would not use a drug that had just been developed on cell lines in our little patients, and so we should not do the equivalent with diagnostic or prognostic information (before you start to argue about the relative dangers of toxic chemicals and information, reflect on how a false positive1 or false negative test result 2 has influenced your own life or medical practice). We need the CDR to have been validated— and by “validated”, I mean shown to accurately discriminate between the affl icted and the untouched—and to give accurate estimates of the proportion of patients with the affl iction in each category. If we are using a “limping child” rule in the emergency department to guide our referrals to orthopaedics and early discharges, we want those with none, one, two, three or four predictors to have a meaningfully different risk of septic arthritis (ie, discriminate). We 88 also want the no-predictor group to stay with a really low proportion, ideally zero, and the four predictor children to have a high proportion with septic arthritis, touching 95% or more, that is, retain accuracy. It is almost always the case that the rule does not work quite as well in the validation set of patients. This is not unexpected, as the rule is derived from one particular group of patient’s data, and another dataset will be similar but not identical. There are a variety of modifications or recalibrations that can be undertaken to fi ne tune a rule and the modification then used more successfully in ongoing care. 3 What frequently happens is that the group that has developed a CDR goes on to test that it really works. This “temporal validation” provides some evidence to suggest that the rule really does work and often allows for optimisation of the rule to adjust for the overenthusiastic data-driven estimates that it initially created. Various approaches, from the complex to the straightforward, have been used to do this successfully.4 5 The problem that remains is that the CDR might work in SuperHospital-W, where Dr X can train her residents into knowing exactly what point #2 of the CDR really means and what a positive or negative looks like, but can it work in SuperHospital-Y, where the physicians are restricted to using the data from the journal article?4 For some CDR, where the basis of the rule is quality-assured laboratory results and proven-to-be reproducible clinical signs or symptoms, this may be easy. For other rules, this may not be so effective (as a quick aside, it is often helpful to assess just how reproducible measurements are. There is a host of issues around this, and a very readable series of BMJ “Statistics Notes” by Profs Bland and Altman published in 1996, 5–7 and their textbooks of practical medical statistics8 9 are accessible reads). This next level of validation, seeing if the limping-child rule is rooted in only one hospital or not, looks at the CDR use in different physical locations but with similar clinical settings and has been called geographical validation. Potentially, you might want to see if it works across different clinical settings (domain validation), for example in the emergency department of secondary-care hospitals and walk-in primary-care clinics,10 but this may be irrelevant if you are working in the same setting as rule development occurred in. The fi nal step should be to demonstrate their efficacy in routine practice with multi-site randomised controlled trials.11 The trial does not seek to see if the rule differentiates folk, and Arch Dis Child Educ Pract Ed 2010;95:88–92. doi:10.1136/adc.2009.174458 Downloaded from ep.bmj.com on August 1, 2012 - Published by group.bmj.com Best practice can be applied, but instead randomises between those who have the rule used, and those who do not, and captures in its results key patient- or health-system-important outcomes: minimising length of hospital stay or invasive testing or improved quality of life. This is hardly ever undertaken but can be a very powerful way of showing improved care with a diagnostic intervention of any sort.12 UNDERSTANDING ASSESSMENTS OF ACCURACY Assuming you have found a rule that has been validated in some way and accepting it is unlikely to have gone through all these steps, then the next bit of information you will be wanting to understand is “how good is the rule anyway”? The information that a rule presents us with is often couched as a “result”: the rule is positive or negative (eg, Ottowa ankle rules), or the result is a value (eg, the neonatal outcome measure of CRIB score). These results are then described in the papers that examine them in terms of their diagnostic accuracy, using phrases that induce fear and occasionally nausea: sensitivity, specificity, likelihood ratios (LR) and predictive values. Sensitivity and specificity In its simplest form, a rule produces a positive or negative result. The group under study—take an emergency room setting for children presenting with fever—can be thought of as coming from two populations: those with the “disease”, for example pneumonia, and those without it. The proposed way to distinguish between these groups may be an invented blood test result—the “lung injury protein”, which goes up if you have a chest infection. The “test” is a “positive result” if the value is greater than a defi ned “threshold” (See fig 1). The threshold then has part of populations on each side: the proportion of people with pneumonia who have a positive test is the sensitivity, and the proportion of people without pneumonia who have a negative test is the specificity. Shifting where the threshold is drawn alters these proportions. As fig 2 shows, pushing the line for “positivity” upwards makes the test more specific (captures fewer people without the disease in the defi nition) but becomes less sensitive (fails to diagnose a greater number with the disease). The reverse is true when the level for positive results is reduced. The situation has an added level of complexity with three levels of test result (eg, low, medium, high), but the core concept remains the same. For clinical application, the important values are not the proportion of people with a pneumonia who have a positive test (sensitivity) and proportion of people without pneumonia who have a negative test (specificity) but instead, the proportion of people with a negative test result who truly did not have pneumonia (negative predictive value (NPV )) and the converse, the proportion of people with a positive test who did have pneumonia (positive predictive value (PPV )). So, why do researchers continue to emphasise sensitivity and specificity? You might be tempted to think, as with medieval priests chanting Latin, this is to keep the information cloaked from the common doctor. You would not be entirely incorrect. However, there is a better reason. The clinically useful NPV and PPV are highly dependent on what the prevalence of the condition is in the population (also called the “baseline risk”) unlike the sensitivity and specificity, which although they are shop-floor useless, do not depend upon that value (generally, but if you take a completely different type of population, like using an adult thrombosis detection CDR in the NICU, for example, you are never going to get the same results). Take the example of testing our invented “lung injury protein” test. Figure 3 demonstrates how the positive and negative predictive values vary under different prevalence’s of disease (in the central panel, 10% prevalence, it shows how the positive test cases are made up of seven with pneumonia— the red squares—and 23 without—the blue squares. The negative test results are the three pink “missed” cases of pneumonia and 67 who really do not have pneumonia). The “correct” value of a positive test ranges from 13% to 55% depending on how prevalent pneumonia is in the population under study. The upshot of this is that the direct translation of PPV and NPV from studies into clinical practice can only be undertaken if the prevalence of the condition is the same in your population as the study. If it is not, you need a little bit of calculation to work out what the values will be in your practice and if this would still be a viable rule. ANOTHER APPROACH—LIKELIHOOD RATIOS Figure 1 ”Lung injury protein” diagnostic test example .1 Arch Dis Child Educ Pract Ed 2010;95:88–92. doi:10.1136/adc.2009.174458 An alternative approach which captures the idea of “what a positive/negative test means” but does not rely on population prevalence is the use of LR. These values compare the proportion of patients with the disease and without the disease in for a given test result. In the LIP example where the cut-off is “93” (fig 1, Sensitivity 72.0%, Specificity 74.2%) these values would be LR+ (LR for a 89 Downloaded from ep.bmj.com on August 1, 2012 - Published by group.bmj.com Best practice Figure 2 ”Lung injury protein” diagnostic test example 2: different cutoffs. Figure 3 Effect of varying prevalence on the PPV and NPV of “lung injury protein” with a sensitivity of 72% and a specificity of 74.8%. NPV, negative predictive value; PPV, positive predictive value. positive test)=72.0/(100–74.2)=74.2/25.8=2.99 and the LR− (LR for a negative test)=0.38. Such values can be calculated for each level of a test result and so are useful for multi-level as well as dichotomous results. As these are ratios, it is numbers that are further away from one that are more useful in either making (above one) or ruling out (below one) a diagnosis. The authors fi nd it easier to think about LR− as fractions instead of decimals to understand it better (eg, not 0.01 but 1/100th). The interpretation of these can be done arithmetically, or with a nomogram, or with a sort of “rule of thumb”: ▶ Likelihood ratio of >10 or <0.1 (1/10th) ▶ -big push towards or away from the diagnosis Table 1 Accuracy measures for “LIP” values to detect imaginary pneumonia for sensitivity and specificity and likelihood ratios 90 Test cut-off Sensitivity Specificity LR+ LR− 73 93 109 0.11 0.38 0.79 98.8 72.0 22.7 15.9 74.8 97.7 1.17 2.99 9.87 ▶ ▶ ▶ Likelihood ratio of >5 or <0.2 (1/5th) ▶ -smaller push towards or away from the diagnosis Likelihood ratio of >2 or <0.5 (0.5) ▶ -a nudge towards or away from the diagnosis Likelihood ratio of closer to 1 than 2 or 0.5 (0.5) ▶ -very little diagnostic usefulness To compare the usefulness of these measures, have a look at tables 1 and 2. The tables show that with increasingly high cut-offs, the value of a positive test in diagnosing pneumonia improves, but a negative becomes less useful in ruling out the problem. It’s up to you which you fi nd easiest to interpret. THE PSYCHOLOGY OF DIAGNOSING Understanding how we physicians use a CDR, or any diagnostic information, requires a bit of knowledge about how doctors think. A number of researchers from a range of backgrounds have examined the diagnostic practices of medics. There is a wealth of research that shows we do not generally “take a history and do a physical Arch Dis Child Educ Pract Ed 2010;95:88–92. doi:10.1136/adc.2009.174458 Downloaded from ep.bmj.com on August 1, 2012 - Published by group.bmj.com Best practice examination” before diagnosing something13 14 and that we use a massive array of mental shortcuts (heuristics)14 15 which speed up our working practice, yet can lead us into dangerously wrong conclusions. In fitting CDR into the practice of managing a clinical problem, it can be useful to take one straightforward model as to how doctors make a diagnosis. The diagnostic process can be thought of in three stages: initiation, refi nement and conclusion.16 The initiation stage is where a diagnosis, or differential, begins to be considered. The refi nement is a working through of differentials, and the conclusion is a point where a decision to act has been made. For most situations, initiation is the germ of a diagnostic idea: the acknowledgement that the feverish, crying baby presenting in the emergency department might have a number of causes for their illness, including bacterial meningitis. This fi rst inkling requires further work: the refi nement process. It is in this process that CDRs can help guide clinicians and lead to a diagnostic conclusion. Other approaches include a formal Bayesian analysis, pattern fitting or a stepwise rule-out of significant serious diagnoses. The conclusion part of the process may be an actual diagnosis (“the child has pneumonia”), a rule-out (“the baby doesn’t have bacterial meningitis”) or a decision not to come to a conclusion (“we’re still not sure why your little one is poorly, and need to undertake a few more tests’). An alternative model of diagnosis, present in situations such as seeing a toddler with trisomy 21, has a single eye-blink-fast step in it. You see the child, and you have made the diagnosis of Down’s syndrome before a thought process can be recognised. This is a pretty universal paediatric example, but there is work to suggest that the more expert a physician is in a particular area, the more rapidly they come to a diagnosis and that this in part is because of pattern recognition or fitting new cases to a mental “categorisation”.15 In these settings, CDRs can struggle to get a (mental) look-in to the process. Yet, another set of theories (“dual process theory”) suggests that stepwise, rational approach and the rapid fast-heuristic conclusion approach can happen at the same time. Some clinicians favour the “rationalising” approach, other the “clinical intuition” approach, but both may be occurring. If there is a confl ict, for example, between what the CDR suggests and “intuition” delivers, the person’s tendency may tip the balance of the thinking.17 CDRs should help our thinking by providing good-quality guidance to avoid diagnostic errors and minimise unnecessary tests. Commonly diagnostic errors occur because of systems and cognitive errors.18 19 Such errors can include a failure to syntheses diagnostic information correctly and come to a premature diagnostic conclusion; a lack of appreciation of the value of a sign or symptom in making a diagnosis or exaggeration Arch Dis Child Educ Pract Ed 2010;95:88–92. doi:10.1136/adc.2009.174458 of the accuracy of a test fi nding. Other reasons for misdiagnosis would not be helped by the use of CDRs, such as the true diagnosis being rare or failure in the technical skill of the individual doctor in reading an x-ray or eliciting a physical sign. USING CDRS IN PRACTISE There remains, however, an almost emotional difficulty in turning to a CDR, when instead, we should be like House,20 Holmes21 or Thorndyke22 in making our diagnoses from skill and knowledge. This is despite the knowledge that in many cases a CDR performs better than our in-built doctoring. 23–26 It may be that we are un-nerved by now knowing that there is a 2% chance of septic arthritis, rather than being safe in the ignorance that “children like that one” in “my clinical experience” do not have an infected joint. Chance alone could mean you send home ~150 children and not get one returning to the orthopods despite that group having a 2% chance of infection (it also raises questions about what a diagnosis really is: “the truth” or just a high enough probability of “the truth”). There may be other issues, such as an objection to reduced professional autonomy, a perception of impracticality or a belief that the rule does not fit the population we are dealing with.27 It might reflect that we are just the sorts of people who prefer working with the heuristics of medical experience that we have built over time.17 This question, as to why we do not follow where the best evidence should steer us, is a matter of ongoing debate and research. 28 On a positive note, the most effective uses of CDRs seem to have been where the rule has a clear clinical usefulness, it is championed by well-respected local clinicians and it makes life better for patients and clinicians.29 30 Lots of these features can be picked up and put in place when you are trying to promote the use of a CDR. You will have decided, from an analysis based on the content of these two articles, whether the rule is technically good and likely to be true. You can assess how much of an impact it should make on your service, and you will be well placed to identify the key players who need to be “on side” in order to get the rule used (Who will it benefit? Let them know, and let them know how it will be good for them! Who will it disadvantage—with more work or nastier testing? Use case stories and hard stats together to show how overall it makes things better for the patients, who are the key cause of concern for us all). CONCLUSIONS CDRs are potentially a way of making our diagnoses more accurate and/or quicker and less unpleasant to achieve. They have not always done this, 31 and their creation and testing is a potential minefield, as these articles have examined. The essential principles that should guide us in our clinical 91 Downloaded from ep.bmj.com on August 1, 2012 - Published by group.bmj.com Best practice work, when seeking to use a CDR, should remain the same as in all other areas. We should seek to use evidence that is valid, important and makes a difference to patients. After all, that is why we became paediatricians, is it not? 14. Acknowledgements The author thanks to Professor Lesley Stewart for her helpful comments on an earlier draft of this manuscript. 17. Funding This work was funded as part of an MRC Research Training Fellowship. Competing interests None. Provenance and peer review Commissioned; externally peer reviewed. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 92 15. 16. 18. 19. 20. 21. Gigerenzer G. Reckoning with Risk: Learning to Live with Uncertainty. London (UK): Penguin Press, 2003. Wardle FJ, Collins W, Pernet AL, et al. Psychological impact of screening for familial ovarian cancer. J Natl Cancer Inst 1993;85:653–7. Janssen KJ, Moons KG, Kalkman CJ, et al. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol 2008;61:76–86. Goodacre S, Sutton AJ, Sampson FC. Meta-analysis: The value of clinical assessment in the diagnosis of deep venous thrombosis. Ann Intern Med 2005;143:129–39. Bland JM, Altman DG. Measurement error. BMJ 1996;313:744. Bland JM, Altman DG. Measurement error and correlation coefficients. BMJ 1996;313:41–2. Bland JM, Altman DG. Measurement error proportional to the mean. BMJ 1996;313:106. Altman DG. Practical Statistics for Medical Research. London (UK): Chapman & Hall, 1990. Bland JM. An Introduction to Medical Statistics. Oxford (UK): Oxford University Press, 2000. Toll DB, Janssen KJ, Vergouwe Y, et al. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol 2008;61:1085–94. Selby PJ, Powles RL, Easton D, et al. The prophylactic role of intravenous and long-term oral acyclovir after allogeneic bone marrow transplantation. Br J Cancer 1989;59:434–8. Lee MT, Piomelli S, Granger S, et al.; STOP Study Investigators. Stroke Prevention Trial in Sickle Cell Anemia (STOP): extended follow-up and final results. Blood 2006;108:847–52. Montgomery K. How doctors think: clinical judgment and the practice of medicine: Clinical Judgement and the Practice of Medicine. New York (NY): Oxford University Press, 2005. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. Norman G. Research in clinical reasoning: past history and current trends. Med Educ 2005;39:418–27. Elstein AS, Schwartz A, Schwarz A. Clinical problem solving and diagnostic decision making: selective review of the cognitive literature. BMJ 2002;324:729–32. Heneghan C, Glasziou P, Thompson M, et al. Diagnostic strategies used in primary care. BMJ 2009;338:b946. Sladek RM, Phillips PA, Bond MJ. Implementation science: a role for parallel dual processing models of reasoning? Implement Sci 2006;1:12. Graber ML, Franklin N, Gordon R. Diagnostic error in internal medicine. Arch Intern Med 2005;165:1493–9. Redelmeier DA. Improving patient care. The cognitive psychology of missed diagnoses. Ann Intern Med 2005;142:115–20. Heel-and-Toe F. House. M.D. http://www.fox.com/house// index/htm#home (accessed 4 May 2010). Doyle AC. The adventures of Sherlock Holmes. London: Penguin Classics, 2007. Freeman RA. John Thorndyke’s Cases related by Christopher Jervis. E-published by Guttenberg at http://www.gutenberg. org/etext/13882. Freeman R A. 2004. Dawes RM, Faust D, Meehl PE. Clinical versus actuarial judgment. Science 1989;243:1668–74. Grove WM, Zald DH, Lebow BS, et al. Clinical versus mechanical prediction: a meta-analysis. Psychol Assess 2000;12:19–30. Quezada G, Sunderland T, Chan KW, et al. Medical and non-medical barriers to outpatient treatment of fever and neutropenia in children with cancer. Pediatr Blood Cancer 2007;48:273–7. Lee TH, Pearson SD, Johnson PA, et al. Failure of information as an intervention to modify clinical management. A timeseries trial in patients with acute chest pain. Ann Intern Med 1995;122:434–7. Cabana MD, Rand CS, Powe NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282:1458–65. Richardson WS. We should overcome the barriers to evidencebased clinical diagnosis! J Clin Epidemiol 2007;60:217–27. Bessen T, Clark R, Shakib S, et al. A multifaceted strategy for implementation of the Ottawa ankle rules in two emergency departments. BMJ 2009;339:b3056. Gaddis GM, Greenwald P, Huckson S. Toward improved implementation of evidence-based clinical algorithms: clinical practice guidelines, clinical decision rules, and clinical pathways. Acad Emerg Med 2007;14:1015–22. Roukema J, Steyerberg EW, van der Lei J, et al. Randomized trial of a clinical decision support system: impact on the management of children with fever without apparent source. J Am Med Inform Assoc 2008;15:107–13. Arch Dis Child Educ Pract Ed 2010;95:88–92. doi:10.1136/adc.2009.174458 Downloaded from ep.bmj.com on August 1, 2012 - Published by group.bmj.com Clinical decision rules: how to use them B Phillips Arch Dis Child Educ Pract Ed 2010 95: 88-92 doi: 10.1136/adc.2009.174458 Updated information and services can be found at: http://ep.bmj.com/content/95/3/88.full.html These include: References This article cites 24 articles, 10 of which can be accessed free at: http://ep.bmj.com/content/95/3/88.full.html#ref-list-1 Email alerting service Receive free email alerts when new articles cite this article. Sign up in the box at the top right corner of the online article. Notes To request permissions go to: http://group.bmj.com/group/rights-licensing/permissions To order reprints go to: http://journals.bmj.com/cgi/reprintform To subscribe to BMJ go to: http://group.bmj.com/subscribe/
© Copyright 2024