Health care data quality Hospital discharge records -- How to deal with data imperfections? Contents Part 1: aspects of health care data quality  Types of data: primary and secondary data  Operational definition of data quality  Quality criteria for patient data  Error Checks: credibility and consistency  “Editing” and “Cleaning”  Pitfalls related to hospital discharge records  Exercice : DRG creep: a new hospital-acquired disease Part 2: data quality in the MCD  Quality of data in the MCD  Data quality assurance MCD: two steps approach  Quality of data versus quality of care: how to deal with data imperfections? Health care data - - a description Scerbo M, Dickstein C, Wilson A. Health Care Data and the SAS System. Cary NC: SAS Institute Inc.; 2001. Questions 1. Do hospital discharge records , in this case our Minimal Clinical Data (MCD) nowadays renamed Minimal Hospital Data (MHD), meet the requirements of good quality data? 2. Should imperfections be established, how should one deal with them? Types of Data: Primary and Secondary data A common classification is based upon who collected the data.  Primary data: Data collected by the investigator himself/ herself for a specific purpose.  Examples: Data collected by a student for his/her thesis or research project. Data collected by a researcher for a randomised clinical trial.  Secondary data: Data collected by someone else for some other purpose (but being utilized by the investigator for another purpose).  Examples: Hospital discharge records, reimbursement claims (National Institute for Health and Disability Insurance (NIHDI)) Hospital discharge records  Databases containing the International Classification of Diseases, Clinical Modification, Ninth Revision (ICD-9-CM) coding of discharge diagnoses are used for a variety of purposes, including: ◦ reimbursement, ◦ budgetary planning, ◦ monitoring of clinical care activities, ◦ Health services research, ◦ and development of clinical guidelines.  As the scope of utilization of these data broadens, the importance of ICD-9CM coding accuracy increases.  Petersen LA, Wright S, Normand SL, Daley J. Positive predictive value of the diagnosis of acute myocardial infarction in an administrative database. J Gen Intern Med 1999 Sep;14(9):555-8.  1. Some Advantages of using Primary data: The investigator collects data specific to the problem under study. 2. There is no doubt about the quality of the data collected (for the investigator). 3. If required, it may be possible to obtain additional data during the study period. . Some Disadvantages of using Primary data (for reluctant/ uninterested investigators): 1. 2. The investigator has to contend with all the hassles of data collection- – deciding why, what, how, when to collect – getting the data collected (personally or through others) – getting funding and dealing with funding agencies – ethical considerations (consent, permissions, etc.) Ensuring the data collected is of a high standardall desired data is obtained accurately, and in the format it is required in unnecessary/ useless data has not been included – – Cost of obtaining the data is often the major expense in studies  Some Advantages of using Secondary data:  The data’s already there- no hassles of data collection  It is less expensive  The investigator is not personally responsible for the quality of data (“I didn’t do it”)  Some disadvantages of using Secondary data:  The investigator cannot decide what is collected (if specific data about something is required, for instance).  One can only hope that the data is of good quality  Obtaining additional data (or even clarification) about something is not possible (most often) Operational definition of data quality  Examples:  Main health problems of the country: an estimate of a certain order of magnitude  Evolution of these health problems requires more precision  Relative importance of the sub-endocardial infarct within the group of ischaemic heart disaeses requires a high degree of precision Tayi GK, Ballou DP. Examining Data Quality. Communications of the ACM 1998;41(2):54-7 Wyatt JC, Sullivan F. What is health information? BMJ 2005 Sep 10;331(7516):566-8 Error Checks: credibility and consistency  Credibility: e.g. range test: observed value outside of the beforehand specified range of plausible values?  Consistency: e.g comparison of age with date of birth and decaese, of gender and certain pathologies  Error causes a conflict: type of cancer “unknown”  type of differentiation “well differentiated”  Nominal or ordinal variables : usually deterministic : adenocanthoma is always “well differentiated”  Interval variables: usually probabilistic: in case of a “well differentiated” adenocanthoma the shortest axis of the nucleus is between 2 and 12 µm. Smaller than 2 is impossible (deterministic) and bigger than 12 is unlikely (probabilistic).  Numerical/visual aids: frequency distributions histograms, contingency tables, box plots and scatter plots) allow to detect outliers or systematic errors “Editing” and “Cleaning”  “Editing : the process whereby the accuracy, consistency and completeness is verified”  "data cleaning“ means: 1) the process of the removal of incomplete records or records with inconsistent information, or 2) the ongoing iterative process where by using logical and statistical checks, errors are detected, debugged, improved or removed  "Cleaning", especially in the sense of removal of erroneous data has to be reported and should be founded on probabilities Pitfalls related to hospital discharge records such as our Minimal Clinical Data (MCD)  There is a basic tension between using the same data for reimbursement and for measuring quality.  When the use is reimbursement, there is a tendency to perform coding quickly and to maximize the coding of complications and co-morbidities.  When the use is to assess quality, however, it is important for coders to have a complete record and to restrict diagnosis coding to conditions that affect patient care “in terms of requiring clinical evaluation; or therapeutic treatment; or diagnostic procedures; or extended length of hospital stay; or increased nursing care and/or monitoring.”  Diagnoses that “have no bearing on the current hospital stay” or represent “a routinely expected condition or occurrence” should not be coded.  AHRQ. AHRQuality Indicators e-Newsletter: AHRQ QI TIPS: ICD-9-CM Coding Issues. 2005. Exercice : DRG creep: a new hospital-acquired disease  case mix the groups of patients requiring similar tests, procedures, and resources (therapeutic, bed services, staff…) that are treated at a particular hospital. 1. Diagnostic Related Groups (DRGs),are a often used case-mix measure. What was the rationale to use DRGs for reimbursement? 2. Changing the “sequence” of diagnosis between a chronic renal illness and a disease such as systemic lupus erythematosus in 159 patients would result in a cost’s shift by > $800,000. How did the author derive this figure? 3. Can you give some examples of generating DRG creep? Miscoding may generate considerable profits. Calculate the gain in % of the following conditions by switching places between principal and secondary diagnosis Principal diagnosis Relative weight Secondary Diagnosis Relative weight Pneumonia 0,8961 myocardial infarction 1,7162 Transient ischemic attack 0,6293 cerebrovascular event 1,2429 Acute bronchitis Gain in % 0,7151 chronic obstructive pulmonary disease 1,1263 Exercice : DRG creep: a new hospital-acquired disease 1. Diagnostic Related Groups (DRGs),are a often used case-mix measure. What was the rationale to use DRGs for reimbursement? Length of stay (LOS) 2. Changing the “sequence” of diagnosis between a chronic renal illness and a disease such as systemic lupus erythematosus in 159 patients would result in a cost’s shift by > $800,000. How did the author derive this figure? (9322-4210)*159= 812808 3. Can you give some examples of generating DRG creep? Computer programs, continuing physician education, miscoding (e.g. coding of a transient ischemic attack (relative weight 0,6293) as a cerebrovascular accident (relative weight 1,2429) Hsia DC, Krushat WM, Fagan AB, Tebbutt JA, Kusserow RP. Accuracy of diagnostic coding for Medicare patients under the prospective-payment Miscoding may generate considerable profits. Calculate the gain in % of the following conditions Principal diagnosis Relative weight Secondary Diagnosis Relative weight Pneumonia 0,8961 myocardial infarction 1,7162 91,5 transient ischemic attack 0,6293 cerebrovascular event 1,2429 97,5 chronic obstructive pulmonary disease 1,1263 57,5 acute bronchitis 0,7151 Gain in % 1,7162/0,8961=1,915 =191,5% =>gain of 91,5% Hsia DC, Krushat WM, Fagan AB, Tebbutt JA, Kusserow RP. Accuracy of diagnostic coding for Medicare patients under the prospective-payment system. N Engl J Med 1988;318(6):352-5. Contents Part 1: aspects of health care data quality  Types of data: primary and secondary data  Operational definition of data quality  Quality criteria for patient data  Error Checks: credibility and consistency  “Editing” and “Cleaning”  Pitfalls related to hospital discharge records  Exercice : DRG creep: a new hospital-acquired disease Part 2: data quality in the MCD  Quality of data in the MCD  Data quality assurance MCD: two steps approach  Quality of data versus quality of care: how to deal with data imperfections? Quality of data in the MCD  Quality is mixed: sometimes good, sometimes bad, sometimes in between  Example of good quality: Community acquired pneumonia (CAP)  Example of in between quality : Cesarean Section (CS), fit for use if one studies cesarean section rates, not if one is interested in medical practices  Example of bad quality : Acute Myocardial Infarction (AMI) Conclusion of CAP-audit (N=4093) 14% 4% 82% CAP confirmed CAP likely Other Carenet versus MCD 35 30 25 Age distribution (in%) 20 BE GE 15 10 5 0 20-29 30-39 40-49 50-59 Age class 60-69 70-79 80-89 90+ 40 35 30 25 Incidence (in %) 20 BE GE 15 10 5 0 20-29 30-39 40-49 50-59 Age class 60-69 70-79 80-89 90+ 30 25 20 Mortality proportion (in %) 15 BE GE 10 5 0 20-29 30-39 40-49 50-59 Age class 60-69 70-79 80-89 90+ 28 MCD 2002 (N=58.194)  SPE 2002 (N=58.841) comparison (in %) Hypertension Diabetes Gestational age >= 37 w Labor Induction Epidural anesthesia Cesarean section Previous CS MKG 5,5 1,6 92,9 19,4 48,3 18,1 4,6 SPE 4,9 1,2 92,7 30,1 63,2 17,7 7,6 Comparison ICD-9-CM vs STEMI ICD-9-CM STEMI N Y Total 41001 673 848 1521 41011 4414 5080 9494 41021 978 1160 2138 41031 1019 1409 2428 41041 3869 4743 8612 41051 561 599 1160 41061 209 229 438 41071 23554 1783 25337 41081 316 255 571 41091 1971 680 2651 37564 16786 54350 Total Comparison Monica vs MCD, 2002-4. Sex Females Place Bruges Ghent Males Bruges Ghent RR: rate ratio Registry Monica MCD Monica MCD Monica MCD Monica MCD Death 26 8 29 9 60 19 75 35 Cases RR 110 3.10 105 111 2.90 100 407 3.15 365 374 2.14 374 95%IC 1.47;6.54 1.45;5.83 1.79;5.59 1.47;3.12 31 Data quality assurance MCD: two steps approach Global assessment: M LOS, SOI C Deviant Non-deviant Targeted: Random: N=2815 N=2158 D A U D 1240 Disagreement 832 disagreement I - 643 days - 28.33 days T Random S % D Stays analyzed 2158 100 Stays without disagreements 1326 61,5 Stays with disagreements 832 38,6 -28.23 Stays with coding errors without repercussions regarding APR-DRG /SOI 574 26,7 0 Stays with a change of APR-DRG and possibly of SOI 142 6,6 8,75 Stays with increased SOI without change of APR-DRG 51 2,4 148,28 Stays with decreased SOI without change of APR-DRG 65 3,0 -185,26 S: stays; D:days Directed S % D Stays analyzed 2815 100 Stays without disagreements 1575 56,0 Stays with disagreements 1240 44,0 - 642,68 Stays with coding errors without repercussions regarding APRDRG /SOI 906 32,2 0 Stays with a change of APR-DRG and possibly of SOI 74 2,6 -224,61 Stays with increased SOI without change of APR-DRG 28 1,0 0,59 Stays with decreased SOI without change of APR-DRG 232 8,2 - 418,65 S: stays; D:days Quality of data versus quality of care: how to cope with data imperfections?  A possible approach to overcome some of the data imperfections consists in carrying out a sensitivity analysis  Example: Community-Acquired Pneumonia (CAP) • Community-Acquired Pneumonia (CAP) remains one of the leading causes of hospital admission, social and economic costs, and death throughout the world. • Inter-hospital comparisons of CAP standardised mortality ratios (CAP-SMRs) may lead to an improved understanding of contextual influences on CAP. • However this type of comparison requires sufficiently reliable data, which can be challenging if these data serve multiple purposes (e.g., both reimbursement and quality control) Sensitivity analysis inCAP • We attempted to minimise biases indirectly arising from differences in medical practices (e.g., IMV/NMV, discharging terminal patients), and in attitudes (e.g. withholding optimal care in the elderly whether or not by request of patient or family). • We did this through a sensitivity analysis, consisting of five subanalyses starting with (1) the basic model from which one-by-one (2) patients discharged during the first week and 3) patients aged over 79 years were excluded, and (4) intensity of care and (5) comorbidities were left out from the adjustment.