Document 10855

Health care data quality
Hospital discharge records -- How to deal with
data imperfections?
Contents
Part 1: aspects of health care data quality

Types of data: primary and secondary data

Operational definition of data quality

Quality criteria for patient data

Error Checks: credibility and consistency

“Editing” and “Cleaning”

Pitfalls related to hospital discharge records

Exercice : DRG creep: a new hospital-acquired disease
Part 2: data quality in the MCD

Quality of data in the MCD

Data quality assurance MCD: two steps approach

Quality of data versus quality of care: how to deal with data imperfections?
Health care data - - a description
Scerbo M, Dickstein C, Wilson A. Health Care Data and the SAS System. Cary NC:
SAS Institute Inc.; 2001.
Questions
1.
Do hospital discharge records , in this case our Minimal
Clinical Data (MCD) nowadays renamed Minimal Hospital
Data (MHD), meet the requirements of good quality data?
2.
Should imperfections be established, how should one deal
with them?
Types of Data: Primary and Secondary data
A common classification is based upon
who collected the data.

Primary data: Data collected by the investigator himself/ herself for a specific purpose.

Examples: Data collected by a student for his/her thesis or research project.
Data collected by a researcher for a randomised clinical trial.

Secondary data: Data collected by someone else for some other purpose (but being utilized by
the investigator for another purpose).

Examples: Hospital discharge records, reimbursement claims (National Institute for Health and
Disability Insurance (NIHDI))
Hospital discharge records

Databases containing the International Classification of Diseases, Clinical
Modification, Ninth Revision (ICD-9-CM) coding of discharge diagnoses are
used for a variety of purposes, including:
◦ reimbursement,
◦ budgetary planning,
◦ monitoring of clinical care activities,
◦ Health services research,
◦ and development of clinical guidelines.

As the scope of utilization of these data broadens, the importance of ICD-9CM coding accuracy increases.

Petersen LA, Wright S, Normand SL, Daley J. Positive predictive value of the diagnosis of acute myocardial infarction in an
administrative database. J Gen Intern Med 1999 Sep;14(9):555-8.

1.
Some Advantages of using Primary data:
The investigator collects data specific to the problem under study.
2.
There is no doubt about the quality of the data collected (for the investigator).
3.
If required, it may be possible to obtain additional data during the study period.
.
Some Disadvantages of using Primary data (for reluctant/ uninterested
investigators):
1.
2.
The investigator has to contend with all the hassles of data collection- –
deciding why, what, how, when to collect
–
getting the data collected (personally or through others)
–
getting funding and dealing with funding agencies
–
ethical considerations (consent, permissions, etc.)
Ensuring the data collected is of a high standardall desired data is obtained accurately, and in the format it is required in
unnecessary/ useless data has not been included
–
–
Cost of obtaining the data is often the major expense in studies

Some Advantages of using Secondary data:

The data’s already there- no hassles of data collection

It is less expensive

The investigator is not personally responsible for the
quality of data (“I didn’t do it”)

Some disadvantages of using Secondary data:

The investigator cannot decide what is collected (if
specific data about something is required, for instance).

One can only hope that the data is of good quality

Obtaining additional data (or even clarification) about
something is not possible (most often)
Operational definition of data quality

Examples:

Main health problems of the country: an estimate of a certain order of
magnitude

Evolution of these health problems requires more precision

Relative importance of the sub-endocardial infarct within the group of
ischaemic heart disaeses requires a high degree of precision Tayi GK, Ballou DP. Examining Data Quality. Communications of the ACM
1998;41(2):54-7
Wyatt JC, Sullivan F. What is health information? BMJ 2005 Sep
10;331(7516):566-8
Error Checks: credibility and consistency

Credibility: e.g. range test: observed value outside of the beforehand specified
range of plausible values?

Consistency: e.g comparison of age with date of birth and decaese, of gender and
certain pathologies

Error causes a conflict: type of cancer “unknown”  type of differentiation “well
differentiated”

Nominal or ordinal variables : usually deterministic : adenocanthoma is always
“well differentiated” 
Interval variables: usually probabilistic: in case of a “well differentiated”
adenocanthoma the shortest axis of the nucleus is between 2 and 12 µm. Smaller
than 2 is impossible (deterministic) and bigger than 12 is unlikely (probabilistic).

Numerical/visual aids: frequency distributions histograms, contingency tables, box
plots and scatter plots) allow to detect outliers or systematic errors
“Editing” and “Cleaning”

“Editing : the process whereby the accuracy, consistency and
completeness is verified”

"data cleaning“ means:
1) the process of the removal of incomplete records or records
with inconsistent information, or
2) the ongoing iterative process where by using logical and
statistical checks, errors are detected, debugged, improved or
removed

"Cleaning", especially in the sense of removal of erroneous data
has to be reported and should be founded on probabilities
Pitfalls related to hospital discharge records
such as our
Minimal Clinical Data (MCD)

There is a basic tension between using the same data for reimbursement and for measuring
quality.

When the use is reimbursement, there is a tendency to perform coding quickly and to
maximize the coding of complications and co-morbidities.

When the use is to assess quality, however, it is important for coders to have a complete
record and to restrict diagnosis coding to conditions that affect patient care “in terms of
requiring clinical evaluation; or therapeutic treatment; or diagnostic procedures; or
extended length of hospital stay; or increased nursing care and/or monitoring.” 
Diagnoses that “have no bearing on the current hospital stay” or represent “a routinely
expected condition or occurrence” should not be coded.

AHRQ. AHRQuality Indicators e-Newsletter: AHRQ QI TIPS: ICD-9-CM Coding Issues. 2005.
Exercice : DRG creep: a new hospital-acquired disease

case mix the groups of patients requiring similar tests, procedures, and
resources (therapeutic, bed services, staff…) that are treated at a particular
hospital.
1.
Diagnostic Related Groups (DRGs),are a often used case-mix measure. What
was the rationale to use DRGs for reimbursement?
2.
Changing the “sequence” of diagnosis between a chronic renal illness and a
disease such as systemic lupus erythematosus in 159 patients would result in
a cost’s shift by > $800,000. How did the author derive this figure?
3.
Can you give some examples of generating DRG creep?
Miscoding may generate considerable profits. Calculate the gain in % of
the following conditions by switching places between principal and
secondary diagnosis
Principal
diagnosis
Relative
weight
Secondary
Diagnosis
Relative
weight
Pneumonia
0,8961
myocardial infarction
1,7162
Transient ischemic
attack
0,6293
cerebrovascular event
1,2429
Acute bronchitis
Gain in
%
0,7151
chronic obstructive
pulmonary disease
1,1263
Exercice : DRG creep: a new hospital-acquired disease
1.
Diagnostic Related Groups (DRGs),are a often used case-mix
measure. What was the rationale to use DRGs for reimbursement?
Length of stay (LOS)
2.
Changing the “sequence” of diagnosis between a chronic renal
illness and a disease such as systemic lupus erythematosus in 159
patients would result in a cost’s shift by > $800,000. How did the
author derive this figure? (9322-4210)*159= 812808
3.
Can you give some examples of generating DRG creep? Computer
programs, continuing physician education, miscoding (e.g. coding of
a transient ischemic attack (relative weight 0,6293) as a
cerebrovascular accident (relative weight 1,2429)
Hsia DC, Krushat WM, Fagan AB, Tebbutt JA, Kusserow RP. Accuracy of
diagnostic coding for Medicare patients under the prospective-payment
Miscoding may generate considerable profits.
Calculate the gain in % of the following conditions
Principal
diagnosis
Relative
weight
Secondary
Diagnosis
Relative
weight
Pneumonia
0,8961
myocardial infarction
1,7162
91,5
transient ischemic
attack
0,6293
cerebrovascular event
1,2429
97,5
chronic obstructive
pulmonary disease
1,1263
57,5
acute bronchitis
0,7151
Gain in
%
1,7162/0,8961=1,915 =191,5% =>gain of 91,5%
Hsia DC, Krushat WM, Fagan AB, Tebbutt JA, Kusserow RP. Accuracy of diagnostic
coding for Medicare patients under the prospective-payment system. N Engl J
Med 1988;318(6):352-5.
Contents
Part 1: aspects of health care data quality

Types of data: primary and secondary data

Operational definition of data quality

Quality criteria for patient data

Error Checks: credibility and consistency

“Editing” and “Cleaning”

Pitfalls related to hospital discharge records

Exercice : DRG creep: a new hospital-acquired disease
Part 2: data quality in the MCD

Quality of data in the MCD

Data quality assurance MCD: two steps approach

Quality of data versus quality of care: how to deal with data imperfections?
Quality of data in the MCD

Quality is mixed: sometimes good, sometimes bad,
sometimes in between

Example of good quality: Community acquired pneumonia
(CAP)

Example of in between quality : Cesarean Section (CS), fit
for use if one studies cesarean section rates, not if one is
interested in medical practices

Example of bad quality : Acute Myocardial Infarction (AMI)
Conclusion of CAP-audit (N=4093)
14%
4%
82%
CAP confirmed
CAP likely
Other
Carenet versus MCD
35
30
25
Age
distribution
(in%)
20
BE
GE
15
10
5
0
20-29
30-39
40-49
50-59
Age class
60-69
70-79
80-89
90+
40
35
30
25
Incidence
(in %)
20
BE
GE
15
10
5
0
20-29
30-39
40-49
50-59
Age class
60-69
70-79
80-89
90+
30
25
20
Mortality
proportion
(in %)
15
BE
GE
10
5
0
20-29
30-39
40-49
50-59
Age class
60-69
70-79
80-89
90+
28
MCD 2002 (N=58.194)  SPE 2002 (N=58.841)
comparison (in %)
Hypertension
Diabetes
Gestational age >= 37 w
Labor Induction
Epidural anesthesia
Cesarean section
Previous CS
MKG
5,5
1,6
92,9
19,4
48,3
18,1
4,6
SPE
4,9
1,2
92,7
30,1
63,2
17,7
7,6
Comparison ICD-9-CM vs STEMI
ICD-9-CM
STEMI
N
Y
Total
41001
673
848
1521
41011
4414
5080
9494
41021
978
1160
2138
41031
1019
1409
2428
41041
3869
4743
8612
41051
561
599
1160
41061
209
229
438
41071
23554
1783
25337
41081
316
255
571
41091
1971
680
2651
37564
16786
54350
Total
Comparison Monica vs MCD,
2002-4.
Sex
Females
Place
Bruges
Ghent
Males
Bruges
Ghent
RR: rate ratio
Registry
Monica
MCD
Monica
MCD
Monica
MCD
Monica
MCD
Death
26
8
29
9
60
19
75
35
Cases RR
110 3.10
105
111 2.90
100
407 3.15
365
374 2.14
374
95%IC
1.47;6.54
1.45;5.83
1.79;5.59
1.47;3.12
31
Data quality assurance MCD: two steps
approach
Global assessment:
M
LOS, SOI
C
Deviant
Non-deviant
Targeted:
Random:
N=2815
N=2158
D
A
U
D
1240 Disagreement
832 disagreement
I
- 643 days
- 28.33 days
T
Random
S
%
D
Stays analyzed
2158
100
Stays without disagreements
1326
61,5
Stays with disagreements
832
38,6
-28.23
Stays with coding errors without
repercussions regarding APR-DRG /SOI
574
26,7
0
Stays with a change of APR-DRG and
possibly of SOI
142
6,6
8,75
Stays with increased SOI without change
of APR-DRG
51
2,4
148,28
Stays with decreased SOI without change
of APR-DRG
65
3,0
-185,26
S: stays;
D:days
Directed
S
%
D
Stays analyzed
2815
100
Stays without disagreements
1575
56,0
Stays with disagreements
1240
44,0
- 642,68
Stays with coding errors without
repercussions regarding APRDRG /SOI
906
32,2
0
Stays with a change of APR-DRG
and possibly of SOI
74
2,6
-224,61
Stays with increased SOI without
change of APR-DRG
28
1,0
0,59
Stays with decreased SOI without
change of APR-DRG
232
8,2
- 418,65
S: stays;
D:days
Quality of data versus quality of care: how to
cope with data imperfections?

A possible approach to overcome some of the data imperfections consists in carrying
out a sensitivity analysis

Example: Community-Acquired Pneumonia (CAP)
•
Community-Acquired Pneumonia (CAP) remains one of the leading causes of hospital
admission, social and economic costs, and death throughout the world.
•
Inter-hospital comparisons of CAP standardised mortality ratios (CAP-SMRs) may lead
to an improved understanding of contextual influences on CAP.
•
However this type of comparison requires sufficiently reliable data, which can be
challenging if these data serve multiple purposes (e.g., both reimbursement and
quality control)
Sensitivity analysis inCAP
•
We attempted to minimise biases indirectly arising from differences in
medical practices (e.g., IMV/NMV, discharging terminal patients), and
in attitudes (e.g. withholding optimal care in the elderly whether or
not by request of patient or family).
•
We did this through a sensitivity analysis, consisting of five subanalyses starting with (1) the basic model from which one-by-one (2)
patients discharged during the first week and 3) patients aged over
79 years were excluded, and (4) intensity of care and (5) comorbidities were left out from the adjustment.