Document 236418

4/27/2010
Odysseas Pentakalos, Ph.D.
SYSNET International, Inc.
[email protected]
What is an Enterprise Master Patient Index
◦ What problem does it attempt to solve
◦ How does it relate to Health IT
◦ Overall problem decomposed into components
OpenEMPI
◦
◦
◦
◦
Brief history
Current architecture and benefits
Interfacing with OpenEMPI
Future directions and availability
1
4/27/2010
Patient visits multiple separate healthcare
providers
Patient known as
John Smith
Patient known as
John Smythe
Patient
Hospital
Patient known as
J. M. Smith
Laboratory
Radiology
ID | John Smith | …
Three entries merged into single record
Enterprise MPI
Merging two patient demographic databases
2
4/27/2010
An essential preparatory step for reducing
error rates
◦ Replace spelling variations in commonly occuring
words
◦ Use key words found as hints for further parsing
◦ Use parsing to decompose names and addresses
into component fields
Dr. John J. Smith MD
Pre
First Middle Last
Title
DR
John J
Smith MD
16 W. Main ST APT 16
Peter Christen, et. al., “Probabilistic Name and Address Cleaning and Standardization”, 2002
Comparison of records is quadratic in the
number of record
◦ Two files of 300,000 each generate 90 billion pairs
Blocking variables are used for partitioning
Multiple passes are used to prevent errors
Selecting blocking variables
◦ High selectivity factor
◦ Preferably uniformly distributed
Wide variety of blocking algorithms available
◦ Sorted neighborhood
◦ Bigram Indexing
◦ Canopy Clustering
3
4/27/2010
Compare a pair of records to see if they match
Subset of fields called matching variables are used
in the decision
◦ R1=(f1, f2,f3,…,fn) compared against R2=(f1, f2,f3,…,fn)
and f1, f2 are used as the matching variables
A number of issues increase the complexity of this
seemingly simple task. Let’s consider just Western
European names for example:
◦
◦
◦
◦
◦
Spelling variations: Meier/Meyer, Odysseas/Odysseus, etc.
Phonetic variations: Sinclair/St. Clair
Compound Names: Hans-Peter or Smith Miller
Alternative Names: Bill/William, Jim/James, etc.
Use of Initials Only
Various types of errors further conspire to
complicate the problem
◦ When OCR is used typical errors are character
substitutions: q vs. g, m vs. rn, b vs. li
◦ Typographical errors during data entry. 80% of
errors are single character errors.
◦ Phonetic errors resulting from transcription
4
4/27/2010
Phonetic Encoding Algorithms
◦
◦
◦
◦
◦
Approximate String Matching
◦
◦
◦
◦
◦
Soundex: oldest and most well known algorithm
Phonex: aims soundex by pre-processing names
Phonix: extension of Phonex with > 100 rules
NYSIIS: New York State Identification Intelligence System
Metaphone/Double Metaphone
Levenshtein or Edit Distance
Longest Common Substring (LCS)
Q-Grams
Jaro/Jaro-Winkler
Combinations of tecniques
Variety of algorithms available, both deterministic and
probabilistic
Fellegi-Sunter is the most popular probabilistic algorithm
5
4/27/2010
Based on Service Oriented Architecture
principles
Utilizing the idea of a service, interchangeable
implements of services can be plugged into the
system transparently
6
4/27/2010
Native Java API (very comprehensive)
◦ Spring based interface
◦ Java EJB interface
IHE PIX/PDQ Interface
◦ Tested at US IHE Connectathon 2009/2010
◦ Participated in the HIMSS Showcase
New starting with 2.0.4 release is a user interface that
will be growing in functionality over time
7
4/27/2010
OpenEMPI is open-source software released under the Apache
2.0 license
Originally available through Open Health Tools, now at
www.openempi.org (kenai)
https://openhie.projects.openhealthtools.org
http://www.openempi.org
Working on the implementation of new
blocking algorithms
Enhance the user interface to support full
functionality
Integrate OpenEMPI into the Connect Gateway
Provide REST and SOAP web-services
interfaces
Develop a more flexible data import facility
8