4/27/2010 Odysseas Pentakalos, Ph.D. SYSNET International, Inc. [email protected] What is an Enterprise Master Patient Index ◦ What problem does it attempt to solve ◦ How does it relate to Health IT ◦ Overall problem decomposed into components OpenEMPI ◦ ◦ ◦ ◦ Brief history Current architecture and benefits Interfacing with OpenEMPI Future directions and availability 1 4/27/2010 Patient visits multiple separate healthcare providers Patient known as John Smith Patient known as John Smythe Patient Hospital Patient known as J. M. Smith Laboratory Radiology ID | John Smith | … Three entries merged into single record Enterprise MPI Merging two patient demographic databases 2 4/27/2010 An essential preparatory step for reducing error rates ◦ Replace spelling variations in commonly occuring words ◦ Use key words found as hints for further parsing ◦ Use parsing to decompose names and addresses into component fields Dr. John J. Smith MD Pre First Middle Last Title DR John J Smith MD 16 W. Main ST APT 16 Peter Christen, et. al., “Probabilistic Name and Address Cleaning and Standardization”, 2002 Comparison of records is quadratic in the number of record ◦ Two files of 300,000 each generate 90 billion pairs Blocking variables are used for partitioning Multiple passes are used to prevent errors Selecting blocking variables ◦ High selectivity factor ◦ Preferably uniformly distributed Wide variety of blocking algorithms available ◦ Sorted neighborhood ◦ Bigram Indexing ◦ Canopy Clustering 3 4/27/2010 Compare a pair of records to see if they match Subset of fields called matching variables are used in the decision ◦ R1=(f1, f2,f3,…,fn) compared against R2=(f1, f2,f3,…,fn) and f1, f2 are used as the matching variables A number of issues increase the complexity of this seemingly simple task. Let’s consider just Western European names for example: ◦ ◦ ◦ ◦ ◦ Spelling variations: Meier/Meyer, Odysseas/Odysseus, etc. Phonetic variations: Sinclair/St. Clair Compound Names: Hans-Peter or Smith Miller Alternative Names: Bill/William, Jim/James, etc. Use of Initials Only Various types of errors further conspire to complicate the problem ◦ When OCR is used typical errors are character substitutions: q vs. g, m vs. rn, b vs. li ◦ Typographical errors during data entry. 80% of errors are single character errors. ◦ Phonetic errors resulting from transcription 4 4/27/2010 Phonetic Encoding Algorithms ◦ ◦ ◦ ◦ ◦ Approximate String Matching ◦ ◦ ◦ ◦ ◦ Soundex: oldest and most well known algorithm Phonex: aims soundex by pre-processing names Phonix: extension of Phonex with > 100 rules NYSIIS: New York State Identification Intelligence System Metaphone/Double Metaphone Levenshtein or Edit Distance Longest Common Substring (LCS) Q-Grams Jaro/Jaro-Winkler Combinations of tecniques Variety of algorithms available, both deterministic and probabilistic Fellegi-Sunter is the most popular probabilistic algorithm 5 4/27/2010 Based on Service Oriented Architecture principles Utilizing the idea of a service, interchangeable implements of services can be plugged into the system transparently 6 4/27/2010 Native Java API (very comprehensive) ◦ Spring based interface ◦ Java EJB interface IHE PIX/PDQ Interface ◦ Tested at US IHE Connectathon 2009/2010 ◦ Participated in the HIMSS Showcase New starting with 2.0.4 release is a user interface that will be growing in functionality over time 7 4/27/2010 OpenEMPI is open-source software released under the Apache 2.0 license Originally available through Open Health Tools, now at www.openempi.org (kenai) https://openhie.projects.openhealthtools.org http://www.openempi.org Working on the implementation of new blocking algorithms Enhance the user interface to support full functionality Integrate OpenEMPI into the Connect Gateway Provide REST and SOAP web-services interfaces Develop a more flexible data import facility 8
© Copyright 2024