Annotation Guidelines

THE ANNOTATION GUIDELINE MANUAL:
EXTRACTING ADVERSE DRUG EVENT INFORMATON FROM
DISCHARGE SUMMARIES AND PROGRESS NOTES IN ELECTRONIC MEDICAL RECORDS
Version 2.0 April 22, 2015
Steven Belknap, Elaine Freund, Nadya Frid, Zuofeng Li, Rashmi Prasad, Balaji Ramesh, Hong Yu
Contents
INTRODUCTION ....................................................................................................................................... 3
General Background............................................................................................................................ 3
Guidelines Background ....................................................................................................................... 3
NAMED ENTITY OR ANNOTATION FIELDS* ............................................................................................. 5
PHI and PII Annotation ........................................................................................................................ 5
Medication Annotation: Drug and Drug Attributes ............................................................................. 6
Medication Annotation: Medication Related Entities and Attributes ................................................. 7
Assertion Category .............................................................................................................................. 8
MedDRA Annotation ......................................................................................................................... 10
ANNOTATION OF RELATIONS ................................................................................................................ 12
ANNOTATION PRACTICE ....................................................................................................................... 13
General Considerations ..................................................................................................................... 13
Choosing a span ................................................................................................................................ 13
Anaphoric pronouns ......................................................................................................................... 13
Articles .............................................................................................................................................. 13
Titles ................................................................................................................................................. 13
Prepositions ...................................................................................................................................... 14
Frequency ......................................................................................................................................... 14
Drugs ................................................................................................................................................. 14
Test results ........................................................................................................................................ 15
Longitudinal Information .................................................................................................................. 15
REFERENCES .......................................................................................................................................... 17
APPENDIX 1: Additional Annotation Practice Examples and Rules for Interannotator Agreement ....... 19
Choosing a span ................................................................................................................................ 19
S/S/LIF Examples ............................................................................................................................... 19
Titles ................................................................................................................................................. 19
Assertion ........................................................................................................................................... 19
Anaphoric Pronouns vs. Co-referent Items ....................................................................................... 20
Drug .................................................................................................................................................. 20
Adverse Event ................................................................................................................................... 21
MedDRA ............................................................................................................................................ 21
Severity ............................................................................................................................................. 21
Test Results ....................................................................................................................................... 22
APPENDIX 2: Entity and Attribute Tables .............................................................................................. 23
APPENDIX 3: Protected Health Information (PHI)” and “Personally Identifiable Information .............. 24
APPENDIX 4: Routes of Drug Administration and Abbreviations .......................................................... 25
APPENDIX 5: Frequency of Drug Administration and Abbreviations ..................................................... 30
Appendix 6: Deviations from i2b2 Guidelines ....................................................................................... 31
APPENDIX 7: Annotation Tool Notes ..................................................................................................... 32
NLP Objectives: ................................................................................................................................. 32
MedDRA ............................................................................................................................................ 32
Export the Annotation to XML .......................................................................................................... 32
Semi-Automated Annotation with the BioNLP named entity tagger−Lancet .................................... 33
Summary of Annotation Processes and Tooling Changes for the ADE Pharmacovigiliance Project .. 34
2
INTRODUCTION
General Background
An adverse event (AE) is an injury to a patient, and an adverse drug event (ADE) is "an injury
resulting from a medical intervention related to a drug" (1). ADEs are common and occur at a
rate of 2.4─5.2 per 100 hospitalized adult patients (1–4). Each ADE is estimated to increase the
length of hospital stay by 2.2 days and to increase the hospital cost by $3,244 (3,5). Severe
ADEs are between the fourth and sixth leading causes of death in the United States (6).
Significant healthcare savings could be realized through prevention of ADEs and through early
detection and mitigation of ADEs (5,7,8). When a clinician recognizes an ADE, a hospital
system typically prompts an appropriate response, such as discontinuation of the drug,
adjustment of dose, administration of an antidote (e.g., blood transfusion, antihistamines,
antiarrhythmics, or intravenous fluid resuscitation), or other action. While particular instances of
ADEs may be recognized and appropriately ameliorated, these events are often not coded in
diagnostic or billing fields of the medical record and are therefore “lost” to
pharmacoepidemiologists, regulatory agencies, and clinicians. One result of this loss is a
paucity of high–quality information that can lead to errors in assessment of toxicity from cancer
drugs (9). The lack of timely and accurate ADE information has led to confusion for patients and
prescribers, especially when the FDA takes regulatory action (10) that appears to be
inconsistent with the available data, as recently happened with clopidogrel (11).
Studies have shown that the occurrence of the ADE is often buried in the EMR narrative
(e.g.,(12)). The ADE is not separately recorded in the form of diagnosis code or other data
accessible in the structured fields and is therefore difficult to detect and assess. However,
manual abstraction of data from discharge notes and from other unstructured text remains a
significant impediment to progress in pharmacovigilance research. Rapid, accurate, and
automated detection of ADEs in cancer patients (CADEs) would provide significant cost and
logistical advantages over manual ADE detection (e.g., chart review or voluntary reporting) (13).
Consequently, robust biomedical natural language processing (BioNLP) approaches that
accurately detect ADEs in EMR narratives would be of great interest to other pharmacovigilance
researchers and also would have potential application in clinical settings.
Guidelines Background
These guidelines are being used to annotate patient Electronic Medical Records (EMRs) which
will be made publicly available as a corpus with high quality annotation of ADEs. This corpus will
also be used to train an innovative NLP system which is part of pharmacovigilance toolkit. The
toolkit will be integrated into the open source translational research platform i2b2 (14), so these
annotation guidelines generally align with the i2b2 (14) guidelines. Annotation objectives are
the identification of relevant named entities (disease, medications and ADEs); and discourse
relations (e.g., causal, temporal and contrastive relations) between them; severity and Naranjo
element extraction method for assessing causality.
The annotation tools use Protégé with the Knowtator plugin (15 ) and incorporate, HHS PHI and
PII terms, the Naranjo scoring system (16 ) and MedDRA (17) terms in the user interface. The
guidelines have been iteratively developed during usage and with experts across many
3
domains. The guidelines and tooling will continue to develop and be refined throughout the
annotation process and as research progresses.
Short videos demonstrating use of the annotation tooling are available (you may want to use
another browser if the links do not open in IE). Alternatively you can go to the UMass BioNLP
Annotation Resource Page:
1 Getting Started - Annotation
2 Annotation Tool Orientation
3 First Annotation PHI
4 Spans and Corrections
5 Relations Annotation
6 Adverse Events and MedDRA
7 More on Attributes
In brief, you will open a record in the annotation tool and it will look similar to the picture below.
The first panel lists the classes [1], the second panel is the medical record window [2] and the
third panel is an attribute annotation window [3]. To annotate most classes, click the class in the
left panel or in the fast annotate bar [4] and highlight it in the middle panel. Some additional
attributes and associations [5] are made from the class panel and the annotation window. A few
are made from just the annotation window, i.e. Period.
There is a website with the annotation guidelines, videos on how to use the tool, and other
resources. http://ummsres12.umassmed.edu/jt/index.php/annotation
4
NAMED ENTITY OR ANNOTATION FIELDS*1
PHI and PII Annotation
To enact the Health Insurance Portability and Accountability Act (HIPAA)(18), the Dept. of
Health and Human Services published a national standard for the electronic exchange, privacy
and security of health information. The “Privacy Rule” protects all individually identifiable health
information transmitted in any form and calls this information “Protected Health Information
(PHI)” and “Personally Identifiable Information (PII).” There are 18 common identifiers
associated with PHI and PII and which must be removed to de-identify data for use or release.
These include things such as name, address, date, Social Security Number, etc. and the
complete list of PHI is in Appendix 1. PHI is annotated to build the named entity recognition in
NLP but also for removal during de-identification. How the PHI classes are to be used is
described below.
Date: This class covers all aspects of date (except year) directly related to an individual,
including birth date, admission date, discharge date, date of death.
Age over 89: Another date identifier applies to all ages over 89 and all
elements of dates (including year) indicative of such age, except that
such ages and elements may be aggregated into a single category of
age 90 or older.
Medical Record Number: Use this class to include medical record
numbers, health beneficiary plan numbers and account numbers of any
type.
Social Security Number: self-explanatory
Location: It will be valuable for machine learning to annotate address
with some granularity. Most of these location identifiers are selfexplanatory but Named Sites would include things such as Universities,
Organizations, named buildings, Landmarks, etc.
Name: All aspects of any name are to be annotated, first name, last name, initials, names
following titles and indicators, nicknames, logins, handles.
Identifiers: This class covers certificate/license numbers; vehicle identifiers and serial numbers
including license plates; device identifiers and serial numbers; and biometric identifiers (which
would be mostly images and we almost surely will not see that type of data).
Electronic Identifiers: e-mail, web sites, IP addresses, username and password
1
* Classes are underlined in the color used as a highlighter in annotation
5
Medication Annotation: Drug and Drug Attributes2
When an adverse event is recognized, a physician will discontinue the drug, adjustment the
dose, or administrator an antidote. Drug and drug specific attributes are important elements to
annotate. Information will be used to assess causal relations between an adverse event and
drug administration.
Field
Definition
Drug name [Entity]
Eg1: Lotensin 20 mg p.o. daily.
Substances for which the
patient has experienced or
Eg 2: He was started on
will experience; including
azithromycin and ceftriaxone.
drug class name or
medications referred with
pronouns. Drug name must
be mentioned either in USP
published drug list or
included in the orange book.
Dosage [Attribute]
The amount of a single
medication used in each
administration.
- Type (Discrete/Continuous)
- Strength
(Concentration/Amount)
- Form (solid, tablet, liquid,
injectable, cream)
Route [Attribute]
Example
Eg 1: In the ER, the patient received
heparin 4000 units bolus, then 1000
units per hour.
Quantified description of the Eg 2: Digoxin 0.125 mg every other
drug administered in each day.
administration.
Method for administering
the medication.
- PO, IV, Topical, Epidural,
Sublingual, Intramuscular, etc. A list with abbreviations,
see Appendix 4
Eg 1: She continues to receive
antibiotics intravenously.
Eg 2: Glyburide 5 mg orally twice a
day.
How often each dose of the Eg 1:A patient was prescribed
medication should be taken Melphalan 5mg (1 tablet) daily.
including both discrete and
- Times a day, etc.
Eg 2: Labetalol 300 mg by mouth
- Specified time of day or hours continuous values.
three times a day.
Frequency [Attribute]
Table with Abbreviations,
see Appendix 5
2
Yellow highlighted terms and spans in this document indicate these are annotatable, but are not an indication of
class type.
6
Field
Duration [Attribute]
Definition
Example
How long the medication is Eg 1: The patient received Taxol for
one month.
to be administered.
- Days, weeks, months, etc.
Eg 2: Continue home medications
and Flagyl 500 mg 1 tablet p.o. q.i.d.
for 10 days.
Attributes shared with other entities are described in subsequent sections. They are: Adverse
effect, Assertion, Outcome and Reason (Indication).
Medication Annotation: Medication Related Entities and Attributes
Elements beyond the drug administration to annotate include: why a drug is being given, the
injury resulting from a medical intervention related to a drug, and differentiating the ADE from
other signs and symptoms.
Field
Definition
Indication
Medical conditions for which Present: The patient was diagnosed
the medication is given in with hypertension and was treated
with Accupril.
the past or the present.
[annotated in the class
navigation bar and appears as
the Drug attribute “Reason”
when a relation is created]
Adverse Event (AE)
Example
Past : He did have some
hypokalemia which was treated with
p.o. K-Dur
Drug related injury to a
patient.
Present: She experienced a
hypersensitivity reaction while
receiving intravenous Taxol
(paclitaxel) therapy.
Past: Patient had anaphylaxis after
getting penicillin 10 years ago.
Signs, Symptoms, Abnormal
Test Findings, and Diseases
(S/S/LIF)
The patient has a history of COPD.
Medical signs, symptoms
and diseases that are
neither adverse effects nor
reasons for administering a
medication.
7
Field
Definition
Example
Severity
Intensity of an adverse
effect.
Eg 1: Severe headache, moderate
chest pain.
[an attribute of Indication, AE
and S/S/LIF]
Eg 2: The PLB has 50% stenosis
just proximal to a widely patent stent.
(annotated in class navigation
bar, but must be added in the
right annotation window for
Indication and S/S/LIF)
Outcome (default notMentioned)
Annotate the Outcome field for adverse events where possible. It has four values: recovered,
not completely recover, died, not mentioned (most common). The default value for this field is
notMentioned.
If an adverse event's Assertion is Absent, the Outcome field is not annotated.
Period (default current)
Annotate temporal information for several entities: Adverse effect, Indication and S/S/LIF. This
is done using the Period attribute. Period values are: current and history. The default value for
this field is current.
Assertion Category
Assertion (modality) expresses a speaker’s degree of commitment to the expressed
proposition’s believability, obligatoriness, desirability, or reality. Ascribe assertion values to
medications and diseases, namely, to “drug”, “adverse effects”, “indication”, and “other signs,
symptoms and diseases” entities.
Present (default)
“Present” means that problems associated with the patient can be present. The drugs the
patient receives are also annotated as Present.
Examples:
 a female patient died while receiving Taxol (Paclitaxel) therapy for the treatment of
endometrial cancer
 The patient had a history of hypertension
 She is on oxycodone 10mg for pain
In our annotation, the positive value ‘present’ is the default value, i.e. if an entity does not have
any assertion value ascribed it means that the value is positive/present.
In the examples below, bold is used for entities with positive assertion value where some other
value can be suspected:
8




At this point in time, he does not require any more antibiotics.
she has since been discontinued on digoxin
His enalapril was changed to lisinopril
His aspirin was held
Comment: replaced, held or discontinued drugs are annotated as “positive” and not as “absent”,
since they used to be taken.



The anaphylactic shock was possibly related to Taxol (relation between Taxol and
anaphylactic shock is Adverse)
The anaphylactic shock was most likely related to Taxol (relation between Taxol and
anaphylactic shock is Adverse)
The anaphylactic shock was not related to Taxol (no relation annotated between Taxol
and anaphylactic shock)
Comment: anaphylactic shock is ascribed positive value, since it did occur. It is only its relation
to Taxol that is questioned or negated and we do not ascribe assertion to relations in the current
schema.


The second episode of malaise, loss of consciousness, undetectable pulse, and tension
were identified as being part of shock. Since these were considered manifestations of
the shock and anaphylactoid reaction, the previously reported separate events of
dyspnea, malaise, abdominal pain, and erythema have been deleted from the file
Supplemental information received from the reporter via BMS Japan on January 15,
2002 indicated that the events dyspnea, blood pressure decreased and facial hot
flushes were changed to anaphylactic shock
Comment: Even if certain symptoms were identified as being part of another symptom and
deleted from the file or renamed, they still did exist and need to be annotated as positive.
Absent
“Absent” asserts that the problem does not exist in the patient. Also annotate drugs the patient
did not receive as Absent.
Examples:










no known drug allergy
the patient denied any dizziness, shortness of breath…
Without syncopal episodes
The patient currently is pain free
There were no clinical signs of congestive heart failure
CVA has been ruled out (cf a consult was placed to rule out CAD where CAD receives
possible value)
She is not a candidate for anticoagulation
Rule out congestive heart failure but doubt (cf CVA has been ruled out where CVA
receives negative value)
The patient had no fever
No antibiotics were given
9
Comment 1: Do not annotate the outcome of “absent” adverse events.
Comment 2: Link “absent” adverse events to drugs the same way we link “present” ones.
Possible
”Possible” asserts that the patient may have a problem, but there is uncertainty expressed in the
note.
Examples:
 Questionable DVT
 Question of DVT
 Their differential is gliomatosis versus radiation effect.
 Possible anterolateral ischemia
 a consult was placed to rule out CAD
 Rule out congestive heart failure but doubt
 The differential diagnosis for his fever included possible inadequately pneumonia versus
bacteremia versus UTI versus CSF infection
Conditional
”Conditional” is used when the mention of the medical problem asserts that the patient
experiences the problem only under certain conditions.
Hypothetical
“Hypothetical: is used for medical problems the patient may develop.
Examples:
 Should her symptoms return or headache develop, please discontinue to taper and
notify Dr. **NAME[ZZZ]'s office.
 Call Dr. X if increased swelling or redness of the left lower extremity or starts to have
difficulty breathing
Not associated with Patient
The mention of the medical problem is associated with someone who is not the patient.


Family history of prostate cancer
Brother had asthma
If needed the classification can be further detailed. For example, a drug can be “absent”
because the doctor did not recommend it or because the patient refused it; or the probability of
a disease can vary from very low to very high.
MedDRA Annotation
The adverse effects are mapped to the concepts from MedDRA. A window with possible
matches allows you to select a MedDRA Preferred Term (PT). If there is no meaningful match,
you can search for its synonym. Follow this link to browse the most current version of the
10
MedDRA ontology from anywhere except the Annotation Server.
http://ummsres14.umassmed.edu/OntoSolr/browse
On the Annotation Server, please use this URL to search and browse MedDRA terms:
http://ummsqhslxweb01.umassmed.edu/OntoSolr/browse
When you find a selection, it can be entered manually. For example, “expire” does not produce
a result but searching on “death” will. Sometimes a fairly generic term is the best choice.
Comment: Drug allergy is considered an adverse event in the past if a specific drug is
mentioned (e.g. ALLERGIES: IMDUR). The allergy is assigned "Drug hypersensitivity"
MedDRA:10013700 and "History" value of "Period" field. Note that in spans like (no) drug
allergies, drug allergy is annotated as Other signs, symptoms and diseases and not as Adverse
event since no specific drug is mentioned.
See the UMass BioNLP Annotation Resource Page:
For videos demonstrating MedDRA annotation.
11
ANNOTATION OF RELATIONS
Annotate relations (connections) between entities and their attributes. See Appendix 2 for a full
table of possible relations.
Dosage, route, frequency, duration, indication and adverse event are drug attributes and are
related to their drugs. To annotate a relation between a drug and its attribute, left-click on the
drug span, then right-click on the attribute span. The attribute then gets highlighted in a dotted
box. Continue this process for each attribute associated with the drug.
Severity can be linked to Indication, Adverse Event or Other signs, symptoms, abnormal Test
Findings and diseases. To annotate a relation between an entity and its severity marker leftclick the entity, then left-click 'Add instance' icon (diamond+) at the Severity section on the right
panel of the annotation tool. Choose from the pop up menu (it auto selects if there is just one
choice). If an entity has several attributes, for example, a drug has Dosage, Route and
Frequency attributes, each attribute is linked individually. If an entity has several attributes from
the same category, for example, a drug caused multiple adverse effects, annotate drug and
connect to it adverse effect attribute. This will be repeated for each adverse effect.
Examples:
Drug’s attributes
Context
She receives Albuterol 2 puffs p.o. q4-6h.
The patient was treated with ampicillin for two
weeks.
He later received chemotherapy for his lung
cancer.
Patient's death was due to anaphylactic shock
caused by the intravenously administered
penicillin.
Disease’s attributes
He has severe diarrhea.
Relation
Dosage (Albuterol, 2 puffs)
Route (Albuterol, p.o.)
Frequency (Albuterol, q4-6h)
Duration (Ampicillin, two weeks)
Reason (lung cancer, chemotherapy)
Adverse (penicillin, anaphylactic, shock)
Severity (diarrhea, severe)
12
ANNOTATION PRACTICE
General Considerations





Do not make assumptions
Do not consider longitudinal information in this workflow– annotate information in current
record
Do not diagnose
Do not annotate a patient’s mistaken beliefs when medical professional commentary is
contradictory
Do annotate general terms such as “problem” and “disease”
Choosing a span
We include most disease complements in its names.
For example, annotation of adverse effects:
 decreased blood pressure (71/53 mmHg) and not just decreased blood pressure
 shock to the liver and breast and not just shock
Anaphoric pronouns
Anaphoric pronouns are the pronouns that refer back to another word or phrase. We do not
annotate anaphoric pronouns like it or this in examples below even though these refer to entities
we do annotate:
The patient had diplopia but it was resolved completely.
The patient had anaphylactic shock. This was caused by antihistamines.
Articles
Indefinite article "a" is not included in annotated entities: in the noun phrase a malignant tumor
of the breast, the span annotated is malignant tumor of the breast not a malignant tumor of the
breast.


Definite article "the" is not included in disease names, either. In the example below the
adverse effect is “anaphylactic shock” not “the anaphylactic shock”:
The anaphylactic shock was characterized by nausea.
Titles
Certain adverse effect reports include clinical trial title, for example:

Protocol title: (NON-BMS/RETRO TAXOL) RETROSPECTIVE DATA COLLECTION
TAXOL IN PATIENTS WITH SOLID TUMORS. Investigator causality assessment was
not provided.
Do not annotate the drug name (Taxol) and its related information in the title, since the name of
the clinical trial may include drugs that an individual patient in that clinical trial does not receive.
For example a clinical trial might have this name:

"A randomized, controlled, blinded clinical trial comparing miraclecillin to wondersporin."
13
In this trial, some of the patients got miraclecillin and other patients got wondersporin, but no
patients got both.
Entities like Suspect Drug/Causality in the example Suspect Drug/Causality: paclitaxel
are treated like titles and not annotated.
AE in the example below is not annotated either:
 AE outcome: The patient experienced death on [words marked]
EMR section title ALLERGIES can be annotated.
 “ALLERGIES: amoxicillin and vancomycin” Allergy is an adverse event and is linked to
each drug separately.
 If an allergic reaction occurred in the past, the allergy remains and is annotated as
“present.”
 “ALLERGIES: none” Annotate allergies where there is no drug named as a S/S/LIF with
the assertion “absent”
Prepositions
We do not include prepositions when annotating duration spans, e.g. for three weeks or for an
unknown period of time we do not include for in duration spans.
In the noun phrase via intravenous drip we only annotate the intravenous drip span
Frequency
The adjective weekly is annotated as frequency, e.g., weekly Taxol
We annotate x2 tabs a day as “2 tabs a day” span (not “x 2 tabs a day").
Drugs
Non-drug examples
Non-drug treatment options like blood transfusions, fluids, normal saline, oxygen and red
packed cells in the examples below are not annotated as drugs:
 Given multiple blood transfusions
 Pressors continued with fluids.
 He was admitted to the hospital and hydrated with normal saline.
 The event was treated with steroids and oxygen.
 Pancytopenia, treated with G-CSF, erythropoetin and red packed cells
Not annotating drug
Do not annotate drug in drug relationship phrase and similar contexts since it does not refer to a
specific drug:
14


According to the pharmacovigilance center reporter and to French methodology of
causality assessment, the drug relationship is unable to determine
According to the pharmacovigilance center reporter and to the French methodology of
causality assessment, drug relationship is probable
We do not annotate the term drug when it does not denote a specific drug, however we
annotate more specific terms like chemotherapy or pain medication.
Do not annotate the relation of a drug and its indication if they are separated by more than a
sentence.
Patient comes in for evaluation of psoriasis and …….4 SENTENCES….patient is using
Motrin 200 mg tablets up to 3 tablets twice a day
Psoriasis is the indication for Motrin, but it is not useful for NLP. The indication and the drug
should be in the same sentence or one sentence in either direction to be useful for NLP.
Test results
Both normal and abnormal test result lists in the form of uncommented numbers are not
annotated as diagnoses:
Laboratory data showed sodium of 143, potassium 4.1, chloride 105, CO2 of 26,
BUN 4, creatinine 0.7, glucose 90, calcium 9.4. White count 5.4, hemoglobin
12.7, hematocrit 27.7, platelet count 247.
We assume that if a certain test or measurement result is significantly abnormal, the diagnosis
is mentioned in the text separately. For example, if a patient’s blood pressure was 180/100, the
report will most likely mention “high blood pressure.”
Culture, culture pending are lab tests and are not annotated. When the result is positive, you
would annotate it as part of a diagnosis.
 ‘….took a surface culture and nothing…..’ and ‘Culture pending.’
Comment lab results are annotated.
 “3 ova and parasites being negative, Giardia being negative in a stool culture that as
negative.” Annotate these items and assert they are absent.
Longitudinal Information
Annotate what is in the specific record you are viewing. Do not make assumptions or consider
longitudinal information you may know (temporal aspects of adverse events are annotated in a
separate workflow with Naranjo scoring).
 “HISTORY OF PRESENT ILLNESS: ……..history of Burkett’s lymphoma….” Patient is
within a few months of treatment, which is not a timetable to consider it cured. The
patient in fact does go on to have a recurrence of the lymphoma, but here it is annotated
S/S/LIF and as history.
15
16
REFERENCES
1.
Bates DW, Cullen DJ, Laird N, Petersen LA, Small SD, Servi D, et al. Incidence of adverse
drug events and potential adverse drug events. Implications for prevention. ADE
Prevention Study Group. Jama. 1995;274(1):29.
2.
Classen DC, Pestonik SL, Scott Evans R, Lloyd JF, Burke JP. Adverse drug events in
hospitalized patients: excess length of stay, extra costs, and attributable mortality.
Obstetrical & Gynecological Survey. 1997;52(5):291.
3.
Bates DW, Spell N, Cullen DJ, Burdick E, Laird N, Petersen LA, et al. The costs of adverse
drug events in hospitalized patients. Adverse Drug Events Prevention Study Group. Jama.
1997;277(4):307.
4.
Nebeker JR, Hoffman JM, Weir CR, Bennett CL, Hurdle JF. High rates of adverse drug
events in a highly computerized hospital. Arch. Intern. Med. 2005 May 23;165(10):1111–6.
5.
Handler SM, Altman RL, Perera S, Hanlon JT, Studenski SA, Bost JE, et al. A systematic
review of the performance characteristics of clinical event monitor signals used to detect
adverse drug events in the hospital setting. J Am Med Inform Assoc. 2007 Aug;14(4):451–
8.
6.
Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized
patients: a meta-analysis of prospective studies. JAMA. 1998 Apr 15;279(15):1200–5.
7.
Classen DC, Pestotnik SL, Evans RS, Burke JP. Description of a computerized adverse
drug event monitor using a hospital information system. Hosp Pharm. 1992 Sep;27(9):774,
776–9, 783.
8.
Kaushal R, Jha AK, Franz C, Glaser J, Shetty KD, Jaggi T, et al. Return on investment for
a computerized physician order entry system. J Am Med Inform Assoc. 2006
Jun;13(3):261–6.
9.
Belknap S. Review: beta-blockers for hypertension increase risk for new-onset diabetes
compared with nondiuretic antihypertensive agents. ACP J. Club. 2008 Apr;148(2):38.
10. FDA Announces New Boxed Warning on Plavix Alerts patients, health care professionals
to potential for reduced effectiveness.
http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm204253.htm.
11. Paré G, MD SR., Yusuf S, Anand SS, Connolly SJ, Hirsh J, et al. Effects of CYP2C19
Genotype on Outcomes of Clopidogrel Treatment. New England Journal of Medicine.
2066–78.
12. Jha AK, Kuperman GJ, Teich JM, Leape L, Shea B, Rittenberg E, et al. Identifying adverse
drug events: development of a computer-based monitor and comparison with chart review
and stimulated voluntary report. Journal of the American Medical Informatics Association.
1998;5(3):305.
17
13. Classen DC, Pestotnik SL, Evans RS, Burke JP. Computerized surveillance of adverse
drug events in hospital patients. Quality and Safety in Health Care. 2005 Jun 1;14(3):221–
6.
14. Uzuner O. Second i2b2 Workshop on Natural Language Processing Challenges for Clinical
Records. In: AMIA... Annual Symposium proceedings/AMIA Symposium. AMIA
Symposium. 2008. p. 1252.
15. Ogren, P.V. Knowtator: A Protégé plug-in for annotated corpus construction. In:
Proceedings of the 2006 Conference of the North American Chapter of the Association for
Computational Linguistics on Human Language Technology: companion volume:
demonstrations.(2006) 273–275.
16. Naranjo, Cláudio A., et al. A method for estimating the probability of adverse drug reactions.
Clinical Pharmacology & Therapeutics 30.2 (1981): 239-245.
17. Mozzicato P[1]. MedDRA: An Overview of the Medical Dictionary for Regulatory Activities.
Pharmaceutical Medicine. 2009 Apr 2;23:65–75.
18. National Institutes of Health, HIPAA Privacy Rule: Information for Researchers. Web.
http://privacyruleandresearch.nih.gov/default.asp Accessed 2 April 2015.
19. U.S. Food and Drug Administration, FDA Data Standards Manual: Route of Administration.
Web.
http://www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/
ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm Accessed 2
April 2015
20. Trotti A, Colevas AD, Setser A, Rusch V, Jaques D, Budach V, et al. CTCAE v3.0:
development of a comprehensive grading system for the adverse effects of cancer
treatment. Semin Radiat Oncol. 2003 Jul;13(3):176–81.
18
APPENDIX 1: Additional Annotation Practice Examples and Rules for
Interannotator Agreement
Choosing a span
Annotate the accurate span even if it is long and include coordinated locations. For example:
 “fibromyalgia causing pain in the neck and paracervical region and down the arm” This
is a long span but the pain is in both the neck and arm.
 “adenopathy in the supraclavicular or axillary regions”
Annotate the location as part of a span, especially when in the same sentence. Otherwise do
not annotate location terms that are distant to the S/S.
 pain of the lesion on the right shoulder
 swelling on the right shoulder. It is in the anterior aspect of the shoulder….
 pustular collection underneath the end of the nail about 5 days ago. It is her right middle
finger.
S/S/LIF Examples

“Sit hunched over…” This is a sign of the back pain being described.


“Cannot really straighten out” This is a sign of the back pain being described.
“Hepatitis C, genotype 1b” Annotate genotypes, subtypes, variants, etc. when given.
This may give information related to treatment decisions.
"hepatitis B vaccination" Hepatitis B will be pre-annotated but the context is the vaccine
vs. the disease, so unmark it.

Titles
If indication is the title in a list of symptoms, you can use it to associate with drug. If the phrasing
of the sentence with the drug also makes the association clear, use it as well. This can be one
or two annotations.

Joint pain. He is willing to try glucosamine to help with joint pain but……
Assertion
Assertion values should not exclude each other. A span can be assigned two assertion values:
 no family history of diabetes: Absent+NotAssociatedWithThePatient
 highly probable: Present+Possible
 unlikely: Absent+Possible
Hypothetical vs Conditional
If/than words are cues to use the assertion ‘conditional’.
 “If normal, treat with oral anti-inflammatory medications and” Here the anti-inflammatory
medication is annotated as ‘conditional’.
Should symptoms return would indicate using the assertion ‘hypothetical’.
19
Present and Absent
It is possible to have sentences containing both present and absent information.
 Patient with chronic Hepatitis C “denies any sequela of hepatitis” is annotated as 2
spans: hepatitis is present and sequela of hepatitis is absent
Current and History
Annotate what is in the record you are viewing. Do not make assumptions or consider
longitudinal information you may know (the purpose is machine learning).
 “he no longer has the abdominal pain that he originally presented” is annotated as
S/S/LIF for abdominal pain with the assertion of absent (vs. history, absent).

Anaphoric Pronouns vs. Co-referent Items
Coreference is when two or more expressions in a text refer to the same thing. We do not
annotate anaphoric pronouns, e.g. “it” below. However, we do annotate other coreferent items.


He has left-sided abdominal pain but it is not hurt with pressure.
“The patient was diagnosed with lung cancer in 2010. Now the disease progressed.”
In this example, “it was down” is meaningless on its own as well as “it” being an anaphoric
pronoun, so annotate just the first half
 Blood pressure was high 180/110, rechecked it was down to 137/100.
Drug
Chemotherapy can be tricky because regimen names often contain information about drug,
dosage, and duration.
 He received CHOP chemotherapy
 CHOP chemotherapy for 6 cycles
CHOP chemotherapy is a span because it is a drug regimen and CHOP names the four drugs
used in combination. It also provides information about duration. Cycles are the number of times
the treatment is repeated at a specified time interval. See http://www.lymphomation.org/chemoCHOP.htm
Annotate drugs even when they are self-administered – legally or illegally. For example if a
patient has shoulder pain and takes over the counter Tylenol and Percocet bought on the street,
annotate shoulder pain, Tylenol and Percocet.
Difficult Example of Indication to Drug Link Followed by Logic in How to Annotate:
/ID The patient developed febrile neutropenia 1/27 and blood cultures from that day revealed
pseudomonas and Strep viridans. Urine cultures showed 35000 colonies of E. faecalis.
Cefepime (1/27-2/27) was initiated (synercid was not used due to a history of adverse effect:
20
myalgias) in addition to the acyclovir and diflucan (stopped 2/5) which were started earlier for
prophylaxis. Daptomycin was started on 1/29 for VRE. Caspofungin was started 2/7, 2 days
after diflucan was stopped as fevers persisted. The patient remained afebrile for several weeks
until 2/27 at which point Cipro and meropenem were initiated, though no microbes were initiated
on culture. The patient remained on acyclovir, caspofungin, ciprofloxacin, daptomycin, and
meropenem (d/c'd 3/10), for the rest of this hospitalization, and remained afebrile since 3/1. He
will be discharged on p.o. ciprofloxacin (to d/c when ANC >1000), voriconazole, and acyclovir-the latter two medications to take indefinitely.
Suggested Indication to Drug Links:
Pseudomonas (Cefepime)
Strep Viridans (Cefepime)
E. Faecalis (Cefepime)
VRE (Daptomycin)
Logic:
The paragraph is indicating that they did a blood culture which revealed Pseudomonas
and Strep viridans. A urine culture was also done which showed E. Faecalis. These are
all bacterias. The paragraph continues by stating Cefepime was initiated. If you research
this drug you will see its an antibiotic (Cephalosporin class) and this antibiotic is used for
all the 3 bacterias mentioned previously. That is why I provided the guidance as this.
Secondly, the paragraph also is clearly providing the following indication to drug relation:
"Daptomycin was started on 01/29 for VRE". That is why I provided this second
guidance.
The rest of drugs on the paragraph are mentioned, but there is no clear direction as to
why they were provided.
Adverse Event
….renal azotemia ,possibly caused by Vancomycin.
 Patient has renal azotemia, but it is also a possible adverse event. Annotate the
sign/symptom renal azotemia. Annotate the renal azotemia again as an adverse event,
assert as possible and create the relation to the drug Vancomycin.
MedDRA
Annotate adverse events with the MedDRA term that best applies to the span.
 “pain in the left back and the left upper abdomen” is annotated as one span. If pain is an
adverse effect, there are actually two different MedDRA matches: back pain
(MedDRA:10003988) and abdominal pain (MedDRA:10000081). Use the more generic
MedDRA term for pain to relate to the full span (MedDRA:10033371).
Severity

“He is rather diffusely tender to palpation” Here ‘rather’ can be a severity meaning ‘to
some degree’
21
Test Results

“LABORATORY DATA: Alpha-fetoprotein tumor marker is 4.3” Do not annotate in this
workflow, it is an uncommented lab result. Do not diagnose or interpret here.
22
APPENDIX 2: Entity and Attribute Tables
This table indicates the attribute fields shown in the Annotation Window for each Entity type.
Note that some fields may be present but are not applicable. For example Adverse is an entity
and attribute for Adverse Effect and is duplicative.
Attribute
Entity
Adverse effect
Drug
Indication
S/S/LIF
Adverse
NA*
Yes
No
Yes?
Assertion
Yes
Yes
Yes
Yes
MedDRA
Yes
No
No
No
Outcome
Yes
No
No
No
Dose
No
Yes
No
No
Duration
No
Yes
No
No
Frequency
No
Yes
No
No
Route
No
Yes
No
No
Period
Yes
No
Yes
Yes
Reason
(Indication)
NA
Yes
NA
No
Severity
Yes
No
Yes
Yes
*Field is present but not applicable
Notes:
 Indication is a S/S/LIF that is treated with a drug.
 Severity has field for assertion
Key:
 Green, can make associate in record window and annotation window
 Yellow, can only make association in annotation window
 Red, cannot make the association
23
APPENDIX 3: Protected Health Information (PHI)” and “Personally
Identifiable Information
The Privacy Rule’s Safe Harbor Method for De-identification (17).
“Under the safe harbor method, covered entities must remove all of a list of 18 enumerated
identifiers and have no actual knowledge that the information remaining could be used, alone or
in combination, to identify a subject of the information. The safe harbor is intended to provide
covered entities with a simple, definitive method that does not require much judgment by the
covered entity to determine if the information is adequately de-identified.”
1. Names; first name, last name, initials, names following titles and other indicators, login name,
screen name, nickname, or handle
2. All geographical subdivisions smaller than a State, including street address, city, county,
precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code,
if according to the current publicly available data from the Bureau of the Census: (1) The
geographic unit formed by combining all zip codes with the same three initial digits contains
more than 20,000 people; and (2) The initial three digits of a zip code for all such geographic
units containing 20,000 or fewer people is changed to 000.
3. All elements of dates (except year) for dates directly related to an individual, including birth
date, admission date, discharge date, date of death; and all ages over 89 and all elements of
dates (including year) indicative of such age, except that such ages and elements may be
aggregated into a single category of age 90 or older;
4. Phone numbers;
5. Fax numbers;
6. Electronic mail addresses;
7. Social Security numbers;
8. Medical record numbers;
9. Health plan beneficiary numbers;
10. Account numbers;
11. Certificate/license numbers;
12. Vehicle identifiers and serial numbers, including license plate numbers;
13. Device identifiers and serial numbers;
14. Web Universal Resource Locators (URLs);
15. Internet Protocol (IP) address numbers;
16. Biometric identifiers, including finger and voice prints;
17. Full face photographic images and any comparable images; and
18. Any other unique identifying number, characteristic, or code (note this does not mean the
unique code assigned by the investigator to code the data)
24
APPENDIX 4: Routes of Drug Administration and Abbreviations
FDA Standards Manual list of route of drug administration. For full list which includes FDA
codes and NCI concept codes, see: (19).
http://www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/Electr
onicSubmissions/DataStandardsManualmonographs/ucm071667.htm
NAME
DEFINITION
SHORT NAME
AURICULAR (OTIC)
Administration to or by way of the ear.
OTIC
BUCCAL
Administration directed toward the
cheek, generally from within the
mouth.
BUCCAL
CONJUNCTIVAL
Administration to the conjunctiva, the CONJUNC
delicate membrane that lines the
eyelids and covers the exposed surface
of the eyeball.
CUTANEOUS
Administration to the skin.
CUTAN
DENTAL
Administration to a tooth or teeth.
DENTAL
ELECTRO-OSMOSIS
Administration of through the diffusion EL-OSMOS
of substance through a membrane in an
electric field.
ENDOCERVICAL
Administration within the canal of the E-CERVIC
cervix uteri. Synonymous with the term
intracervical..
ENDOSINUSIAL
Administration within the nasal sinuses E-SINUS
of the head.
ENDOTRACHEAL
Administration directly into the
trachea.
E-TRACHE
ENTERAL
Administration directly into the
intestines.
ENTER
EPIDURAL
Administration upon or over the dura
mater.
EPIDUR
EXTRA-AMNIOTIC
Administration to the outside of the
membrane enveloping the fetus
X-AMNI
EXTRACORPOREAL
Administration outside of the body.
X-CORPOR
HEMODIALYSIS
Administration through hemodialysate
fluid.
HEMO
INFILTRATION
Administration that results in
INFIL
substances passing into tissue spaces or
into cells.
INTERSTITIAL
Administration to or in the interstices
of a tissue.
INTERSTIT
INTRA-ABDOMINAL
Administration within the abdomen.
I-ABDOM
INTRA-AMNIOTIC
Administration within the amnion.
I-AMNI
INTRA-ARTERIAL
Administration within an artery or
arteries.
I-ARTER
25
INTRA-ARTICULAR
Administration within a joint.
I-ARTIC
INTRABILIARY
Administration within the bile, bile
ducts or gallbladder.
I-BILI
INTRABRONCHIAL
Administration within a bronchus.
I-BRONCHI
INTRABURSAL
Administration within a bursa.
I-BURSAL
INTRACARDIAC
Administration with the heart.
I-CARDI
INTRACARTILAGINOUS
Administration within a cartilage;
endochondral.
I-CARTIL
INTRACAUDAL
Administration within the cauda
equina.
I-CAUDAL
INTRACAVERNOUS
Administration within a pathologic
cavity, such as occurs in the lung in
tuberculosis.
I-CAVERN
INTRACAVITARY
Administration within a non-pathologic I-CAVIT
cavity, such as that of the cervix,
uterus, or penis, or such as that which
is formed as the result of a wound.
INTRACEREBRAL
Administration within the cerebrum.
I-CERE
INTRACISTERNAL
Administration within the cisterna
magna cerebellomedularis.
I-CISTERN
INTRACORNEAL
Administration within the cornea (the I-CORNE
transparent structure forming the
anterior part of the fibrous tunic of the
eye).
INTRACORONAL, DENTAL
Administration of a drug within a
portion of a tooth which is covered by
enamel and which is separated from
the roots by a slightly constricted
region known as the neck.
I-CORONAL
INTRACORONARY
Administration within the coronary
arteries.
I-CORONARY
INTRACORPORUS CAVERNOSUM
Administration within the dilatable
I-CORPOR
spaces of the corporus cavernosa of the
penis.
INTRADERMAL
Administration within the dermis.
I-DERMAL
INTRADISCAL
Administration within a disc.
I-DISCAL
INTRADUCTAL
Administration within the duct of a
gland.
I-DUCTAL
INTRADUODENAL
Administration within the duodenum.
I-DUOD
INTRADURAL
Administration within or beneath the
dura.
I-DURAL
INTRAEPIDERMAL
Administration within the epidermis.
I-EPIDERM
INTRAESOPHAGEAL
Administration within the esophagus.
I-ESO
INTRAGASTRIC
Administration within the stomach.
I-GASTRIC
INTRAGINGIVAL
Administration within the gingivae.
I-GINGIV
26
INTRAILEAL
Administration within the distal portion I-ILE
of the small intestine, from the jejunum
to the cecum.
INTRALESIONAL
Administration within or introduced
directly into a localized lesion.
I-LESION
INTRALUMINAL
Administration within the lumen of a
tube.
I-LUMIN
INTRALYMPHATIC
Administration within the lymph.
I-LYMPHAT
INTRAMEDULLARY
Administration within the marrow
cavity of a bone.
I-MEDUL
INTRAMENINGEAL
Administration within the meninges
(the three membranes that envelope
the brain and spinal cord).
I-MENIN
INTRAMUSCULAR
Administration within a muscle.
IM
INTRAOCULAR
Administration within the eye.
I-OCUL
INTRAOVARIAN
Administration within the ovary.
I-OVAR
INTRAPERICARDIAL
Administration within the pericardium. I-PERICARD
INTRAPERITONEAL
Administration within the peritoneal
cavity.
I-PERITON
INTRAPLEURAL
Administration within the pleura.
I-PLEURAL
INTRAPROSTATIC
Administration within the prostate
gland.
I-PROSTAT
INTRAPULMONARY
Administration within the lungs or its
bronchi.
I-PULMON
INTRASINAL
Administration within the nasal or
periorbital sinuses.
I-SINAL
INTRASPINAL
Administration within the vertebral
column.
I-SPINAL
INTRASYNOVIAL
Administration within the synovial
cavity of a joint.
I-SYNOV
INTRATENDINOUS
Administration within a tendon.
I-TENDIN
INTRATESTICULAR
Administration within the testicle.
I-TESTIC
INTRATHECAL
Administration within the cerebrospinal IT
fluid at any level of the cerebrospinal
axis, including injection into the
cerebral ventricles.
INTRATHORACIC
Administration within the thorax
(internal to the ribs); synonymous with
the term endothoracic.
INTRATUBULAR
Administration within the tubules of an I-TUBUL
organ.
INTRATUMOR
Administration within a tumor.
INTRATYMPANIC
Administration within the aurus media. I-TYMPAN
INTRAUTERINE
Administration within the uterus.
I-THORAC
I-TUMOR
I-UTER
27
INTRAVASCULAR
Administration within a vessel or
vessels.
I-VASC
INTRAVENOUS
Administration within or into a vein or
veins.
IV
INTRAVENOUS BOLUS
Administration within or into a vein or
veins all at once.
IV BOLUS
INTRAVENOUS DRIP
Administration within or into a vein or
veins over a sustained period of time.
IV DRIP
INTRAVENTRICULAR
Administration within a ventricle.
I-VENTRIC
INTRAVESICAL
Administration within the bladder.
I-VESIC
INTRAVITREAL
Administration within the vitreous body I-VITRE
of the eye.
IONTOPHORESIS
Administration by means of an electric
current where ions of soluble salts
migrate into the tissues of the body.
ION
IRRIGATION
Administration to bathe or flush open
wounds or body cavities.
IRRIG
LARYNGEAL
Administration directly upon the larynx. LARYN
NASAL
Administration to the nose;
administered by way of the nose.
NASOGASTRIC
Administration through the nose and
NG
into the stomach, usually by means of a
tube.
NOT APPLICABLE
Routes of administration are not
applicable.
NA
OCCLUSIVE DRESSING TECHNIQUE
Administration by the topical route
which is then covered by a dressing
which occludes the area.
OCCLUS
OPHTHALMIC
Administration to the external eye.
OPHTHALM
ORAL
Administration to or by way of the
mouth.
ORAL
OROPHARYNGEAL
Administration directly to the mouth
and pharynx.
ORO
OTHER
Administration is different from others
on this list.
OTHER
PARENTERAL
Administration by injection, infusion, or PAREN
implantation.
PERCUTANEOUS
Administration through the skin.
PERCUT
PERIARTICULAR
Administration around a joint.
P-ARTIC
PERIDURAL
Administration to the outside of the
dura mater of the spinal cord..
P-DURAL
PERINEURAL
Administration surrounding a nerve or
nerves.
P-NEURAL
PERIODONTAL
Administration around a tooth.
P-ODONT
NASAL
28
RECTAL
Administration to the rectum.
RECTAL
RESPIRATORY (INHALATION)
Administration within the respiratory
tract by inhaling orally or nasally for
local or systemic effect.
RESPIR
RETROBULBAR
Administration behind the pons or
behind the eyeball.
RETRO
SOFT TISSUE
Administration into any soft tissue.
SOFT TIS
SUBARACHNOID
Administration beneath the arachnoid.
S-ARACH
SUBCONJUNCTIVAL
Administration beneath the
conjunctiva.
S-CONJUNC
SUBCUTANEOUS
Administration beneath the skin;
hypodermic. Synonymous with the
term SUBDERMAL.
SC
SUBLINGUAL
Administration beneath the tongue.
SL
SUBMUCOSAL
Administration beneath the mucous
membrane.
S-MUCOS
TOPICAL
Administration to a particular spot on
TOPIC
the outer surface of the body. The E2B
term TRANSMAMMARY is a subset of
the term TOPICAL.
TRANSDERMAL
Administration through the dermal
layer of the skin to the systemic
circulation by diffusion.
T-DERMAL
TRANSMUCOSAL
Administration across the mucosa.
T-MUCOS
TRANSPLACENTAL
Administration through or across the
placenta.
T-PLACENT
TRANSTRACHEAL
Administration through the wall of the
trachea.
T-TRACHE
TRANSTYMPANIC
Administration across or through the
tympanic cavity.
T-TYMPAN
UNASSIGNED
Route of administration has not yet
been assigned.
UNAS
UNKNOWN
Route of administration is unknown.
UNKNOWN
URETERAL
Administration into the ureter.
URETER
URETHRAL
Administration into the urethra.
URETH
VAGINAL
Administration into the vagina.
VAGIN
29
APPENDIX 5: Frequency of Drug Administration and Abbreviations
Common frequency of drug administration abbreviations
Abbreviation Definition
q.d.
once a day
b.i.d.
twice a day
t.i.d.
three times a day
q.i.d.
four times a day
q.h.s.
before bed
5X a day
five times a day
q.4h
every four hours
q.6h
every six hours
q.o.d.
every other hour
prn.
as needed
30
Appendix 6: Deviations from i2b2 Guidelines
Conditional:
”Conditional” is used when the mention of the medical problem asserts that the patient
experiences the problem only under certain conditions.
He got 1 day of voriconazole for possible presumed aspergillosis, but given that he was improving on the
other antibiotics and his CT was not consistent with aspergillosis and he was no longer on
immunosuppression, it seemed like a less likely diagnosis. His urine and blood cultures were all negative.
Given these findings that presumed diagnosis is community-acquired pneumonia, he will complete a 10day course of azithromycin and Omnicef. The patient has been instructed to return if his fevers or cough
worsen and he gets worsening shortness of breath as this may indicate that the patient has a recurrent
aspergillosis
We have not come across examples of conditional value in our corpus so far. We believe that
i2b2 examples of Conditional value fall into "Present" category. For example, “dyspnea on
exertion” is a medical term and should be annotated as “dyspnea on exertion” with Present
assertion value (not as just “dyspnea” with Conditional assertion value).
Prepositions:
We do not include prepositions when annotating duration spans, e.g. for three weeks or for an
unknown period of time we do not include for in duration spans. Here we differ from i2b2 where
the preposition for is included in duration spans.
31
APPENDIX 7: Annotation Tool Notes
NLP Objectives:
Everything we are doing now informs these objectives (current classes, MedDRA, Naranjo, etc.)
for the ADE Pharmacovigilance project. When measured against objectives, the addition of
anything must be essential to this list to maintain annotation focus. “modifiers” will help with
discourse relations.
Disease
Drugs
Adverse Drug Events
Discourse relations
·
Temporal relations
·
Causal relations
·
Contrastive relations
Severity
MedDRA
MedDRA is a five-level hierarchy of medical terms:
- System organ class (SOC): most general
- High level group term (HLGT)
- High level term (HLT)
- Preferred term (PT)
- Lowest level term (LLT): most specific
All the adverse event should only be assigned Preferred terms (PT). Lowest Level terms are
more specific but the term list contains synonyms so there is a lot of redundancy.
If the search did not bring any PT result but only LLT result (e.g. Itching) we double left click the
LLT term and choose on its corresponding PT (Pruritus)
Export the Annotation to XML3
To get annotations out of Knowtator is to use the XML export. Select the menu option Knowtator
-> Export annotations to XML and then follow the directions. This will generate one XML file per
text source in your collection. The XML format used directly parallels the data model that
Knowtator uses for storing annotations in Protégé. Looking at the XML files may actually be
helpful to understand how Knowtator represents annotations in
Protégé.http://knowtator.sourceforge.net/faq.shtml
3
This is old text and refers to outdated versions of the Knowtator Plugin, MEDdra, and i2b2, but we wanted to
retain this historical information.
32
Table: Annotation Tools and other Resources
Software
URL
Document
Protégé
http://protege.cim3.net
/download/oldreleases/3.3.1/basic/
Knowtator
http://knowtator.sourc
eforge.net/
http://knowtator.sourceforge.net/install.shtml
MedDRA
Browser
http://www.meddrams
so.com/subscriber_do
wnload_tools_browser
.asp
Need MedDRA131E, import the folder named MedAscii.
I2b2
Medication
Annotation
Guideline
http://lancet.googlecod
e.com/files/Preliminary
.Annotation.Guidelines
.7.9.pdf
Semi-Automated Annotation with the BioNLP named entity tagger−Lancet
[This section was written some time ago and it is not clear how much of this is still applicable]
To increase the annotation speed, we apply the BioNLP named entity tagger Lancet, which is
trained on the annotated data, to automatically identify the named entities. An annotator then
corrects the automatically labeled corpus. The annotated corpus will be fed into the learner and
used to train a new model. Such interactive steps are repeated until a satisfactory performance
is met. This section is to guide the oracle on how to correct the automatically labeled corpus.
After importing the NLP tools annotation, the annotator attribute of each annotation is assigned
with a NLP tool name, such as Lancet UWM.
First, a default annotator is assigned. You can configure that by click: Knowtator-> Configure>default Annotator. This configuration does not change the attributes of any existing annotation
and is set for a new annotation.
In the event of partially correct annotations, the annotator needs to delete the annotation first
and then re-annotate. Otherwise, the correction work will not be recorded by Knowtator.
In the event when an entity is annotated more than once, please keep the correct annotation
and delete the other ones. If both or all of them are correct, just delete ones until only one is left.
33
Please look through the whole article and insert the absent annotations.
There is a trade-off between precision and recall of the NLP tools. Here, we prefer a high
precision annotation.
Summary of Annotation Processes and Tooling Changes for the ADE Pharmacovigiliance
Project
1. Reviewed PHI and PII requirements. Significantly streamlined PHI annotation and made
all PHI markings the same color for a simplified view.
2. We now have access to MedDRA releases and the terms used in annotation are now
updated with each release. MedDRA annotation has been brought into the annotation
tool; a major time saver. Upon selecting an adverse event, a MedDRA pop-up window
shows the top 10 matches to the selected term or span. Selection of a MedDRA term
from the pop-up window populates the MedDRA fields with term and concept codes.
Manual annotation of MedDRA terms is still possible and an updated browser on the
virtual machine makes searching an easier process.
3. Inter-annotator agreement is a priority as the team expands. Established a regular
meeting to compare and discuss annotations. Incorporation of conclusions, rules and
examples in the guidelines is now a routine process.
4. Updated guidelines with more examples, filled in gaps, created a section with examples
to specifically aid inter-annotator agreement, there is a history section, and another for
tracking major changes to processes and tooling.
5. Created videos demonstrating use of the Annotation Tool.
6. Added a webpage for Annotation related resources.
7. Established a workflow process and file system on the virtual machine to enable
annotation, editing and other separated workflows.
8. Post annotation processing will include assigning CTCAE (20) categories to severity
annotations, and default values for Assertion, Period and Outcome will be assigned.
34