Download Report

Question answering
for general practitioners
An information presentation module for the IMIX demonstrator
Mieke van Langen
Question answering
for general practitioners
An information presentation module for the IMIX demonstrator
M.C.G. van Langen
Universiteit Twente, October 2005
Study:
Business Information Technology
Faculty:
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Disciplines:
Language, Knowledge and Interaction (TKI)
Information Systems and Change Management (IS&CM)
Supervisors:
dr. M. Theune (TKI)
dr.ir. H.J.A. op den Akker (TKI)
dr.ir. A.A.M. Spil (IS&CM)
Preface
This paper is my master’s thesis for the study Business Information Technology at the
University of Twente. My assignment was part of the IMIX project. This is an NWO research
program aiming at the development of knowledge and technology needed to find specific
answers to specific questions in Dutch documents. I was allowed to choose a research subject
relevant for the IMIX project myself.
During my study I have focused on both language technology and healthcare. I thought it would
be nice to integrate these subjects in my master’s thesis. The IMIX project provided a very good
context for this combination, because the results of this project are integrated in a question
answering system (which incorporates a lot of language technology) for the medical domain.
In concert with my supervisors I decided to investigate the suitability of such a question
answering system for professional use by general practitioners, and to develop part of the
language technology that would be needed to accommodate the requirements of this special
group. This assignment turned out to be a great combination of all subjects I was confronted
with during my study. I hope you enjoy reading it!
Ede, October 2005
Mieke van Langen
Dankwoord (in Dutch)
Mijn afstudeeropdracht is redelijk soepel verlopen. Hoewel het een groot project was en ik veel
alleen werkte omdat ik intern afstudeerde, heb ik mij aardig aan de planning kunnen houden en
ligt hier een verslag waar ik erg tevreden mee ben. Dit was echter niet mogelijk geweest zonder
hulp van anderen. Ik ben dan ook heel blij met de steun en medewerking die zoveel mensen mij
bij deze opdracht hebben verleend. Ten eerste zijn dat natuurlijk mijn afstudeerbegeleiders die
uitgebreid de tijd genomen hebben om mij te helpen een goed onderwerp te kiezen, en ook
daarna veel tijd en moeite besteed hebben aan het kritisch lezen en becommentariëren van
mijn verslagen.
Ten tweede ben ik veel dank verschuldigd aan Anita Verhoeven en de huisartsen die ik heb
mogen interviewen. Anita Verhoeven heeft mij gastvrij ontvangen in Groningen, en mij veel
duidelijk gemaakt over de mogelijkheden voor en de praktijk van het informatie zoeken van
huisartsen. De vijf huisartsen die ik (sommige zelfs twee keer) geïnterviewd heb voor mijn
onderzoek, ben ik erg dankbaar voor de tijd die ze voor mij hebben kunnen vrijmaken, zeker
gezien de tijdsdruk waaronder zij momenteel moeten werken.
Ten derde heb ik bij het evalueren van mijn ontwerp veel hulp gehad van Wauter Bosma en alle
proefpersonen die mijn vragenlijst ingevuld hebben. Wauter Bosma had toegang tot een
werkende versie van de IMIX demonstrator en was steeds bereid antwoorden op mijn
voorbeeldvragen te genereren.
En “last”, maar zeker niet “least” zijn daar mijn vriend en mijn ouders. Zij hebben mij niet alleen
tijdens mijn afstudeeropdracht, maar ook gedurende de rest van mijn studie steeds in alle
opzichten gesteund. Mijn studietijd was door mijn gezondheidsproblemen geen gemakkelijke
periode, maar dankzij hun steun heb ik mijn studie toch kunnen afronden en kan ik nu, gezond
en wel, als ingenieur een nieuwe start maken.
Allemaal heel erg bedankt!
Mieke
Executive summary
This research was part of the IMIX (Interactive Multimodal Information eXtraction) project. This
project is an NWO (Netherlands Organization for Scientific Research) research program aiming
at the development of knowledge and technology needed to find specific answers to specific
questions in Dutch documents. The results of the IMIX project are integrated in an interactive
multimodal question answering system for the medical domain. In their work, general
practitioners are confronted with large quantities of and needs for information. Therefore, it was
investigated whether such a medical question answering system would be suitable for
professional use by general practitioners, and how answers should be presented for this user
group.
Based on literature research and interviews with general practitioners, it was concluded that a
medical question answering system could primarily be used by general practitioners for
answering questions for patient education. Such a question answering system should be
accessible via the Internet, it should search for answers only in information sources that were
marked as reliable by medical professionals, its response time should be short enough to
enable use during medical consultations, and it should recognize ICPC coding and other
medical slang in the question. Further, to be able to use this system, general practitioners must
have access to a computer with Internet connection and a printer in their consulting rooms. In
this way, they can search for answers on their questions during the medical consultation and
possibly print the answer and give it to the patient. Besides, a web portal for general
practitioners has been designed that could help general practitioners in keeping an overview of
all different information types they can find on the Internet.
Answers retrieved by a question answering system for general practitioners should be
presented together with a link to their sources, and a checkbox to enable the general
practitioner to indicate whether he wants it to be printed or not. In this way different answers on
the same question retrieved from different sources can be integrated into one consistent view.
Besides, algorithms have been developed that integrate different answers retrieved from the
same source into one concise answer, possibly extended with sentences from their context.
Finally, it was found that general practitioners would also like to have information technology to
search for dermatological images, and for contact information of regional health professionals
and medical organizations. It is therefore recommended that the applicability of image retrieval
and information extraction technology for general practitioners is also investigated.
Managementsamenvatting (in Dutch)
Dit onderzoek maakt onderdeel uit van het IMIX (Interactive Multimodal Information eXtraction)
project. Dit is een onderzoeksprogramma van het NWO (Nederlandse Organisatie voor
Wetenschappelijk Onderzoek) dat zich ten doel stelt om kennis en technologie te ontwikkelen
die nodig zijn om specifieke antwoorden op specifieke vragen in Nederlandstalige documenten
te vinden. The resultaten van het IMIX project worden geïntegreerd in een interactief
multimodaal question answering systeem voor het medische domein. Omdat huisartsen
behoefte hebben aan en gebruik kunnen maken van een grote hoeveelheid en verscheidenheid
aan informatie, is onderzocht of zo’n question answering systeem geschikt zou zijn voor
professioneel gebruik door huisartsen, en hoe antwoorden door dit systeem gepresenteerd
zouden moeten worden voor deze gebruikersgroep.
Op basis van literatuuronderzoek en interviews met huisartsen, wordt geconcludeerd dat een
medisch question answering systeem voor huisartsen vooral geschikt zou zijn om vragen voor
patiëntenvoorlichting te beantwoorden. Zo’n question answering systeem zou beschikbaar
moeten zijn via het Internet, het zou alleen naar antwoorden moeten zoeken in
informatiebronnen waarvan medici hebben aangegeven dat ze betrouwbaar zijn, de reactietijd
van het systeem zou dusdanig kort moeten zijn dat vragen beantwoord kunnen worden tijdens
het consult, en het systeem moet medische terminologie en ICPC codes in de vraag kunnen
begrijpen. Daarnaast moeten huisartsen, om zo’n systeem te kunnen gebruiken, een computer
met Internetverbinding en een printer in hun spreekkamer hebben. Op die manier kunnen ze
hun vragen tijdens het consult beantwoorden en eventueel de antwoorden ook uitprinten om ze
mee te geven aan de patiënt. Verder is er een web portal gemaakt om huisartsen een overzicht
te bieden van alle verschillende soorten informatie die ze op het Internet kunnen vinden.
Antwoorden die door een question answering systeem voor huisartsen gevonden worden,
moeten gepresenteerd worden met links naar de bronnen waarin ze gevonden zijn en
aankruisvakjes zodat de huisarts per antwoord kan aangeven of hij het wil uitprinten. Op die
manier worden antwoorden op dezelfde vraag die in verschillende bronnen gevonden zijn,
geïntegreerd in één overzicht. Daarnaast zijn er algoritmes ontwikkeld waarmee verschillende
antwoorden die uit dezelfde bron komen geïntegreerd kunnen worden in één beknopt antwoord
en antwoorden eventueel uitgebreid kunnen worden met extra zinnen uit de omgeving van het
antwoord.
Tenslotte blijken huisartsen ook behoefte te hebben aan informatietechnologie waarmee ze
naar
dermatologische
plaatjes
en
naar
adresgegevens
van
plaatselijke
gezondheidszorgorganisaties kunnen zoeken. Daarom wordt aanbevolen om ook de
geschiktheid van image retrieval en information extraction technologie voor gebruik door
huisartsen te onderzoeken en systemen te ontwikkelen waarmee deze informatie kan worden
gevonden.
Contents
1
INTRODUCTION .............................................................................................................................15
1.1
1.2
1.3
1.4
2
LITERATURE ON INFORMATION USE BY GENERAL PRACTITIONERS.........................21
2.1
2.2
2.3
2.4
2.5
3
THE IMIX DEMONSTRATOR ........................................................................................................49
REQUIREMENTS ..........................................................................................................................53
DESIGN .......................................................................................................................................56
AN INFORMATION PORTAL FOR GENERAL PRACTITIONERS ........................................................60
RESPONSE FORMULATION .......................................................................................................63
5.1
5.2
5.3
5.4
5.5
6
METHOD .....................................................................................................................................41
RESULTS .....................................................................................................................................41
CONCLUSIONS ............................................................................................................................44
DISCUSSION................................................................................................................................45
THE INFORMATION PRESENTATION MODULE (GIPS) .......................................................49
4.1
4.2
4.3
4.4
5
THE GENERAL PRACTICE ............................................................................................................21
INFORMATION NEEDS..................................................................................................................22
INFORMATION SOURCES .............................................................................................................29
COMPUTER USE ..........................................................................................................................35
CONCLUSIONS ............................................................................................................................38
INTERVIEWS WITH GENERAL PRACTITIONERS..................................................................41
3.1
3.2
3.3
3.4
4
CONTEXT OF THE RESEARCH .....................................................................................................15
RESEARCH QUESTION ................................................................................................................17
RESEARCH METHOD ...................................................................................................................17
STRUCTURE OF THE PAPER........................................................................................................19
RELATED WORK ..........................................................................................................................63
GIPS ..........................................................................................................................................65
ANSWER INTEGRATION ...............................................................................................................66
ANSWER EXTENSION ..................................................................................................................73
IMPLEMENTATION .......................................................................................................................87
EVALUATION ..................................................................................................................................89
6.1
6.2
EVALUATION OF THE ENTIRE DESIGN .........................................................................................89
EVALUATION OF THE ANSWER EXTENSION ALGORITHM .............................................................91
7
CONCLUSIONS ..............................................................................................................................97
8
DISCUSSION ...................................................................................................................................99
REFERENCES.......................................................................................................................................101
APPENDIX A: QUESTIONS................................................................................................................107
APPENDIX B: INTERVIEW GENERAL PRACTITIONERS ...........................................................109
APPENDIX C: SCREENSHOTS.........................................................................................................113
APPENDIX D: QUESTIONS AND ANSWERS.................................................................................115
APPENDIX E: EVALUATION INTERVIEW GENERAL PRACTITIONERS ................................119
APPENDIX F: QUESTIONNAIRE RESPONSE FORMULATION.................................................121
Question answering for general practitioners
1 Introduction
During and after consultations, general practitioners use a lot of information. Next to the
information they receive from the patient, they also look up information on controversial or rare
topics, diagnosis, treatment and investigations, and information for patient education [MCW05].
This information is needed not only in consulting rooms, but also during patient visits.
Due to the rise of evidence based medicine the use of medical knowledge by general
practitioners has become even more important, but the amount of medical information also
increases rapidly [VER99]. To address the problem of quickly finding the relevant information
among these large document collections, intelligent information technology is needed. In this
master’s thesis it is investigated how question answering technology can help general
practitioners meeting their information needs.
1.1 Context of the research
This research is part of the IMIX project. This project concerns the development of question
answering technology for Dutch. In section 1.1.1 information is provided on question answering
and related work on question answering relevant for this research. Section 1.1.2 gives a general
description of the IMIX project.
1.1.1 Question answering
A question answering (QA) system “accepts questions in natural language form, searches for
answers over a collection of documents and extracts and formulates concise answers” [MS03].
A general QA system architecture consists of the following components (see Figure 1):
•
•
•
•
•
question analysis;
document retrieval;
answer extraction;
answer selection [JMR03]; and
response formulation [MS03].
(Graphical) user interface
Question
Documents
Answer
Indexer
Question
analysis
Document
retrieval
Answer
extraction
Answer
selection
Response
formulation
(NLP) Resources
Figure 1 General question answering system architecture
The question analysis component transforms a natural language question into a retrieval query
and classifies the question with respect to its expected answer type (e.g. the name of a person,
15
Mieke van Langen
a date, a location, etc.). The document retrieval component takes the retrieval query as input
and returns a set of documents relevant for the query. These documents are retrieved from a
document collection with the aid of an indexer. The answer extraction component extracts
possible answers from the retrieved documents. The answer selection component returns a
ranked list of the extracted answers. Finally, the response formulation component formulates a
natural language response to the natural language question. All five components possibly make
use of (natural language processing) resources. A (graphical) user interface may be used to
facilitate the interaction between the user and the QA system.
Compared to traditional information retrieval (IR) technology, QA systems only return concise
answers to the user, instead of entire documents in which the user has to find the answer
himself. Actually, IR systems are used as document retrieval component in QA systems. QA
thus extends IR.
Two attempts are being made at developing open domain QA systems for Dutch. ‘Open
domain’ refers to the document collection from which the answers are extracted. An open
domain QA system could search in any type of document collection for answers on all possible
domains. A ‘closed domain’ QA system is targeted at documents on a specific domain and only
answers questions on this domain, like for example medical questions. One of the open domain
Dutch QA projects, named “Question answering for Dutch using dependency relations”, is
executed at the Rijksuniversiteit Groningen [BOU03]. This project makes use of existing QA
technology combined with dependency analysis based on full syntactic parsing of both the
question and the potential answer fragments. For this purpose the Alpino Dependency Parser
for Dutch [BNM01] is used. The other Dutch project, executed at the University of Amsterdam,
has resulted in a multi-stream architecture for question answering [JMR03]. In this architecture
each stream represents a different approach for QA, such as table lookup, pattern matching, an
existing QA system for English combined with automatic translation, and web answering. Each
stream has its own strengths and thus suits some question types more than others. The
system’s final answer is taken from the combined pool of answers generated by the suitable QA
streams. Both projects don’t concentrate on response formulation. The result of these QA
systems is thus a ranked list of answers returned by the answer selection component.
1.1.2 The IMIX project
The IMIX (Interactive Multimodal Information eXtraction) project is an NWO (Netherlands
Organization for Scientific Research) research program aiming at the development of
knowledge and technology needed to find specific answers to specific questions in Dutch
documents [NWOa]. The results of this research program are integrated in the IMIX
demonstrator. This demonstrator is an interactive multimodal QA system for the medical
domain. It consists of two parts: a text based part covering the complete medical domain and a
multimodal part focusing on the RSI domain only [OS04]. The demonstrator is targeted towards
naïve users who have no knowledge of the domain and little technical knowledge [VP05].
Compared to the other Dutch QA systems described in the previous section, the IMIX
demonstrator has a restricted domain. However, in contrast with those systems, it incorporates
modules for response formulation and speech input and output.
IMOGEN (Interactive Multimodal Output Generation) is a part of the IMIX project [NWOb]. It
aims at the development of multimodal information presentation modules for the output of QA
systems. Next to response formulation, these modules include speech generation and the use
of graphics. The information presentation modules developed for IMOGEN constitute the
IMOGEN demonstrator, which is part of the IMIX demonstrator. The IMOGEN sub modules will
firstly be developed for the RSI domain only.
To investigate whether a QA system like the IMIX demonstrator is also suitable for professional
use, in this master’s thesis an IMOGEN sub module is developed that is targeted towards
general practitioners. The input for this module consists of ranked lists of answers (produced by
16
Question answering for general practitioners
the IMIX demonstrator’s question answering modules). The output is a presentation of the
answers that would be most convenient for Dutch general practitioners.
1.2 Research question
The research question to be answered in this master’s thesis is:
Which information needs of Dutch general practitioners can be satisfied by a question
answering system and how should the answers be presented?
To answer this research question, the following sub questions have to be answered:
1. What are the information needs of Dutch general practitioners?
2. How are these information needs satisfied now?
3. How do Dutch general practitioners use and appreciate computers in their work?
The integration of the IMOGEN sub module built for this research and currently used general
practitioner information systems falls outside the scope of this research. The sub module will not
be implemented in general practices, because the IMIX demonstrator only intends to show the
status, progress, and results of the research carried out in the IMIX program [NWOa]. The sub
module is thus only developed to show whether and under which conditions a system like the
IMIX demonstrator suits professional use.
1.3 Research method
To answer the research questions described above, information is needed on the information
needs and use of Dutch general practitioners and on the way they use and appreciate
computers in their work. A lot of research has been executed on the information seeking
behavior of professionals from different disciplines. Leckie et al. [LPS96] developed a model of
the information seeking of professionals derived from research on engineers, health care
professionals, and lawyers. This model was found to be appropriate to explain the information
seeking behavior of Dutch general practitioners [VER99]. The model is presented in Figure 2.
Work roles
Tasks
Characteristics of information needs
Sources of
information
Information is
sought
Feedback
Awareness of
information
Feedback
Outcomes
Figure 2 A model of the information seeking of professionals
17
Mieke van Langen
According to this model, work roles and the related tasks undertaken by professionals prompt
particular information needs, which in turn give rise to the information seeking process. The
information needs arising form a specific task are influenced by a number of variables, including
factors relating to the individual (age, profession, specialization, career stage, geographic
location, etc.), and general characteristics of the information needs. Examples of these
characteristics are: context (internally or externally prompted), frequency (recurring or new),
predictability (anticipated or unexpected), importance (urgency), and complexity.
The information seeking process elicited by an information need is influenced (as depicted in
the model) by the sources of information, the awareness of information, and the outcomes.
Besides, the outcomes may also influence the sources of information and the awareness of
information. Sources of information can be formal or informal, internal or external, oral or
written, and personal (own knowledge and experience). Awareness of information refers to
direct or indirect knowledge of the various information sources and the perceptions about the
information seeking process or about the retrieved information. Variables of this factor include
familiarity and prior success with a certain search strategy or information source,
trustworthiness, packaging (medium or format), timeliness, cost (financial, psychological,
physical), quality, and accessibility (physical proximity, language). Outcomes are the results of
the information seeking process. The optimal outcome is that the information need is met and
the professional accomplishes his task. However, the outcome may also be that the information
need is not satisfied and further information seeking is required. In this case feedback is
provided (possibly altering the factors influencing the information seeking process), and a
second round of information seeking is undertaken. It is also possible that an outcome from one
task associated with a specific role unexpectedly benefits the professional in another role.
A QA system would be one of the possible sources of information, thus influencing the
information seeking process. Its goal would be to make the information seeking process more
efficient and to improve the outcomes. Therefore, it should improve some of the variables
related to the awareness of information. The QA system is also an information seeker itself,
however. Its performance is thus in turn influenced by the information sources it uses itself. The
relation of the QA system to the information seeking model is depicted in Figure 3. In this figure
the QA system is positioned inside the sources of information ellipse, to emphasize that it would
be only one of the information sources a general practitioner could use. This source can be
accessed via a user interface, which is depicted separately. The information sources used by
the QA system are also positioned within the larger sources of information ellipse, because
these are sources that are probably already available for general practitioners. It is expected
that only a subset of the general practitioners’ information needs can be met by a QA system.
This is depicted by a box questions within the information needs box. The outcomes a QA
system returns are also a subset of all search results, depicted by the answers box within the
outcomes box.
To answer the research question, it must firstly be investigated for which work roles, tasks, and
information needs of Dutch general practitioners a QA system could support the information
seeking process, and which variables related to the awareness of information could be
improved by a QA system as opposed to other information sources. Besides, because a QA
system is an information seeker itself, it must be investigated which information sources it could
consult, and how it should search these sources for relevant information. Finally, because a QA
system runs on a computer, it must be investigated how Dutch general practitioners use and
appreciate computers in their work. To answer these questions, literature on general
practitioners and on medical informatics has been reviewed. Besides, interviews have been
conducted with a few Dutch general practitioners and an expert on information use by Dutch
general practitioners.
Secondly, based on the findings from literature and the interviews, requirements have been
specified for the IMIX demonstrator, and a design has been made for an IMOGEN sub module
for general practitioners consisting of a response formulation component and a graphical user
18
Question answering for general practitioners
interface. Prototypes have been constructed for both components. These prototypes have been
evaluated by naïve users and a subset of the general practitioners who participated in the
previous interviews.
Work roles
Tasks
Characteristics of information needs
Questions
Sources of information
Sources of
information
Question
answering system
User
interface
Information is
sought
Feedback
Awareness of
information
Feedback
Outcomes
Answers
Figure 3 A QA system integrated with the model of information seeking
Finally, based on the results of the evaluation of the information presentation module,
conclusions were drawn and recommendations were made concerning the conditions under
which a system like the IMIX demonstrator would suit professional use.
1.4 Structure of the paper
This paper is organized as follows. In chapter 2 existing literature on general practitioners’ work
roles and tasks, information needs, information sources, and computer use is described. The
interviews conducted with Dutch general practitioners are described in chapter 3. In chapter 4
the results of the literature research and the interviews are related to the design of the IMIX
demonstrator, and the design for the information presentation module for general practitioners is
presented. Chapter 5 describes the response formulation technology developed for the
information presentation module. In chapter 6 the results of this research are evaluated. And
finally, in chapters 7 and 8 the conclusions of this research are presented and discussed.
19
Mieke van Langen
20
Question answering for general practitioners
2 Literature on information use by general practitioners
Like in all areas, in medical sciences the volume of information grows exponentially [VER99].
Detmer and Shortliffe [DS97] stated already in 1997 that every year more than 360,000 articles
are published in medical journals, making knowledge diffusion to physicians rather slow. They
refer to a study, which found that two years after wide publication, only 50% of the general
practitioners knew that laser surgery could save the sight of some of their diabetic patients.
Westberg and Miller [WM99] state that “because of the ever-increasing size of biomedical
literature and the complexity of modern health care practices, physicians could spend ours to
weeks reading texts and seeking expert opinions for each patient they encounter.” It is thus
increasingly difficult, but also increasingly important for physicians to find the information they
need.
In this chapter the information use by general practitioners is investigated. In section 2.1 the
work roles and tasks of Dutch general practitioners are described. The information needs of
general practitioners are described in the next section. Section 2.3 deals with the information
sources used to pursue these needs. In section 2.4 the role and use of computers in the general
practice are described. Finally, conclusions are drawn with respect to the possibility of using a
question answering (QA) system for information seeking by general practitioners.
2.1 The general practice
This research especially deals with Dutch general practitioners. Dutch general practitioners
work in solo practices, in duo or group practices, or in primary health care centers. In contrast to
their American colleagues, they never work in hospitals [VER99]. The Dutch general practitioner
acts as a gatekeeper to secondary care. He is therefore expected to manage a wide range of
medical problems, giving rise to high information needs.
General practitioners not only see patients, but also have to learn, perform research, educate
and manage. Verhoeven [VER99] discerns five different roles of general practitioners, see
Table 1. Each role is associated with different tasks and thus with different information needs.
Work role
Tasks
Service provider
Patient care
Learner
Professional reading, attending conferences and meetings
Researcher
Writing publications, speaking at conferences
Educator
Planning, curriculum development
Administrator/manager
Managing own practice
Table 1 Work roles and tasks of general practitioners
The role of service provider is common to all professionals [LPS96]. Physicians spend most of
their time in this role and the tasks associated with patient care create their greatest need for
information. Professionals also have a role of learner. They have to keep up with the
advancements in their field, and upgrade their education and skills by taking courses [LPS96].
Tasks associated with this role include professional reading, and attending conferences and
meetings. The third role, researcher, is not performed by all general practitioners. Most Dutch
general practitioners primarily provide patient care. Only some of them combine this with
research. Tasks associated with the role of researcher are writing publications and speaking at
conferences. As an educator, general practitioners teach medical students and general practice
trainees. Tasks associated with this role include planning and curriculum development. Finally,
as an administrator and manager, general practitioners have to manage their own practice.
21
Mieke van Langen
In this research only the role of service provider is considered, because this is the role
in which medical questions may arise that should be answered quickly, thus making a
question answering system potentially useful. In the roles of learner and researcher
the general practitioner also wants medical questions to be answered, but in this case
generally complete articles are needed to (scientifically) answer these questions, not
just concise answers.
The work as service provider consists of medical consultations both in the general practitioner’s
consulting room and at patients’ homes. Medical consultations generally consist of the following
phases [DN96]:
data gathering and recording;
searching databases (medical records, suitable drugs, etc.);
choosing a course of action;
documentation;
providing explanations; and
arranging any future consultations.
The phase of data gathering and recording provides the input for the phase of searching
databases. The latter phase especially concerns searching information and might thus be
supported by a QA system. The outcome of this phase is used to support the phases of
choosing a course of action and providing explanations to the patient.
In principle, patient visits consist of the same phases as consultations in the consulting room.
However, during patient visits the general practitioner cannot make use of the same resources
he uses in his consulting room. Therefore, the phases of searching databases and
documentation might be a little harder, also complicating the phases of choosing a course of
action and providing explanations.
A mobile device containing a question answering system could really improve the
general practitioner’s possibilities of searching databases during patient visits and
therefore potentially also improve patient care.
2.2 Information needs
A lot of research has been done on the needs and use of medical knowledge by general
practitioners. Quantitative estimates of the information needs of physicians in their role of
service provider vary greatly, however. Different studies result in different estimates because
they differ on definition of terms, subjects, setting, and method of data collection. Gorman
[GOR95] tries to structure these studies by defining different types of information and different
types of information needs. The information types are described in section 2.2.1 and the
information needs in section 2.2.2. General practitioners encounter a lot of obstacles when they
try to address their information needs, however. The factors determining whether an information
need is pursued and satisfied or not are discussed in section 2.2.3.
The information needs described in the following sections are only those of the general
practitioner in his role of service provider. Therefore, only questions arising during medical
consultations are taken into account. Information needs that are met by regularly reading
medical journals or randomly “browsing” for information without a real question in mind are
considered to be the information needs of the general practitioner as a learner.
2.2.1 Types of information
Gorman [GOR95] identifies five types of information used by physicians, see Table 2. The first
type, patient data, refers to information about a specific person. It includes the patient’s medical
22
Question answering for general practitioners
history, observations from physical examination, and results of diagnostic testing. This
information is usually obtained from the patient himself, his family and friends, and the medical
record. These data fall outside the scope of a potential QA system, for it is not convenient to
consult the patient by a QA system and it is assumed that electronic medical records are well
enough organized to make a QA system superfluous. Patient data might be included, however,
in the questions a general practitioner would submit to a QA system.
Type of information
Description
Patient data
Refer to a single person
Medical knowledge
Generalizable to many persons
Population statistics
Aggregate patient data
Logistic information
How to get the job done
Social influences
How others get the job done
Table 2 Information types
Medical knowledge is general information that is applicable to the care of all patients. It includes
scientific medical knowledge, but also the accumulated informal experience of the general
practitioner. Medical knowledge could be sub classified according to classic textbook categories
(etiology, pathophysiology, clinical manifestations, diagnosis and differential diagnosis,
treatment, and prevention) or according to organ system domain categories (dermatology,
rheumatology, neurology, etc.). Next to the classic textbook categories Magrabi et al. [MCW05]
discern a separate category, namely patient education. Questions about patient education deal
with the need for information to better inform patients about their conditions or to increase their
compliance with the treatment. Whereas information on for example etiology, diagnosis, or
treatment is used to support the medical consultation phase of choosing a course of action,
information for patient education is used for the phase of providing explanations.
Population statistics refer to aggregated data about groups or populations of patients. This
includes formal population statistics, but physicians also use their personal knowledge of recent
illness patterns in the community as a form of informal epidemiological information.
Logistic information refers to local knowledge about how to get the job done, often specific to a
practice setting or payment mechanism. As examples Gorman [GOR95] mentions information
about required forms, coverance by insurers, and referral lists of medical care organizations
(which is typical for the American situation). He doesn’t mention, however, information about
which physician a patient should be referred to from a medical instead of an economical point of
view. I assume general practitioners sometimes need to find out which physician performs a
particular treatment he thinks is convenient for a particular patient. This information seems to be
on the boundary of medical knowledge and logistic information. Logistic information is usually
local and can best be obtained from human sources such as office and hospital staff or
colleagues. This type of information is therefore not suitable for a QA system.
Social influences refer to knowledge about the expectations and beliefs of others, especially
colleagues, but also patients, families, and others in the community. This type of information
can evidently not be provided by a QA system. It may be of influence, however, on the general
practitioner’s behavior concerning a QA system, but this falls outside the scope of this research.
23
Mieke van Langen
Medical knowledge and population statistics can surely be dealt with by a question
answering system. Actually, the IMIX demonstrator used for this research [NWOa] is a
question answering system dealing with questions about medical knowledge and
population statistics. Its document collection includes the Spectrum Medical
Encyclopedia (aiming at the general public) and the Merck medical data (aiming at
medical professionals as well as the general public). Besides, for the RSI domain,
additional data are obtained from (among others) the RSI patient association, TNO
Arbeid, Arbobondgenoten, Ergo-Direct, and Stichting RSI Nederland.
2.2.2 Types of information needs
Next to types of information, Gorman [GOR95] also identifies different types of information
needs, see Table 3. First of all he discerns unrecognized needs. These can be inferred from
measurement of physician knowledge or observation of clinical practices. Information systems
that depend on the physician to seek information can’t succeed until the physician recognizes
that a need exists. To address unrecognized needs, information systems should be designed to
do so, for example by issuing automatic reminders or by automatically informing physicians of
additional diagnostic possibilities not initially considered by the physician. When a QA system
would be used for this purpose, it should thus extract implicit questions from the data the
general practitioner enters into the general practitioner information system during the phase of
data gathering and recording. The extraction of implicit questions falls outside the scope of this
research however.
Type of information need
Description
Unrecognized need
The physician is not aware of the information need or
knowledge deficit
Recognized need
The physician is aware that information is needed
Pursued need
Information seeking occurs
Satisfied need
Information seeking succeeds
Table 3 Types of information needs
Secondly, Gorman identifies recognized needs. These are needs articulated by the physician. A
question being articulated by a physician doesn’t guarantee, however, that the answer is
actually necessary to benefit the patient or the practitioner, in other words, that it is really a
‘need’. Recognized needs for which some information seeking behavior is executed are called
pursued needs. If the pursuit of a particular need is successful, this need is also called a
satisfied need.
A question answering system can only answer pursued needs, because it cannot read
the physician’s mind for recognized needs. The general practitioner should enter the
question into the system himself, making it a pursued need. The aim of the question
answering system would be to answer the entered question, in other words to turn the
pursued need into a satisfied need. Besides, it should be designed to tempt the
general practitioner to enter all his medical questions, turning as much as possible
recognized needs into pursued needs.
Apart from the definitions for information and information needs, the definition of a question is
also tricky, because medical questions tend to be multi-factorial. They can contain questions
24
Question answering for general practitioners
within a question [GOR95]. Most of them are complex, and patient-, problem-, and practitionerspecific. Physicians therefore usually first need to tell the patient’s story, to explain the context
of the question. These stories often contain information from several of the five different
information types. See Appendix A for a sample of typical questions asked by general
practitioners.
Results from different studies on the information needs of physicians can apparently differ
greatly when different definitions for information and information needs are used. Estimates for
general practitioners range from 0.07 to 1.8 questions per patient encounter [GOR95]. In a
questionnaire executed among 226 Dutch general practitioners in 1996 [VNB99] general
practitioners indicated that questions for which they needed answers arose 6.9 times a week
(the amounts ranged from 0.04 to 50). These general practitioners were also asked to record
their most recent question, which is thus a recognized information need. Of these questions
50.5% dealt with therapeutic problems, and 24.8% with diagnostic problems. Circulatory,
musculoskeletal, and digestive were the top three system domain categories the questions dealt
with.
Most other studies were executed in English speaking countries. Following, a few of their
findings relevant for a Dutch QA system are described. Magrabi et al. [MCW05] analyzed the
queries Australian general practitioners submitted to an experimental online evidence system,
which were thus pursued information needs. This online evidence system was an information
retrieval system in which users not only had to enter keywords, but could also select a search
filter concerning the type of question (disease etiology, diagnosis, treatment, prescribing, or
patient education). In this study 43% of the questions dealt with therapeutic problems (35% with
treatment and 8% with prescribing), 40% with diagnostic problems, 10% with patient education,
and 7% with disease etiology. Gastrointestinal, dermatology, and musculoskeletal were the top
three system domain categories for which information was searched. A drawback of this study is
that it only deals with the questions pursued with this online evidence system. Questions that
were not pursued or that were answered with other means were not considered. However, the
questions pursued with an online evidence system probably resemble those that could be
answered by a QA system, because both systems can only make use of electronic resources.
Ely et al. [EOE99] collected 1101 questions (recognized information needs) from 103 American
general practitioners. They searched these data for generic questions. The most frequently
used question structures were “What is the cause of symptom X?”, “What is the dose of drug
X?”, “How should I manage disease or finding X?”, “How should I treat finding or disease X?”,
and “What is the cause of physical finding X?”. Besides, Ely et al. found that older patients and
female patients elicited more questions than younger and male patients respectively, and that
younger physicians asked more questions than their older colleagues.
Barrie and Ward [BW97] collected 85 medical questions (recognized information needs) from 27
Australian general practitioners. They found that physicians in solo or duo practices asked
significantly fewer questions per consultation than those in larger practices.
For a Dutch question answering system this means questions can be expected (in
descending order of frequency) on diagnostic problems, treatment, patient education,
prescribing, and disease etiology. Gastrointestinal, dermatology, musculoskeletal, and
circulatory will likely be the most frequent system domain categories. Knowledge of
frequently used generic questions can be used for the design of the question analysis
component of the system. Finally, it is expected that younger physicians and
physicians working in larger practices will be more likely to use a question answering
system, because they generally have more questions than older general practitioners
and general practitioners working in solo or duo practices.
25
Mieke van Langen
2.2.3 Pursuing information needs
In a questionnaire executed among 226 Dutch general practitioners in 1996 [VNB99] general
practitioners were found to immediately pursue an answer to their questions in 76% of the
cases and in 85% of these cases a (partial) answer was found. This means that 65% of all
recognized information needs were turned into satisfied needs, and 24% of the recognized
information needs were not even pursued. Gorman and Helfand [GH95] even found that 70% of
the questions arising in general practice are never pursued. This is quite a large difference.
According to Gorman [GOR95], this might be due to differences in the definition of terms,
subjects, setting, and method of data collection. Both studies examined general practitioners not
working in hospitals, and both studies concerned recognized information needs about medical
knowledge. However, Gorman and Helfand [GH95] observed American general practitioners
and recorded their questions during patient care, while Verhoeven et al. [VNB99] collected
Dutch general practitioners’ questions by sending them a questionnaire in which they were
asked to record their most recent question and whether or not they pursued this question. This
is a great difference in method of data collection. In Verhoeven’s research general practitioners
might have been tempted to record an information need that they pursued because that’s what
they remember or because of social influences, resulting in a much higher rate of pursued
questions.
Gorman and Helfand [GH95] found two factors that predicted the pursuit of information needs:
the physician's belief that a definitive answer existed, and the urgency of the patient's problem.
Ely et al. [EOE99] collected 1101 questions from 103 American general practitioners. They
found that only questions about drug dose were routinely pursued and that an answer was
found to 80% of the pursued questions. Both findings are consistent with those of Gorman and
Helfand.
In the literature lots of different barriers are identified that complicate the search for information
by general practitioners. Apart from lack of time and information overload, Ely et al. [EOE02]
have identified different obstacles for each of the following steps in asking and answering
questions:
recognizing an information gap;
question formulation;
searching for relevant information;
answer formulation; and
using the answer to direct patient care.
In the research of Ely et al. the general practitioners only performed the first two steps and the
last step themselves. Information searching, and answer formulation were done by experts who
tried to answer questions generated by general practitioners. These are exactly the steps that
would be executed by a QA system. The obstacles found for these steps are therefore highly
important for this research.
The obstacles identified by Ely et al. [EOE02] are summarized in Table 4. Obstacles related to
recognizing an information gap deal with the transformation of an unrecognized need into a
recognized need. Sometimes physicians are unaware of a gap in knowledge when they make a
decision. In this case they have an information need, but they don’t recognize it. They might
also suppress a recognized information need because of time pressure, embarrassment,
personal characteristics, or characteristics of the clinical setting.
Question formulation refers to modifying the question in order to be able to find relevant
literature. For example, patient specific questions should be generalized, patient data could be
added to focus the search, potential supplementary questions could be anticipated for, specific
words might be changed, etc. When a QA system is used for answering medical questions, a
dialogue might be needed to overcome the obstacles related to this step.
26
Question answering for general practitioners
Six different sorts of obstacles related to the searching for relevant information step are
identified. The first is failure to initiate the search. Reasons for not pursuing information needs
include doubt about the existence of relevant information, insufficient justification (when the
question is not important enough to justify a search), lack of time, and the availability of
consultation (sometimes general practitioners just refer patients to specialists rather than learn
enough about the problem to manage it themselves). Detmer and Shortliffe [DS97] also mention
the ignorance of the availability of relevant information as a reason for not pursuing information
needs.
Steps in asking and answering
questions
Obstacles
Recognizing an information gap
Lack of awareness of an information need
Suppression of a recognized information need
Question formulation
Inability to answer patient specific questions with
general resources
Missing patient data
Uncertainty about the scope
Difficulties modifying the question
Searching for relevant information
Failure to initiate the search
Uncertainty about the searching strategy
Inadequate (availability of) resources
Inadequate information
Inadequate evidence
Inadequate use of evidence
Answer formulation
Failure to directly or completely answer the
question
Too long or too short answer
Answer directed at the wrong audience
Difficulty addressing unrecognized information
needs
Discomfort with formulating an answer to be used
in patient care
Using the answer to direct patient care
Answer not trusted
Answer no longer needed
Answer inadequate
Table 4 Obstacles to answering medical questions
Secondly, general practitioners may be uncertain about the right searching strategy. They may
have difficulties with selecting the appropriate resources, be uncertain about how to know when
all the relevant evidence has been found so that the search can stop, don’t know the meaning of
null search results, etc. The meaning of null search results is also important when developing a
QA system, because when no articles are found on a certain treatment or when a relevant
article doesn’t mention the treatment, this doesn’t necessarily mean that there is no treatment,
but sometimes a null search result might be a clear answer.
Thirdly, resources might be inadequate. They might be badly accessible, badly indexed, poorly
organized, not clinically oriented, not trusted, not current, not allowing real time interaction with
27
Mieke van Langen
the searcher, or a certain topic might not be included in a resource that should logically include
it. Other obstacles are inappropriate descriptors of resources [VBM95], the cost of resources,
difficulties learning or using many resources, and variable quality of the information [WM99].
The fourth obstacle, inadequate information, deals with information that is incorrect, not current,
vague, unnecessarily cautious, biased, or fails to anticipate supplementary information needs,
differentiate between different diagnoses, define terms, or adequately describe clinical
procedures. Verhoeven et al. [VBM95] also mention the overload of irrelevant information as an
obstacle to finding the right answer.
The fifth and sixth group of obstacles related to the information searching concern the evidence.
When studies don’t address the medical question, don’t compare the relevant treatments, or
don’t study the outcome or population of interest, they may deal with the right subject, but still
not be relevant for the question. Besides, relevant evidence might be badly synthesized or
hardly applicable.
Obstacles related to answer formulation include failure to directly or completely answer the
question, too long or too short answers, answers directed at the wrong audience, and difficulty
addressing unrecognized information needs that are evident in the question. Besides, nonphysician searchers indicated they were not comfortable formulating an answer that would
direct patient care.
Finally, the step using the answer to direct patient care was sometimes not executed, because
the answers were not trusted, too late or inadequate.
Different suggestions to overcome all these obstacles are summarized in literature. Verhoeven
et al. [VBM95] suggest improved accessibility of information resources by computerization,
education in the use of information sources, and improved accessibility to library facilities.
Besides, she argues journal articles should be tailored more to the general practitioner’s daily
work. Magrabi et al. [MCW05] suggest that search systems for electronic resources should be
preprogrammed with specialist bibliographic knowledge to save the physician’s time. For
example, the online evidence system investigated by Magrabi et al. used search filters (such as
‘diagnosis’ or ‘treatment’) that added specialist keywords to the query entered by a general
practitioner that have been shown to significantly enhance the quality of search results, but are
unlikely to be known by the general practitioner.
Some of these solutions can’t go without each other. For example, computerization has the
potential to offer general practitioners access to loads of information, but studies indicate that
general practitioners have difficulty finding the most relevant resources and selecting the
appropriate search terms [WM99]. Therefore, also education in the use of resources and/or
search programs with specialist bibliographic knowledge are needed to make computerization a
real solution. QA could also make computerization a solution, because it eliminates the need for
education in the use of resources (for the only thing the physician has to do is entering a
question in natural language). Besides, a QA system could also incorporate bibliographic
knowledge for searching the right information.
Ely et al. [EOE02] think authors should anticipate the needs of busy physicians. For example,
when authors name a certain drug, they could include essential prescribing information,
because this may be an unrecognized or supplementary information need when a physician has
a question about this drug; resources could be written in a question and answer style; resources
should be kept current by the ongoing surveillance of physicians’ changing questions; and
research should be initiated and funded based on questions without adequate answers. These
are all issues that could not be solved by a QA system. Ely et al. also indicate however, that the
modification of questions from the way they were originally stated by the general practitioners
often proved very helpful for searching the right information. This is an issue that might be
addressed by a QA system.
28
Question answering for general practitioners
A question answering system could deal with the following obstacles:
obstacles related to question formulation could be overcome by modifying the
query, possibly using a dialogue;
because a question answering system is an information seeker itself, it could take
away the uncertainty about the search strategy to be followed by the general
practitioner;
the system could improve the accessibility of other electronic resources, because
it eliminates the need for general practitioners to directly interact with those
resources;
problems with poorly organized information could be accounted for, when the
system adequately synthesizes information from different sources.
However, most of the obstacles regarding information searching remain when a
question answering system is used, because they are inherent to the resources used.
The obstacles related to answer formulation could only be overcome by a question
answering system when questions are correctly interpreted and the right resources
are used. Besides, the system should correctly interpret null search results.
2.3 Information sources
When general practitioners pursue their information needs, they can use a lot of different
information sources. In this section only the sources of medical knowledge and population
statistics are considered, because these are the information types that are relevant for a QA
system. In the first subsection different types of information sources are discussed. In section
2.3.2 the sources of evidence based medicine used by general practitioners are described.
Finally, the general practitioner’s information seeking behavior with respect to the different
information resources (influenced by the awareness of information) is discussed in section
2.3.3.
2.3.1 Types of information sources
For medical knowledge and population statistics three different types of information sources can
be discerned [VNB99], see Table 5. Printed sources include general practitioners’ own books
and journals (their private medical libraries), but Dutch general practitioners may also address
the libraries of the local hospitals they refer their patients to. Besides, the Dutch Institute for
Research of Health Care (NIVEL) in Utrecht provides medical information on demand for
general practitioners from their own library, and the Royal Dutch Academy of Arts and Sciences
(KNAW) in Amsterdam (owning the largest medical journal library in the Netherlands) provides
journal articles to general practitioners [VER99].
Printed sources for patient education are provided by the Dutch College of General Practitioners
(NHG), the Scientific Institute of Dutch Pharmacists (WINAp), patient organizations,
associations of specialists, hospitals, drug manufacturers, etc. The Dutch College of General
Practitioners publishes patient brochures and patient letters [NHGp]. Patient brochures provide
general information on frequently occurring disorders and the measures the patient can take to
prevent or cure them. Patient letters have been written especially for patients suffering a
specific disease. These letters provide detailed information on the disease and its treatments.
Printed sources of formal population statistics include published descriptions of disease
prevalence in the medical journal literature [GOR95].
Human resources include colleagues, specialists, and office staff. Specialists contain a lot of
domain-specific knowledge, but their time is limited. Westberg and Miller [WM99] state that
modern academic health care centers may be able to satisfy many of the general practitioners’
information needs by providing Internet-mediated access to their electronic and human
29
Mieke van Langen
information sources. They propose a triage model for doing so. With this model information
requests are firstly mapped to electronic resources. Only when a request doesn’t seem to map
well, access is provided to human resources. Human information on population statistics can be
provided by public health departments [GOR95].
Information sources
Examples
Printed sources
Drug reference books, private books and journals, library
books and journals, journal articles received from others
Human resources
Other general practitioners, specialists, office staff
Electronic sources
Cd-rom, online databases, Internet
Table 5 Information sources for medical knowledge
Electronic sources include information on cd-rom or information that is accessible via a modem
or Internet connection. The biggest electronic source is of course the World Wide Web. The
World Wide Web potentially provides the general practitioner with all the latest information on
medical issues. Physicians are often frustrated, however, by the difficulty in finding reliable and
relevant information on the Web quickly [DS97]. They want resources that are practitioner
oriented, produced by reputable sources, and cover a specific topic in medicine. Because of the
Web’s rapid growth and lack of controls, its organization is poor, and validity and reliability of
sources found on the Web are questionable [WM99]. These shortcomings render a substantial
amount of Web information unsuitable for direct clinical application. A strategy for use of the
Web to support clinical practice could be locating and using anchors of known high quality
[WM99]. The major Internet search systems are not discriminatory in what they index and their
index methods are word based. Detmer and Shortliffe [DS97] argue that a medical retrieval
system should make use of content based instead of word based index methods. Whereas
word based index methods index documents by the words occurring in the documents, with
content based index methods documents are indexed by a mostly fixed set of general terms
(not necessarily occurring in the documents) describing what the documents are about.
Documents could for example be indexed by the controlled-vocabulary terms from the Medical
Subject Headings (MeSH). Besides, representation methods that add contextual information to
portions of documents may help improve retrieval relevance by focusing retrieval in only
relevant semantic regions.
One of the most mentioned sources of electronic medical information is Medline. This is a
bibliographic database provided by the National Library of Medicine, containing citations to the
last 40 years of medical literature [DS97]. One of the programs for searching Medline is
PubMed [EPM], which is publicly available through the Internet. Next to citations PubMed
provides a summary for most articles. Sometimes also free full-text articles are available in
PubMed Central or elsewhere on the Internet (in which case a link is provided). Otherwise, one
needs to go to a library or subscribe to the publisher to get the full-text article. Medline can be
helpful for answering medical questions. However, finding specific answers to questions can be
time-consuming and expensive because of the effort required to search through a sometimes
large set of relevant publications [WM99].
Most of the printed sources for patient education are also provided electronically. For example,
patient brochures and patient letters are published on the website of the Dutch College of
General Practitioners [NHGp]. Besides, the Scientific Institute of Dutch Pharmacists
electronically publishes information on the most frequently used medicines [APO].
Electronic sources of population statistics include electronically available published descriptions
of disease prevalence in the medical journal literature, but also aggregate patient data from
electronic medical records [GOR95]. Published statistics are not always applicable to a given
30
Question answering for general practitioners
local population because of differences in ethnic composition, local vectors of disease, or
lifestyle differences, but the increasing use of electronic medical records improves capabilities
for analyzing data of the local population. For example, the Dutch National Information Network
for Primary Care (LINH) is a network of 93 automated general practices with over 360,000
patients [LINH]. These general practitioners continually collect data on diseases, consultations,
drug prescriptions, and referrals. These data are used to generate representative, continuous,
quantitative, and qualitative information on the care provided by Dutch general practitioners, but
could also be used to generate population statistics.
Gorman et al. [GAW94] investigated to what extent the questions arising in general practice
could be answered using only online medical journal literature. They randomly collected a set of
60 questions from American general practitioners not working in hospitals. Medical librarians
tried to find answers to these questions in online resources. The general practitioners
themselves evaluated the information found. In 56% of the cases physicians judged the
information to be relevant for their question. In 46% of the cases the information provided a
clear answer to their question. These 46% might include cases where no relevant information
was found, because sometimes this is a clear answer to questions such as “Is there any
information on new therapies for disease X?”. In 40% of the cases physicians expected the
information would have had an impact on their patient, and in 51% of the cases they expected
the information would have had an impact on themselves or on their practice. These
percentages highly exceed the current use of electronic resources by general practitioners
[VNB99], suggesting that a QA system might be able to answer a lot of questions currently
being answered with human or printed resources.
A question answering system could only make use of electronic sources. Luckily,
printed sources and sources for patient education are increasingly available in
electronic form, making them also accessible by a question answering system. Human
resources can not be consulted by a question answering system, but the information
they present might also be available via printed and electronic resources.
2.3.2 Sources of evidence based medicine
Evidence based medicine is an approach to clinical practice in which physicians base their
decisions and actions on appropriate evidence from the patient’s history, examination,
laboratory data, and scientific medical knowledge [VER99]. Practicing evidence based medicine
is guided by the following principles:
formulating the question;
searching the literature for relevant information;
selecting the articles;
appraising the evidence for validity and usefulness; and
applying the evidence in everyday practice.
Literature research must be guided by scientific strategies and should satisfy the same criteria
as research in general. It should thus be valid, reproducible, and verifiable. Bias should be
limited. For example, studies yielding statistically significant differences between groups are far
more likely to be reported than those in which no differences were found. Therefore, to minimize
publication bias, both published and unpublished studies need to be included and criteria for
inclusion and exclusion must be accounted for. It is recognized however that critically evaluating
all literature is unrealistic for busy physicians. Instead, general practitioners rely on evidence
based resources, like guidelines, critically appraised articles, and systematic reviews.
Dutch guidelines for general practitioners are issued by the Dutch College of General
Practitioners (NHG). These guidelines include the NHG-Standaarden [NHGs], which prescribe
the actions a physician should undertake concerning diagnosis and treatment of certain
31
Mieke van Langen
diseases, and the NHG-formularium [NHGf], which gives pharmacotherapeutic advise. Both are
evidence based, but they only provide guidelines. They don’t give any information about the
effectiveness of different therapeutical options. English guidelines are provided for example by
the National Guideline Clearinghouse [NGC].
Critically appraised articles are published by several journals [VER99]. For example Evidencebased Medicine and ACP Journal Club together publish the cd-rom Best Evidence [BE96], and
the Journal of Family Practice also publishes critically appraised articles.
Systematic reviews can be subdivided into qualitative systematic reviews and quantitative
systematic reviews. For a qualitative systematic review the medical literature is searched for all
relevant information on a specific disease, in order to formulate the best approach to diagnosis
or treatment. For example, Clinical Evidence (published by the British Medical Journal) [BMJ] is
a printed and electronic source providing information on the evidence of the effectiveness of
different therapies. Each subject is started with relevant medical questions, and then the best
available evidence is summarized to answer these questions. Besides, a list is provided of the
interventions covered, categorized according to whether they have been found to be effective or
not. Clinical Evidence doesn’t make any recommendations. It is updated every six months in
print and monthly online.
Quantitative systematic reviews (or meta-analyses) try to answer medical questions, using
rigorous statistical analysis of pooled research studies. For example, the Cochrane Library
(published by the Cochrane Collaboration) [CC] is an electronic journal that provides
quantitative systematic reviews. Whereas qualitative reviews consider all reported treatments
for a specific disease, quantitative reviews concentrate on the evidence for only one treatment
and summarize the statistical data of all relevant studies to get more significant information
about the effectiveness of this treatment.
Next to guidelines, critically appraised articles, and systematic reviews, there are also question
answering services that provide physicians with evidence based answers (generated by for
example a clinical librarian) on their medical questions. Usually these services require that
general practitioners submit their questions in PICO (Patient or Problem, Intervention,
Comparison intervention, Outcomes) format in order to direct the search to relevant and precise
answers [CEBM]. In this format firstly the patient’s problem and characteristics are described,
then the intervention the physician is considering, then (if relevant) the alternative intervention to
which the physician wants the intervention to be compared, and finally the outcomes the
physician wants to reach with this intervention. For example, the NLH Question-Answering
Service [NHS] is a pilot project that tries to answer medical questions. The answers provided by
this service consist of the original question, the interpretation of this question, the text of the
answer, and the references used. Verhoeven and Schuling [VS03] developed a question
answering service for Dutch general practitioners to investigate whether general practitioners
use this service and what the costs are. General practitioners used this service minimally, but it
was found that they used the service more often when they personally knew the person who
answered their question. It turned out to be possible to answer general practitioner’s questions
within the required timeframe, and the costs were on average 200 Euro per question. Coumou
and Meijman [CM03] suggest that these costs should be covered by the patient’s health insurer,
because literature research is quicker, cheaper, and sometimes more useful than for example
blood tests or MRI scans.
Because general practitioners seem very reluctant to use question answering
services, and a single search executed by a medical librarian costs on average 200
Euro, a question answering system is probably not suitable to answer the type of
questions submitted to a question answering service. Besides, it is not realistic to
expect a question answering system to critically appraise all information sources. A
question answering system should therefore only consult evidence based resources
32
Question answering for general practitioners
for answering medical questions. An electronic source like the World Wide Web or
even Medline is thus not suitable as document collection for a medical question
answering system when a physician wants to practice evidence based medicine
(which he should). However, most evidence based information sources are in English
and not freely available, except for the guidelines issued by the Dutch College of
General Practitioners (NHG). When other resources are used, the information found
by the question answering system should be presented in such a way that the general
practitioner is able to evaluate the answers himself.
2.3.3 Using information sources
The information sources most frequently used by general practitioners are human based
(colleagues), followed by private books (tertiary literature), journals (primary literature) [VBM95],
and continuing medical education (such as classes and conferences) [WM99]. Libraries and
printed or online bibliographies are much less used. In 1999 medical computer applications,
telemedicine and the World Wide Web were the least used information sources [WM99]. In a
survey among Dutch general practitioners conducted in 1996 [VNB99] only 3% of the general
practitioners indicated they sometimes used the Internet for answering patient-specific
questions.
Variances in the use of information sources exist among general practitioners. Factors
influencing this behavior are presented in Table 6 [VBM95]. The first factor is physical,
functional and intellectual accessibility of the resource. Physical accessibility concerns how
close the resource is to the general practitioner. Distances are diminishing however as
information increasingly becomes available via general practitioners’ desktop computers.
Functional accessibility concerns the time and energy needed to search the information. As for
electronic sources, a QA system could seriously decrease the time and energy needed to find
relevant information. Intellectual accessibility concerns the understandability of the information.
This depends on the intelligence of the practitioner, but also on the organization of the
information.
Factor
Description
Physical, functional, and intellectual
accessibility of the resource
Availability, searchability, and understandability
Age
Participation in research or teaching
Social context
Rural vs. urban physicians
Practice characteristics
Solo practice vs. health center
Stage of information gathering
Analysis, decision, etc.
Table 6 Factors influencing information seeking behavior
The second factor is the age of the general practitioner. Younger physicians tend to use
libraries and printed sources more frequently than older physicians. Thirdly, physicians who
engage in research or teaching use journals, conferences, libraries and online databases more
often than others. The fourth factor is social context. Rural physicians tend to perform less
online searches than urban physicians do. The fifth factor is practice characteristics. Physicians
in solo practices use journals most, whereas physicians in health centers usually consult
colleagues. Finally, the sixth factor is the stage of information gathering. In calling attention to
33
Mieke van Langen
new information printed material is mostly used, during analysis personal contact is most
important, and in the decision stage refresher courses are the most important information
source.
Human based resources are preferred for several reasons [GOR95]. Firstly, because many
medical questions have a narrative character, they can be easier asked to a colleague than to
printed or electronic sources. Secondly, information seeking behavior by general practitioners is
not only determined by the need for medical knowledge, but also by the need for
commiseration, affirmation of professional relationships, feedback about their own knowledge
and practices relative to those of others, etc. These needs can’t be easily fulfilled by printed or
electronic sources. Thirdly, there may be a need for higher-order information than descriptive
medical knowledge, such as confirmation, explanation, analysis, synthesis, and judgment that
takes into account the complexity of the patient’s case and combines it with an expert
understanding of the issues involved. Finally, general practitioners need an answer to a patient
care problem, not just information relevant to a query. With this respect, human sources
understand best what the general practitioner needs.
Electronic information sources are not frequently used, because the variables of awareness of
electronic sources are generally negative. In the survey conducted in 1996 [VNB99] Dutch
general practitioners indicated that they wanted improvements in the World Wide Web on ease
of searching, financial costs, time to search, chance of success, and usefulness. No relation
was found between the use of electronic sources and the age or type of practice of the general
practitioners in this survey. Ideally, electronic information retrieval systems should automatically
display relevant summary information and provide links to supporting evidence and analysis. To
realize this vision a combination of content, information science methods and technology is
required [DS97]. Content already becomes increasingly available electronically. Information
science methods are needed to structure this content to achieve optimal retrieval, select
resources to best answer particular questions, integrate information from several sources into
one consistent view, and provide search interfaces that help users select the appropriate search
terms. Technology concerns high-speed data networks, standard protocols, open-systems
architectures, and cross-platform applications. In this area the Internet already provides most
functionality. Moreover, electronic content increasingly becomes available via the Internet and
integration of lots of electronic sources is thus possible. It is therefore most important to
concentrate on the information science methods.
A question answering system could implement most of the information science
methods mentioned above:
content can be structured by indexing the document collection, preferably content
based (as opposed to word based);
information from several sources is integrated into one consistent view when the
system composes a single answer of all retrieved information;
a question answering system naturally provides a search interface that helps
users select the appropriate search terms, because the user only has to enter a
question in natural language, which the system will then transform into the
appropriate query.
A challenging issue, however, is selecting the right resources. Physicians need
assistance with this, because they are not aware of all the resources available to
answer a particular question, nor do they have time to assess which resource is best
[DS97]. To automate the resource-selection process, the system must have
knowledge of what questions each resource can answer. It is therefore needed to
know the scope, depth, intended audience, currency of information, and reputability of
each source.
34
Question answering for general practitioners
As for sources of evidence based medicine, Barrie and Ward [BW97] state that general
practitioners may be reluctant to change towards these sources, because they highly value the
human judgment and accessibility of their current sources. If general practitioners would
experience an excess of unanswered questions or if they were dissatisfied with their current
sources, they might be motivated to change resources. However, they seem to find answers to
most of their medical questions and are satisfied with the information sources they currently
use.
Apart from the established preference of physicians for human information sources, the most
important determinants of whether a knowledge resource will be used are its availability and
clinical applicability [GAW94]. In a survey among Dutch general practitioners the most important
characteristics for choosing an information source were found to be its reliability, usefulness,
and ease of searching [VNB99]. Financial costs were considered the least important. The
general practitioners indicated they think the following improvements in obtaining information
are desirable: better-organized and more practical journal articles; more high quality, evidence
based reviews; computer-accessible abstracts of most important articles; computer-accessible
medical questions and answers; and more medical guidelines.
A medical question answering system could help in providing computer-accessible
abstracts of relevant articles, medical questions and answers and medical guidelines.
However, in order to be used by general practitioners, the information presented by a
question answering system should be reliable and easy to apply in clinical practice.
2.4 Computer use
Computer applications developed to support clinical decision making for general practitioners
can be divided in three categories [WM99], see Table 7. The first category consists of clinical
information systems. These systems manage electronic medical records. They support general
practitioners by reliably and efficiently storing and retrieving patient data. Therefore, structured
ways of data registration have been developed [DHP98]. An approach for the registration of
medical data is SOAP (Subjective findings, Objective findings, Assessment, and Plan). This is a
problem-oriented way of recording. Firstly the symptoms stated by the patient (subjective
findings) are recorded, then the signs found in medical examination (objective findings), thirdly
the diagnosis concluded by the general practitioner (assessment), and finally the prescribed
therapy (plan). Further, symptoms, diagnoses and therapies can be specified by specific codes.
A coding instrument that is accepted by the Dutch College of General Practitioners (NHG) and
the World Health Organization is the ICPC (International Classification of Primary Care). Criteria
for the ICPC codes are defined in the ICHPPC-II-Defined (International Classification of Health
Problems in Primary Care-II-Defined). There are also ICPC-like classifications for physical
examination results and test outcomes (OCO), and for drugs.
Type of general practitioner
application
Description
Clinical information systems
Electronic medical record systems
Clinical decision support systems
Provide information about diagnosis, therapy,
and prognosis, based on patient data
Bibliographic and full text information
retrieval systems
Search for information relevant for answering
medical questions
Table 7 Categories of applications developed for general practitioners
35
Mieke van Langen
Secondly, clinical decision support systems provide general practitioners with information
regarding diagnosis, therapy, and prognosis, based on patient data. A Dutch clinical decision
support system that has been introduced countrywide is the Electronic Prescription System
(EPS) [LSS01]. This system suggests drug prescriptions and other therapies for a patient,
based on patient data, the (ICPC coded) diagnosis, and prescription guidelines issued by the
Dutch College of General Practitioners [NHGf, NHGs]. The system can also print a drug
prescription and send it directly to the pharmacist. Besides, patient letters [NHGp] can be
retrieved with the system. To be able to use this system, the general practitioner should also
use a clinical information system, the SOAP approach, and ICPC coding.
Finally, bibliographic and full text information retrieval systems support general practitioners in
finding information to answer their questions. An example of a bibliographic information retrieval
system is PubMed [EPM] (see the part on electronic resources in section 2.3.1 for more
information on this system).
A question answering system would belong to the category of bibliographic and full
text information retrieval systems. Theoretically, a question answering system could
also be used as a decision support system, but a lot of reasoning is needed for that.
Therefore, the IMIX demonstrator used for this research excludes advisory questions.
The question types dealt with by the IMIX demonstrator include:
questions about facts (e.g. What is RSI?);
verification questions (e.g. Is RSI chronical?);
multiple choice questions (e.g. Is exercise good or bad for RSI?); and
quantity questions (e.g. How many people are suffering RSI?).
During the evaluation of the implementation of the Electronic Prescription System (EPS)
different surveys were executed on the computer use by Dutch general practitioners. In 1999
95% of the general practitioners reported they owned a computer, and in 2001 this percentage
was already 97% [WHB02]. In 2001 100% of the computer owning general practitioners also
owned a clinical information system, 94% of them actually used the electronic medical record,
71% also owned the EPS, and 87% of them actually used it. The EPS was even found to be
used more often than any printed resource. The electronic medical record was used during the
consultation in 86% of the cases, and mostly also the SOAP approach was used. ICPC coding
is less commonly used. Only 25% of the electronic medical record using general practitioners
indicated they nearly always used ICPC coding, but this amount had already significantly
increased since 1999.
The Internet use by Dutch general practitioners was investigated by order of the Dutch College
of General Practitioners in 2000 [COX00]. In this research 76% of the general practitioners said
they had access to the Internet, but only 37% could access the Internet from their general
practice. Almost half the general practitioners without an Internet connection indicated they
were considering implementing one before the end of the year. The general practitioners
primarily used their Internet connection for e-mail, mostly with colleagues. However, 43% of the
general practitioners also indicated they wanted to use websites with medical knowledge (those
days the guidelines issued by the Dutch College of General Practitioners were not yet available
via the Internet). When medical knowledge is available electronically, the general practitioners
would rather access it through Internet than on cd-rom. 34% of the general practitioners
indicated they would like to receive some education on searching medical knowledge on the
World Wide Web.
Several general practitioner information systems and their implementations and evaluations
have been described in literature. They range from simple administrative systems, to advanced
decision support systems. None of them really uses QA technology, though some information
36
Question answering for general practitioners
retrieval systems get close. In the remainder of this section findings from literature are described
that could be useful for the design of the information presentation module for a QA system.
During the evaluation of several versions of a Dutch general practitioner information system by
Dupuits et al. [DHP98], general practitioners indicated they didn’t appreciate the use of mouse
input. They found the act of moving a mouse for data entry during a consultation too disturbing.
Besides, the mouse with the underlay took too much space on their desks. Interfaces based on
the use of windows and function keys were well appreciated. Speech input was also not feasible
for practical reasons. Due to the diversity of voices, only the voices of a limited number of
persons were recognized. Besides, disturbing sounds from the immediate surroundings
occasionally caused the voice controller to randomly generate words.
Findings from a user centered development process of a general practice medical workstation
[RHF92] included that there is no best way to perform the work process, because medical
practice is extremely complicated and users are highly variable. General practitioner information
systems must therefore be equally flexible. Further, it was found that simple clear presentations
are often more effective than sophisticated attempts to provide intelligent summaries, because
physicians are good at recognizing patterns if the information is clearly presented. This research
also indicated that some physicians find the use of a mouse difficult. It is to be questioned,
however, whether this still is the case, because over ten years have proceeded from then.
Further, several overall constraints were abstracted from the results of this research. Firstly, a
general practitioner information system must always be interruptible. If the general practitioner
turns away from the system, it must be in precisely the same state when he returns to it.
Secondly, all functional options must be immediately visible at the user interface. The use of
multi-state buttons should thus be minimized. Thirdly, there should be no modes. All functions
should be possible at every phase of the consultation, since the course of the consultation is
highly variable and unpredictable.
Different evaluations of the EPS have been executed. The evaluation of the implementation of
the system by the Dutch Institute for Research of Health Care (NIVEL) [WHB02] concentrated
on the frequency of use and practical barriers for using the system (such as problems
concerning the ICPC coding system, or lack of training). Lagendijk et al. [LSS01] primarily
suggest different implementation methods to increase the general practitioners’ motivation to
work with the system. They state that the most important issues for general practitioners are
better communication (with hospitals, pharmacies and colleagues) and reduction of time
pressure. An application for general practitioners should therefore contribute to one or both of
these issues to increase general practitioners’ motivation. Boonstra [BOO03] investigated the
subjective reasons for the limited acceptance of the EPS. He concluded that to improve general
practitioners’ acceptance of the system it should, among others, be designed to fit the
consultation process (it should not disturb, but enhance the communication with the patient); it
should suggest alternative therapies rather than only one therapy (this would recognize and
strengthen the self esteem of general practitioners as medical professionals); it should be
designed so that users could add new therapies or local agreements on therapies; and patients
should be informed about the features and advantages of the system (to avoid patients having a
lower esteem of physicians using the system).
For a question answering presentation module these findings implicate that the
module should be very flexible. It should preferably accommodate multiple input
modes, but at least input should not be restricted to mouse or speech input only.
Further, the module should provide simple clear presentations as much as possible,
and all functional options must be immediately visible at the user interface. Because
the module must always be interruptible, it should not incorporate any timeouts.
Finally, to achieve motivation to work with the system it should at least save the
general practitioner time (for it cannot improve communication), and the system
should not give the impression that the presented answer is the only best way,
37
Mieke van Langen
because general practitioners want to have the feeling that as medical professionals
they are in control of the decision process, not the system.
2.5 Conclusions
Based on the literature reviewed in this chapter, the following conclusions can be drawn with
respect to the work roles, tasks, and information needs of Dutch general practitioners suitable to
be supported by a QA system, the variables related to the awareness of information that could
be improved by a QA system, the information sources that could be consulted by a QA system,
the way the system should search these sources for relevant information, and the user interface
of the system.
Work roles and tasks
A QA system could support the general practitioner primarily in his work role of service provider,
during the phase of searching databases, in order to help the general practitioner choosing a
course of action and providing explanations to the patient. Especially when the system is
available on a mobile device, it could really improve the physician’s tasks associated with the
role of service provider, namely patient care.
Information needs
The types of information that can be provided by a QA system are medical knowledge and
population statistics. General practitioners’ questions regarding these information types
generally deal with diagnosis, treatment, patient education, prescribing, and disease etiology.
The most frequently needed medical knowledge categories are gastrointestinal, dermatological,
musculoskeletal, and circulatory knowledge. The document collection of the system could
therefore consist of scientific medical literature on these subjects, sources for patient education,
and formal population statistics. Besides, a Dutch QA system might provide some information
specific for the Dutch situation that is on the boundary of medical knowledge and logistic
information, like which hospital performs which treatments.
It is expected that younger physicians and physicians working in larger practices are more likely
to use a QA system, because they generally have more questions than their older colleagues
and general practitioners working in solo or duo practices.
Awareness of information
The aim of the system should be to transform most of the recognized information needs
concerning the information types provided by the system into satisfied needs. A lot of
recognized needs are never pursued, however. The system should be designed to encourage
general practitioners to pursue their information needs. Therefore, physicians should be
convinced that a definitive answer exists and that this can be found with the QA system.
Besides, the system should save the general practitioner time.
Further, there are some variables related to the awareness of information that could be
improved by a QA system as opposed to other information sources. A QA system could take
away the uncertainty about the search strategy to be followed by the general practitioner, it
could improve the functional accessibility of other electronic resources, and it could improve the
intellectual accessibility of poorly organized electronic resources, when it adequately
synthesizes information, possibly from different sources.
Sources of information
General practitioners seem to use very little electronic information sources, while a QA system
can only access electronic resources. This shouldn’t be a problem, however, because research
indicated that about half of the recognized information needs could be satisfactorily answered
with online resources (also meaning that the other half of the information needs might not be
met by a QA system). Only resources that are current, reliable, and suitable for direct clinical
application should be used. Further, to allow for evidence based medicine, the system should
38
Question answering for general practitioners
primarily use evidence based sources such as those provided by the Dutch College of General
Practitioners (NHG)
Information seeking
The QA system should of course satisfy the pursued needs. Therefore, it is important that the
system correctly interprets and modifies the question in order to find the right information and
formulate a good answer, because general practitioners’ questions generally are very complex.
A dialogue might be needed to accomplish this.
To achieve optimal retrieval and select the right information sources to best answer particular
questions, the system should index the available information (preferably using a content based
indexing method). Further, the system needs to know the scope and depth of each source, it
must incorporate bibliographic knowledge, and it should correctly interpret null search results.
Concerning the answer formulation, the system should not give the impression that the
presented information is the single best way for clinical practice. To allow for evidence based
medicine, the system should enable critical evaluation of the answers by providing links to
supporting evidence and analysis.
User interface
The information presentation module of a QA system should be very flexible. Input should not
be restricted to mouse or speech input only, information should be presented as simply and
clearly as possible, and all functional options must be immediately visible at the user interface.
The module must not incorporate any timeouts.
39
Mieke van Langen
40
Question answering for general practitioners
3 Interviews with general practitioners
Most literature cited in the previous chapter is a few years old and based on non-Dutch
research. Coumou and Meijman [CM03] summarized the available literature published between
1992 and 2002 on the information seeking behavior of general practitioners. The only Dutch
research on the information needs and use of general practitioners was executed by Verhoeven
et al. [VBM95, VER99, VNB99]. Besides, the only articles dealing with answering general
practitioners’ questions concerned bibliographic information or document retrieval systems, or
human question answering services. Computerized question answering (QA) systems have not
yet been used in the medical field. To get more in-depth information about Dutch general
practitioners and their expectations of a QA system, interviews were conducted with a few
Dutch general practitioners and Anita Verhoeven. In section 3.1 the research method is
described, section 3.2 presents the results of the interviews, and in section 3.3 the conclusions
with respect to QA for general practitioners are discussed. The results of the interviews are not
always consistent with the findings from literature described in the previous chapter. In section
3.4 the differences and agreements between the literature and the interviews are analyzed.
3.1 Method
Because great variations exist among general practitioners and their use of information
systems, this research does not aim to develop a system that would be equally appreciated by
all Dutch general practitioners. Requirements for the system will therefore not be based on
quantitative research but on qualitative interviews with Anita Verhoeven and only a few Dutch
general practitioners. Verhoeven also was a general practitioner, but because of a shortage of
available general practices, she turned to another job. Nowadays, she is medical librarian and
information specialist at the University of Groningen. In the 1990s she executed a PhD research
on the information needs of general practitioners [VER99] (which is frequently cited in the
previous chapter). The information she gave in the interview is partly used to complete the
literature survey in the previous chapter, and partly to comment on the results of the interviews
with general practitioners described in the next subsection.
For the interviews with general practitioners thirteen general practitioners, working in ten
different general practices (mostly in Ede, the Netherlands), were sent an introductory letter.
The next week they were asked by telephone if they were willing to contribute to this research.
Five of them, each working in another general practice, agreed to participate, four men and one
woman. Two of them worked in a primary health care center; the others worked in duo
practices. The interviews were semi-structured and covered the topics of information needs,
information sources and computer use. A general outline of the interview is shown in Appendix
B. First of all the general practitioners were explained that this research is about the information
needs of general practitioners and concentrates on questions about medical knowledge. Then
they were asked how often they are confronted with such questions and whether and when they
search for an answer to these questions. Consequently they were asked which resources they
generally use to pursue their information needs and whether they think their possibilities for
finding answers on their questions are sufficient. The third part of the interview dealt with
computer use. Firstly the general practitioners were asked whether they owned a computer at
work and which applications they use. Then they were shown a few paper examples of user
interfaces for information retrieval systems and a QA system (see Appendix B) to give them an
idea what these systems are about. They were asked whether they used any information
retrieval systems for medical purposes and whether they think a QA system might be useful for
their work. Finally, a few questions were asked about their preferences concerning user
interfaces.
3.2 Results
In the following subsections the results of the interviews are discussed with respect to the
general practitioners’ information needs, information sources, and computer use.
41
Mieke van Langen
3.2.1 Information needs
Most general practitioners indicated they rarely meet any medical questions for which they have
to look up an answer. Especially the three older physicians said they have a good memory and
large experience and only sometimes have to look up information on uncommon cases.
Besides, the general practitioners are all supported by a general practitioner information system
(Medicom) that incorporates an electronic prescription system, which frequently eliminates the
need to look up any information on medication (though one of the general practitioners indicated
he rarely agreed with the information provided by the electronic prescription system).
All general practitioners said they normally always pursue their information needs. If possible,
they immediately search for an answer during the consultation. Otherwise, they do it afterwards
or in the evening hours. During patient visits they rarely search for answers. Some general
practitioners even avoid visiting patients as much as possible, because they lack access to
most of their information sources when they’re not in their consulting room. Besides, their
computers are connected to those of the local pharmacy (which is a feature specific for the
general practitioner information systems provided by Medicom). Therefore, prescriptions are
only executed in the consulting room from where they are directly sent to the pharmacy. When
general practitioners are confronted with a medical question during a patient visit that needs an
immediate answer, they call a specialist. Less urgent questions are answered when the general
practitioner is back in his consulting room. Most general practitioners indicated that the
questions they are confronted with during patient visits don’t differ from those met in the
consulting room. Only one of them thought they did differ, because the patients she visits differ
from those who come to her consulting room. The visited patients are generally elderly (having
less questions than other patients) or terminal patients (eliciting questions about palliative care).
She therefore always carries a manual on palliative care with her when she is visiting patients.
3.2.2 Information sources
Printed information sources are used by all general practitioners. The most frequently
mentioned resources are the Pharmacotherapeutic Directory (Farmacotherapeutisch Kompas),
the Diagnostic Directory (Diagnostisch Kompas), the guidelines issued by the Dutch College of
General Practitioners (NHG-Standaarden), and books on internal medicine (Interne
geneeskunde), dermatology (especially for the images), and microbiology. These books are all
in Dutch and organized specifically for direct clinical application by general practitioners. The
general practitioners indicated they rarely use books about the basics of medicine, like
physiology, because they are not suited for direct clinical application. Dutch journals, like
Nederlands Tijdschrift voor Geneeskunde (Dutch Medical Journal) and Huisarts en Wetenschap
(General Practitioner and Science), were primarily used by the general practitioners in their
roles of learner (for ongoing medical education), not for searching answers on medical
questions in their roles of service provider. One of the general practitioners even indicated he
thinks journal articles merely present opinions, rather than objective knowledge. He thinks even
systematic reviews are essentially subjective, because scientific research is always executed
with a specific goal in mind.
Human resources are also used by all general practitioners. They consult both their colleagues
in the general practice and specialists. One of the general practitioners indicated he usually
sends an e-mail to a specialist when he has any questions. The others rather consult specialists
by phone. However, one of them told that when he has diagnostic problems concerning a
dermatological disorder, he takes a picture of it with his digital camera and sends the image to a
dermatologist by e-mail to ask for advise, which he called ‘tele-dermatology’.
Most general practitioners don’t use much electronic resources. Verhoeven indicated in the
interview that the primary reason for this is that general practitioners lack time and capabilities
to search for information in electronic resources, and also lack time to improve their searching
capabilities. The electronic prescription system is the only electronic resource used by all
general practitioners. The system was used for retrieving prescription information and patient
42
Question answering for general practitioners
letters. Some general practitioners also electronically accessed the guidelines issued by the
Dutch College of General Practitioners (NHG). The Internet was used to search for answers on
medical questions by only two of the general practitioners. One of them primarily used PubMed,
the other Artsennet. PubMed provides access to a bibliographic database (Medline). This
general practitioner thus frequently ends up with only a summary of a relevant article. When he
wants to have the full text, he needs to go to the library, but he hasn’t got time for that.
Artsennet is a website published by the Royal Dutch Medical Association (KNMG) presenting
Dutch medical information for and written by Dutch physicians [KNMG]. Another general
practitioner indicated he did sometimes search the Internet for medical information, but primarily
in his role of learner, not as service provider, because he thought the information found on the
Internet is very general and not suitable for direct clinical application. When answers to medical
questions are searched on the Internet, this is usually done after the consultation, because it
takes too much time to do this during the consultation.
Verhoeven indicated in the interview that Dutch general practitioners rarely use English
information sources. Their primary evidence based resource is the guidelines issued by the
Dutch College of General Practitioners (NHG-Standaarden). These are publicly available to all
Dutch general practitioners, whereas English evidence based resources generally are not.
When general practitioners are searching for medical knowledge that is not available in the
guidelines, Verhoeven would recommend firstly consulting Clinical Evidence [BMJ], then the
Cochrane Library [CC], and finally scientific literature via PubMed [EPM]. However, none of the
interviewed general practitioners used Clinical Evidence or the Cochrane Library. The only
English resources used are journal articles.
Most of the general practitioners indicated they think their possibilities of finding answers on
their medical questions are sufficient. Especially the older general practitioners generally have
few questions. However, one of the younger physicians indicated she sometimes finds it difficult
to decide where to search, but when she doesn’t know where to find the information, she asks
one of her colleagues.
3.2.3 Computer use
All general practitioners participating in this research had a computer with cd-rom player in their
consulting room. They used a general practitioner information system that incorporated an
electronic prescription system. Most of the general practitioners owned some disk with medical
knowledge, but they haven’t got time to use it or to learn working with it. Especially the older
physicians indicated they are used to their working procedures without electronic resources.
They see that their younger colleagues or the general practice trainees they teach in their roles
of educator more often consult electronic resources, and they think it could be useful, but they
don’t think they really miss anything in their possibilities of finding answers to their medical
questions. Other applications used by the general practitioners are Microsoft Word (mostly for
writing letters) and an e-mail program.
The general practitioners all had access to the Internet, four of them also from their consulting
rooms, and the other one was planning it, but they vary greatly in the ways they use it. One of
the general practitioners indicated that “whenever he needs it, it is down”, another primarily
used it as a communication channel (for e-mailing medical specialists) instead of an information
resource, and the others really used it to look up medical knowledge. Apart from the guidelines
issued by the Dutch College of General Practitioners, they used Google, PubMed and
Artsennet. Google was thought to be easy for searching, because with all possible keywords
lots of results are returned, but difficulties were experienced judging the reliability and
applicability of these results. With PubMed and Artsennet it is harder to select the right
keywords, but the results are always scientific and aimed at medical professionals. However,
most articles cited by PubMed are not freely available.
Most general practitioners found it hard to imagine themselves using a QA system. They think
the printed sources they use now are already easily accessible and suitable for direct clinical
43
Mieke van Langen
application. Besides, they prefer to look up information in these books, because of the context
they provide. The guidelines issued by the Dutch College of General Practitioners can already
be easily found electronically, eliminating the need for a QA system. Two of the general
practitioners indicated they would rather use a QA system for patient education, because they
have their own ways for searching the information they need for themselves, while they think
the patient letters provided by the electronic prescription system are not always sufficient. One
of them suggested he also might want to use a QA system for retrieving dermatological images.
He had a book for this, but he thought a QA system might provide access to more images.
Another general practitioner would like to use a QA system to retrieve summaries of Dutch
recent scientific publications from for example Huisarts en Wetenschap (General Practitioner &
Science). And yet another general practitioner would like to use it for retrieving ‘data’ like e-mail
addresses, telephone numbers, patient organizations, waiting lists, and logistic information.
Verhoeven thought a QA system could use the database of an evidence based question
answering service as a resource. However, this would restrict the system to answering
questions that already have been answered by humans. All general practitioners would prefer
concise answers or relevant paragraphs to complete documents. However, they all want some
link to the original document, to enable them to read more about the subject if they want to.
Besides, they all emphasized the information sources used by a QA system should be up-todate and reliable. They could thus best be retrieved from the Internet, but only from some
selected sites. Finally, one of the general practitioners indicated he would like to use medical
language, or even ICPC-codes, in the questions he would submit to a QA system.
The general practitioners participating in the interviews tend to prefer mouse input above using
function keys, because function keys are harder to learn, and mouse input is quicker than
switching between different menus with function keys. However, the general practitioner
information system they use now is still not working with mouse input, but the next version will.
Only one of the general practitioners indicated she would rather use keyboard input to avoid
RSI problems. The general practitioners reacted rather differing on the suggestion of speech
input. Some thought it would be excellent, because it would be much quicker than typing with
two fingers, others thought it would only be useful for dictating letters, and one of them indicated
using speech input during consultations would be rather disturbing in the contact with the
patient. In any case they only want to use speech input when it is working perfectly.
3.3 Conclusions
Based on the interviews, the following conclusions can be drawn with respect to the work roles,
tasks, information needs, awareness of information, and sources of information of Dutch
general practitioners relevant for QA, and the user interface of a QA system.
Work roles and tasks
The general practitioners participating in the interviews indicated they lacked their information
sources during patient visits. A QA system that is accessible during patient visits could thus be
very useful. However, the general practitioners haven’t got a personal digital assistant (PDA) or
laptop computer yet.
Information needs
Because general practitioners vary greatly in their information needs and use, and in their use
of computers they can’t be expected to use a QA system equally often. Younger physicians
seem to have more questions and more often use Internet to search for an answer. They are
therefore expected to appreciate a QA system more than their older colleagues.
Awareness of information
Most general practitioners preferably search for an answer on their medical questions during the
consultation. However, they indicated that searching for an answer on the Internet takes too
much time to do this. Therefore, a QA system should reduce the time needed to search an
answer on the Internet sufficiently to make possible searching for answers during the
consultation.
44
Question answering for general practitioners
Sources of information
The sources general practitioners currently use in printed form (textbooks) are already well
organized and easily accessible. There is thus no need to make them accessible via a QA
system. Only one of the general practitioners indicated he might want to search for
dermatological images (also found in his manual) with a QA system, but other techniques like
image retrieval would be more suitable for this. Human sources can’t be made accessible via a
QA system, but their telephone numbers and e-mail addresses can. One of the general
practitioners indicated he would very much like a system that supplied him with this type of
information. However, information extraction techniques might be more suitable for retrieving
contact information. Finally, electronic resources are especially suitable for use by a QA system,
because the system can improve their functional accessibility. Information retrieval systems
currently used by general practitioners either return too much irrelevant information or require
very specific keywords. A QA system should act as an intermediary selecting the right sources
and keywords for the general practitioner.
Not all electronic sources are suitable for a QA system however. The guidelines issued by the
Dutch College of General Practitioners (NHG) are already easily accessible via an index.
Electronically available Dutch journal articles could be used, but only if they are suitable for
direct clinical application, otherwise they could better be accessed via an information retrieval
system to allow the general practitioner to read the context of the information. The general
practitioners participating in the interviews disagreed on the clinical applicability of journal
articles. It is therefore not clear whether they should be included or not. Patient education is
freely available on the Internet and widely dispersed. Electronic sources for patient education
are therefore especially suitable for a QA system. Moreover, some of the general practitioners
indicated they would primarily use a QA system for patient education. Thus, for the present,
only information for patient education should be included in a QA system for general
practitioners.
Most general practitioners participating in the interviews have access to the Internet from their
consulting rooms. Besides, they all indicated they would like a QA system to supply them with
the most recent information. Information sources used by a QA system can therefore best be
retrieved from the Internet. However, to avoid irrelevant or unreliable information, the QA
system should only consult some selected sites.
Information seeking
A QA system for general practitioners should be able to handle medical language and
preferably also ICPC-coding in the input question. The answer should be presented as a
concise answer, with direct links to the sites the information was retrieved from, so the general
practitioners can easily retrieve the complete documents if they want to.
User interface
Most general practitioners preferably use mouse input, but keyboard input should be
accommodated as well, because of RSI prevention. Speech input should only be used when it
is working perfectly.
3.4 Discussion
Some of the conclusions drawn from the literature (described in chapter 2) are confirmed by the
results of the interviews described in this chapter. However, there also are some differences
between the conclusions of both chapters. In this section the results of the literature study are
related to those of the interviews.
Work roles and tasks
The literature study was restricted to the information needs of the general practitioner in his role
of service provider, because this is the role in which there is a need for concise answers. Some
of the information sources mentioned in literature seem to serve the general practitioner
primarily in his role of learner however. For example, journal articles, whether in printed or
electronic form, are consulted to answer medical questions by only some of the interviewed
45
Mieke van Langen
general practitioners. The others primarily use them for ongoing medical education, because
they think the information presented in journal articles is not suitable for direct clinical
application. It is therefore not clear whether journal articles should be consulted by a QA system
or not.
In chapter 2 it was concluded that a QA system could support the general practitioner primarily
during the medical consultation phase of searching databases. It is therefore most convenient to
use the system during this phase of the consultation, instead of after the consultation. This
conclusion is supported by the interviews, in which the general practitioners indicated they
preferably search for an answer on their medical questions during the consultation. Therefore, a
QA system should reduce the time needed to search for an answer on the Internet sufficiently to
make possible searching for answers during the consultation.
The assumption, based on literature, that the medical consultation phase of searching
databases might be harder during patient visits than in the consulting room is also supported by
the results of the interviews. There are even general practitioners that avoid visiting patients as
much as possible for this reason. It is therefore expected that in the end general practitioners
will turn to mobile computers, because they surely need mobile access to their information
sources. With a laptop, the general practitioner generally has the same possibilities as he has
with his desktop computer, thus a QA system wouldn’t need to be adjusted for use on a laptop.
However, when a general practitioner would like to address the QA system via a PDA, a
different user interface would be needed. Perhaps the presentation of the information would
also have to be different, because less space is available for information presentation on a
PDA.
Information needs
Both in literature and in the interviews general practitioners varied greatly in the number of
questions they are confronted with during patient care. In a questionnaire executed in 1996 this
number ranged from 0.04 to 50 times a week [VNB99]. In this research no relations were
investigated between the number of questions and the age of the general practitioner. However,
Ely et al. {EOE99] found that younger physicians generally have more questions than their older
colleagues. This finding seems consistent with the results of the interviews.
Based on the literature it was expected that general practitioners’ questions generally deal with
diagnosis, treatment, patient education, prescribing, and disease etiology, and that the most
frequently needed medical knowledge categories are gastrointestinal, dermatological,
musculoskeletal, and circulatory knowledge [VNB99, MCW05]. These findings seem consistent
with the information sources consulted by the general practitioners participating in the
interviews.
In the literature it was stated that a lot of recognized information needs are never pursued
[GH95]. However, the general practitioners participating in the interviews said they normally
pursue all their information needs. This great difference might be due to a difference in research
method. In the interviews the general practitioners were asked directly how often they pursued
their information needs, whereas in other studies general practitioners were observed [GH95] or
they were sent a questionnaire [VNB99]. Questions may be forgotten shortly after they arose. In
the case of an observation, such questions are recorded as recognized but not pursued.
However, when physicians are directly asked for the percentage of recognized questions they
pursue, they don’t remember their forgotten questions (which are then taken for unrecognized
questions), making the percentage of pursued questions much higher. Besides, the general
practitioners participating in the interviews all used an electronic prescription system, making it
much easier to answer questions concerning prescription. On the other hand, these questions
were already routinely pursued without such a system [EOE99]. It is therefore expected that the
actual fraction of pursued questions is somewhat lower than the interviewed general
practitioners suggest.
46
Question answering for general practitioners
The assumption that questions may be easily forgotten is another reason why a QA system
should reduce the time needed to search for an answer on the Internet sufficiently to make
possible searching for answers during the consultation. To prevent general practitioners from
not pursuing their information needs, they must be able to search for an answer to their
questions immediately.
Awareness of information
Concerning electronic information sources, the literature finding that general practitioners have
difficulties in selecting the right sources and keywords [WM99] is confirmed by the interviews.
The general practitioners indicated that the information retrieval systems they used either return
too much irrelevant information or require very specific keywords.
Sources of information
Whereas in literature no relation was found between the use of electronic sources and the age
of the general practitioners [VBM95], the interviews suggest that younger physicians tend to use
the Internet for searching information more often than their older colleagues. This might be due
to the fact that they generally have more questions. Besides, younger physicians probably have
learned using the Internet during their medical education, whereas older physicians mostly have
had to learn using the Internet themselves (for which they generally lack time), because it is
only for the last ten years that the Internet has come into use among the general public.
Moreover, older physicians already have their ways of finding answers to their medical
questions and they are generally satisfied with these ways. They are thus not really motivated to
learn working with new information sources. It is expected that the younger general practitioners
who have more questions and are already accustomed to using the Internet, will be most
motivated to use a QA system, because for them such a system could really save time when it
makes finding the relevant information on the Internet faster and easier. Besides, the time
needed to learn working with the system will probably be the lowest for this group of general
practitioners.
A QA system is not suitable for finding all relevant information on the Internet, however. Some
types of information could better be found with other searching techniques, such as indexes,
information retrieval, image retrieval, or information extraction. To help general practitioners
choosing the application most suitable for retrieving each information type, some sort of web
portal could be constructed that provides access to all different applications and thus to all
different information types.
In the previous chapter it was stated that to allow for evidence based medicine, a QA system
should primarily use evidence based resources such as those provided by the Dutch College of
General Practitioners (NHG) and enable critical evaluation of the answers by providing links to
supporting evidence and analysis. However, the general practitioners participating in the
interviews (especially the older three) seemed to rely more on ‘experience based medicine’ than
on evidence based medicine. Besides, the electronic sources provided by the Dutch College of
General Practitioners are thought to be already easily searchable and general practitioners
indicated they appreciate the context provided by these sources. Only the suggestion that links
should be provided to supporting evidence is confirmed by the interviews.
User interface
While in previous research general practitioners seemed to prefer keyboard input to mouse
input [DHP98, RHF92], the general practitioners participating in the interviews don’t mind using
a mouse. They even tend to prefer mouse input above using function keys. Probably this is
because nowadays general practitioners have become accustomed to using a mouse for their
non-medical applications like Microsoft Word and Internet Explorer. They are all looking forward
to the next version of their current general practitioner information system that will be
accommodated for mouse input. Still, because of concerns of RSI prevention, applications for
general practitioners should not be restricted to a single input mode.
47
Mieke van Langen
Summarizing from both chapters, the following conclusions can be drawn with respect
to the requirements and scope of a question answering system for general
practitioners:
Information needs
for the present, only questions for patient education will be answered by the
question answering system;
the system should be aimed at physicians who are already accustomed to using
the Internet;
Awareness of information
physicians should be convinced that answers exist on the questions they have
concerning patient education, and that they can be found with the question
answering system;
the system should reduce the time needed to search for an answer on the Internet
sufficiently to make possible searching for answers during the consultation;
Sources of information
only information sources that are current, reliable, and suitable for patient
education should be used;
information sources can best be retrieved from the Internet, but only from some
selected sites;
Information seeking
the system should be able to handle medical language and preferably also ICPCcoding in the input question;
a dialogue might be needed to enable the system to correctly interpret and modify
the question;
the system needs to know the scope and depth of each information source;
bibliographic knowledge should be incorporated in the system;
null search results should be correctly interpreted;
a different information presentation module must be designed for use via a PDA;
the retrieved information should be presented as a concise answer, with direct
links to the sites the information was retrieved from;
information from several sources should be integrated into one consistent view;
the system should not give the impression that the presented information is the
single best way for clinical practice;
User interface
a web portal could be constructed that provides access to all different types of
information for general practitioners (including the question answering system);
the user interface of the question answering system should be very flexible;
input should not be restricted to a single input mode;
speech input should only be used when it is working perfectly;
a different user interface must be designed for use via a PDA;
information should be presented as simply and clearly as possible;
all functional options must be immediately visible at the user interface;
there shouldn’t be any timeouts.
48
Question answering for general practitioners
4 The information presentation module (GIPS)
Based on literature research and interviews with Dutch general practitioners (described in
chapters 2 en 3 respectively) conclusions were drawn with respect to the requirements and
scope of a question answering (QA) system for general practitioners. Based on these
conclusions an information presentation module (GIPS: General practitioner Information
Presentation Submodule) has been designed for the IMIX demonstrator. Besides, requirements
have been specified for a web portal that provides access to all different information retrieval
systems relevant for general practitioners (including GIPS).
In section 4.1 the first version of the IMIX demonstrator and its components are described. The
conclusions presented in the previous chapter are related to the components of the IMIX
demonstrator in section 4.2. The design of GIPS is presented in section 4.3. In section 4.4 a
prototype of an information portal for general practitioners is described.
4.1 The IMIX demonstrator
The IMIX demonstrator is an interactive multimodal QA system. The architecture of the first
version of the demonstrator is presented in Figure 4 on the next page [IMIXa]. It comprises six
components (shown as rectangles in the figure): a speech recognizer (norisc.asr), two question
answering modules (rolaquad.qa and qadr.qa), an output generator (imogen.gen), a text to
speech module (imogen.tts), and a graphical user interface (imix.gui).
The components interact with each other by reading and writing structured (XML-based) data
on global data stores (pools, shown as ellipses in Figure 4). When a component subscribes to a
data pool, it automatically receives messages published to that pool. An advantage of using
pools is that the framework is very flexible [HER03]. Because the data producer and data
consumer do not have to know each other, components can be added or replaced without
causing changes to other components.
The first component, the speech recognizer, is responsible for audio recording. It receives a
signal from the pool “control.asr.start” when it can start speech recording, The recognizer may
take some time to start up, and as soon as it is ready to receive speech input, it sends out a
signal to the pool “control.asr.ready”. The recognizer decides by itself when the speech
recording ends. When recording is finished and the analysis is complete, a message containing
the word graph is sent to the pool "questions".
The two question answering modules are triggered by a message arriving in "questions". This
message originates either from the speech recognizer (in the case of speech input) or from the
graphical user interface (text input). Each of the question answering modules processes the
question and sends a message containing its answer(s) to the pool "answers".
When two answer messages have arrived in "answers", the output generation module
transforms them into visual output (containing text and/or pictures), and speech output. Both
outputs are simultaneously sent to the "presentation" pool.
The text to speech module starts speaking the text when it arrives in the "presentation" pool. It
stops speaking as soon as a signal is received from the pool "control.tts.stop".
The graphical user interface controls the dialogue with the user. After a start screen, the user
chooses between text and speech using a button. In case of text, the user types text in a text
field, which is sent to the pool "questions". In case of speech, the speech recognizer is started
by sending a signal to "control.asr.start", while a "please wait" prompt is presented to the user.
When a signal is received in "control.asr.ready", a microphone symbol is displayed. Then, the
graphical user interface waits for information to arrive in the "presentation" pool, which it
displays. A "continue" button then takes the system back to the beginning, at which a signal is
sent to "control.tts.stop" to stop any speech output still going on.
49
Mieke van Langen
Thus, the information presentation (collected in the pool “presentation”) is generated by the
output generation module and displayed by the graphical user interface. These are the
components that will constitute GIPS. The speech recognizer and text to speech module will be
ignored in this study, because they are not yet working perfectly, which is required for use by
general practitioners. The two question answering modules will be taken for granted, because
they are not part of the IMOGEN project (of which this study is a part). Thus, requirements will
be drafted for these modules, but they will not be implemented.
Figure 4 Architecture for the first IMIX demonstrator
In the next subsections the current specifications of the output generation module and the
graphical user interface are presented.
4.1.1 Output generation module
The output generator subscribes to the pool “answers”. The messages in this pool are XMLbased QA documents [THE05]. An example of this content is shown in Figure 5.
The root element of a QA document is “qa”. The attribute “engine” of this element specifies
which of the two question answering modules produced the message (“rolaquad” or “qadr”). A
QA document further contains a question element and a list of answers.
The question element contains the question. Its attribute “mode” specifies the input mode. Its
value could be “typed” or “spoken”. In GIPS, only typed questions will occur. The question
element in turn contains two elements: the question string and the question analysis. The
50
Question answering for general practitioners
question string is the original string of the question as typed in by the user. The question
analysis contains the annotation of the question. The tags used for the annotation differ per
question answering module. The attribute “type” of the question analysis element indicates the
type of the question. The classifications of question types are also different for each of the two
question answering modules.
<?xml version="1.0" encoding="iso-8859-1"?>
<qa engine="qadr">
<question mode=typed>
<string> Wat is RSI ? </string>
<question-analysis type="definition(rsi)">
<node rel="top" cat="whq" begin="0" end="3">
....
</node>
</question-analysis>
</question>
<answerlist nr_answers="3">
<answer rank="1" conf="6.58">
<id source=www.rsi-vereniging.nl#overrsi#misverstanden
doc="www.rsi-vereniging.nl#overrsi#misverstanden" par="1"/>
<context>
<core>
RSI is hetzelfde als een muisarm
</core>
</context>
</answer>
<answer rank="2" conf="6.57">
<id source=www.rsi-vereniging.nl#overrsi#misverstanden
doc="www.rsi-vereniging.nl#overrsi#misverstanden" par="3"/>
<context>
<core>
Tegenwoordig is RSI een verzamelnaam ( ' paraplubegrip ' )
voor alle klachten aan armen , nek en schouders .
</core>
</context>
</answer>
<answer rank="3" conf="6.57">
<id source="www.arbeid.tno.nl#kennisgebieden#rsi#index"
doc="www.arbeid.tno.nl#kennisgebieden#rsi#index" par="2"/>
<context>
<core>
Ook RSI is eigenlijk een verkeerde term .
</core>
</context>
</answer>
</answerlist>
</qa>
Figure 5 Example QA document
The list of answers contains the answers produced by the question answering module. The
number of answers is specified by the attribute “nr_answers”. Each answer element contained
51
Mieke van Langen
in the list has two attributes: “rank”, and “conf”. The “rank” indicates the answer’s rank within the
answer list. Higher ranked answers are more likely to be a good answer to the user’s question
than lower ranked ones. The “conf” attribute indicates how confident the question answering
module is that this answer is a good answer to the user’s question. Confidence scores issued
by “rolaquad.qa” range between 0.00 and 1.00, and those issued by “qadr.qa” range between
1.00 and 8.00. Next to these attributes, each answer in the list contains two elements: an id and
the annotated text of the answer.
The id indicates the location of the answer within the document collection. For web documents
“qadr.qa” uses the form www.arbeid.tno.nl#kennisgebieden#rsi#index [BOU04]. Compared to
the original URL, slashes are replaced by hashes to be able to treat the ids as filenames. The
other question answering module, “rolaquad.qa”, indicates the answer document with an integer
index key assigned to the document in the private Rolaquad document database [CBD05].
The text of the answer is contained in a core element, which is in turn contained by a context
element that possibly gives the surrounding context of the answer. In principal, the core
comprises one sentence.
<?xml version="1.0" encoding="iso-8859-15"?>

<presentation xmlns="pml.xsd">


<content realization="speech">
This is a P-ml document with a picture.
</content>


<content name="picture.png" encoding="base64"
content-type="image/png">

iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAIAAACQd1PeAAAADElEQVQI12P
4//8/AAX+Av7c
</content>


<content realization="visual">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>IMIX</title></head>
<body>
<h1>P-ml document<h1>
This is a P-ml document with a

<img src="picture.png" alt="picture" />
</body>
</html>
</content>
</presentation>
Figure 6 Example of a P-ml document
52
Question answering for general practitioners
The output generation module transforms the information received in the “answers” pool into an
appropriate presentation of the information, which is then published to the “presentation” pool.
Presentations are exchanged using P-ml (presentation mark-up language) [IMIXp]. This is a
mark-up language that was designed specifically for the IMIX project. An example of a P-ml
document is shown in Figure 6.
The root element of P-ml is “presentation”. Within the presentation element several content
elements are specified that contain a part of the presentation. A content element has four
optional attributes: “realization”, “name”, “encoding”, and “content-type”.
The value of the realization attribute specifies the method in which the content element is to be
realized. For instance, "speech" means that the content is to be realized by a speech
synthesizer, and "visual" means that the content is to be visualized. For GIPS only “visual”
content elements will be used.
A content element that has a name attribute can be referred to from elsewhere in the P-ml
document. Content elements without a “name” attribute can only affect the presentation if a
realization attribute is specified.
The encoding attribute specifies the encoding of non-XML compatible file formats. For instance,
PNG images can be base64 encoded.
Finally, the content-type attribute can be used as a hint to interpret the contents of a content
element. This is especially useful in combination with the “encoding” attribute.
P-ml does not support content representation, but a content element may contain any other
XML element designed for content representation, like an XHTML, SVG or SSML document.
However, a content element may also contain an encoded binary format or just plain text.
4.1.2 Graphical user interface
The graphical user interface receives input from the user, which it publishes to the pool
“questions”, and it displays the presentations generated by the output generation module in
return.
The graphical user interface has eight different states [IMIXg]. In Figure 7 the state transition
diagram for the user interface is presented. In the first state, “welcome”, a welcome screen is
displayed with information about the IMIX demonstrator and a start button. Then the system
goes to the second state, “modality”, in which the user is asked to choose between text input
and speech input. The third to fifth states (“wake_up”, “speech_input”, and “text_input”) deal
with the different input modes. After a question has been entered, the system goes to the state
“waiting_for_answer”. When an answer arrives, the system goes to the state “answer” and the
answer screen is displayed. This screen has two buttons: one to start a new question and one
to close the graphical user interface. In case a user chooses to start a new question, the system
returns to the “modality” state. When the graphical user interface is closed, which can be done
in any of the states, the system gets to the final state “end”.
4.2 Requirements
In the previous chapter a list of conclusions with respect to the requirements and scope of a QA
system for general practitioners was presented. This list was organized according to a subset of
the elements of the model of information seeking by professionals [LPS96] (see also section
1.3):
information needs;
awareness of information;
sources of information; and
information seeking.
53
Mieke van Langen
Besides, a separate category was added for requirements concerning the user interface. Based
on all these conclusions, some user requirements can be specified for the IMIX demonstrator.
welcome
start
modality
speak
[1 time]
st
type
wake_up
control.asr.
ready
speak
enter
[empty_textbox]
timeout
speech_input
text_input
control.asr.
ready
enter
waiting_for_answer
presentation
answer
new
close
Figure 7 State transition diagram for the first IMIX demonstrator
The conclusions concerning the information needs specify the type of questions that will be
dealt with and the user group that will be targeted. These conclusions don’t concern a single
component of the system, but the problem and scope of the entire system. The requirements
related to the awareness of information concern user education, and quality attributes like
response time, reliability, and availability. These conclusions thus also concern the entire
system. The conclusions with respect to the sources of information implicate requirements on
the document collection consulted by the question answering modules.
Nine conclusions were drawn with respect to the information seeking of the system:
1. the system should be able to handle medical language and preferably also ICPC-coding in
the input question;
2. a dialogue might be needed to enable the system to correctly interpret and modify the
question;
3. the system needs to know the scope and depth of each information source;
54
Question answering for general practitioners
4.
5.
6.
7.
bibliographic knowledge should be incorporated in the system;
null search results should be correctly interpreted;
a different information presentation module must be designed for use via a PDA;
the retrieved information should be presented as a concise answer, with direct links to the
sites the information was retrieved from;
8. information from several sources should be integrated into one consistent view; and
9. the system should not give the impression that the presented information is the single best
way for clinical practice.
These conclusions could be categorized according to the steps of asking and answering
questions identified by Ely et al. [EOE02] (see also section 2.2.3):
recognizing an information gap;
question formulation;
searching for relevant information;
answer formulation; and
using the answer to direct patient care.
The first two conclusions deal with the step of question formulation. This step is now dealt with
by the two question answering modules of the IMIX demonstrator, but when a dialogue would
be used to correctly interpret and modify the question (as is suggested by the second
conclusion), it would be better to include a separate component to manage the dialogue.
Actually, at this moment a second version of the IMIX demonstrator is developed that supports
dialogue [VP05]. For this purpose a dialogue action manager is included in the architecture.
This component continually decides which action to take next, depending on the status and the
state of the dialogue with the user [OS04]. In most states, the next action is computed on the
basis of the history of the dialogue, the semantic or pragmatic interpretation of the most recent
user input, and the information obtained from the databases the dialogue action manager can
access.
The third to fifth conclusions related to information seeking concern the step of searching for
relevant information. This step is dealt with by the two question answering modules of the IMIX
demonstrator. These conclusions therefore implicate requirements for the question answering
modules.
Finally, the sixth to ninth conclusions related to information seeking concern the step of answer
formulation. This step is dealt with by the output generation module. Requirements for the
output generation module could therefore be derived from these conclusions.
The conclusions concerning the user interface implicate requirements for the graphical user
interface. Besides, based on these conclusions the decision was taken to leave the speech
recognizer and the text to speech module out of the design for the QA system for Dutch general
practitioners. Further, one of the conclusions recommended developing a web portal that
provides access to all different types of information for general practitioners, including GIPS.
The design of this portal is described in section 4.4.
Summarizing, these are the problem and scope of the entire system, and the
functional requirements for each of the different components:
Problem and scope
General practitioners are confronted with questions about patient care during medical
consultations. Answers on these questions are needed during the medical
consultation, but searching for answers on the Internet takes too much time.
Therefore, a question answering system should reduce the time needed to search for
an answer on the Internet sufficiently to make possible searching for answers during
the consultation. For the present, only questions concerning patient education will be
55
Mieke van Langen
answered by the question answering system, and the system will be aimed at
physicians who are already accustomed to using the Internet. For the system to be
successful, general practitioners will have to be convinced that answers exist on the
questions they have concerning patient education, and that they can be found with the
question answering system.
Functional requirements
Dialogue action manager
medical language and ICPC-coding in the question must be understood;
questions must be correctly interpreted and modified;
Question answering modules
the scope and depth of each information source must be known;
bibliographic knowledge should be incorporated in the system;
null search results should be correctly interpreted;
Document collection
only information sources that are current, reliable, and suitable for patient
education should be used;
information sources can best be retrieved from the Internet, but only from some
selected sites;
Output generation module
a different output generation module must be designed for information
presentation on a PDA;
the output should be presented as a concise answer, with direct links to the
documents the information was retrieved from;
information from several sources should be integrated into one consistent view;
it must be made clear that the output is not the single best way for clinical
practice;
Graphical user interface
the user interface should be very flexible;
both keyboard and mouse input must be accommodated;
a different graphical user interface must be designed for use via a PDA;
4.3 Design
Based on the requirements discussed above, the architecture of the IMIX demonstrator has
been adjusted for use by Dutch general practitioners. The revised architecture is presented in
Figure 8. Next to the two question answering modules, the output generation module, and the
graphical user interface, an extra module is incorporated in the design: the dialogue action
manager (dam.control). This component is integrated into the architecture according to the
specification of the second version of the IMIX demonstrator [VP05]. However, because the
second version of the demonstrator is not yet implemented, the components that will be
developed for this research probably cannot be integrated with the other components during this
research.
As depicted in Figure 8, the output generation module and the graphical user interface now
communicate with the question answering modules through the dialogue action manager. The
user input to the dialogue action manager consists of two pools: “user.gui” (pressed buttons),
and “user.language.raw” (text). As soon as the definitive question is known, the dialogue action
manager may write it to the pool “user.language.analysed”. The dialogue action manager
communicates with the question answering modules through the pools “questions” and
“answers”. The output from the dialogue manager is sent to the pool “dam.out”. This may be an
answer from the question answering modules, or another type of dialogue act, like a question or
56
Question answering for general practitioners
an informing message. The output generator transforms the signal received in “dam.out” into
visual output and sends it to the "presentation" pool. Then the output is displayed by the
graphical user interface.
user.language.analyzed
user.language.raw
questions
rolaquad.qa
user.gui
qadr.qa
answers
dam.control
imix.gui
presentation
dam.out
imogen.gen
Figure 8 Architecture for the IMIX demonstrator targeted at general practitioners
The dialogue action manager, like the two question answering modules, will be taken for
granted, because it is not part of the IMOGEN project (of which this study is a part). In the
previous section two functional requirements were specified for this component: medical
language and ICPC-coding in the question must be understood; and questions must be
correctly interpreted and modified. The second requirement actually is the goal of the dialogue
action manager. However, the dialogue action manager will not deal with the first requirement,
because the IMIX demonstrator is aimed at naïve users [VP05]. Therefore, to implement this
requirement, without changing the design of the dialogue action manager, the input to this
component should not contain any medical language or ICPC-coding. The graphical user
interface should therefore ‘translate’ these terms before sending them to the
“user.language.raw” pool.
In the next subsections the designs for the two components of GIPS (the output generation
module and the graphical user interface) are specified.
4.3.1 Output generation module
In the first version of the IMIX demonstrator, the output generator subscribes to the pool
“answers” [IMIXa]. However, because the dialogue action manager is added to the design of the
QA system for general practitioners, the output generator now subscribes to the pool “dam.out”
[VP05]. In this pool two types of information occur: answers and other types of dialogue acts.
The messages in this pool contain three different elements: dialogue act elements that give
dialogue act information; an output element that contains the data to be used by the output
generation module; and a context element that gives dialogue context information. The output
element contains a text element (consisting of a text string) and/or qa elements that are exact
57
Mieke van Langen
copies of the messages in the pool “answers” that already exists in the first version of the IMIX
demonstrator. Text elements represent dialogue acts other than answers.
When a signal is received in the “dam.out” pool, the output generation module transforms the
information into an appropriate presentation, which is then published to the “presentation” pool.
The qa elements generated by the question answering modules have to be modified to provide
the general practitioner with a comprehensible presentation of the answer to his question. In the
requirements it was stated that the answer should be presented as a concise response, with
direct links to the documents the information was retrieved from (which should be very reliable),
and that information from several sources should be integrated into one consistent view.
Besides, it must be made clear that the response is not the single best way for clinical practice.
The two question answering modules each generate a number of answers, which may originate
from different sources, but multiple answers may also have been retrieved from the same
source. To be able to provide a direct link to the source the answer was retrieved from, and to
make clear that the presented response is just a view provided by a particular source (with
which a general practitioner could disagree), answers from different sources should not be
integrated into one sentence. Instead, an overview should be provided of all different answers
integrated per source. This is illustrated in Figure 9. In this figure the three answers presented in
the example QA document shown in Figure 5 are grouped according to the two sources the
answers were retrieved from. The first and second answers
RSI is hetzelfde als een muisarm. (RSI is the same as a mouse arm.)
and
Tegenwoordig is RSI een verzamelnaam ('paraplubegrip') voor alle klachten
aan armen, nek en schouders. (Nowadays, RSI is a collective term for all
complaints of arms, neck, and shoulders.)
were retrieved from the website of the RSI Association. These two answers have been
integrated into one sentence in Figure 9:
Tegenwoordig is RSI, ook wel een muisarm genoemd, een verzamelnaam
('paraplubegrip') voor alle klachten aan armen, nek en schouders.
(Nowadays, RSI, also called a mouse arm) is a collective term for all
complaints of arms, neck, and shoulders.)
Actually, this integrated sentence is quite advanced. The first answer is transformed into an
appositive “ook wel een muisarm genoemd” (“also called a mouse arm”) to the second answer.
It is unlikely that a natural language processing system will produce such a sentence in the near
future, but it would be very nice. The third answer
Ook RSI is eigenlijk een verkeerde term. (Actually, RSI is also a wrong
term.)
was retrieved from TNO Arbeid. This answer is presented separately. Together with each group
of answers the name of the source is provided as a link to that source. This kind of presentation
can be specified in HTML and has to be contained by a visual content element in a P-ml
document.
The decision that answers from different sources should not be integrated into one sentence is
not universal to the IMOGEN project. Actually efforts are spent on the development of
techniques to integrate similar answers retrieved from different documents into one sentence
that is more specific or more general than the original answers [MK05].
58
Question answering for general practitioners
Wat is RSI?
RSI-vereniging:
Tegenwoordig is RSI, ook wel een muisarm genoemd,
een verzamelnaam ('paraplubegrip') voor alle
klachten aan armen, nek en schouders.
TNO Arbeid:
Ook RSI is eigenlijk een verkeerde term.
Figure 9 Example presentation
A separate requirement for the output generation module specified that a different output
generation module should be designed for information presentation on a PDA. In the case of a
PDA, there is not enough space to show all different answers on the screen. Therefore, only
one source should be selected for which the answer is presented. The answer should be
accompanied by the name of the source, but it is useless to provide a link to the complete
document the answer was retrieved from, because it would take too much effort to read it on
PDA. In Figure 10 the example presentation of Figure 9 is adapted to a PDA screen. This kind
of presentation could be represented simply as plain text in a content element of a P-ml
document.
RSI-vereniging:
Tegenwoordig is RSI, ook
wel een muisarm genoemd,
een verzamelnaam
('paraplubegrip') voor
alle klachten aan armen,
nek en schouders.
Figure 10 Example presentation for PDA
To integrate several answers retrieved from the same source into one concise response,
possibly aided by the surrounding (annotated) context, some natural language processing is
needed. This language technology is described in chapter 5. Other types of dialogue acts will
not be modified by the output generation module. The string contained by a text element in the
“dam.out” pool will thus be directly copied (as plain text) into a content element of a P-ml
document.
4.3.2 Graphical user interface
The graphical user interface controls the dialogue with the user. It accepts typed input and
pressed buttons from the user and it displays the presentations generated by the output
generation module in return.
In the first version of the IMIX demonstrator the graphical user interface has eight different
states [IMIXg]. However, for GIPS, the states dealing with the choice of input mode (“modality”)
and speech input (“wake_up” and “speech_input”) are not needed, because only text input will
be implemented. The “welcome” state will also be omitted, because one of the requirements for
the graphical user interface stated that all functional options should be immediately visible at the
user interface. The system will therefore start in the “text_input” state.
When a question or other dialogue act is entered by pressing the ok button or typing [enter]
(satisfying the requirement that both keyboard and mouse input must be accommodated), the
user turn ends and the system turns to the “waiting_for_answer” state. When a dialogue act
arrives that is not an answer, the system returns to the “text_input” state to allow the user to
59
Mieke van Langen
type a reply. When an answer arrives, the system goes to the “answer” state. From this state
the user can either start a new dialogue or close the graphical user interface. Like in the first
version of the IMIX demonstrator, the system can go to the “end” state from any of the other
states by closing the graphical user interface. Besides, the user should also be able to start a
new dialogue from any of the states to prevent the user from getting stuck in a fruitless
dialogue. However, the dialogue will never be reset after a timeout, because if a physician turns
away from the system, it must always be in precisely the same state when he returns to it. The
new state transition diagram for GIPS is shown in Figure 11.
new
text_input
enter
enter
[empty_textbox]
presentation [no_answer]
waiting_for_answer
presentation [answer]
answer
close
Figure 11 State transition diagram for GIPS
4.4 An information portal for general practitioners
The general practitioners participating in the interviews (described in chapter 3) were not
unanimous in their information needs and their expectations of QA systems. QA systems turned
out to be suitable primarily for answering questions for patient education based on information
retrieved from the Internet. Other types of questions might also be answered with information
retrieved from the Internet, but question answering is not a suitable technique for finding an
answer to these questions. To provide an overview of the different techniques available to
general practitioners for answering their questions, a web portal could be developed that
provides access to all types of applications useful for general practitioners.
Next to GIPS, the portal should provide access to an image retrieval system to retrieve
dermatological images, information extraction technology to retrieve data like telephone
numbers and e-mail addresses (the social map), the index of the guidelines issued by the Dutch
College of General Practitioners [NHGs], and medical information retrieval systems like
Artsennet [KNMG] and PubMed [EPM] to retrieve journal articles. Some of these applications,
like GIPS itself, image retrieval, and information extraction are not yet available for use by
general practitioners. Therefore, when a web portal would be implemented, it should be made
very clear which types of information can and which cannot be found with the available
applications to avoid general practitioners being frustrated because the application does not
return the information they need.
60
Question answering for general practitioners
A prototype of such a web portal has been implemented in HTML and JavaScript. The lay-out of
the portal resembles that of “startpagina.nl” which is a frequently used portal for Internet users.
Each application is shown as a textbox headed with a colored title. The color of the title
indicates the availability of the application. A blue title indicates that the application is available,
a purple title means there is only a prototype of the application, and applications with a red title
are not available at all, but might be available in the future, in which case they would be useful
for general practitioners. For each application text fields are included in the portal in which the
user can directly enter his question, diminishing the number of steps a user would need when
he was to enter his question in the application itself after opening it.
GIPS is the only application with a purple title. A prototype of GIPS has also been implemented
in HTML and JavaScript. This prototype consists of a simple graphical user interface in which
the user can select a question from a list of questions. Then the answer to this question
(generated previously with the prototype described in the next chapter) is shown in an answer
field. One of the answers also includes a picture. In this prototype dialogue functionality is not
implemented, and keyboard input has not been accommodated for.
Screenshots of the web portal and GIPS are shown in Appendix C. The prototypes themselves
can be viewed at my website:
http://wwwhome.cs.utwente.nl/~langen/thesis/portal
61
Mieke van Langen
62
Question answering for general practitioners
5 Response formulation
In the previous chapter the design for GIPS (General practitioner Information Presentation
Submodule) was presented. There, the need was expressed for natural language processing
(NLP) techniques to integrate several answers into one concise and coherent response. NLP is
a subfield of artificial intelligence and linguistics. It concerns the processing and understanding
of natural language by computers. NLP is a broad field comprising, among others, speech
recognition, natural language generation, question answering, information retrieval, information
extraction, and automatic summarization.
In section 5.1 some related work on response formulation is presented. Section 5.2 describes
which response formulation methods are suitable for GIPS. These methods are elaborated in
sections 5.3 and 5.4. Finally, in section 5.5 the implementation of the response formulation task
is described.
To avoid confusion about the terminology, in this chapter the term ‘answer’ will be used for the
answer sentences returned by the question answering modules, while the term ‘response’ will
be used for the final (concise and coherent) answer generated by the output generation module.
5.1 Related work
In previous QA research, little effort has been dedicated to response formulation. Most QA
research has focused on improving system performance against a standard set of questions,
like the QA track at the TREC conferences [TREC]. These questions require short factoid
answers and system performance on these questions is measured by the rank at which the
correct answer is returned by the system. No natural language responses need to be formulated
for this task.
For the sake of better human computer interaction, response formulation is receiving more and
more attention, though. Generally, two response formulation approaches can be discerned that
are related to existing approaches for the QA stage of answer extraction: formulation templates
(5.1.1); and query-based summarization (5.1.2). Besides, research is executed on the use of
sentence fusion (5.1.3) to integrate several answers into one response.
5.1.1 Formulation templates
Kosseim [KPG03] investigated the use of formulation templates. For example, the question
Who is the prime minister of Canada?
could be transformed into the templates:
The prime minister of Canada is <person-name>
or
<person-name>, prime minister of Canada,
These templates can be used for answer extraction as well as response formulation. In the case
of answer extraction, a lot of possible formulations are produced. Then the QA system searches
for these formulations in the document collection and instantiates <person-name> with the
matching noun phrase. This noun phrase is then considered the answer. In the case of
response formulation, only a few (or only one) template of good linguistic quality is produced.
Then <person-name> is instantiated with the answer to produce a linguistically correct
response.
I used formulation templates for a QA system that only answers factoid person questions
[LAN05]. It is a relatively simple method that can only be used for short, factoid answers. As far
as I know, the use of formulation templates for more extensive answer has not been
investigated.
63
Mieke van Langen
5.1.2 Query-based summarization
Query-based summarization aims at summarizing the part of a text relevant to the user’s
question. This method is used for answer extraction. For example, Cardie et al. [CNP00]
investigated the use of a variant of the vector space model (primarily used by information
retrieval systems) to generate a query-based summarization. With this method, the answer
document is divided into chunks (e.g. sentences, paragraphs, 200-word passages), a vector
representation is generated for the question and for each document chunk, then the similarity of
each chunk to the question is determined, and the most similar chunk (up to a predetermined
length) is taken to be the query-dependent summary. This summary is then used to extract
possible answers to the question.
However, query-based summarizations also seem appropriate as response. According to Lin et
al. [LQS03], the most natural response presentation style for a QA system is “focus-pluscontext”. This means the system returns the answer to the user’s question, extended with the
text surrounding the answer. Lin et al. investigated which context level is preferred by users:
only the exact answer; the sentence from which the answer was extracted; the paragraph; or
the entire document. They concluded that users prefer to receive the paragraph the answer was
retrieved from.
Bosma [BOS05] investigated a more intelligent way of producing query-based summarizations.
He used discourse annotations to determine which context sentences are most related to the
sentence in which the answer is located. Only these sentences (which are not necessarily
subsequent to each other in the original document) are included in the response, instead of the
entire paragraph. This response was compared to a baseline consisting of the answer sentence
extended with the preceding and the successive sentence. Users were asked to evaluate the
query-based summarizations and the baseline on the extent to which their accurateness could
be verified, on the usefulness with respect to the question, and on the amount of irrelevant
information with respect to the question. It turned out that users thought Bosma’s query-based
summarizations were more verifiable, and contained less irrelevant information than the
baseline response.
Actually, in their methods, Lin et al. and Bosma don’t use the query to produce a query-based
summarization at all. They only use the answer sentence as starting-point to determine which
sentences to include in the response. Therefore, I call this method “answer extension”, instead
of “query-based summarization”.
5.1.3 Sentence fusion
Sentence fusion is used especially in multi-document summarization to summarize the
information common to all documents. As an example Barzilay [BAR03] presents the
sentences:
IDF Spokeswoman did not confirm this, but said the Palestinians fired an
anti-tank missile at a bulldozer.
and
The clash erupted when Palestinian militants fired machine-guns and antitank missiles at a bulldozer that was building an embankment in the area to
better protect Israelian forces.
These sentences would be fused into the sentence:
Palestinians fired an anti-tank missile at a bulldozer.
Sentence fusion could also be used in QA systems for response formulation. Marsi and
Krahmer [MK05] investigate a method for Dutch to integrate similar answers retrieved from
different documents into one sentence that is more specific or more general (depending on the
goal of the system) than the original answer sentences. For example, the answers
64
Question answering for general practitioners
RSI can be caused by repeating the same sequence of movements many
times an hour or day.
and
RSI is generally caused by a mixture of poor ergonomics, stress and poor
posture.
Might be fused into a more specific response, like:
RSI can be caused by a mixture of poor ergonomics, stress, poor posture
and by repeating the same sequence of movements many times an hour or
day.
The method developed by Marsi and Krahmer comprises three stages: alignment, merging, and
generation. During alignment, words and phrases in the different sentences that are related to
each other are aligned, and each alignment is labeled with the semantic relation holding
between the aligned phrases (e.g. equals, restates, specifies). In the merging stage it is decided
which information from either sentence should be preserved. Finally, a grammatically correct
surface representation is generated for the fused sentence. Marsi and Krahmer haven’t
investigated their method of sentence fusion for QA yet. They only evaluated it on parallel
corpora.
5.2 GIPS
To get an idea of the possible answers received as input for GIPS, ten example questions about
RSI were submitted to the question answering module “qadr.qa”. This module was used
because it provides the dependency structures of the question, the answer sentences, and the
sentences in their context, which can be useful for subsequent natural language processing.
More information on dependency structures is provided later in this chapter. The example
questions and their answers are presented in detail in Appendix D.
On inspection of the answers there appear to be two different types of answers. The first type
can be interpreted without knowing the original context of the answer sentence. I call these
“autonomous answers”. For example, on the first question in Appendix D:
Wat is RSI? (What is RSI?)
one of the answers is
RSI is een verzamelnaam voor zeer uiteenlopende vormen van
overbelasting in het gebied van nek, schouders, armen en ellebogen. (RSI is
a collective term for very different forms of overload in the area of neck,
shoulders, arms, and elbows.)
This answer can be interpreted easily, because it simply gives a definition.
However, there is a second type of answers that can hardly be interpreted without knowing their
context. I call these “dependent answers”. Many of these answers are not relevant for the
question, but seem to have a strong relation with a sentence in their context that is necessary to
correctly interpret the answer and would be very relevant for the question. For example, on the
third question in Appendix D:
Welke spieren zijn betrokken bij RSI? (Which muscles are affected by RSI?)
one of the answers is
Uit onderzoek is bekend dat deze het vaakst betrokken is bij RSI. (It is
known from research that this one is affected by RSI most often.)
This answer cannot be entirely interpreted, because it contains a referring expression “deze”
(“this one”). This expression probably refers to a muscle. It is expected that the previous
65
Mieke van Langen
sentence contains the name of this referred muscle, which would be very relevant for the
question.
Because there are two different types of answers, also two different response formulation
strategies are proposed. Autonomous answer sentences retrieved from the same document
could be integrated into a single sentence using some sentence fusion method. This is
described in section 5.3. Dependent answers need to be extended with sentences from their
original context, using some answer extension method, to make them interpretable and
probably also more relevant, see section 5.4.
5.3 Answer integration
Autonomous answers could be integrated into one sentence to make the response more
concise and fluent. However, GIPS should only integrate answers originating from the same
document into one sentence, because the general practitioner receiving the response must be
able to check its source. Besides, it is hypothesized that general practitioners will more readily
accept and trust a response accompanied by a link to its source, than a computer generated
response that cannot be easily verified.
Within NLP, two different approaches for integrating sentences have been investigated:
aggregation and sentence fusion. Aggregation is used in natural language generation (NLG) to
make generated text more coherent by reducing redundant information and introducing
connectives [RM99, SHA02]. NLG is defined as “the process of constructing natural language
outputs from non-linguistic inputs” [JM00], which is also called concept-to-text generation
[LAP03]. In concept-to-text generation, information that is represented in a knowledge base or
other logical representation is transformed into natural language. For example, the logical
proposition “likes(John, Mary) AND likes(Mary, John)” could be transformed into the natural
language phrase “John and Mary like each other.”.
Sentence fusion is used in text-to-text generation to create a concise and fluent fusion of two or
more sentences [BAR03]. Text-to-text generation methods take information that is represented
in natural language as input, and transform it into a new natural language representation
satisfying certain constraints. For example, in automatic summarization, sentences could be
deleted or fused to represent only the most important information. Sentence fusion is used
especially in multi-document summarization to summarize the information common to all
documents.
The response formulation task investigated in this chapter concerns text-to-text generation,
because it starts with text snippets (answer sentences) retrieved by the question answering
modules and aims at integrating them into a single coherent text (response). Therefore,
sentence fusion (being a text-to-text generation task) seems the most appropriate approach for
answer integration. Actually, the use of sentence fusion for the IMIX demonstrator is
investigated at the University of Tilburg by Marsi and Krahmer [MK05]. However, I think that the
concept-to-text generation field of aggregation could also provide useful insights for the
response formulation task, because it has been investigated more thoroughly. Whereas
sentence fusion primarily concerns the fusion of similar sentences (like in multi-document
summarization or fusion of parallel corpora), aggregation concerns the integration of sentences
connected by all sorts of relations. Therefore, I investigated the possibilities to use aggregation
for response formulation.
The next subsection provides a general description of aggregation. Then in sections 5.3.2 till
5.3.7 the different linguistic levels of aggregation are elaborated. In section 5.3.8 conclusions
are drawn with respect to the aggregation types that can be used for the response formulation
task. Finally, in section 5.3.9 an algorithm is presented for answer integration.
66
Question answering for general practitioners
5.3.1 Aggregation
Reape and Mellish [RM99] conducted a literature survey to investigate what aggregation is.
They concluded that aggregation is “the combination of two or more linguistic structures into a
single linguistic structure which contributes to sentence structuring and construction”.
Aggregation roughly consists of two stages [SHA02]. In the first stage, it is decided which
linguistic structures might be aggregated. Mostly, linguistic structures are aggregated when they
have a rhetorical relation or when they show some similarity. In the second stage,
transformations are applied to these structures. For this purpose linguistic constructions are
used, such as conjunction, adjective phrase attachment, quantification, and gapping. Examples
of these constructions are shown in Table 8.
Aggregation is usually used to make computer generated text more concise, cohesive, and/or
fluent and (consequently) to generate more complex sentences [SHA02]. For the sake of
fluency, the linear ordering of aggregated constituents is also important. For example, the
sentence “Mary is a sweet little girl.” is more fluent than “Mary is a little sweet girl.”, while both
sentences are grammatical. Therefore, when two sentences are aggregated, a linear ordering of
their constituents should also be specified.
There are several types of aggregation. Each author makes a different classification of
aggregation types. Reape and Mellish [RM99] tried to integrate these classifications by
introducing a classification based on levels of linguistic representation. They discern six different
levels at which aggregation can occur. Table 9 lists the different types of aggregation
associated with these levels of linguistic representation.
Linguistic construction
Example
Conjunction
Sue invited John.
Sue invited Mary.
→ Sue invited John and Mary.
Adjective phrase attachment
Mary is a girl.
Mary is sweet.
→ Mary is a sweet girl.
Quantification
John was invited.
Mary was invited.
John and Mary are the only children.
→ All children were invited.
Gapping (deletion of a second verb)
John hit Mary.
Phil hit Sue.
→ John hit Mary and Phil ∅ Sue.
Right-node-raising
John likes Mary.
Phil hates Mary.
→ John likes ∅ and Phil hates Mary.
Table 8 Examples of linguistic constructions
5.3.2 Conceptual aggregation
The first level, the conceptual or inferential level, concerns non-linguistic, languageindependent, domain knowledge contained in some sort of knowledge base. Conceptual
aggregation concerns the use of user modeling, domain knowledge, and common sense
reasoning to reduce the number of concepts [SHA98]. This type of aggregation typically
67
Mieke van Langen
reduces the number of propositions in the message while increasing the complexity of the value
of some conceptual role. For example, the two sentences "John hit Mary" and "Mary kicked
John" might result in the aggregated sentence "John and Mary fought."
Aggregation type
Description
Conceptual aggregation
Reduction of the number of concepts
Discourse aggregation
Reduction of the complexity of the rhetorical structure
Semantic aggregation
Combination of two or more semantic entities
Syntactic aggregation
Combination of sentences using syntactic constructions
Lexical aggregation
Reduction of the number of lexical predicates and/or lexemes
Referential aggregation
Referring expression generation
Table 9 Types of aggregation
5.3.3 Discourse aggregation
The second level, discourse, concerns the coherence of a text. Text coherence implicates that
utterances in a text are linguistically as well as non-linguistically connected to each other
[HAA02a]. Besides, the utterances must contribute to the rhetorical structure of the text. A
theory that describes the different rhetorical relations between sentences is the Rhetorical
Structure Theory (RST) developed by Mann and Thompson [MT87]. Most RST relations are
asymmetric. In that case one of the sentences is considered the nucleus (conveying the most
essential information) and the other the satellite. However, there are also multi-nuclear relations
in which there is no distinction between nucleus and satellite. Table 10 lists some examples of
RST relations [JM00]. RST relations are hierarchical, therefore the rhetorical structure of a text
can be represented as a tree (see for example Figure 12 and Figure 13).
Relation
Description
Elaboration The satellite presents some additional detail concerning the nucleus.
e.g. John likes girls. He likes Mary most.
Contrast
The nuclei present things that are different in some relevant way.
e.g. John likes girls. He doesn’t like Sue.
Sequence
The nuclei are realized in succession.
e.g. John invites Mary. John invites Sue.
Purpose
The satellite presents the goal of performing the activity presented in the
nucleus. e.g. John invites Mary. He wants her to come to his party.
Result
The situation presented in the satellite results from the one presented in the
nucleus. e.g. John invites Mary. She comes to his party.
Table 10 Examples of RST relations
Discourse or rhetorical aggregation is defined as “any operation that applies to a discourse
structure, rhetorical structure or text plan and maps it to a better structure or plan” [RM99]. An
example of discourse aggregation might be the mapping of the rhetorical structure tree
E(nuc(E(nuc(n),sat(p1))),sat(p2)) (for example: “John like girls. He likes Mary
most. He also likes Sue.”)
68
Question answering for general practitioners
Into
E(nuc(n),sat(and(p1,p2))) (for example: “John likes girls. He likes Mary most
and he also likes Sue.”)
where “E” is the elaboration relation, “nuc” is the nucleus, and “sat” is the satellite. These trees
are illustrated graphically in Figure 12 and Figure 13 respectively. As this example illustrates,
discourse aggregation typically reduces the complexity of the rhetorical structure while
increasing the complexity of one of its propositional leaves.
Elaboration
p2
Elaboration
n
p1
Figure 12 Rhetorical structure tree E(nuc(E(nuc(n),sat(p1))),sat(p2))
Elaboration
n
p1
p2
Figure 13 Rhetorical structure tree E(nuc(n),sat(and(p1,p2)))
5.3.4 Semantic aggregation
The semantic level concerns linguistic, language-dependent representations of meaning. In
contrast to the conceptual level, the information at this level is domain-independent [RM99].
Semantic aggregation is defined as the combination of two or more semantic entities into one
entity. Methods of semantic aggregation are semantic grouping and logical transformations.
Semantic grouping is the ordering and bracketing of semantic content. Logical transformations
concern the mapping of semantic predicates into fewer or just different predicates. For example,
the meanings of “Jamie is Chris’s sister” and “Chris is Jamie’s brother” might be mapped to the
meaning of “Chris and Jamie are brother and sister”.
The distinction between conceptual and semantic aggregation is difficult. Actually, other
authors, like Shaw [SHA02], don’t distinguish between these types of aggregation. Instead,
Shaw considers all aggregation operations that make no use of syntactic knowledge or lexicon
(except for referential aggregation) interpretive aggregation. According to Shaw, interpretive
aggregation operators perform inferences over conceptions and relations across propositions.
Reape and Mellish [RM99] also admit that they couldn’t find any clear examples of semantic
69
Mieke van Langen
aggregation “which couldn’t alternatively be classified as either conceptual, syntactic or lexical
aggregation”. This type of aggregation is thus a bit doubtful.
5.3.5 Syntactic aggregation
Syntax refers to the way words are arranged together [JM00]. Words belong to different word
classes, implying restrictions on the way they can be used in a sentence. Syntactic aggregation
is the most common form of aggregation [RM99]. It combines propositions using syntactic
constructions, like conjunction, gapping, etc. [SHA02]. Syntactic aggregation can be paratactic
or hypotactic. In paratactic aggregation the aggregated sentences are of equal syntactic status.
The main paratactic aggregation operator is the coordinating conjunction, a linguistic
construction that uses a coordinator (like “and”, “or”, “but”) to link linguistic units of equal
syntactic status. For example the sentences “John likes school.” and “Mary likes school.” can be
aggregated as “John and Mary like school.”. The coordinating conjunction can be used to
combine propositions that have an addition, sequence, or non-volitional result rhetorical relation.
The clauses in hypotactic constructions have unequal syntactic status [SHA02]. For example,
when two propositions have an elaboration relation, the proposition in satellite position can be
transformed into a modifying construction, such as an adjectival phrase, a prepositional phrase,
or a relative clause, like the transformation of “Mary is sweet.” into an adjective in “Mary is a
sweet girl.”. Lexical information is used to determine if the result of hypotactic aggregation
doesn’t violate any syntactic or lexical constraints. For example, when the sentences “Mary is a
girl.” and “Mary is sweet.” are aggregated as “Mary is a sweet girl.”, “girl” must be realizable
using an adjective, and “sweet” must be realizable as a pronominal modifier. Such restrictions
are coded in a lexicon.
Rhetorical relations are very important in choosing a linguistic construction for syntactic
aggregation. Shaw [SHA02] gives the following example to show how different rhetorical
relations lead to very different aggregated sentences.
Consider the following two sentences:
1. John abused the duck.
2. The duck buzzed John.
When the main rhetorical relation connecting the nucleus and satellite is elaboration, the
sentences might be aggregated by a relative clause, resulting in the following aggregated
sentences (depending on which of the sentences was the nucleus):
a. John abused the duck that had buzzed him.
b. The duck buzzed John who had abused it.
By using the past perfect tense in the relative clauses, the nucleus and satellite also have a
sequence relation, therefore sentences a en b describe very different situations. In the first,
John was the victim first before he became an aggressor, while in the second the duck was the
victim first.
When the main rhetorical relation connecting the nucleus and satellite is sequence, nonvolitional result, or addition, the sentences might be aggregated by conjunction, resulting in the
sentences c and d.
c.
The duck buzzed John and he abused it.
d. John abused the duck and it buzzed him.
Thus, at least four different aggregated sentences can result from two sentences, depending on
their rhetorical relation.
70
Question answering for general practitioners
5.3.6 Lexical aggregation
The lexical level represents the concatenation of morphemes making up a word [JM00]. Lexical
aggregation combines multiple lexical items to express them more concisely [SHA02]. This
operation is related to paraphrasing. Compared with hypotactic aggregation, lexical aggregation
operators use more detailed lexical information. For example, the phrase “a dog used by the
police” might be transformed into “a police dog”, transforming the reduced relative clause into a
prenominal modifier. Another type of lexical aggregation is the combination of multiple lexemes
into one, like the transformation of the phrase “rise sharply” into “shoot”.
5.3.7 Referential aggregation
The last type of aggregation, referential aggregation, is usually associated with referring
expression generation [RM99]. Referring expression generation concerns the linking of words
by introducing pronouns, demonstratives, and other types of reference [JM00]. Another type of
referential aggregation, however, is quantification [SHA02]. Quantification replaces a set of
entities with a reference to their type (based on ontology) as restricted by a quantifier. For
example, when it is known that John and Mary are the only students, the sentences “John likes
school” and “Mary likes school” could be transformed into “all students like school”. For this type
of aggregation an ontology is needed that provides information on instance-class relations,
inheritance relations, and part-of relations between different entities.
However, referring expressions and quantification can introduce ambiguity, when applied
incorrectly. For example, it might be unclear to which entity a referring expression refers [JM00],
or when multiple quantifiers are synthesized in the same sentence, the scope of the quantifiers
could be ambiguous [SHA02]. Thus, referential aggregation should be used cautiously.
5.3.8 Conclusions
Because the task of response formulation is a text-to-text generation process instead of a
concept-to-text generation process, not all levels of aggregation are relevant for the response
formulation task. Conceptual, discourse, and semantic aggregation suggest an underlying
knowledge and rhetorical structure presentation of the text that is not available to GIPS.
Transforming the answers or the entire original document into a knowledge representation
would extremely increase the complexity of the aggregation task. Therefore, conceptual,
discourse, and semantic aggregation will not be used.
Syntactic aggregation is especially useful for the response formulation task, because it
concerns using linguistic constructions to integrate sentences, while preserving their
grammatical and lexical correctness. However, a specific syntactic construction can only be
used to aggregate sentences having a certain rhetorical relation. Therefore, the rhetorical
relations between the answers should be determined. This is very difficult, because the answers
to be integrated are autonomous and usually not subsequent in the original text. They are all
suggested to present an answer on the same question, however. Their rhetorical relation could
thus be suggested to be addition. The primary syntactic construction used to aggregate
sentences having an addition relation is the coordinating conjunction. This construction could
thus be used to aggregate different autonomous answers retrieved from the same document.
For lexical aggregation a lot of morphological and semantic knowledge about Dutch words is
needed, and it primarily concerns aggregation within sentences. This type of aggregation is
probably more useful for concept-to-text generation processes in which lexical choices still have
to be made. In the case of GIPS, answers already have their lexical representation. Therefore,
this type of aggregation will not be used for this research. However, some lexical knowledge
might be needed to correctly execute syntactic aggregation.
Finally, referential aggregation would be very useful to make a text more coherent, but it can
introduce ambiguity when applied incorrectly. In an application for general practitioners
71
Mieke van Langen
ambiguity must be avoided as much as possible. Therefore, this type of aggregation will also be
omitted.
5.3.9 Answer integration algorithm
In the previous section it was concluded that answers should be integrated using a coordinating
conjunction construction. Shaw [SHA02] presents a conjunction algorithm for an NLG system
that incorporates different types of ellipsis, like gapping and right-node-raising. With some minor
adjustments, this algorithm could also be very useful for the answer integration task. Shaw’s
algorithm consists of the following steps:
1. group propositions and order them according to their similarities while satisfying pragmatic
and contextual constraints;
2. determine recurring elements in the ordered propositions being combined;
3. create a sentence boundary when the combined clause reaches a-priori thresholds;
4. decide which recurring elements are redundant and should be deleted.
The first step, the grouping and ordering of answers, could be done by grouping the answers
per source and ordering them according to the order of the original document to preserve the
discourse structures of the original document. Then, based on the number of answers per
source and their similarity, it will be determined which answers should be aggregated. To avoid
generating too complex sentences, no more then two answers should be integrated into a single
sentence. When there are more then two answers, the similarity of all subsequent answers
could be computed to determine which couples could best be integrated.
When two answers are to be integrated, the second step is to determine recurring elements.
Recurring elements could be deleted in the fourth step. Therefore, they need to be exactly
identical. To check whether two constituents are identical, Shaw proposes two equivalence
tests: the alphabet equivalence test; and the sense equivalence test. Alphabet equivalence
concerns the surface form of a constituent. This can be easily checked by comparing the
respective strings. Sense equivalence concerns “the identity of the indexicals”. Shaw says that
for nouns this means that their entity identifiers should be tested, and for verbs and adjectives
their lexical senses should be tested. However, this is only possible in concept-to-text
generation. In the case of the answers returned by the question answering modules, the entire
original text should be analyzed to determine the senses of the nouns, verbs and adjectives.
This would extremely complicate the algorithm. Instead, the dependency structures with which
the answers are annotated could be used as an extra equivalence test. Dependency relations
have a tree structure. They are described in detail in the next section. In short, identical
constituents should have the same relation, POS and root tags in the case of a single word, or
the same relation, cat tags and child nodes in the case of a phrase. Otherwise, they are not
identical.
The third step, creating a sentence boundary, can be omitted, because the number of answers
to be integrated was already restricted to two.
Finally, in the fourth step the sentences are joined using the coordinator “and”, and recurring
elements are deleted. When a recurring element, or a group of recurring elements, is realized at
the end of a sentence, it should be deleted backward, meaning that the first occurrence of the
identical constituent is deleted. Otherwise, it should be deleted forward, thus deleting the
second occurrence of the identical constituent. According to Shaw, this directionality is a
universal phenomenon, also valid for Dutch. For example, consider the following sentences:
a. Mary eats an apple in the morning.
b. Sue eats a banana in the morning.
c.
72
Mary eats an apple, and Sue a banana in the morning.
Question answering for general practitioners
In sentences a and b two recurring elements can be identified: “eats”, and “in the morning”. The
first element, “eats”, is realized in the middle of the sentence and should thus be deleted
forward. The other is realized at the end of the sentence and is thus deleted backward. The
deleted elements are underlined in sentences a and b. Sentence c is the aggregated sentence.
Not all sentences can be aggregated that easily, however. Shaw [SHA02] identifies several
additional constraints on the conjunction algorithm dealing with scope ambiguities and
morphology. Scope ambiguities can occur with modifiers, negation, and quantifiers. For
example, when the phrases “tall men” and “women” are aggregated as “tall men and women”, it
is not clear whether the women are also tall. To avoid these kinds of ambiguity, the elements
should be reordered to make the scope clear, like in “women and tall men”. Morphological
problems can occur when, for example, number agreement rules are harmed, like in “Mary and
Sue eats an apple.”.
Compared to the sentence fusion algorithm described by Marsi and Krahmer [MK05], this
conjunction algorithm is more rigid. In this algorithm only identical constituents are deleted,
whereas in the sentence fusion algorithm also constituents having a specification relation or
restatements can be fused. Consider the example sentences presented in section 5.1.3:
RSI can be caused by repeating the same sequence of movements many
times an hour or day.
and
RSI is generally caused by a mixture of poor ergonomics, stress and poor
posture.
The sentence fusion algorithm aligns the phrases “can be caused by” and “is generally caused
by” and labels them as restatements of each other. Then one of these phrases is arbitrarily
chosen to be preserved in the fused sentence. Instead, the conjunction algorithm would sign the
words “caused” and “by” as recurring elements, deleting them forward, which would result in the
ungrammatical sentence:
RSI can be caused by repeating the same sequence of movements many
times an hour or day, and is generally a mixture of poor ergonomics, stress
and poor posture.
Thus, the sentence fusion algorithm is more complex to implement, but it seems to be a more
natural solution for answer integration then this conjunction algorithm. Sentence fusion appears
to incorporate coordinating conjunction constructions, as well as lexical aggregation by aligning
not only identical but also similar phrases.
5.4 Answer extension
When an answer cannot be interpreted without knowing its context, it should be extended with
the sentences most related to this answer to formulate a coherent response. Therefore, the
rhetorical relations between the sentences in the context of the answer and the answer itself
should be determined. In a natural language generation system, the rhetorical relation between
two sentences should be specified by a component called a content planner [SHA02] or
discourse planner [JM00]. However, the response formulation task described in this chapter is a
text-to-text generation process. The rhetorical relations must thus be inferred from the text.
Bosma [BOS05] used RST annotations for his answer extension algorithm. However, there
aren’t any automated RST analysis tools available for Dutch yet. Therefore, for GIPS a simple
algorithm has been developed that detects sentences that are strongly related to a given
answer.
It is assumed that strongly related sentences have been aggregated on some level by the
author. Conceptual, semantic, syntactic, and lexical aggregation are primarily used to aggregate
clauses or phrases within sentences or to integrate multiple sentences or propositions into one.
73
Mieke van Langen
Aggregation operators that are used to express a relation between two sentences (instead of
within a sentence) are discourse and referential aggregation. Discourse and referential
aggregation operators can be recognized by certain linguistic markers, like cue phrases
(described in section 5.4.1), anaphoric referring expressions (5.4.2), and document structure
(5.4.3). In section 5.4.4 the algorithm is described that is used to select the sentences that
should be included in the answer.
5.4.1 Cue phrases
In discourse aggregation, linguistic devices are used to signal rhetorical relations explicitly
[KS98]. These devices are called cue phrases, or discourse connectives. Power et al. [PSB03]
discern three different types of discourse connectives: subordinating conjunctions, coordinating
conjunctions, and conjunctive adverbs. Subordinating conjunctions (like “although”, “because”)
connect a nucleus and satellite that must be expressed within the same sentence. The
conjunction can be located either in the first or in the second clause. Coordinating conjunctions
(“and”, “or”, “but”) connect two nuclei either occurring in the same sentence or in different
sentences. The conjunction always occurs in the second span. In Dutch there are five
coordinating conjunctions [KS04]: “en” (“and”), “maar” (“but”), “want” (“for”), “dus” (“so”), and “of”
(“or”). Conjunctive adverbs (“however”, “moreover”) always connect text spans occurring in
different sentences. The adverb is located in the second sentence.
In this research only cue phrases that indicate a relation between two sentences are relevant,
because they are used to determine whether a rhetorical relation exists between these
sentences. Therefore, only coordinating conjunctions connecting text spans expressed in
different sentences, and conjunctive adverbs are relevant for this research. These cue phrases
always occur in the second of the two related sentences.
Figure 14 Knott and Sanders’ Dutch cue phrase taxonomy
Knott and Sanders [KS98] constructed a cue phrase taxonomy expressing relationships
between different cue phrases, see Figure 14. This taxonomy doesn’t describe the total set of
Dutch cue phrases, however, because it was primarily used for “a first theory-driven systematic
74
Question answering for general practitioners
and cross-linguistic cue phrase study” in which the use of Dutch cue phrases was compared to
that of English cue phrases. Besides, it also includes subordinating conjunctions like “omdat”
(“because”), which are not relevant for this research.
To be able to recognize the relevant cue phrases, a list of cue phrases that signal a rhetorical
relation between two sentences has been constructed. Therefore, a small Dutch textcorpus
(about 10,000 words) containing paragraphs from the IMIX document collection and NHG
patient education documents [NHGp], has been analyzed. Firstly, all conjunctions and adverbs
that signal a rhetorical relation between the sentence they occur in and the previous sentence
were marked manually. Then, it was investigated how these phrases could be recognized as
connecting two sentences, because coordinating conjunctions could also connect text spans
within a sentence, and because in Dutch some conjunctions and conjunctive adverbs are
ambiguous. For example, the additive cue phrase “ook” identified by Knott and Sanders is a
conjunctive adverb (meaning “also”) in:
Jantje houdt van zwemmen. Ook Pietje zwemt graag. (John likes swimming.
Peter also likes to swim.)
but it is an elliptic device (meaning “too”) connecting two phrases within a sentence in:
Jantje houdt van zwemmen en Pietje ook. (John likes swimming and Peter
too.)
Therefore, all occurrences of the marked words in the text were examined manually to identify
under which circumstances they could be determined to be a cue phrase connecting two
sentences. For this analysis, dependency trees were used.
Figure 15 Example dependency tree
Dependency trees make explicit the dependency relations between constituents in a sentence
[BNM01]. Each non-terminal node in a dependency tree is connected with a head-daughter and
one or more non-head daughters, whose dependency relations to the head are specified in a
relation tag. For example, in Figure 15 the dependency tree of the sentence “Jantje houdt van
zwemmen en Pietje ook.” is shown. The “top” node of this tree is connected with three leaf
nodes (belonging to the words “en Pietje ook”) and a non-terminal node “main”. The head-
75
Mieke van Langen
daughter of the “main” node is the verb “houdt”. Besides, the main node has two non-head
daughters: one is the subject of the head (“Jantje”), the other is a prepositional complement.
This prepositional complement in turn has a head-daughter (the preposition “van”) and a nonhead daughter (the verb “zwemmen”), which is the object of the head.
One of the question answering modules of the IMIX demonstrator (“qadr.qa”) provides
dependency structures of the question, the answer sentences, and the sentences in the context
of the answers. These dependency structures are generated by the Alpino parser [BNM01]. In
the next version of the IMIX demonstrator this parser will be available for all IMIX modules,
enabling the output generation module to procure dependency structures also for the answers
returned by the other question answering module (“rolaquad.qa”). Next to the relation to the
head, the Alpino parser provides a POS-tag, the root, and the original word for each leaf node in
the dependency tree.
With the analysis described above, three coordinating conjunctions were identfied: “en”, “maar”,
and “of”. These conjunctions were found to only signal a relation between two sentences, when
they are the first word of the second sentence. The other two Dutch coordinating conjunctions,
“dus” and “want”, didn’t occur as the first word of a sentence in this corpus. However, because
all coordinating conjunctions are used in the same way, these conjunctions would also be
proper cue phrases when they are the first word of a sentence.
The coordinating conjunctions “of” and “dus” could also be used in another role, however
[KS04]. “Of” could also be a subordinating conjunction (“whether”). In that case it would
definitely not be a cue phrase. When a sentence starts with a coordinating conjunction, this
conjunction is attached directly to the top node of the corresponding dependency tree. When a
sentence starts with a subordinating conjunction, however, this conjunction has a parent node
labeled with the POS-tag “cp”. Therefore, when a sentence starts with “of” it should firstly be
checked whether this phrase’s parent node is the top node, before marking it as a cue phrase.
The coordinating conjunction “dus” could also be an adverb, but in that case it could still be a
cue phrase (especially when it is the first word of a sentence). Therefore, no extra checking is
needed for this phrase. The Dutch coordinating conjunctions are presented in Table 11,
together with the conditions they should satisfy to be determined a cue phrase. Some of these
conjunctions (“want” and “maar”) are also Dutch nouns. However, in that case they will probably
not be the first word of a sentence. Therefore, the POS-tags of these words do not need to be
checked.
Coordinating conjunction
Constraints
en
maar
of
Only when it is the first word of the sentence.
Only when it is the first word of the sentence.
Only when it is the first word of the sentence
and its parent node is the top node.
Only when it is the first word of the sentence.
Only when it is the first word of the sentence.
dus
want
Table 11 Coordinating conjunctions for Dutch
Next to these coordinating conjunctions, several different conjunctive adverbs were identified.
Although according to Power et al. [PSB03] conjunctive adverbs always connect text spans
occurring in different sentences, in the analyzed corpus the adverbs are also sometimes used to
connect clauses within a sentence. Besides, they are also used as other types of adverbs, or
even other parts of speech.
Based on the analysis of the conjunctive adverbs, some regularities were discovered
76
Question answering for general practitioners
. Firstly, the POS-tag of the adverb should of course be “adv”. Unfortunately, some of the
conjunctive adverbs are labeled as adjectives or prepositional phrases by the Alpino parser.
Those should still be considered cue phrases, however.
Secondly, it was found that a conjunctive adverb probably doesn’t connect the sentence with
the previous sentence in the following cases:
when it is embraced by brackets;
when the relation of one of its parent nodes in the dependency tree is modifier (labeled by
the Alpino parser with the relation tag “mod”);
when the sentence starts with a subordinated clause (when the first word of the sentence
has a parent node labeled with the relation tag “mod” and the POS-tag “cp”);
when it is positioned in a clause following a semi-colon or colon;
when it is positioned after a conjunction or pronoun (labeled by the Alpino parser with the
POS-tags “vg” or “pron”) whose direct parent node is also a parent node of the adverb,
unless this conjunction or pronoun is the first word of the sentence;
when it is positioned in the second part of a dependency tree whose top node has only two
daughters, both labeled with the relation tag “dp”.
The first rule concerns the use of brackets. When a cue phrase is positioned in a bracketed
phrase, like “bijvoorbeeld” (“like”) in:
Over de bijdrage van persoonsgebonden risicofactoren (bijvoorbeeld
lichaamsbouw, het omgaan met stress) aan de kans op het krijgen van RSI
is nog vrijwel niets bekend. (Little is known of the influence of personal risk
factors (like body structure, or dealing with stress) on the chance of getting
RSI.)
it probably indicates the relation of the bracketed part to the rest of the sentence. Therefore
bracketed parts should be discarded.
The second rule concerns modifiers. All cue phrases are normally modifiers themselves.
However, when a cue phrase is located inside a larger modifier, it probably does not relate to
the previous sentence but to the head of the modifier it is part of. For example, in:
Met een normale bloedsuiker wordt de kans op bijvoorbeeld hart- en
vaatziekten kleiner. (A normal blood suger reduces the chance on for
example heart and vascular diseases.)
the phrase “op bijvoorbeeld hart- en vaatziekten” (“on for example heart and vascular diseases”)
is a modifier of the noun “kans” (“chance”). In this case the cue phrase “bijvoorbeeld” (“for
example”) clearly does not signal a relation with the previous sentence. Therefore, cue phrases
that have a parent node labeled “mod” are discarded.
The third rule concerns subordinated clauses. When a sentence starts with a subordinated
clause, like:
Hoewel daarbij vrijwel continu bewogen en dus afgewisseld wordt, leidt
typen er toch toe dat steeds dezelfde spiergroepen gespannen zijn.
(Although this involves almost continuous movement and thus variation,
typing still causes the same muscle groups to be constantly tense.)
a cue phrase positioned after this subordinated clause, like “toch” (“still”), probably relates the
second clause with the subordinated clause. A cue phrase that is positioned within a
subordinated clause could signal a relation with the previous sentence as well as with the next
clause. Therefore, it was decided that sentences starting with a subordinating clause had better
not be searched for cue phrases.
The fourth rule concerns colons and semicolons. When a cue phrase is positioned after a colon
or semicolon, it probably relates to the part preceding the colon or semicolon. Thus when a
77
Mieke van Langen
sentence contains one of these punctuation marks, only the part preceding it will be searched
for cue phrases.
The fifth rule concerns conjunctions and pronouns. When a cue phrase is positioned after a
conjunction or pronoun, it probably doesn’t refer to the previous sentence, but to something
within the sentence preceding the conjunction or pronoun. However, this is only the case when
the cue phrase is contained within the scope of the conjunction or pronoun. This means that the
direct parent node of the conjunction or pronoun in the dependency tree is also a parent of the
cue phrase. For example, in:
Hoewel daarbij vrijwel continu bewogen en dus afgewisseld wordt, leidt
typen er toch toe dat steeds dezelfde spiergroepen gespannen zijn.
(Although this involves almost continuous movement and thus variation,
typing still causes the same muscle groups to be constantly tense.)
the cue phrase “dus” (“thus”) occurs after and within the scope of the conjunction “en” (“and”). It
doesn’t signal a relation with the previous sentence. On the contrary, in:
Het risico op hart- en vaatziekten wordt echter niet alleen door de bloeddruk
bepaald. (The risk of heart and vascular diseases is not only determined by
blood pressure, however.)
the cue phrase “echter” (“however”) occurs after but outside the scope of the conjunction “en”
(“and”). This cue phrase does signal a relation with the previous sentence. An exception to this
rule occurs when the conjunction or pronoun is the first word of the sentence. In that case cue
phrases occurring within the scope of the conjunction or cue phrase probably still refer to the
previous sentence, like “toch” (“yet”) in:
En toch ontstaan vaak klachten, zelfs veel meer dan met die zware
typmachines van vroeger. (And yet a lot of complaints arise, even more than
with those ancient typing machines.)
Finally, the sixth rule concerns sentences consisting of two parts that might as well have been
two different sentences. Cue phrases located in the second part of such sentences probably
signal a relation with the first part instead of with the previous sentence. Therefore those second
parts should also be discarded.
The above mentioned rules are not universal. For example, cue phrases positioned after a
colon might still connect this sentence with the previous sentence. However, it is far more likely
that it relates the second part of the sentence to its first part.
For some of the conjunctive adverbs additional constraints could be specified. In Table 12 all
conjunctive adverbs are listed, along with their constraints. Of course, the corpus used to
extract these cue phrases is relatively small. Probably, in a larger corpus more cue phrases and
more precise constraints could be detected, but this list provides a good starting point for the
algorithm used in this research.
Specific constraints were identified for “bijvoorbeeld” (“like”, “for example”) and “wel” (“still”,
“however”). “Bijvoorbeeld” is frequently used in appositive constructions. Like in
Deskundigen (bijvoorbeeld artsen) stimuleren het gebruik van
pauzesoftware. (Experts (like physicians) stimulate the use of break
programs.)
or:
Deskundigen, bijvoorbeeld artsen, stimuleren het gebruik van
pauzesoftware. (Experts, like physicians, stimulate the use of break
programs.)
78
Question answering for general practitioners
In that case, it does not connect two sentences, but is just a modifier of the phrase it is in
apposition with. Frequently, this use of “bijvoorbeeld” can be recognized because one of the
general constraints (like the use of brackets, or a parent with relation “modifier”) is met.
However, this is not always the case. A relatively easy way to recognize those occurrences not
embraced by brackets or contained in a modifier, is by looking at commas: when “bijvoorbeeld”
is directly preceded by a comma, it is probably part of an appositive. Only when it is also directly
followed by a comma, like in:
Deskundigen, bijvoorbeeld, stimuleren het gebruik van pauzesoftware.
(Experts, for example, stimulate the use of break programs.)
it is surely not part of an appositive and, if no other constraints are harmed, it might well be a
conjunctive adverb connecting two sentences.
Conjunctive adverb Constraints
bijvoorbeeld
bovendien
daarnaast
daarom
dan ook
dus
echter
evenzeer
immers
namelijk
ook
tenslotte
tevens
toch
verder
vervolgens
wel
Not when it occurs directly after a comma, unless it is also directly
followed by a comma.
This phrase is labeled with the POS-tag “pp” by the Alpino parser.
This phrase is labeled with the POS-tag “pp” by the Alpino parser.
This cue phrase poses the same conditions as its constituent adverbs
“dan” and “ook”. It will therefore not be discerned as a separate cue
phrase, though its use is different from that of “dan” and “ook”.
This phrase is labeled with the POS-tag “adj” by the Alpino parser.
Only when it is the first word of the sentence.
Table 12 Conjunctive adverbs for Dutch
The adverb “wel” has a lot of different senses (“well”, “indeed”, “rather”), only some of which are
conjunctive adverbs. The only occurrences of “wel” in the text corpus that were considered a
conjunctive adverb connecting two sentences, were positioned at the start of a sentence. And
when “wel” was the first word of the sentence, it always connected two sentences. Therefore, it
was decided that “wel” should only be considered a cue phrase when it is the first word of the
sentence.
5.4.2 Anaphoric referring expressions
There are a lot of different types of referring expressions. Anaphoric referring expressions refer
to an entity previously mentioned to the reader or hearer [HAA02b]. The most common are
pronouns and demonstratives [KS04]. They usually refer to an entity mentioned at most two
79
Mieke van Langen
sentences ago [JM00]. Pronouns could for example be “he”, “she”, “it”, “they” or possessive
pronouns like “his”, “her”, “its”, “their”. Demonstratives, or demonstrative pronouns, are “this”,
“that”, “these”, and “those”. In Dutch, there are also four demonstratives: “dit”, “dat”, “deze”, and
“die”. Besides, in Dutch there is a possessive demonstrative “diens” [KS04]. These
demonstratives can be used as a noun phrase or adjectively. When they are used in the place
of a noun phrase combined with a preposition, their form is modified [HOU00]. For example,
“met dit” (“with this”) becomes “hiermee”, and “dit … mee” like in:
# Dit kun je mee zwemmen. (You can swim with this.)
becomes “hier … mee”:
Hier kun je mee zwemmen. (You can swim with this.)
(The # in front of the first sentence indicates that this sentence is not grammatical.) In this way,
under influence of a preposition, the demonstratives “dit” and “deze” are transformed into “hier”,
and “dat” and “die” are transformed into “daar”. When there aren’t any words between “hier” or
“daar” and the preposition, they are integrated into one word (“hierdoor”, “hierop”, daarvan”,
etc.). In the same way, the pronoun “het” (“it”) is transformed into “er” under influence of a
preposition.
Next to demonstrative pronouns, there are also demonstrative adverbs [KS04]. These refer to a
place, time, or manner. In Dutch, there are two demonstrative adverbs referring to place: “hier”
(“here”) and “daar” (“there”); two demonstrative adverbs referring to time: “toen” and “dan”
(“then”); and one demonstrative adverb referring to manner: “zo” (“in this way”). These adverbs
were not categorized as conjunctive adverbs in the previous subsection, because they don’t
signal a rhetorical relation between two sentences, but refer to something mentioned previously.
Another type of anaphoric referring expression is a definite noun phrase [JM00]. A definite noun
phrase consists of a definite determiner (“the”) and a noun phrase mentioned previously or
paraphrasing something mentioned previously. However, definite noun phrases are also used
non-anaphorically to refer to an entity that is contained in the hearer’s set of beliefs about the
world, or an entity of which the uniqueness is implied by the description itself. A definite noun
phrase thus not always refers to an entity introduced in the previous sentence. It is therefore not
very suitable as a linguistic marker of a coherence relation between two sentences.
To construct a list of anaphoric referring expressions referring to an entity, place, time, or
manner mentioned in the previous sentence, the Dutch corpus that was also used for extracting
cue phrases has again been analyzed. Firstly, all pronouns and demonstratives that refer to
something introduced or referred to in the previous sentence were marked manually. Besides,
also adjectives referring explicitly to something mentioned in the previous sentence, like
“andere” (“other”) were marked. Then, it was investigated how these expressions could be
recognized as referring to the previous sentence, because anaphoric expressions could also
refer to an entity in the same sentence, or for example to the entire text (like in “This document
is about RSI.”). Besides, in Dutch some pronouns and demonstratives are ambiguous. For
example, the Dutch word “het” can be a pronoun (“it”), but it could also be a determiner (“the”),
in which case it is not a referring expression. Therefore, in the same way as with cue phrases,
all occurrences of the marked words in the text were examined manually to identify under which
circumstances they could be determined to be anaphoric expressions referring to an entity in
the previous sentence.
Based on the analysis, for anaphoric expressions also some general regularities were
discovered, analogous to but slightly different from those for cue phrases. It was found that a
referring expression probably doesn’t refer to an entity in the previous sentence in the following
cases:
when it is embraced by brackets;
when it is positioned in a clause following a subordinated clause (labeled with the relation
tag “mod” and the POS-tag “cp”);
80
Question answering for general practitioners
when it is positioned in a clause following a semi-colon;
when the sentence contains a colon;
when it is positioned after a conjunction or pronoun (labeled with the POS-tags “vg” or
“pron”) whose direct parent node is also a parent node of the referring expression, unless
this conjunction or pronoun is the first word of the sentence;
when it is positioned in the second part of a dependency tree whose top node has only two
daughters, both labeled with the relation tag “dp”.
These regularities are also not universal. For example, anaphoric expressions positioned in the
second part of a sentence might still refer to something in the previous sentence. However, it is
far more likely that it refers to something mentioned in the first part of the sentence. Even so, an
anaphoric expression that is the first word of the sentence not necessarily refers to something in
the previous sentence. It might also refer to another sentence or another text level.
There are three differences between the regularities for anaphoric expressions, and the ones for
cue phrases. Firstly, whereas cue phrases probably do not refer to the previous sentence when
the relation of one of its parent nodes in the dependency tree is modifier, this restriction doesn’t
hold for anaphoric expressions. Modifiers have some relation to the phrase they modify: the
head. Therefore, when a cue phrase is part of a modifier, it probably expresses the rhetoric
relation this modifier has to the head. However, when an anaphoric expression is located within
a modifier, it could still refer to something in the previous sentence, especially when the
sentence starts with this modifier, like in:
In dat stadium is het noodzakelijk ook arbeidsgebonden psychosociale en
persoonsgebonden aspecten in beschouwing te nemen. (In that stage it is
necessaru to also take into account psychosocial and personal aspects.)
In this sentence the phrase “in dat stadium” (“in that stage”) is a modifier, but the anaphoric
expression “dat” (“that”) definitely refers to something mentioned in a previous sentence.
The second difference has to do with subordinated clauses. When a sentence starts with a
subordinated clause, like:
Hoewel daarbij vrijwel continu bewogen en dus afgewisseld wordt, leidt
typen er toch toe dat steeds dezelfde spiergroepen gespannen zijn.
(Although this involves almost continuous movement and thus variation,
typing still causes the same muscle groups to be constantly tense.)
cue phrases or anaphoric expressions located after the subordinated clause, like “toch” (“still”),
usually refer to (something within) the preceding subordinated clause. Cue phrases located
within this clause itself, might refer to the previous sentence, but they might also express the
relation with the rest of the sentence. Therefore, cue phrases located within a sentence starting
with a subordinated clause are never recognized as connecting two sentences. However,
referring expressions within a subordinated clause, like “daarbij” (“with this”) in the example
sentence, were more often used anaphorically (referring backwards) than cataphorically
(referring forwards) in the examined text corpus. Therefore, anaphoric expressions are only
discarded when they are located after a subordinated clause.
Thirdly, when a sentence contains a colon, cue phrases are only discarded when they are
located after the colon, because in that case they probably refer to the part preceding the colon.
Referring expressions preceding a colon however, are frequently used cataphorically instead of
anaphorically, like “andere” (“other”) in:
Pauzesoftware mag niet de aandacht afleiden van risicofactoren van andere
aard: werkplek en werkorganisatie. (Break programs should not distract from
other types of risk factors: workspace and work organization.)
Therefore, when a sentence contains a colon, all referring expressions are discarded.
81
Mieke van Langen
Referring
expression
POS-tag
ander
adj
andere
adj
daar
noun / adv
daarbij
pp
daarmee
pp
daarover
pp
daarvan
pp
dan
adv
dat
det
datzelfde
det
dergelijke
adj
deze
det
die
det
dit
det
er
noun
even
adj
genoemde
adj
hetzelfde
det
hier
noun / adv
hierbij
pp
hierdoor
pp
hiermee
pp
hieruit
pp
hiervan
pp
laatstgenoemde
adj
zo
adv
zo'n
det
Constraints
Not in “onder andere” or “geen andere”, or in “ene-andere” or
“sommige-andere” constructions. Not after a “van-naar”
construction.
Not in “if-then” constructions, recognizable by the word “als”
(“if”) occurring previously in the sentence or by a parent node
with the relation tag “nucl” and the POS-tag “smain”.
When it is the first word of the sentence.
Not in indications of the current period like “deze week”, “deze
maand”, “deze eeuw”.
Not in indications of the current period like “dit weekend”, “dit
jaar”.
Only when it is followed by a particle (labeled “part”) or
preposition (labeled “prep”) that has the same direct parent.
When it is labeled with the relation tag “mod”, and directly
followed by an adjective labeled “hd”. Not in “as-as”
constructions, recognizable by the word “als” (“as”) occurring
later in the sentence.
When it is the first word of the sentence.
Only when it is the first word of the sentence. Not in “as-as” or
“as-as possible” constructions, recognizable by the words “als”
(“as”) or “mogelijk” (“as possible”) occurring later in the
sentence.
Table 13 Anaphoric referring expressions for Dutch
82
Question answering for general practitioners
For some of the anaphoric expressions additional constraints were specified, because these
expressions could also be used non-anaphorically. For example, the demonstrative adverb “zo”
(“in this way”) is also used in comparisons in the sense of “as”, like in:
Eet daarom zo min mogelijk. (Therefore, eat as few as possible.)
In Table 13 all referring expressions are listed, along with their constraints and the POS-tags
they should have to be considered an anaphoric expression.
One of the anaphoric expressions found in the corpus is not mentioned in this list: “het”. As
explained above, “het” could be a determiner (“the”) or a pronoun (“it”). Only when it is a
pronoun, it could be a referring expression. However, in that case it is still very difficult to
determine whether it really is a referring expression. For example, “het” is an anaphoric
expression, probably referring to some disease mentioned in the previous sentence, in:
In sommige families komt het meer voor dan in andere. (In some families it
is more prevalent than in others.)
but it is not a referring expression in:
In sommige gebieden regent het vaker dan in andere. (In some areas it is
raining more often than in others.)
Both sentences have very similar dependency trees. More complex linguistic knowledge would
thus be needed to detect the difference between both occurrences of “het”. Paice and Husk
[PH87] investigated the different uses of “it” in English text. They identified seven different types
of what they call structural “it” (as opposed to referential “it”). These seven types could also be
identified for the Dutch word “het” (next to the use of “het” as a determiner). For each type
different rules would be needed to recognize it. Most of them could be reliably detected using
limited word lists and dependency trees, but for some of them, like expressions of time and
ambience like “it is twelve o’clock” or “it is raining”, more complex lexical or semantic knowledge
would be needed. Because of the complexity of differentiating between structural and referential
uses of “het”, it was considered better to leave “het” out of the list of anaphoric expressions for
this research.
Pronouns other than “het” (“it”) were not found, except for one occurrence of “ze” (“they”), but
this was also too difficult to distinguish from other occurrences not referring to an entity in the
previous sentence. Probably this corpus doesn’t contain much pronouns like “hij” (“he”) or “zij”
(“she”) referring to an entity in the previous sentence, because the texts all deal with medical
information. Other text categories, like for example newspaper articles, are expected to contain
much more pronouns, because they more often deal with people instead of diseases.
All demonstrative pronouns were found in the corpus. Besides, a lot of occurrences of “hier” and
“daar” in combination with a proposition were retrieved. They all satisfied the same constraints.
Therefore, the list could be extended with all possible combinations of “hier” and “daar” with
propositions. In Table 14 all combinations found in the Dutch dictionary [STE94] are presented.
These referring expressions could have the POS-tags “pp” or “adv”.
Again, the corpus used to extract all these referring expressions is relatively small. Probably, in
a larger corpus more expressions and more precise constraints could be detected, but the lists
presented in Table 13 and Table 14 provide a good starting point for the algorithm used in this
research.
83
Mieke van Langen
Hier
Daar
hieraan
hierachter
hierbij
hierbinnen
hierboven
hierbuiten
hierdoor
hierheen
hierin
hierlangs
hiermede
hiermee
hierna
hiernaast
hierom
hieromheen
hieromtrent
hieronder
hierop
hierover
hiertegen
hiertegenover
hiertoe
hiertussen
hieruit
hiervan
hiervandaan
hiervoor
daaraan
daarachter
daarbeneden
daarbij
daarbinnen
daarboven
daarbuiten
daardoor
daardoorheen
daarheen
daarin
daarlangs
daarmede
daarmee
daarna
daarnaar
daarnaast
daarom
daaromheen
daaromtrent
daaronder
daarop
daaropvolgend
daarover
daaroverheen
daartegen
daartegenover
daartoe
daartussen
daaruit
daarvan
daarvandaan
daarvoor
Table 14 Prepositional anaphoric expressions for Dutch
5.4.3 Document structure
Next to cue phrases and anaphoric expressions, an answer may contain other signs of
relatedness to another sentence or graphical element, like punctuation marks and captions.
These signs deal with document structure. According to Power et al. [PSB03] document
structure describes the organization of a document into graphical constituents like sections,
paragraphs, sentences, bulleted lists, tables, and figures. Besides, document structure covers
some features within sentences like quotation. Some of these graphical constituents are made
explicit in mark-up languages, such as HTML.
Chapters, sections, paragraphs, sentences, clauses, and phrases are all levels of document
structure [PSB03] (in descending order of abstractness). A response should contain at least one
sentence, and at most an entire paragraph (for concerns of conciseness). A sentence starts
with a capital letter and ends in a full stop (a dot, question mark, or exclamation mark). When
the answer returned by the question answering module is a sentence, it could suffice as a
response. However, if it ends with a question mark, it is probably a question. In that case, the
next sentence is expected to be the answer on this question, and should thus also be included
in the response. However, if the question is a link in an HTML-document (enclosed by the tags
<a> and </a>), it would be better to include the linked document in the response.
When the answer returned by the question answering module is not a sentence, it could for
example be a clause, a caption, or a heading. In the case of a clause, ending for example in a
semicolon or colon, the next clause should also be included to make the sentence complete.
When the answer ends with a colon, it might also be followed by an image, table, or list instead
of a second clause. In that case, this image, table, or list should be included in the response. If
the answer doesn’t finish with a punctuation mark at all, the sentence is probably not finished.
This may be because it crosses a page break in the original document. In that case, the rest of
the sentence should be retrieved from the original document. When the answer is a caption
belonging to a figure or table, the corresponding figure or table should also be retrieved from the
original document and included in the response. Captions can be easily recognized, because
they start with “Table” or “Figure”. When the answer is a heading (recognizable by a heading
number, or the heading tags in an HTML-document) at least the first sentence of the headed
section should also be included in the response.
84
Question answering for general practitioners
A paragraph begins on a new line. Because the response should contain at most an entire
paragraph, it should not cross a paragraph boundary in the original document. In HTMLdocuments paragraphs are separated by the tags <br> or <p>. In other documents, paragraph
boundaries may be harder to detect. For example, in PDF-documents, all lines start with a new
line to preserve the lay-out. New lines thus not always indicate a paragraph boundary.
Lists and quotations are examples of indented structures [PSB03]. Indented structures are of a
certain level of document structure and are contained by an element possibly being of another
level. For example, the elements of a list may be paragraphs, while the list itself is contained by
a sentence. When the answer returned by the question answering module is part or contains
part of an indented structure, the entire structure should be included in the response. The
sentence containing or preceding the indented structure should also be included in the
response, because it indicates the context of the structure. Vertical lists can be recognized by
bullets (in the case of a bulleted list) or numbers (in the case of an enumerated list). In an
HTML-document lists are enclosed by the tags <ul>, <ol>, or <dl>, and each element is
preceded by the tag <li>. Quotations or other types of comments can be recognized by the
enclosing single or double quotation marks, or brackets.
Horizontal lists or enumerations are not indented structures, because they are described in plain
text. In this case the elements of the list are simply phrases within a clause, sentences within a
paragraph, or paragraphs within a section, etc. When the list is not contained within one
sentence, it might be recognized by cue phrases like “firstly … secondly … finally”. However,
horizontal lists may also be indicated with more subtle cue phrases or be implicit in the text.
They will therefore be ignored in this research.
Finally, answers can be captions belonging to a table or figure, but they can also refer to a table
or figure. Normally this is done by explicitly mentioning the word “table” or “figure” followed by a
number. In that case the referred table or figure should also be included in the response.
5.4.4 Answer extension algorithm
In the previous sections three types of linguistic markers that signal a strong relation between
two sentences have been identified: cue phrases, anaphoric expressions, and document
structure. In this section an algorithm is described that can be used to extend an answer
sentence based on these linguistic markers. The input for the algorithm is a QA document
generated by the question answering module “qadr.qa”. This document contains up to five
different answers that have already been annotated with their dependency structures. First of all
these answers are grouped per source. Then each group of answers originating from the same
source is extended.
To generate a coherent response by extending a group of answers, the answer sentences are
firstly ordered according to the order of the original document. Then the first answer is
extended. When this answer contains any cue phrases or anaphoric expressions that satisfy the
relevant constraints, the previous sentence is included in the response. When this previous
sentence also contains a cue phrase or anaphoric expression, its predecessor is also included.
This procedure is repeated until the latest added sentence doesn’t contain any cue phrases or
anaphoric expressions, or when a paragraph boundary is reached. Then, the first sentence
following the answer could be considered. If it contains any cue phrases or anaphoric
expressions, it could also be included in the response. However, to prevent the response from
getting too long, this procedure is only repeated if the answer hasn’t already been extended with
three or more sentences. Thus, when a sentence included in the response contains any cue
phrases or anaphoric expressions, its previous sentence is always included, because otherwise
the response could not fully be interpreted. But sentences following the response containing a
cue phrase and/or anaphoric expression are only included if the response would not grow too
large.
85
Mieke van Langen
When there are other answers retrieved from the same document, it is firstly checked whether
they have already been included in the response. If they haven’t, the same procedure is used to
extend these answers.
Finally, the complete response is checked for any linguistic markers of document structure. For
example, when the last sentence ends with a colon or semicolon, the next sentence or other
graphical constituent is also included, and if there are any unfinished quotations, sentences are
added until they are finished (if they can’t be finished, they are completely omitted). Possibly
also figures or tables are included in the response.
For example, on the third question in Appendix D:
Welke spieren zijn betrokken bij RSI? (Which muscles are affected by RSI?)
four different answers were retrieved from the same document (in the order of occurrence in the
original document):
1. Gezien de neiging van RSI zich uit te breiden, blijkt dat na verloop van tijd vaak veel spieren
betrokken zijn bij het proces. (In view of the tendency of RSI to spread, after some time
often a lot of muscles seem to be affected by the process.)
2. Toch zijn er wel een aantal spieren die opvallend vaak zijn aangedaan bij RSI. (However,
there are a number of muscles that are affected by RSI notably often.)
3. Uit onderzoek is bekend dat deze het vaakst betrokken is bij RSI. (It is known from research
that this one is affected by RSI most often.)
4. Ademhalingsspieren Andere spieren die betrokken kunnen zijn bij RSI zijn de scaleni.
(Respiratory muscles Other muscles that could be affected by RSI are the scaleni.)
The first answer doesn’t contain any linguistic markers of a relation with the previous sentence.
However, the sentence following the answer
Het aantal spieren dat bij verschillende RSI patiënten mee kan doen, is dan
ook groot.
contains a cue phrase (“dan ook”). The subsequent sentence
Een beschrijving daarvan zou haast neerkomen op het dupliceren van een
anatomische atlas.
contains the anaphoric expression “daarvan”, and the following sentence
Toch zijn er wel een aantal spieren die opvallend vaak zijn aangedaan bij
RSI.
contains the cue phrase “toch”. Actually, this last sentence is the second answer. The second
answer is thus automatically included in the response by extending the first answer.
When all answers have been extended, the following response results (answers are bold, and
linguistic markers have been marked):
Gezien de neiging van RSI zich uit te breiden, blijkt dat na verloop van tijd
vaak veel spieren betrokken zijn bij het proces. Het aantal spieren dat bij
verschillende RSI patiënten mee kan doen, is dan ook groot. Een beschrijving
daarvan zou haast neerkomen op het dupliceren van een anatomische atlas. Toch
zijn er wel een aantal spieren die opvallend vaak zijn aangedaan bij RSI. De
meest beruchte spier is de monnikskapspier (trapezius). Uit onderzoek is bekend
dat deze het vaakst betrokken is bij RSI. Dat is ook niet zo verwonderlijk, want
deze spier zorgt voor het optillen en stabiliseren van de schouders. Zodra de
armen worden opgetild, zoals bij typen en telefoneren, neemt de spanning in deze
spier fors toe.
86
Question answering for general practitioners
Ademhalingsspieren Andere spieren die betrokken kunnen zijn bij RSI zijn de
scaleni. Dit is een groepje spieren die vast zit aan de halswervelkolom en aan de
bovenste ribben. Bij diep ademhalen spannen deze spieren aan, maar bij normale
activiteiten nauwelijks. Vandaar dat deze spieren ook wel hulpademhalingsspieren
genoemde worden, ze heffen de ribben bij speciale omstandigheden zoals niezen,
zuchten en hoesten.
This response consists of two paragraphs, separated in the original document by several pages.
The response is very relevant for the question, but it is quite long. Actually, the original
document contains three pages of information relevant for the question. This may indicate that a
concise response is not possible on this question. On the other hand, GIPS is required to return
only concise responses. In the case of the example presented above, the response could be
restricted to the first paragraph, because that contains three out of four answers. It is not clear,
however, how often a response would grow this large and how relevant they would be. A
decision on how to prevent responses from getting too long has therefore been deferred to the
evaluation stage (described in the next chapter).
5.5 Implementation
The response formulation algorithms described in this chapter are incorporated by the output
generation module of GIPS. A prototype of this module has been implemented in Java version
1.5.0. This prototype consists of a class “GipsGen” which provides a method “generate”. This
method takes a QA document generated by the question answering module “qadr.qa” as input
(other types of input cannot be processed by this prototype). It reads the question sentence, the
answer sentences, their context sentences, and the annotations associated with these
sentences. Then it generates a P-ml document according to the answer extension algorithm
described above. It uses some subclasses to accomplish this. For example, a class “Node” has
been implemented to be able to generate dependency trees. Besides, a class “GipsTest” has
been implemented to provide “GipsGen” with example input and collect the P-ml files it
generates.
The answer integration algorithm described in section 5.3 has not been implemented, because
this would take too much time. Besides, although the sentence fusion algorithm developed at
the University of Tilburg [MK05] aims at integrating answers retrieved from different documents,
it also seems a good solution for integrating autonomous answers retrieved from the same
document. The sentence fusion algorithm would be more flexible than the answer integration
algorithm described in section 5.3. However, the sentence fusion algorithm is not implemented
for question answering yet. Autonomous answers are therefore simply treated as extended
answers (extended with only zero sentences) by this prototype.
A second prototype has been constructed that generates a baseline response. The class
“BaselineGen” used for this purpose is very similar to “GipsGen”. It groups the answers per
source and puts them in the original order, just as is done with the answer extension algorithm.
However, “BaselineGen” doesn’t use the lists of cue phrases and anaphoric expressions. Nor
does it generate any dependency trees. It simply includes the preceding and the successive
sentence for each answer sentence.
5.5.1 Limitations
The GIPS prototype has some limitations compared to the answer extension algorithm. First of
all, because it doesn’t have access to the IMIX document collection and the Alpino parser,
answers could only be extended with sentences provided as context sentences in the QA
document. Ideally, this context consists of the entire paragraph the answer was retrieved from.
However, this is not always the case. Especially sentences retrieved from pdf-documents are
frequently poor. For example, they are not finished because they are interrupted by a page
break, or they contain a page header, like the document title or page number. Besides, it was
also not possible to retrieve referred pictures or tables.
87
Mieke van Langen
Secondly, in the sentences returned by “qadr.qa”, all words and punctuation marks are
separated by white spaces. Therefore, it was very difficult to determine whether a quotation
mark signaled the start or the end of a quotation, and thus to determine whether extra
sentences should be included to finish the quotation or not. Fortunately, quotations don’t occur
frequently in the text category used for this research. Bracketed parts did not have this problem,
because the opening bracket “(“ and finishing bracket “)” could easily be distinguished.
Finally, vertical lists are also not recognized by the GIPS prototype. Therefore, they cannot be
finished when part of it is included in the response.
All these limitations could be solved when the system has access to the original documents
(and preferably also the HTML-tags) of the answers, and to the Alpino parser. This could have
been accomplished during this research, but that would have taken very much extra effort, while
the extra value for this research would have been minimal.
5.5.2 Code
The Java code for the prototype and the baseline, including the example input and output files
used for evaluation of the system, can be retrieved from my website:
http://wwwhome.cs.utwente.nl/~langen/thesis
88
Question answering for general practitioners
6 Evaluation
Based on literature research and interviews with general practitioners, an information
presentation module for a medical QA system (GIPS) has been designed. In this chapter the
design for GIPS in general and the response formulation algorithm developed for GIPS in
particular are evaluated. The evaluation of the entire design for GIPS is described in section
6.1. The evaluation of the answer extension algorithm used for the response formulation
component is described in section 6.2.
6.1 Evaluation of the entire design
To evaluate the entire design for GIPS, a prototype of the information portal for general
practitioners (including a prototype of the graphical user interface of GIPS) has been
constructed (see section 4.4). These prototypes have not been constructed to test the
functionality of the corresponding systems, but to make the design tangible for users. The
method used to evaluate this design is described in section 6.1.1. In sections 6.1.2 and 6.1.3
the results and conclusions of this evaluation are presented.
6.1.1 Evaluation method
As stated previously, this research did not aim to develop a system that would be equally
appreciated by all Dutch general practitioners. It was expected that general practitioners who
already use the Internet to search for answers on their medical questions are more likely to
appreciate and use a QA system than other general practitioners. Therefore, to evaluate this
research, the design for GIPS and the information portal resulting from this research have been
evaluated only with the two general practitioners who indicated in the previous interviews that
they already used the Internet to search for answers on their medical questions. Again,
qualitative interviews have been used.
The interviews had the same semi-structure as the previous ones. They covered the topics of
information needs, information sources and computer use. A general outline of the interviews is
shown in Appendix E. First of all, the general practitioners were confronted with the prototypes
of the information portal and GIPS. Then they were asked how often they think they would use
such systems and whether they would use them during consultations and patient visits.
Secondly, it was explained to the general practitioners what kind of resources GIPS would use
and they were asked what they think of these resources. Thirdly, the general practitioners were
asked a few questions about whether they think the information portal and GIPS could be an
improvement for their work and what they think of the functionality of the systems.
6.1.2 Evaluation results
In this paragraph, the results of the interviews are discussed with respect to the general
practitioners’ information needs, information sources, and computer use.
Information needs
The general practitioners both thought they would use a system like GIPS to pursue their
information needs if it were entirely functional and working properly. One of them thought she
would use it multiple times a day, the other thought he would use it a few times per week.
Example questions they would have liked to submit to the system are “How often do RSI and
the chronic fatigue syndrome coincide?” or “What kind of exercises could be done when … ?”.
The general practitioners would both use the system during the consultation in order to be able
to print its responses and give them to the patient, just as they do now with NHG patient letters
that are provided by the general practitioner information system they use. Patient letters are not
available for all topics, however. Therefore they would like any additional information GIPS
could provide.
On the use of the information portal and mobile computing, the general practitioners were less
unanimous. The information portal provides access to three existing systems (the NHG
89
Mieke van Langen
guidelines, Artsennet.nl, and PubMed) and three future systems (GIPS, image retrieval, and a
social map). One of the general practitioners was primarily enthusiastic about the ease with
which the three existing systems could be accessed via the information portal. The other one
thought he would rather access these existing systems directly, just as he is doing now. He
really liked the idea of the three future systems, however.
Mobile computing is still science fiction for both general practitioners. One of them would really
like it. She is looking forward to having access to patient data and information systems like
GIPS during patient visits. The other general practitioner hasn’t got any problems with the way
he is working now and wouldn’t consider purchasing a laptop just to have access to his data
during patient visits.
Information sources
Both general practitioners indicated that it is very hard to determine the reliability of electronic
sources. One of them said he once read about an anesthesiologist (whose name he can’t
remember) who made an overview of reliable medical websites. He thinks such an overview
(constructed by a medical professional) could be used as document collection for GIPS. The
other general practitioner said she would trust the information provided by the NHG, but from
patient organizations she would only use practical information. She thinks these sources do not
provide reliable theoretical information.
Computer use
One of the general practitioners really thought the information portal would save her time
compared with the way she now accessed the search engines Artsennet.nl and PubMed. She
even asked whether the prototype of the information portal would remain online.
The general practitioners both appreciated the way the information is presented by GIPS and
the length of the responses. They especially liked the printing function. However, one of the
general practitioners indicated his computer wasn’t connected to a printer yet. He would like to
have a printer on his desktop to be able to print the information during the consultation without
having to leave the room. This will be realized in the near future.
Finally, both general practitioners thought GIPS and the other future systems (image retrieval
and the social map) could be an improvement for their work. Besides, one of the general
practitioners said he appreciated GIPS, because he could enter an entire question instead of
only a few keywords, like in traditional search engines. However, he indicated he found it hard
to identify any negative aspects now, but when all possible questions could be entered in GIPS,
he would firstly like to experiment with it for a while to determine the quality of the responses.
6.1.3 Conclusions
A question answering system that answers questions for patient education would really be
suitable for use by general practitioners, especially for the ones who already use Internet to look
up information on patient care. The design that has been made during this research (GIPS)
seems to be appreciated by general practitioners. However, special attention should be paid to
selecting the right information sources. Besides, the automation level of general practices is
currently not always sufficient for optimal use of such a question answering system. For
example, general practitioners should have access both to the Internet and to a printer in their
consulting rooms.
Next to question answering technology, image retrieval technology for retrieving dermatological
images, and information extraction technology for retrieving address information of medical
professionals and organizations (a social map) would also be very useful for general
practitioners. The information portal designed during this research seems a good way of
providing general practitioners with an overview of the kinds of information they can find on the
Internet.
90
Question answering for general practitioners
6.2 Evaluation of the answer extension algorithm
An answer extension algorithm was developed for GIPS that extends an answer sentence
(returned by a question answering module) with the sentences most related to it. In this section
it is investigated whether these sentences add any relevant information and whether including
them makes the response more coherent. The prototype used for this evaluation was described
in section 5.5. The method used to evaluate the algorithm is described in section 6.2.1. In
sections 6.2.2 and 6.2.3 the results and conclusions of this evaluation are presented.
6.2.1 Evaluation method
To evaluate the answer extension algorithm, responses have been generated for a set of test
questions. This test set includes the ten example RSI questions presented in Appendix D and
ten non RSI-related medical questions randomly selected from an IMIX document with example
questions [IMIXv]. The paragraphs from which the answers on these questions were retrieved
were not part of the text corpus used for the development of the answer extension algorithm. On
manual inspection of the responses on the test questions, the algorithm seems to have correctly
recognized the relevant cue phrases and anaphoric expressions.
The responses generated with the prototype were compared with the corresponding baseline
responses (consisting of the answer sentence extended with the preceding and successive
sentence). Then the paragraphs were selected that satisfied the following criteria:
the paragraph generated by the answer extension algorithm differs from the baseline;
the paragraph at least covers the same topic as the corresponding question (to prevent the
evaluation from being influenced by the correctness of the answers generated by the
question answering module “qadr.qa”);
the paragraph doesn’t refer to any figures, tables, or lists;
if multiple paragraphs from a single response satisfy these criteria, only the first one is
selected.
In total, the twenty responses contained 53 paragraphs. Of these paragraphs 14 were identical
to the baseline. This does not necessarily mean that GIPS extended all these 14 answers with
the preceding and successive sentence, because sometimes the QA document doesn’t provide
any context sentences or for example provides only successive context sentences, in which
cases GIPS as well as the baseline were not able to extend the answers properly. 16 other
paragraphs were considered not dealing with the same topic as the question, and 5 others
contained expressions referring to a figure, table, or list. The remaining 18 paragraphs belonged
to 12 different responses. For each of these 12 responses the first suitable paragraph was
selected. Any unfinished sentences in these paragraphs were finished manually by consulting
the original document instead of the context provided by the QA document.
Naïve users were asked to evaluate these paragraphs. It was not necessary to ask general
practitioners to evaluate the paragraphs, because the responses were not evaluated on medical
correctness, but on linguistic characteristics. Besides, GIPS aims at answering questions for
patient education. Patients should thus be able to understand the responses. Naïve users are
all potential patients, they are thus very good subjects for the evaluation of the responses.
However, because some linguistic feeling is needed to be able to evaluate the responses on the
relevant variables, only higher (non-medically) educated users participated in this evaluation.
Twenty-nine users participated in this evaluation. Though all participants received some higher
education, the group was very heterogeneous, consisting of men and women of different ages
(from 19 to 50 years old) and from different disciplines. There were no medically educated
participants, because the evaluation of the responses should not be influenced by judgments on
the medical correctness of the responses.
The participants were randomly divided into two groups. Each group received a different
questionnaire. The two questionnaires are shown in Appendix F. The first group had to evaluate
the GIPS generated paragraphs for one half of the twelve questions, and the baseline
91
Mieke van Langen
responses for the other questions. For the second group, it was the other way round. The GIPS
generated responses and baseline responses were randomly ordered.
The goal of this evaluation was to investigate whether the sentences included by the answer
extension algorithm add any relevant information and whether including them makes the
response more coherent. Therefore, the participants were asked to indicate for each response
on a five-point scale how useful the response is with respect to the question, how much
irrelevant information with respect to the question is contained in the response, and how
coherent the response is. The participants were told that the usefulness of the response doesn’t
relate to the amount of useful information a response provides, but only to the presence of any
useful information with respect to the question. The amount of irrelevant information was
defined as the proportion of information that is irrelevant with respect to the question. A
response can thus both be very useful and contain a lot of irrelevant information with respect to
the question. Finally, the coherence was defined as the linguistic coherence of the response. If
there are any referring expressions that could not be resolved, like “these things” in:
These things are caused by smoking.
the coherence is said to be low. However, if the response is a fluent text that can be fully
interpreted, the coherence is high. Or, as one of the participants stated it: “would I understand
the response if I had not read the question?”.
This evaluation method is very similar to that used by Bosma for the evaluation of his answer
extension algorithm [BOS05]. He used the same baseline and the same type of questionnaire.
However, instead of the coherence of the responses, he investigated to what extent the
participants were able to verify whether the responses were accurate. This “verifiability” is high
when a participant is able to verify that the response is accurate, as well as when he is able to
verify that the response is not accurate. It is only low when a response doesn’t contain enough
context to determine whether the response concerns the subject of the question or another
subject. This variable was not used for the evaluation of the responses generated by GIPS,
because in a pretest participants thought it very hard to distinguish this variable from the
usefulness of the response. Besides, when they understood what was meant with the
verifiability, they never evaluated it as low. Therefore, it was decided to prevent any further
participants from evaluating this variable. Instead, because the answer extension algorithm
developed for GIPS was intended to improve the coherence of a response, participants were
asked whether they thought the paragraphs formed a coherent text.
6.2.2 Evaluation results
Five-point scales were used to evaluate the responses. These scales ranged from “very low” to
“very high”. The middle score was “neutral”. For each variable (usefulness, irrelevance, and
coherence) the proportion of high scores on the GIPS generated responses was compared to
the proportion of high scores on the baseline responses. Neutral scores were ignored, because
users also used this score when they could not decide whether the score should be high or low.
The proportion of high scores is thus defined as the number of high scores divided by the total
number of high and low scores on a specific variable. Double-sided two-sample t-tests were
used to determine whether the difference between the proportions of high scores on the GIPS
generated responses and on the baseline responses was significantly larger than zero, at a
significance level of 0.05.
Only three of the twelve answers (the first, fourth, and sixth) were extended by the extension
algorithm of GIPS. The other answers were considered autonomous. To investigate whether the
not extended answers were really autonomous and whether the extended answers were
extended better than the baseline answers, the proportions of high scores were also analyzed
separately for the three dependent answers and the nine autonomous answers.
92
Question answering for general practitioners
Usefulness
In Table 15 the proportions of high scores on usefulness are shown for the GIPS generated
responses and the baseline responses. On dependent answers the usefulness of the responses
with respect to the question was equally well evaluated, but on autonomous answers the
baseline seems to perform somewhat better than GIPS. This difference is not significant (at a
significance level of 0.05), however.
There are three individual autonomous answers on which the baseline did score significantly
higher than GIPS. In these cases GIPS apparently failed to include useful sentences that were
included by the baseline. Relationships between these sentences were not signaled by cue
phrases or anaphoric expressions, however. The algorithm thus did work properly. For example,
on the question:
Komt RSI in Nederland vaker voor dan in de rest van Europa? (Is the
prevalence of RSI in the Netherlands higher than in the rest of Europe?)
the answer sentence was:
De geschiedenis leert dat RSI geen modeverschijnsel is. (History teaches us
that RSI is not a trend.)
This sentence is indeed not useful with respect to the question at all. It doesn’t have any cue
phrases or anaphoric expressions either. However, the preceding sentence is very useful with
respect to the question:
Nederland heeft niet meer RSI-klachten dan andere landen in Europa. (The
Netherlands don’t count more RSI complaints than other countries in
Europe.)
The other two questions on which the baseline provided a significantly more useful reponse
concerned answer sentences that were more useful than the one illustrated above. However, in
these cases the preceding and successive sentences provided some additional information that
was evaluated as useful by the participants.
Total
Dependent answers
Autonomous answers
GIPS
Baseline
Significant
difference
0.514
0.919
0.374
0.607
0.914
0.505
no
no
no
Table 15 Proportion of high scores on usefulness
Irrelevance
The scores on the proportion of irrelevant information with respect to the question are shown in
Table 16. The irrelevance of GIPS generated responses was significantly lower than the
irrelevance of the baseline responses, at a significance level of 0.05. This was not the case for
the extended responses, however.
On the individual extended responses also no significant differences were measured concerning
irrelevance. For only one of these responses, on the sixth question:
Wat zijn de verschijnselen van griep? (What are the symptoms of influenza?)
the baseline scored better (i.e. had a lower irrelevance) than GIPS. Possibly this was due to the
sentence length of the GIPS generated response:
93
Mieke van Langen
Griep wordt veroorzaakt door het zogenaamde influenzavirus, met
verschijnselen van koorts, neusverkoudheid, hoesten, hoofdpijn,
spierpijn en vermoeidheid. Omdat het virus erg besmettelijk is kan
iedereen griep krijgen en zal je meestal ruim een week het bed moeten
houden. Gezonde mensen knappen daarna weer op door rust en veel
drinken, maar bij mensen met een chronische aandoening, patiënten met
een verminderde weerstand, bewoners van verpleeg-verzorgingshuizen en
ouderen boven de 65 jaar kan de ziekte ernstig verlopen. Zij worden dan ook
jaarlijks door hun huisarts gevaccineerd tegen griep, de zogenaamde
influenzavaccinatie, die voor 70-80% bescherming biedt tegen het krijgen
van griep (influenza).
The first two sentences of this response (in bold) were both answer sentences, retrieved from
the same paragraph. The third and the fourth sentences were added, because they contained
an anaphoric expression and a cue phrase respectively. The response thus counts only four
sentences, but these sentences have a mean length of 25 words. As a comparison: another
extended response counted seven sentences, but received a lower irrelevance score. Its mean
sentence length was 15 words. The baseline for the response presented above didn’t include
the last sentence (counting 25 words). Instead the sentence preceding the first answer
sentence was included. This sentence counted only 14 words.
Total
Dependent answers
Autonomous answers
GIPS
Baseline
Significant
difference
0.530
0.697
0.483
0.699
0.686
0.704
-0.169
no
-0.221
Table 16 Proportion of high scores on irrelevance
Coherence
The results on coherence are presented in Table 17. GIPS scores significantly higher on
coherence than the baseline, at a significance level of 0.05. Again, this is not true for the
extended responses, however.
On the individual extended responses no significant differences were measured concerning
coherence either. But, probably for the same reason as hypothesized before, the GIPS
generated response on the sixth question (the one presented above) scored lower than the
baseline response. Another GIPS generated response also scored lower, but this difference
was even less significant. The third extended response scored almost significantly better than
the baseline response.
Total
Dependent answers
Autonomous answers
GIPS
Baseline
Significant
difference
0.807
0.795
0.812
0.647
0.711
0.622
0.160
no
0.189
Table 17 Proportion of high scores on coherence
94
Question answering for general practitioners
6.2.3 Conclusions
Concerning the usefulness of the responses with respect to the question no great differences
were found between GIPS and the baseline. There was only one question for which GIPS
apparently failed to include the most useful sentence where the baseline did. Actually, this
failure could also be attributed to the question answering module of the IMIX demonstrator,
which should have marked this sentence as the answer sentence instead of the successive
one.
Based on the evaluation of the proportion of the response that is irrelevant to the question, it
might be concluded that GIPS succeeds in filtering a lot of irrelevant information that was
included by the baseline. However, answers that are extended by GIPS should not grow too
large. The algorithm used by GIPS only restricted the number of sentences included, but maybe
it should also take into account the number of words.
Finally, on coherence the GIPS generated responses scored significantly higher than the
baseline responses. It is suggested that restricting the number of words of extended responses
might increase the coherence even more.
Especially answers that were not extended by GIPS (autonomous answers) contained
significantly less irrelevant information and were judged significantly more coherent than the
baseline responses. It might thus be concluded that these answers should indeed not be
extended. For dependent answers (which were extended) no significant differences with the
baseline responses were found. This is probably due to the small number of dependent
answers.
Bosma [BOS05] developed a similar answer extension algorithm and executed a similar
evaluation. He also concluded that the differences on usefulness between the responses
generated by his algorithm (the query-based summarizations) and the baseline responses were
not significant, and that the query-based summarizations contained less irrelevant information
than the baseline responses. He did not evaluate the responses on coherence. For his
algorithm an automatic RST-annotation tool for Dutch would be needed, which is not yet
available. The algorithm developed for GIPS only needs a tool for generating dependency trees.
For this task a Dutch tool is available: the Alpino parser. Therefore, the answer extension
algorithm developed for GIPS currently seems a good alternative for the algorithm developed by
Bosma.
95
Mieke van Langen
96
Question answering for general practitioners
7 Conclusions
The research question to be answered in this master’s thesis was:
Which information needs of Dutch general practitioners can be satisfied by a question
answering (QA) system and how should the answers be presented?
Literature research and interviews with general practitioners were used to identify the different
work roles and tasks, information needs, variables of awareness of information, and information
sources of general practitioners.
Work roles and tasks
It was concluded that a QA system could support the general practitioner primarily in his work
role of service provider, during the phase of searching databases, in order to help the general
practitioner providing explanations to the patient.
Information needs
The information needs most suitable to pursue with a QA system, would be those concerning
patient education and population statistics. Most Dutch general practitioners already use a
general practitioner information system that incorporates the patient letters issued by the Dutch
College of General Practitioners (NHG). These letters can be printed and handed out to the
patient. However, not all topics are covered by these patient letters. Therefore, some of the
interviewed general practitioners indicated they would like to use a question answering system
to search for additional information to give to the patient.
Awareness of information
Concerning the awareness of information, a QA system could improve the accessibility of
electronic information sources for patient education, because the general practitioner only has
to enter a question in Dutch. A dialogue between the system and the general practitioner might
be used in order to specify the question when needed. The general practitioners participating in
the interviews did not yet use the Internet to search for answers during medical consultations,
because that would take too much time. However, when they want to hand out information to
the patient, they should look up the information during the medical consultation. Therefore, the
QA system’s response time should be short enough to enable using the system during medical
consultations. Besides, general practitioners must know which information they can find with the
QA system. To provide an overview of the information that can be found with the QA system
and the information types that can be retrieved with other information retrieval systems useful
for general practitioners, an information portal has been designed. This portal was appreciated
by the general practitioners participating in the evaluation.
Information sources
The information sources used by the QA system should be suitable for patient education.
Because the information must be up-to-date, these sources could best be retrieved from the
Internet. However, it is hard to determine the reliability of sources on the Internet. Therefore,
only information sources marked as reliable by medical professionals should be used.
Computer use
The computer use of general practitioners was also investigated. Most general practitioners
have a computer in their consulting rooms and make use of a general practitioner information
system. The use of mobile devices in the general practice is far less common, however.
Although it is expected that these devices would really improve the physician’s work during
patient visits, none of the interviewed general practitioners was planning to have one. Some of
them even rather omit visiting patients for this reason.
It was concluded that the general practitioners most likely to appreciate a QA system are those
who already search for answers on their medical questions on the Internet. Concerning the user
interface, the system should accommodate both keyboard and mouse input. Speech input
should only be possible when it is working perfectly. The system should be able to recognize
97
Mieke van Langen
ICPC coding and other medical slang in the question. There shouldn’t be any timeouts and all
functional options must be immediately visible on the graphical user interface. Further, general
practitioners must have access to a computer with Internet connection and a printer in their
consulting rooms to be able to use the QA system optimally. A lot of general practitioners
already have these facilities, because they also need them for searching the Web and printing
patient letters. It is expected that most other general practitioners will acquire these facilities in
the near future. So when the QA system would become available on the Internet, general
practitioners wouldn’t have to make any extra costs to be able to use it. There only needs to be
an organization that informs the general practitioners about the QA system, and selects the right
information sources and keeps them up-to-date.
Information presentation
The presentation of the answers a QA system returns should include a few important aspects
for general practitioners. The answer selection component of a QA system (which provides the
input for the response formulation component) returns a number of answers, which could all be
correct. It was concluded that answers retrieved from different sources should not be integrated
into a single answer, because general practitioners want to be able to check (the reliability of)
the source of the answer and because they want to have the feeling that as medical
professionals they are in control of the decision process, not the system. However, answers
from different sources should be integrated into a single view, to enable general practitioners to
select the most suitable sources or answers easily, and print this selection of answers.
Therefore, each answer should be presented together with a link to its source, and a checkbox
to indicate whether it should be printed or not.
Answers originating from the same source could be integrated into one concise answer,
possibly extended with sentences from their context. An algorithm has been developed that
determines whether an answer should be extended and with which sentences it should be
extended. For this purpose, lists of cue phrases and anaphoric referring expressions were
produced and rules were extracted that determine whether an occurrence of a cue phrase or
anaphoric expression in a text signals a relation with the preceding sentence (in which case this
preceding sentence should be included in the response).
The answer extension algorithm has been evaluated and the evaluation results have been
compared to those of Bosma [BOS05] who developed a similar algorithm. It was concluded that
the answer extension algorithm produced coherent responses that contain less irrelevant
information than a baseline response consisting of the answer sentence extended with the
preceding and successive sentence. These results were similar to those of Bosma. However,
for his algorithm an automatic RST-annotation tool for Dutch would be needed, which is not yet
available. Therefore, the answer extension algorithm developed during this research currently
seems a good alternative for the algorithm developed by Bosma.
98
Question answering for general practitioners
8 Discussion
This research essentially consists of two parts. The first part concerns the information and
computer use by general practitioners. The second part deals with response formulation.
The research on the information and computer use by general practitioners concentrated on the
possibility of using question answering (QA) technology to improve general practitioners’ work.
It was concluded that QA systems would primarily be suitable to answer questions for patient
education. However, during the interviews with general practitioners, other types of information
needs that might be pursued with intelligent information retrieval technology also became
apparent. For example, when confronted with dermatological diseases, general practitioners
frequently have to look up images in a dermatology book. Systems for dermatological image
retrieval might help general practitioners by reducing the time to find the relevant picture.
Another information type needed by general practitioners is a “social map” that provides an
overview of regional health professionals and medical organizations. A lot of these
organizations can be found on the Internet. Information extraction technology might be used to
enable general practitioners to retrieve contact information of these organizations quickly. It is
therefore strongly recommended that the applicability of image retrieval and information
extraction technology for general practitioners is investigated in future research.
Another question arising from this research is whether a QA system that answers questions for
patient education could also be used by patients themselves instead of general practitioners. I
think this depends on the state of the art of the QA technology. When the system always returns
responses that make sense, it would be very useful for patients, because the system will use a
document collection consisting of reliable sources aimed at patient education. Actually, the IMIX
demonstrator (the QA system that was the starting-point of this research) is targeted towards
these naïve users. However, currently the system also returns a lot of answers that are not even
dealing with the same subject as the question. This information could be very misleading to the
patient. General practitioners could serve as an intermediary to filter these answers and give the
patient only those answers useful for him.
Further, a design was made for an information portal for general practitioners. A very simple
prototype has been implemented to illustrate this design. More work could be dedicated to
improve this portal, for example by enabling personalization. When general practitioners are
able to add and remove systems themselves, they will possibly be more likely to appreciate and
use this portal.
In the second part of this research, dealing with response formulation, two algorithms were
developed: an answer integration algorithm, and an answer extension algorithm. The answer
integration algorithm has not been implemented because of time constraints and because it was
expected that a Dutch sentence fusion algorithm which is already being investigated by Marsi
and Krahmer [MK05] would achieve better results. Their algorithm has not yet been
implemented and evaluated for application in a QA system, however. Therefore, the comparison
of (the complexity and results of) these algorithms is left for future research.
The answer extension algorithm has been implemented and evaluated. Lists of Dutch cue
phrases and anaphoric expressions were constructed for this purpose, and rules were extracted
that determine whether an occurrence of a cue phrase or anaphoric expression in a text signals
a relation with the preceding sentence. These lists and rules were based on a rather small text
corpus covering only the medical domain. Therefore, they are probably not complete. Future
research is needed to construct completer lists, especially when the algorithm would be used for
other domains. However, although the lists were not complete, the evaluation results of the
answer extension algorithm are promising.
The answer extension algorithm restricts the number of sentences the answer sentence could
be extended with. Sentences occurring after the answer sentence are only added when the total
number of added sentences doesn’t exceed three. However, based on the evaluation results, it
99
Mieke van Langen
was hypothesized that this number should depend on the sentence length. For example, a
second test could be added that checks whether the number of words added doesn’t exceed
twenty before adding an extra sentence. Extra research would be needed to investigate what
numbers of words and sentences the response should maximally contain to optimize both the
relevance and coherence of the response.
Further, it was concluded that answers that were not extended by the answer extension
algorithm, were more coherent and contained significantly less irrelevant information with
respect to the question than a baseline response consisting of the answer sentence extended
with the preceding and successive sentence, while no significant differences with respect to the
usefulness of the responses were found. It thus seems that the answer extension algorithm is
very useful to determine whether an answer should be extended or not. However, no significant
differences were found between the extended responses and the baseline responses. This is
probably due to the small number of extended responses. A more thorough evaluation would be
needed to investigate the usefulness, irrelevance, and coherence of extended responses. For
this purpose, it would be better to integrate the answer extension algorithm with the IMIX
demonstrator to enable automatic generation of example responses, and speed up the process
of finding extended answers suitable for evaluation.
It was also concluded that, for the time being, the answer extension algorithm would be a good
alternative for the algorithm developed by Bosma [BOS05], because for his algorithm an
automatic RST-annotation tool for Dutch would be needed, which is not yet available. However,
when such a tool becomes available, it would be useful to compare the performance of both
algorithms on the same set of example responses to determine which algorithm performs best
on usefulness, irrelevance, and coherence.
Finally, in this research, response formulation was restricted to selecting and integrating
sentences from the document an answer was retrieved from. More intelligent technology would
be needed to perform some reasoning with the retrieved information, and formulate responses
like “Yes, it is”, or “No, but …” on verification questions, or “100,000” on quantity questions.
However, it is expected that with the current state of the art, general practitioners would not very
much appreciate and trust systems interpreting texts for them. This issue was therefore not
dealt with in this research.
100
Question answering for general practitioners
References
[APO]
Apotheek.nl. Geneesmiddelen. http://www.apotheek.nl
[BAR03]
Barzilay, R. Information fusion for multidocument summarization: Paraphrasing and
generation. PhD Thesis, Columbia University, 2003.
[BE96]
Best evidence [database on cd-rom]. Philadelphia: American College of Physicians,
1996.
[BMJ]
British Medical Journal. Clinical Evidence. London: BMJ Publishing Group Limited.
http://www.clinicalevidence.com
[BNM01]
Bouma, G., Noord, G. van, and Malouf, R. Alpino: Wide-coverage computational
analysis of Dutch. 2001. http://www.let.rug.nl/~vannoord/papers/alpino.pdf
[BOO03]
Boonstra, A. Interpretative perspectives on the acceptance of an optional
information system: Lessons from the introduction of an electronic prescription
system for general practitioners. University of Groningen: Research Institute SOM,
2003. http://www.ub.rug.nl/eldoc/som/a/03A08/03A08.pdf
[BOS05]
Bosma, W. Extending answers using discourse structure. Submitted to Crossing
Barriers in Text Summarization Research. Workshop to be held in conjunction with
RANLP, 2005.
[BOU03]
Bouma, G. Question answering for Dutch using dependency relations. September
2003, Groningen, the Netherlands. http://odur.let.rug.nl/~gosse/Imix/
project_description.pdf
[BOU04]
Bouma, G. QADR output specification. 2004. IMIX Internal Project Page (restricted):
http://imix.uvt.nl/Demonstrator/Integration/qadr_qa/xml_specs.pdf
[BW97]
Barrie, A.R. and Ward, A.M. Questioning behaviour in general practice: A pragmatic
study. In British Medical Journal 315, 1997. pp. 1512-1515.
[CBD05]
Canisius, S., Bosch, A. van den, and Daelemans, W. IMIX Rolaquad: XML output
specification. 2005. IMIX Internal Project Page (restricted):
http://imix.uvt.nl/Demonstrator/Integration/rolaquad_qa/Rolaquad-XMLspecification.pdf
[CC]
Cochrane Collaboration. Cochrane Library. http://www.cochrane.org
[CEBM]
Centre for Evidence-Based Medicine. Focusing clinical questions.
http://www.cebm.net/focus_quest.asp
[CM03]
Coumou, H. and Meijman, F. Hoe zoekt de huisarts literatuurgegevens bij
problemen van patiënten? In Huisarts en Wetenschap 46, 2003. pp. 359-63.
[CNP00]
Cardie, C., Ng, V., Pierce, D., and Buckley, C. Examining the role of statistical and
linguistic knowledge sources in a general-knowledge question-answering system. In
Proceedings of the 6th Conference on Applied Natural Language Processing. 2000.
pp. 180-187.
[COX00]
Cox, D. Uitslag enquête LHV en NHG. Huisartsen surfen thuis! In Huisarts en
Wetenschap 43, 2000. pp. 408-409.
[DHP98]
Dupuits, F.M.H.M., Hasman, A., and Pop, P. Computer-based assistance in family
medicine. In Computer Methods and Programs in Biomedicine 55, 1998. pp. 39-50.
[DN96]
Dennis, A. and Newman, W. Supporting doctor-patient interaction: Using a surrogate
application as a basis for evaluation. In Proceedings of the CHI '96 Conference
101
Mieke van Langen
Companion on Human Factors in Computing Systems: Common Ground, April 1318, 1996, Vancouver, BC, Canada. ACM. pp. 223-224.
[DS97]
Detmer, W.M. and Shortliffe, E.H. Using the Internet to improve knowledge diffusion
in medicine. In Communications of the ACM 40 (8), 1997. pp. 101-108.
[EOE02]
Ely, J.W., Osheroff, J.A., Ebell, M.H., Lee Chambliss, M., Vinson, D.C., Stevermer,
J.J., and Pifer, E.A. Obstacles to answering doctors’ questions about patient care
with evidence: qualitative study. In British Medical Journal 324, 2002. pp. 710—722.
[EOE99]
Ely, J.W., Osheroff, J.A., Ebell, M.H., Bergus, G.R., Levy, B.T., Lee Chambliss, M.,
and Evans, E.R. Analysis of questions asked by family doctors regarding patient
care. In British Medical Journal 319, 1999. pp. 358-361.
[EPM]
Entrez. PubMed. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
[GAW94]
Gorman, P.N., Ash, J., and Wykoff, L. Can primary care physicians' questions be
answered using the medical journal literature? In Bulletin of the Medical Library
Association 82 (2), 1994. pp. 140-146.
[GH95]
Gorman, P.N. and Helfand, M. Information seeking in primary care: how physicians
choose which clinical questions to pursue and which to leave unanswered. In
Medical Decision Making 15 (2), 1995. pp. 113-119.
[GOO]
Google. http://www.google.nl
[GOR95]
Gorman, P.N. Information needs of physicians. In Journal of the American Society
for Information Science 46 (10), 1995. pp. 729-736.
[HAA02a] Haan, S. de. Discourse. In Appel, R., Baker, A., Hengeveld, K., Kuiken, F., and
Muysken, P. (eds.). Taal en taalwetenschap. Oxford: Blackwell Publishers, 2002. pp.
71-86.
[HAA02b] Haan, S. de. Zinsbetekenis. In Appel, R., Baker, A., Hengeveld, K., Kuiken, F., and
Muysken, P. (eds.). Taal en taalwetenschap. Oxford: Blackwell Publishers, 2002. pp.
163-181.
[HER03]
Herzog, G. Multiplatform Testbed: A tutorial. Nijmegen, 2003. IMIX Internal Project
Page (restricted):
http://imix.uvt.nl/Demonstrator/Multiplatform/MultiplatformTutorial030428.pdf
[HOU00]
Houët, H. Prisma handboek van de Nederlandse taal. First edition. Utrecht: Het
Spectrum, 2000.
[IMIXa]
IMIX. Architecture. IMIX Internal Project Page (restricted):
http://imix.uvt.nl/Demonstrator/Integration/architecture.html
[IMIXg]
IMIX. Module imix.gui. IMIX Internal Project Page (restricted):
http://imix.uvt.nl/Demonstrator/Integration/imix_gui/doc.html
[IMIXp]
IMIX. P-ml presentation format. IMIX Internal Project Page (restricted):
http://imix.uvt.nl/Demonstrator/Integration/imogen_gen/pml.html
[IMIXv]
IMIX. Stevin vragen. IMIX Internal Project Page (restricted):
http://imix.uvt.nl/data/stevin-vragen.txt
[JM00]
Jurafsky, D. and Martin, J.H. Speech and language processing: An introduction to
natural language processing, computational linguistics, and speech recognition.
Prentice Hall, 2000.
[JMR03]
Jijkoun, V., Mishne, G. and Rijke, M. de. Building Infrastructure for Dutch Question
Answering. In: A.P. de Vries (ed.), Proceedings DIR 2003, 2003.
102
Question answering for general practitioners
[KNMG]
Koninklijke Nederlandsche Maatschappij tot bevordering der Geneeskunst.
Artsennet. http://www.artsennet.nl
[KPG03]
Kosseim, L., Plamondon, L., and Guillemette, L. Answer formulation for questionanswering. In Proceedings of The Sixteenth Conference of the Canadian Society for
Computational Studies of Intelligence, Canada, June 2003. pp. 24–34.
[KS98]
Knott, A. and Sanders, T. The classification of coherence relations and their
linguistic markers: An exploration of two languages. In Journal of Pragmatics, 30 (2).
1998. pp. 135-175.
[KS04]
Koenen, L. and Smits, R. Handboek Nederlands. First edition. Utrecht: Bijleveld,
2004.
[LAN05]
Langen, M.C.G. van. Building a system for answering Dutch person questions. In 2nd
Twente Student Conference on IT. Enschede, 2005.
[LAP03]
Lapata, M. Probabilistic text structuring: Experiments with sentence ordering. In
Proceedings of the 41st Annual Meeting of the Association for Computational
Linguistics. July 2003. pp. 545-552.
[LINH]
Landelijk Informatie Netwerk Huisartsenzorg: Feiten en cijfers over huisartsenzorg in
Nederland. http://www.linh.nl
[LQS03]
Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., and Karger, D.R. What
makes a good answer? The role of context in question answering. In Proceedings of
the Ninth IFIP TC13 International Conference on Human-Computer Interaction.
Zurich, Switzerland, 2003.
[LSS01]
Lagendijk, P.J.B., Schuring, R.W. and Spil, T.A.M. Het Elektronisch Voorschrijf
Systeem: Van kwaal tot medicijn. Enschede: Universiteit Twente, 2001.
[LPS96]
Leckie, G.J., Pettigrew, K.E., and Sylvain, C. Modeling the information seeking of
professionals: A general model derived from research on engineers, health care
professionals, and lawyers. In Library Quarterly 66 (2). 1996. pp. 161-193.
[MCW05] Magrabi, F., Coiera, E.W., Westbrook, J.I., Gosling, A.S. and Vickland, V. General
practitioners’ use of online evidence during consultations. In International Journal of
Medical Informatics 74 (1), January 2005. pp. 1-12.
[MK05]
Marsi, E. and Krahmer, E. Explorations to sentence fusion. Submitted to ENLG ’05.
2005.
[MS03]
Moldovan, D. and Surdeanu, M. On the role of information retrieval and information
extraction in question answering systems. In M. T. Pazienza (ed.). SCIE 2002. July
2002, pp. 129-147.
[MT87]
Mann, W.C. and Thompson, S.A. Rhetorical structure theory: A theory of text
organization. Technical report RS-87-190. University of Southern California,
Information Sciences Institute. 1987.
[NGC]
National Guideline Clearinghouse. http://www.guideline.gov
[NHGf]
Nederlands Huisartsen Genootschap. NHG-Formularium. http://nhg.artsennet.nl
[NHGp]
Nederlands Huisartsen Genootschap. Patiëntenvoorlichting. http://nhg.artsennet.nl
[NHGs]
Nederlands Huisartsen Genootschap. NHG-Standaarden. http://nhg.artsennet.nl
[NHS]
National Health Service. NLH Question-Answering Service.
http://www.clinicalanswers.nhs.uk
103
Mieke van Langen
[NWOa]
Nederlandse Organisatie voor Wetenschappelijk Onderzoek. Interactieve
Multimodale Informatie Extractie. http://www.nwo.nl/imix
[NWOb]
Nederlandse Organisatie voor Wetenschappelijk Onderzoek. IMOGEN: Interactive
Multimodal Output Generation.
http://www.nwo.nl/nwohome.nsf/pages/NWOP_653H7L
[OS04]
Os, E. den (ed.). Functional specification IMIX demonstrator. 2004. IMIX Internal
Project Page (restricted):
http://imix.uvt.nl/Demonstrator/Specification/functional_specification_1_0.doc
[PH87]
Paice, C.D. and Husk, G.D. Towards the automatic recognition of anaphoric features
in English text: The impersonal pronoun “it”. In Computer Speech and Language 2.
1987. pp. 109-132.
[PSB03]
Power, R., Scott, D., and Bouayad-Agha, N. Document structure. In Computational
Linguistics 29 (4). 2003. pp. 211-260.
[RHF92]
Rector, A.L., Horan, B., Fitter, M., Kay, S., Newton, P.D., Nowlan, W.A., Robinson,
D., and Wilson, A. User centered development of a general practice medical
workstation: The PEN&PAD experience. In Proceedings of the SIGCHI conference
on Human factors in computing systems, June 1992, Monterey, California, United
States. ACM. pp. 447-453.
[RM99]
Reape, M. and Mellish, C. Just what is aggregation anyway? In Proceedings of the
7th European Workshop on Natural Language Generation. Toulouse, France. 1999.
pp. 20-29.
[SHA98]
Shaw, J.C. Clause aggregation using linguistic knowledge. In Proceedings of the 9th
International Workshop on Natural Language Generation. Canada, 1998. pp. 138147.
[SHA02]
Shaw, J.C. Clause aggregation: An approach to generating concise text. PhD thesis,
Columbia University, 2002.
[STE94]
Sterkenburg, P.G.J. van. Van Dale handwoordenboek van hedendaags Nederlands.
Second edition. Utrecht: Van Dale Lexicografie, 1994.
[THE05]
Theune, M. QA XML output (shared part). 2005. IMIX Internal Project Page
(restricted): http://imix.uvt.nl/Demonstrator/Integration/answers/QA-messages-v1.pdf
[TREC]
Text Retrieval Conference. Question answering collections:
http://trec.nist.gov/data/qa.html
[VBM95]
Verhoeven, A.A.H., Boerma, E.J., and Meyboom-de Jong, B. Use of information
sources by family physicians: a literature survey. In Bulletin of the Medical Library
Association 83, 1995. pp. 85-90.
[VER99]
Verhoeven, A.A.H. Information-seeking by general practitioners. PhD Thesis,
Rijksuniversiteit Groningen. Groningen: Van Denderen, 1999.
[VNB99]
Verhoeven, A.A.H., Noort, C.P. van, Bosveld, H.E.P., Boerma, E.J., and Meyboomde Jong, B. Information use and needs: a survey among Dutch general practitioners.
In Verhoeven, A.A.H. Information-seeking by general practitioners. PhD Thesis,
Rijksuniversiteit Groningen. Groningen: Van Denderen, 1999.
[VP05]
Vidiam and Paradime. Functional specification IMIX dialogue system: Version 1.
2005. http://wwwhome.cs.utwente.nl/~schooten/vidiam/funcspec2/funcspec22apr2005.pdf
[VS03]
Verhoeven, A.A.H. and Schuling, J. Op zoek naar bewijs: een vraag- en
antwoorddienst voor de huisarts. In Huisarts en Wetenschap 46, 2003. pp. 12-17.
104
Question answering for general practitioners
[WHB02]
Wolters, I., Hoogen, H. van den, and Bakker, D. de. Evaluatie invoering Elektronisch
Voorschrijf Systeem Monitoringfase: de situatie in 2001. Utrecht: NIVEL, 2002.
[WM99]
Westberg, E.E. and Miller, R.A. The basis for using the Internet to support the
information needs of primary care. In Journal of the American Medical Informatics
Association 6 (1), 1999. pp. 6-25.
105
Mieke van Langen
106
Question answering for general practitioners
Appendix A: Questions
This Appendix contains a sample of typical questions asked by general practitioners. These
questions were collected in Oregon studies of information needs [GOR95, GAW94].
1. In a patient with refractory headaches, now benefiting from a calcium channel blocker, is
there a specific drug or dose that has been shown to work? Is there a study showing this?
2. After 2 courses of antibiotics in a physician’s daughter with bronchitis, what treatment is
appropriate for persistent symptoms?
3. In an octogenarian with anemia, angina, and a history of transient ischemic attacks, with a
normal creatinine, iron, and mean corpuscular volume, who refuses a bone marrow exam,
what diagnostic and therapeutic options are there?
4. Is it safe to use ibuprofen in a 50-year-old man with a history of colon cancer, now reporting
dysuria, who has cellular casts in his urine?
5. Does Norpace cause fatigue?
6. What are the cost, risk, and usefulness of dipyridamole thallium scanning in a patient with
chronic obstructive lung disease, claudication, and angina pectoris?
7. In a woman with sclerosing adenosis on breast biopsy and family history of breast cancer,
who requires estrogen therapy to control symptoms, how can the risk of breast cancer be
lowered?
8. In an 88-year-old woman with dysphagia due to past laryngeal cancer, now in respiratory
failure due to aspiration, what is the physician’s role in aggressiveness of care decisions
when the patient’s family has unrealistic expectations?
9. For a child with exacerbation of steroid dependent asthma and varicella exposure, how do
you give varicella immune globulin and where do you get it?
10. Is meclizine effective for labyrinthitis?
11. In a man with vague intermittent abdominal and back pain, what additional information will
be most useful and what is the complete differential diagnosis?
12. Can aspirin or an antiplatelet agent be used as prophylaxis against pulmonary embolism
(PE) in an elderly woman with unexplained oxygen desaturation and no clinical risk factors
for PE (none that warrant transport 100 miles for diagnostic tests)?
13. In a woman with history of delivering at 33 weeks, now having Braxton-Hicks contractions at
32 weeks, on terbutaline and bed rest, in breech position, is c-section indicated if labor
cannot be stopped?
14. How can I distinguish and manage chest pain in an older woman with known coronary
disease, status post angioplasty of the left anterior descending coronary artery, arthritis
which precludes treadmill testing, esophagitis, inadequate personality which complicates
history, given that dipyridamole testing is 180 miles away?
15. In a patient with steroid dependent chronic obstructive lung disease, does the risk of renal
or gastrointestinal complications outweigh the benefit of non-steroidal anti-inflammatory
therapy for degenerative joint disease?
16. Can an insulin-dependent diabetic be certified as a commercial driver?
17. At what age is screening prostate-specific antigen [testing] indicated in a low-risk patient?
18. What is the exact increase in risk of thrombotic events on oral contraceptives in a woman
with family history of myocardial infarction (her grandmother at age forty-nine) and of deepvein thrombosis?
19. Are nonacetylated salicylates really safer (and how much safer) in patients with NSAID GI
intolerance (who benefit from anti-inflammatory effect)?
20. For diagnosis of deep-vein thrombosis, how good is ultrasound; does it obviate the need for
venogram (can it exclude the diagnosis)?
21. Is amoxicillin safe for use in a lactating woman?
22. What is [sic] the sensitivity and specificity of arterial ultrasound exam of the lower
extremities?
23. Is hypothyroidism associated with high cholesterol or low?
24. What is the dose of Imferon?
107
Mieke van Langen
25. At what point is endoscopy indicated in patients with esophagitis who remain symptomatic
on medication?
26. Where can I send this patient for education about his alcoholism: more education than
Alcoholics Anonymous provides, less expense than inpatient treatment?
108
Question answering for general practitioners
Appendix B: Interview general practitioners
Information needs
The following questions deal with the medical questions you are confronted with during patient
care.
1. How frequently are you confronted with questions from patient care?
2. Do you always search for answers on such questions? Include answers which you look up
in the Pharmacotherapeutic Directory (Farmacotherapeutisch Kompas), or which you obtain
from consulting a colleague.
3. At what moments do you search for information?
4. Do you also search for information when you are visiting patients?
5. Are the questions you are confronted with during patient visits different from those you are
confronted with in the consulting room?
Information sources
6. Which information sources do you use when you are looking for the answer to a medical
question?
7. Do you use any electronic sources?
8. Do you also use English information sources?
9. What do you think of your possibilities for finding information for patient care?
10. Are any improvements needed for finding information for patient care?
Computer use
11. Do you own any of the following items at work?
Computer
CD-ROM player
Software to search the medical literature
CD or disk with medical knowledge
Subscription to Internet
12. Which computer applications do you use at work?
The following questions deal with information retrieval and question answering systems. I will
show you some examples; see Figure 16 to Figure 18.
13. Do you ever use information retrieval systems during clinical practice?
No, continue with question 17
Yes, namely ………………………………..
14. Are you generally able to find what you need with these systems?
15. Which features of these systems do you like?
16. Are there any negative aspects of these systems?
17. Do you think a question answering system could be an improvement for your work?
18. In which information sources would you like a question answering system to search?
109
Mieke van Langen
19. What kind of answers do you prefer when you search for an answer to a medical question?
Complete articles
Only relevant paragraphs
A concise answer
Other, namely: ………………………………..
20. Would this be the same when you were searching for information during a patient visit?
21. If you had a system that provides you with only paragraphs or concise answers to your
medical questions, what additional information would you like to have?
22. When you use a computer at work, do you generally prefer mouse or keyboard input?
23. Would you like to use speech input?
24. Can you show me a user interface of a medical information system that you really
appreciate?
Figure 16 A common information retrieval user interface [GOO]
110
Question answering for general practitioners
Figure 17 A medical information retrieval user interface [EPM]
Figure 18 An example of a question answering user interface
111
Mieke van Langen
112
Question answering for general practitioners
Appendix C: Screenshots
In Figure 19 a screenshot is shown of the prototype of the web portal for general practitioners.
Figure 20 shows a screenshot of the prototype of GIPS, which can be accessed through the
web portal.
Figure 19 Screenshot of the information portal for general practitioners
Figure 20 Screenshot of the prototype of GIPS
113
Mieke van Langen
114
Question answering for general practitioners
Appendix D: Questions and answers
In this appendix ten example RSI questions and their answers as generated by the “qadr.qa”
question answering module of the IMIX demonstrator are presented (in Dutch). The answers
are grouped per source.
1. Wat is RSI?
www.rsi-vereniging.nl/rsi-vereniging/huisarts :
RSI is een verzamelnaam voor zeer uiteenlopende vormen van
overbelasting in het gebied van nek , schouders , armen en
ellebogen .
www.rsi-vereniging.nl/rsi-vereniging/handvat :
Beroepsziekte : met 2600 mensen per jaar in de WAO is RSI de
meest gesignaleerde beroepsziekte in 2001 .
www.rsi-vereniging.nl/rsi-vereniging/archief/muismetstaart :
Twee jaar geleden hebben we een grote beeldschermwerk-dag
georganiseerd , sinds die tijd is RSI gelukkig wel een
belangrijk onderwerp in Nederland . "
RSI is geen verkoudheid waar je na een paar weken weer vanaf
bent , zoveel is wel duidelijk .
www.rsi-vereniging.nl/rsi-vereniging/archief/internationale_rsi :
RSI is een overkoepelende term voor aandoeningen aan nek ,
schouder , arm en hand .
2. Waardoor kan RSI ontstaan?
www.rsi-vereniging.nl/rsi-vereniging/behandelplan :
RSI wordt veroorzaakt door een combinatie van risicofactoren .
www.rsi-vereniging.nl/rsi-vereniging/behandelmethoden :
Sommige ayurvedische therapeuten gaan ervan uit dat RSI wordt
veroorzaakt door een stoornis in de stofwisseling .
www.rsi-vereniging.nl/rsi-vereniging/archief/muismetstaart :
Caissicres , kappers , musici en lopende band-medewerkers : er
is nog een groot aantal andere beroepen die tot RSI leiden .
3. Welke spieren zijn betrokken bij RSI?
Review_RSI_Bulthuis_Elkhuizen :
Gezien de neiging van RSI zich uit te breiden , blijkt dat na
verloop van tijd vaak veel spieren betrokken zijn bij het proces
.
Ademhalingsspieren Andere spieren die betrokken kunnen zijn bij
RSI zijn de scaleni .
Uit onderzoek is bekend dat deze het vaakst betrokken is bij RSI
.
Toch zijn er wel een aantal spieren die opvallend vaak zijn
aangedaan bij RSI .
115
Mieke van Langen
4. Welke beroepen worden getroffen door RSI?
www.arbobondgenoten.nl/arbothem/lichblst/rsi/tno_verzuim_en_rsi :
Tabel 4.4 Proportie RSI bij werknemers met 13 weken verzuim :
onderverdeling naar beroepsgroep
Verdeling naar beroepsgroep Aangezien beroepsgroepen en
bedrijfstakken voor een groot gedeelte overeen komen is het niet
verrassend dat er relatief veel verzuimende werknemers met
ambachtelijke en industriële beroepen RSI hebben .
39 verzuim door RSI gevonden werden zijn ambachtelijke en
industriële beroepen en dienstverlenende beroepen .
www.rsi-vereniging.nl/rsi-vereniging/grrsi :
In meer recent onderzoek is , behalve bevestiging van hoge
prevalenties in sommige van de genoemde beroepen , ook een hoge
prevalentie bij echografisten en in visverwerkende bedrijven
gevonden ( Ohl94 , Smi97 ) .
Uit een overzicht van buitenlandse onderzoeken naar een relatie
tussen arbeid en diverse klachten en aandoeningen die onder RSI
gerekend worden , komt een aantal beroepen met zeer hoge
prevalenties naar voren ( Hag95 ) .
5. Hoe is opkomende RSI te herkennen?
www.muisarm.nl/site/fysiologische_verklaring :
RSI uit zich in spier- , pees- , en zenuwklachten .
6. Welke oefeningen kan ik op mijn werkplek uitvoeren om RSI te
voorkomen?
www.rsi-vereniging.nl/gezond/inrichting :
De Interne Arbodienst van de Universiteit Leiden geeft de
volgende aanwijzingen :
www.rsi-vereniging.nl/gezond/Bewegenisgezond :
Zo blijven mensen achter de pc in vorm en kan RSI mogelijk
worden voorkomen .
www.rsi-vereniging.nl/rsi-vereniging/handvat :
Beroepsziekte : met 2600 mensen per jaar in de WAO is RSI de
meest gesignaleerde beroepsziekte in 2001 .
7. Hoe kan ik mijn werkplek het beste inrichten om RSI te voorkomen?
www.rsi-vereniging.nl/gezond/inrichting :
De Interne Arbodienst van de Universiteit Leiden geeft de
volgende aanwijzingen :
Een goed ingerichte werkplek is de eerste stap om RSI klachten
te voorkomen .
www.rsi-vereniging.nl/gezond/stap_rsi2002 :
Zowel voor de
116
Question answering for general practitioners
8. Helpt pauzesoftware bij de bestrijding van RSI?
www.rsi-vereniging.nl/overrsi/links :
Het RSI-Kenniscentrum richt zich op kennis van effectiviteit van
therapeutische interventies , hulpmiddelen en voorlichting rond
de preventie en bestrijding van RSI .
www.muisarm.nl/site/opening :
Stichting RSI Nederland wil o.a. met deze website een
substantiële bijdrage leveren aan de informatieverstrekking over
RSI , en daarmee helpen bij de preventie en de bestrijding van
de muisarm .
9. Kan je door RSI in de WAO komen?
www.arbobondgenoten.nl/arbothem/lichblst/rsi/tno_verzuim_en_rsi :
Wel is van het aantal personen tussen 35 en 55 dat in de WAO
terechtkomt een relatief groter percentage door RSI in de WAO
gekomen ( 4,2% ) dan in de jongere ( 2,7% ) en oudere
leeftijdsgroepen ( 3,0% ) 3 .
Wel komen van het aantal werkende vrouwen er bijna twee keer
zoveel in de WAO ( door RSI ) als van het aantal werkende manWel is van het aantal personen tussen 35 en 55 dat in de WAO
terechtkomt een relatief iets groter percentage door RSI in de
WAO gekomen dan in de jongere en oudere leeftijdsgroepen3 .
Wel komen van het aantal werkende personen bijna twee keer
zoveel vrouwen in de WAO ( door RSI ) dan mannen .
Bedrijfssectoren waarin men een hoog risico loopt door RSI in de
WAO te komen zijn de reinigingsindustrie , de textielindustrie
en de steen- , cement- , glas- , en keramische industrie .
10.
Komt RSI in Nederland vaker voor dan in de rest van Europa?
www.rsi-vereniging.nl/onderzoek/wrkrsi :
Nederland is geen koploper wat betreft RSI klachten in Europa .
Nederland is in Europa de laatste jaren koploper wat betreft
computergebruik ( Paoli , 1992 ; Paoli , 1997 ; Paoli & Merllié
, 2000 ; Andries e.a. , 2002 ) .
Ik wil nu overgaan tot de vragen van alledag die je vaak over
RSI hoort , en ik wil met u onderzoeken welke antwoorden het
wetenschappelijk onderzoek al heeft .
De geschiedenis leert dat RSI geen een modeverschijnsel is .
merck :
De ziekte komt veel in Europa voor en er zijn ook gevallen
bekend in de voormalige Sovjetunie , China , Japan en Australië
.
117
Mieke van Langen
118
Question answering for general practitioners
Appendix E: Evaluation interview general practitioners
The general practitioner is confronted with the prototypes of the information portal for general
practitioners and the graphical user interface of GIPS developed during this research. It is
explained that, eventually, GIPS will also be able to hold a dialogue to specify the question,
ICPC coding will be recognized, and keyboard input will be accommodated. Then the general
practitioner is asked to answer the following questions assuming both systems are working
perfectly.
Information needs
1. How frequently do you think you would use this portal or GIPS?
2. Could you remind any questions you met recently which you would have liked to enter in
one of these systems? If yes, what were they?
3. Would you use it during patient care or after the consultation?
4. Could you also imagine yourself using it during patient visits?
Information sources
GIPS will retrieve its answers from online sources for patient education published by the NHG,
patient organizations, etc.
5. Do you think these sources are appropriate?
6. Are there any other sources you would like the system to search in?
Computer use
7. Do you think using this portal would save time compared to using the search engines you
are using now?
8. Do you think you would use the printing option provided by GIPS?
9. Do you like the presentation of the answers?
10. Are there any negative aspects of the portal or GIPS?
11. Do you think these systems could be an improvement for your work?
119
Mieke van Langen
120
Question answering for general practitioners
Appendix F: Questionnaire response formulation
The participants to this questionnaire have been divided into two groups that completed
different questionnaires. Each questionnaire contains twelve question-answer pairs, half of
which have been answered by GIPS and the other half are the baseline answers. In the first
questionnaire, the first, second, fourth, seventh, ninth, and tenth answer are GIPS generated. In
the second questionnaire it is the other way round. For each answer the participants are asked
how useful the provided information is with respect to the question, how much irrelevant
information is contained in the answer, and how coherent the answer is.
First questionnaire
Very High Neut Low Very
high
ral
low
1. Welke spieren zijn betrokken bij RSI?
Gezien de neiging van RSI zich uit te breiden,
blijkt dat na verloop van tijd vaak veel spieren
betrokken zijn bij het proces. Het aantal spieren
dat bij verschillende RSI patiënten mee kan doen,
is dan ook groot. Een beschrijving daarvan zou
haast neerkomen op het dupliceren van een
anatomische atlas. Toch zijn er wel een aantal
spieren die opvallend vaak zijn aangedaan bij
RSI. De meest beruchte spier is de
monnikskapspier (trapezius). Uit onderzoek is
bekend dat deze het vaakst betrokken is bij RSI.
Dat is ook niet zo verwonderlijk, want deze spier
zorgt voor het optillen en stabiliseren van de
schouders.
Usefulness
Irrelevance
Coherence
2. Hoe voorkom je likdoorns?
Doordat likdoorns meestal ontstaan door slecht
passende schoenen, kunnen deze verdwijnen
wanneer beter passend schoeisel wordt
gedragen.
Usefulness
Irrelevance
Coherence
3. Komt RSI in Nederland vaker voor dan in de rest
van Europa?
Nederland heeft niet meer RSI-klachten dan
andere landen in Europa. De geschiedenis leert
dat RSI geen modeverschijnsel is. In de
vleesindustrie en andere beroepen met veel
repeterende armbewegingen en in kantoorwerk
kwamen de klachten al voor voordat de term RSI
bestond.
Usefulness
Irrelevance
Coherence
121
Mieke van Langen
Very High Neut Low Very
high
ral
low
4. Wat is RSI?
RSI
is
een
verzamelnaam
voor
zeer
uiteenlopende vormen van overbelasting in het
gebied van nek, schouders, armen en ellebogen.
Het kan zijn dat de arts aanvullend onderzoek
laat verrichten naar andere ziektebeelden met
soortgelijke symptomen.
Usefulness
Irrelevance
Coherence
5. Kan je door RSI in de WAO komen?
Specifieke risicoberoepen zijn horecapersoneel,
conciërges en schoonmakers, gezinshulpen en
bejaardenverzorgers en 'overige ambachtelijke
beroepen'. Bedrijfssectoren waarin men een hoog
risico loopt door RSI in de WAO te komen zijn de
reinigingsindustrie, de textielindustrie en de
steen-, cement-, glas-, en keramische industrie.
Administratieve beroepen vormen in het huidige
onderzoek geen risicogroep voor verzuim door
RSI; tevens vormen de zakelijke dienstverlening
en de overheid geen risicogroepen voor WAOintrede door RSI.
Usefulness
Irrelevance
Coherence
6. Wat zijn de verschijnselen van griep?
Elke winter wordt 5 tot 20 % van de Nederlands
bevolking getroffen door griep. Griep wordt
veroorzaakt door het zogenaamde influenzavirus,
met verschijnselen van koorts, neusverkoudheid,
hoesten, hoofdpijn, spierpijn en vermoeidheid.
Omdat het virus erg besmettelijk is kan iedereen
griep krijgen en zal je meestal ruim een week het
bed moeten houden. Gezonde mensen knappen
daarna weer op door rust en veel drinken, maar
bij mensen met een chronische aandoening,
patiënten met een verminderde weerstand,
bewoners van verpleeg-verzorgingshuizen en
ouderen boven de 65 jaar kan de ziekte ernstig
verlopen.
Usefulness
Irrelevance
Coherence
7. Hoeveel procent van de Nederlandse bevolking
heeft psoriasis?
De kans op psoriasis is gerelateerd aan het
aantal familieleden dat deze aandoening heeft.
Usefulness
Irrelevance
Coherence
122
Question answering for general practitioners
Very High Neut Low Very
high
ral
low
8. Hoe is opkomende RSI te herkennen?
Verklaringen
RSI uit zich in spier-, pees-, en zenuwklachten.
Een combinatie van onderstaande mechanismen
veroorzaakt de problemen.
Usefulness
Irrelevance
Coherence
9. Waardoor kan RSI ontstaan?
RSI wordt veroorzaakt door een combinatie van
risicofactoren.
Usefulness
Irrelevance
Coherence
10. Wat is het verschil tussen een vlokkentest en een
vruchtwaterpunctie?
Een vlokkentest kan in plaats van een
vruchtwaterpunctie worden gedaan, tenzij voor
een onderzoek juist vruchtwater nodig is,
bijvoorbeeld voor het bepalen van de concentratie
alfafoetoproteïne in het vruchtwater.
Usefulness
Irrelevance
Coherence
11. Wat is slapeloosheid?
Artsen classificeren slapeloosheid als primair of
secundair. Primaire slapeloosheid is een lang
bestaande aandoening die weinig of geen
verband lijkt te hebben met enige spanning of
bijzondere gebeurtenissen in het leven. De
secundaire vorm wordt veroorzaakt door pijn,
angst, geneesmiddelen, depressie of extreme
spanningen.
Usefulness
Irrelevance
Coherence
12. Welke beroepen worden getroffen door RSI?
Beroepsgroepen waarin men een meer dan
gemiddeld risico loopt op (kort) verzuim door RSI
zijn beroepen in de transportsector en
dienstverlenende beroepen. Beroepsgroepen die
als risicogroep voor langdurig (meer dan 13
weken durend) verzuim door RSI gevonden
werden zijn ambachtelijke en industriële
beroepen
en
dienstverlenende
beroepen.
Specifieke risicoberoepen zijn horecapersoneel,
conciërges en schoonmakers, gezinshulpen en
bejaardenverzorgers en 'overige ambachtelijke
beroepen'.
Usefulness
Irrelevance
Coherence
123
Mieke van Langen
Second questionnaire
Very High Neut Low Very
high
ral
low
1. Welke spieren zijn betrokken bij RSI?
De één heeft vooral last in de schouders, bij de
ander ontstaan de klachten in de pols of in de
arm. Gezien de neiging van RSI zich uit te
breiden, blijkt dat na verloop van tijd vaak veel
spieren betrokken zijn bij het proces. Het aantal
spieren dat bij verschillende RSI patiënten mee
kan doen, is dan ook groot. Een beschrijving
daarvan zou haast neerkomen op het dupliceren
van een anatomische atlas. Toch zijn er wel een
aantal spieren die opvallend vaak zijn aangedaan
bij RSI. De meest beruchte spier is de
monnikskapspier (trapezius). Uit onderzoek is
bekend dat deze het vaakst betrokken is bij RSI.
Dat is ook niet zo verwonderlijk, want deze spier
zorgt voor het optillen en stabiliseren van de
schouders.
Usefulness
Irrelevance
Coherence
2. Hoe voorkom je likdoorns?
Eeltplekken kunnen worden voorkomen door de
irritatiebron weg te nemen of, als dit niet mogelijk
is, handschoenen, beschermende materialen,
bijvoorbeeld ringen te dragen. Doordat likdoorns
meestal ontstaan door slecht passende
schoenen, kunnen deze verdwijnen wanneer
beter passend schoeisel wordt gedragen. Een
middel dat de hoornlaag losweekt, bijvoorbeeld
salicylzuur,
kan
likdoorns
sneller
doen
verdwijnen.
Usefulness
Irrelevance
Coherence
3. Komt RSI in Nederland vaker voor dan in de rest
van Europa?
De geschiedenis leert dat RSI geen
modeverschijnsel is.
Usefulness
Irrelevance
Coherence
4. Wat is RSI?
Verwacht bij het eerste onderzoek geen
definitieve diagnose. RSI is een verzamelnaam
voor
zeer
uiteenlopende
vormen
van
overbelasting in het gebied van nek, schouders,
armen en ellebogen. Het kan zijn dat de arts
aanvullend onderzoek laat verrichten naar andere
ziektebeelden met soortgelijke symptomen.
124
Usefulness
Irrelevance
Coherence
Question answering for general practitioners
Very High Neut Low Very
high
ral
low
5. Kan je door RSI in de WAO komen?
Bedrijfssectoren waarin men een hoog risico loopt
door RSI in de WAO te komen zijn de
reinigingsindustrie, de textielindustrie en de
steen-, cement-, glas-, en keramische industrie.
Usefulness
Irrelevance
Coherence
6. Wat zijn de verschijnselen van griep?
Griep wordt veroorzaakt door het zogenaamde
influenzavirus, met verschijnselen van koorts,
neusverkoudheid, hoesten, hoofdpijn, spierpijn en
vermoeidheid. Omdat het virus erg besmettelijk is
kan iedereen griep krijgen en zal je meestal ruim
een week het bed moeten houden. Gezonde
mensen knappen daarna weer op door rust en
veel drinken, maar bij mensen met een
chronische aandoening, patiënten met een
verminderde weerstand, bewoners van verpleegverzorgingshuizen en ouderen boven de 65 jaar
kan de ziekte ernstig verlopen. Zij worden dan
ook jaarlijks door hun huisarts gevaccineerd
tegen griep, de zogenaamde influenzavaccinatie,
die voor 70-80% bescherming biedt tegen het
krijgen van griep (influenza).
Usefulness
Irrelevance
Coherence
7. Hoeveel procent van de Nederlandse bevolking
heeft psoriasis?
De aandoening heeft een erfelijke component. De
kans op psoriasis is gerelateerd aan het aantal
familieleden dat deze aandoening heeft. Psoriasis
is niet te genezen, maar in de meeste gevallen
wel goed te behandelen.
Usefulness
Irrelevance
Coherence
8. Hoe is opkomende RSI te herkennen?
RSI uit zich in spier-, pees-, en zenuwklachten.
Usefulness
Irrelevance
Coherence
125
Mieke van Langen
Very High Neut Low Very
high
ral
low
9. Waardoor kan RSI ontstaan?
Integrale aanpak
RSI wordt veroorzaakt door een combinatie van
risicofactoren. Gezien de grote verscheidenheid
aan factoren ligt het eigenlijk voor de hand dat de
aanpak van de klachten zich dient te richten op
alle aspecten die een rol hebben bij het ontstaan
van klachten.
Usefulness
Irrelevance
Coherence
10. Wat is het verschil tussen een vlokkentest en een
vruchtwaterpunctie?
Met een vlokkentest worden bepaalde afwijkingen
van de foetus opgespoord, meestal tussen de
tiende en twaalfde week van de zwangerschap.
Een vlokkentest kan in plaats van een
vruchtwaterpunctie worden gedaan, tenzij voor
een onderzoek juist vruchtwater nodig is,
bijvoorbeeld voor het bepalen van de concentratie
alfafoetoproteïne in het vruchtwater. Vóór de test
wordt met behulp van echografie vastgesteld of
de foetus leeft, wat de leeftijd van de foetus is en
wat de ligging van de placenta is.
Usefulness
Irrelevance
Coherence
11. Wat is slapeloosheid?
Primaire slapeloosheid is een lang bestaande
aandoening die weinig of geen verband lijkt te
hebben met enige spanning of bijzondere
gebeurtenissen in het leven.
Usefulness
Irrelevance
Coherence
12. Welke beroepen worden getroffen door RSI?
Beroepsgroepen die als risicogroep voor
langdurig (meer dan 13 weken durend) verzuim
door RSI gevonden werden zijn ambachtelijke en
industriële
beroepen
en
dienstverlenende
beroepen.
126
Usefulness
Irrelevance
Coherence