Question answering for general practitioners An information presentation module for the IMIX demonstrator Mieke van Langen Question answering for general practitioners An information presentation module for the IMIX demonstrator M.C.G. van Langen Universiteit Twente, October 2005 Study: Business Information Technology Faculty: Electrical Engineering, Mathematics and Computer Science (EEMCS) Disciplines: Language, Knowledge and Interaction (TKI) Information Systems and Change Management (IS&CM) Supervisors: dr. M. Theune (TKI) dr.ir. H.J.A. op den Akker (TKI) dr.ir. A.A.M. Spil (IS&CM) Preface This paper is my master’s thesis for the study Business Information Technology at the University of Twente. My assignment was part of the IMIX project. This is an NWO research program aiming at the development of knowledge and technology needed to find specific answers to specific questions in Dutch documents. I was allowed to choose a research subject relevant for the IMIX project myself. During my study I have focused on both language technology and healthcare. I thought it would be nice to integrate these subjects in my master’s thesis. The IMIX project provided a very good context for this combination, because the results of this project are integrated in a question answering system (which incorporates a lot of language technology) for the medical domain. In concert with my supervisors I decided to investigate the suitability of such a question answering system for professional use by general practitioners, and to develop part of the language technology that would be needed to accommodate the requirements of this special group. This assignment turned out to be a great combination of all subjects I was confronted with during my study. I hope you enjoy reading it! Ede, October 2005 Mieke van Langen Dankwoord (in Dutch) Mijn afstudeeropdracht is redelijk soepel verlopen. Hoewel het een groot project was en ik veel alleen werkte omdat ik intern afstudeerde, heb ik mij aardig aan de planning kunnen houden en ligt hier een verslag waar ik erg tevreden mee ben. Dit was echter niet mogelijk geweest zonder hulp van anderen. Ik ben dan ook heel blij met de steun en medewerking die zoveel mensen mij bij deze opdracht hebben verleend. Ten eerste zijn dat natuurlijk mijn afstudeerbegeleiders die uitgebreid de tijd genomen hebben om mij te helpen een goed onderwerp te kiezen, en ook daarna veel tijd en moeite besteed hebben aan het kritisch lezen en becommentariëren van mijn verslagen. Ten tweede ben ik veel dank verschuldigd aan Anita Verhoeven en de huisartsen die ik heb mogen interviewen. Anita Verhoeven heeft mij gastvrij ontvangen in Groningen, en mij veel duidelijk gemaakt over de mogelijkheden voor en de praktijk van het informatie zoeken van huisartsen. De vijf huisartsen die ik (sommige zelfs twee keer) geïnterviewd heb voor mijn onderzoek, ben ik erg dankbaar voor de tijd die ze voor mij hebben kunnen vrijmaken, zeker gezien de tijdsdruk waaronder zij momenteel moeten werken. Ten derde heb ik bij het evalueren van mijn ontwerp veel hulp gehad van Wauter Bosma en alle proefpersonen die mijn vragenlijst ingevuld hebben. Wauter Bosma had toegang tot een werkende versie van de IMIX demonstrator en was steeds bereid antwoorden op mijn voorbeeldvragen te genereren. En “last”, maar zeker niet “least” zijn daar mijn vriend en mijn ouders. Zij hebben mij niet alleen tijdens mijn afstudeeropdracht, maar ook gedurende de rest van mijn studie steeds in alle opzichten gesteund. Mijn studietijd was door mijn gezondheidsproblemen geen gemakkelijke periode, maar dankzij hun steun heb ik mijn studie toch kunnen afronden en kan ik nu, gezond en wel, als ingenieur een nieuwe start maken. Allemaal heel erg bedankt! Mieke Executive summary This research was part of the IMIX (Interactive Multimodal Information eXtraction) project. This project is an NWO (Netherlands Organization for Scientific Research) research program aiming at the development of knowledge and technology needed to find specific answers to specific questions in Dutch documents. The results of the IMIX project are integrated in an interactive multimodal question answering system for the medical domain. In their work, general practitioners are confronted with large quantities of and needs for information. Therefore, it was investigated whether such a medical question answering system would be suitable for professional use by general practitioners, and how answers should be presented for this user group. Based on literature research and interviews with general practitioners, it was concluded that a medical question answering system could primarily be used by general practitioners for answering questions for patient education. Such a question answering system should be accessible via the Internet, it should search for answers only in information sources that were marked as reliable by medical professionals, its response time should be short enough to enable use during medical consultations, and it should recognize ICPC coding and other medical slang in the question. Further, to be able to use this system, general practitioners must have access to a computer with Internet connection and a printer in their consulting rooms. In this way, they can search for answers on their questions during the medical consultation and possibly print the answer and give it to the patient. Besides, a web portal for general practitioners has been designed that could help general practitioners in keeping an overview of all different information types they can find on the Internet. Answers retrieved by a question answering system for general practitioners should be presented together with a link to their sources, and a checkbox to enable the general practitioner to indicate whether he wants it to be printed or not. In this way different answers on the same question retrieved from different sources can be integrated into one consistent view. Besides, algorithms have been developed that integrate different answers retrieved from the same source into one concise answer, possibly extended with sentences from their context. Finally, it was found that general practitioners would also like to have information technology to search for dermatological images, and for contact information of regional health professionals and medical organizations. It is therefore recommended that the applicability of image retrieval and information extraction technology for general practitioners is also investigated. Managementsamenvatting (in Dutch) Dit onderzoek maakt onderdeel uit van het IMIX (Interactive Multimodal Information eXtraction) project. Dit is een onderzoeksprogramma van het NWO (Nederlandse Organisatie voor Wetenschappelijk Onderzoek) dat zich ten doel stelt om kennis en technologie te ontwikkelen die nodig zijn om specifieke antwoorden op specifieke vragen in Nederlandstalige documenten te vinden. The resultaten van het IMIX project worden geïntegreerd in een interactief multimodaal question answering systeem voor het medische domein. Omdat huisartsen behoefte hebben aan en gebruik kunnen maken van een grote hoeveelheid en verscheidenheid aan informatie, is onderzocht of zo’n question answering systeem geschikt zou zijn voor professioneel gebruik door huisartsen, en hoe antwoorden door dit systeem gepresenteerd zouden moeten worden voor deze gebruikersgroep. Op basis van literatuuronderzoek en interviews met huisartsen, wordt geconcludeerd dat een medisch question answering systeem voor huisartsen vooral geschikt zou zijn om vragen voor patiëntenvoorlichting te beantwoorden. Zo’n question answering systeem zou beschikbaar moeten zijn via het Internet, het zou alleen naar antwoorden moeten zoeken in informatiebronnen waarvan medici hebben aangegeven dat ze betrouwbaar zijn, de reactietijd van het systeem zou dusdanig kort moeten zijn dat vragen beantwoord kunnen worden tijdens het consult, en het systeem moet medische terminologie en ICPC codes in de vraag kunnen begrijpen. Daarnaast moeten huisartsen, om zo’n systeem te kunnen gebruiken, een computer met Internetverbinding en een printer in hun spreekkamer hebben. Op die manier kunnen ze hun vragen tijdens het consult beantwoorden en eventueel de antwoorden ook uitprinten om ze mee te geven aan de patiënt. Verder is er een web portal gemaakt om huisartsen een overzicht te bieden van alle verschillende soorten informatie die ze op het Internet kunnen vinden. Antwoorden die door een question answering systeem voor huisartsen gevonden worden, moeten gepresenteerd worden met links naar de bronnen waarin ze gevonden zijn en aankruisvakjes zodat de huisarts per antwoord kan aangeven of hij het wil uitprinten. Op die manier worden antwoorden op dezelfde vraag die in verschillende bronnen gevonden zijn, geïntegreerd in één overzicht. Daarnaast zijn er algoritmes ontwikkeld waarmee verschillende antwoorden die uit dezelfde bron komen geïntegreerd kunnen worden in één beknopt antwoord en antwoorden eventueel uitgebreid kunnen worden met extra zinnen uit de omgeving van het antwoord. Tenslotte blijken huisartsen ook behoefte te hebben aan informatietechnologie waarmee ze naar dermatologische plaatjes en naar adresgegevens van plaatselijke gezondheidszorgorganisaties kunnen zoeken. Daarom wordt aanbevolen om ook de geschiktheid van image retrieval en information extraction technologie voor gebruik door huisartsen te onderzoeken en systemen te ontwikkelen waarmee deze informatie kan worden gevonden. Contents 1 INTRODUCTION .............................................................................................................................15 1.1 1.2 1.3 1.4 2 LITERATURE ON INFORMATION USE BY GENERAL PRACTITIONERS.........................21 2.1 2.2 2.3 2.4 2.5 3 THE IMIX DEMONSTRATOR ........................................................................................................49 REQUIREMENTS ..........................................................................................................................53 DESIGN .......................................................................................................................................56 AN INFORMATION PORTAL FOR GENERAL PRACTITIONERS ........................................................60 RESPONSE FORMULATION .......................................................................................................63 5.1 5.2 5.3 5.4 5.5 6 METHOD .....................................................................................................................................41 RESULTS .....................................................................................................................................41 CONCLUSIONS ............................................................................................................................44 DISCUSSION................................................................................................................................45 THE INFORMATION PRESENTATION MODULE (GIPS) .......................................................49 4.1 4.2 4.3 4.4 5 THE GENERAL PRACTICE ............................................................................................................21 INFORMATION NEEDS..................................................................................................................22 INFORMATION SOURCES .............................................................................................................29 COMPUTER USE ..........................................................................................................................35 CONCLUSIONS ............................................................................................................................38 INTERVIEWS WITH GENERAL PRACTITIONERS..................................................................41 3.1 3.2 3.3 3.4 4 CONTEXT OF THE RESEARCH .....................................................................................................15 RESEARCH QUESTION ................................................................................................................17 RESEARCH METHOD ...................................................................................................................17 STRUCTURE OF THE PAPER........................................................................................................19 RELATED WORK ..........................................................................................................................63 GIPS ..........................................................................................................................................65 ANSWER INTEGRATION ...............................................................................................................66 ANSWER EXTENSION ..................................................................................................................73 IMPLEMENTATION .......................................................................................................................87 EVALUATION ..................................................................................................................................89 6.1 6.2 EVALUATION OF THE ENTIRE DESIGN .........................................................................................89 EVALUATION OF THE ANSWER EXTENSION ALGORITHM .............................................................91 7 CONCLUSIONS ..............................................................................................................................97 8 DISCUSSION ...................................................................................................................................99 REFERENCES.......................................................................................................................................101 APPENDIX A: QUESTIONS................................................................................................................107 APPENDIX B: INTERVIEW GENERAL PRACTITIONERS ...........................................................109 APPENDIX C: SCREENSHOTS.........................................................................................................113 APPENDIX D: QUESTIONS AND ANSWERS.................................................................................115 APPENDIX E: EVALUATION INTERVIEW GENERAL PRACTITIONERS ................................119 APPENDIX F: QUESTIONNAIRE RESPONSE FORMULATION.................................................121 Question answering for general practitioners 1 Introduction During and after consultations, general practitioners use a lot of information. Next to the information they receive from the patient, they also look up information on controversial or rare topics, diagnosis, treatment and investigations, and information for patient education [MCW05]. This information is needed not only in consulting rooms, but also during patient visits. Due to the rise of evidence based medicine the use of medical knowledge by general practitioners has become even more important, but the amount of medical information also increases rapidly [VER99]. To address the problem of quickly finding the relevant information among these large document collections, intelligent information technology is needed. In this master’s thesis it is investigated how question answering technology can help general practitioners meeting their information needs. 1.1 Context of the research This research is part of the IMIX project. This project concerns the development of question answering technology for Dutch. In section 1.1.1 information is provided on question answering and related work on question answering relevant for this research. Section 1.1.2 gives a general description of the IMIX project. 1.1.1 Question answering A question answering (QA) system “accepts questions in natural language form, searches for answers over a collection of documents and extracts and formulates concise answers” [MS03]. A general QA system architecture consists of the following components (see Figure 1): • • • • • question analysis; document retrieval; answer extraction; answer selection [JMR03]; and response formulation [MS03]. (Graphical) user interface Question Documents Answer Indexer Question analysis Document retrieval Answer extraction Answer selection Response formulation (NLP) Resources Figure 1 General question answering system architecture The question analysis component transforms a natural language question into a retrieval query and classifies the question with respect to its expected answer type (e.g. the name of a person, 15 Mieke van Langen a date, a location, etc.). The document retrieval component takes the retrieval query as input and returns a set of documents relevant for the query. These documents are retrieved from a document collection with the aid of an indexer. The answer extraction component extracts possible answers from the retrieved documents. The answer selection component returns a ranked list of the extracted answers. Finally, the response formulation component formulates a natural language response to the natural language question. All five components possibly make use of (natural language processing) resources. A (graphical) user interface may be used to facilitate the interaction between the user and the QA system. Compared to traditional information retrieval (IR) technology, QA systems only return concise answers to the user, instead of entire documents in which the user has to find the answer himself. Actually, IR systems are used as document retrieval component in QA systems. QA thus extends IR. Two attempts are being made at developing open domain QA systems for Dutch. ‘Open domain’ refers to the document collection from which the answers are extracted. An open domain QA system could search in any type of document collection for answers on all possible domains. A ‘closed domain’ QA system is targeted at documents on a specific domain and only answers questions on this domain, like for example medical questions. One of the open domain Dutch QA projects, named “Question answering for Dutch using dependency relations”, is executed at the Rijksuniversiteit Groningen [BOU03]. This project makes use of existing QA technology combined with dependency analysis based on full syntactic parsing of both the question and the potential answer fragments. For this purpose the Alpino Dependency Parser for Dutch [BNM01] is used. The other Dutch project, executed at the University of Amsterdam, has resulted in a multi-stream architecture for question answering [JMR03]. In this architecture each stream represents a different approach for QA, such as table lookup, pattern matching, an existing QA system for English combined with automatic translation, and web answering. Each stream has its own strengths and thus suits some question types more than others. The system’s final answer is taken from the combined pool of answers generated by the suitable QA streams. Both projects don’t concentrate on response formulation. The result of these QA systems is thus a ranked list of answers returned by the answer selection component. 1.1.2 The IMIX project The IMIX (Interactive Multimodal Information eXtraction) project is an NWO (Netherlands Organization for Scientific Research) research program aiming at the development of knowledge and technology needed to find specific answers to specific questions in Dutch documents [NWOa]. The results of this research program are integrated in the IMIX demonstrator. This demonstrator is an interactive multimodal QA system for the medical domain. It consists of two parts: a text based part covering the complete medical domain and a multimodal part focusing on the RSI domain only [OS04]. The demonstrator is targeted towards naïve users who have no knowledge of the domain and little technical knowledge [VP05]. Compared to the other Dutch QA systems described in the previous section, the IMIX demonstrator has a restricted domain. However, in contrast with those systems, it incorporates modules for response formulation and speech input and output. IMOGEN (Interactive Multimodal Output Generation) is a part of the IMIX project [NWOb]. It aims at the development of multimodal information presentation modules for the output of QA systems. Next to response formulation, these modules include speech generation and the use of graphics. The information presentation modules developed for IMOGEN constitute the IMOGEN demonstrator, which is part of the IMIX demonstrator. The IMOGEN sub modules will firstly be developed for the RSI domain only. To investigate whether a QA system like the IMIX demonstrator is also suitable for professional use, in this master’s thesis an IMOGEN sub module is developed that is targeted towards general practitioners. The input for this module consists of ranked lists of answers (produced by 16 Question answering for general practitioners the IMIX demonstrator’s question answering modules). The output is a presentation of the answers that would be most convenient for Dutch general practitioners. 1.2 Research question The research question to be answered in this master’s thesis is: Which information needs of Dutch general practitioners can be satisfied by a question answering system and how should the answers be presented? To answer this research question, the following sub questions have to be answered: 1. What are the information needs of Dutch general practitioners? 2. How are these information needs satisfied now? 3. How do Dutch general practitioners use and appreciate computers in their work? The integration of the IMOGEN sub module built for this research and currently used general practitioner information systems falls outside the scope of this research. The sub module will not be implemented in general practices, because the IMIX demonstrator only intends to show the status, progress, and results of the research carried out in the IMIX program [NWOa]. The sub module is thus only developed to show whether and under which conditions a system like the IMIX demonstrator suits professional use. 1.3 Research method To answer the research questions described above, information is needed on the information needs and use of Dutch general practitioners and on the way they use and appreciate computers in their work. A lot of research has been executed on the information seeking behavior of professionals from different disciplines. Leckie et al. [LPS96] developed a model of the information seeking of professionals derived from research on engineers, health care professionals, and lawyers. This model was found to be appropriate to explain the information seeking behavior of Dutch general practitioners [VER99]. The model is presented in Figure 2. Work roles Tasks Characteristics of information needs Sources of information Information is sought Feedback Awareness of information Feedback Outcomes Figure 2 A model of the information seeking of professionals 17 Mieke van Langen According to this model, work roles and the related tasks undertaken by professionals prompt particular information needs, which in turn give rise to the information seeking process. The information needs arising form a specific task are influenced by a number of variables, including factors relating to the individual (age, profession, specialization, career stage, geographic location, etc.), and general characteristics of the information needs. Examples of these characteristics are: context (internally or externally prompted), frequency (recurring or new), predictability (anticipated or unexpected), importance (urgency), and complexity. The information seeking process elicited by an information need is influenced (as depicted in the model) by the sources of information, the awareness of information, and the outcomes. Besides, the outcomes may also influence the sources of information and the awareness of information. Sources of information can be formal or informal, internal or external, oral or written, and personal (own knowledge and experience). Awareness of information refers to direct or indirect knowledge of the various information sources and the perceptions about the information seeking process or about the retrieved information. Variables of this factor include familiarity and prior success with a certain search strategy or information source, trustworthiness, packaging (medium or format), timeliness, cost (financial, psychological, physical), quality, and accessibility (physical proximity, language). Outcomes are the results of the information seeking process. The optimal outcome is that the information need is met and the professional accomplishes his task. However, the outcome may also be that the information need is not satisfied and further information seeking is required. In this case feedback is provided (possibly altering the factors influencing the information seeking process), and a second round of information seeking is undertaken. It is also possible that an outcome from one task associated with a specific role unexpectedly benefits the professional in another role. A QA system would be one of the possible sources of information, thus influencing the information seeking process. Its goal would be to make the information seeking process more efficient and to improve the outcomes. Therefore, it should improve some of the variables related to the awareness of information. The QA system is also an information seeker itself, however. Its performance is thus in turn influenced by the information sources it uses itself. The relation of the QA system to the information seeking model is depicted in Figure 3. In this figure the QA system is positioned inside the sources of information ellipse, to emphasize that it would be only one of the information sources a general practitioner could use. This source can be accessed via a user interface, which is depicted separately. The information sources used by the QA system are also positioned within the larger sources of information ellipse, because these are sources that are probably already available for general practitioners. It is expected that only a subset of the general practitioners’ information needs can be met by a QA system. This is depicted by a box questions within the information needs box. The outcomes a QA system returns are also a subset of all search results, depicted by the answers box within the outcomes box. To answer the research question, it must firstly be investigated for which work roles, tasks, and information needs of Dutch general practitioners a QA system could support the information seeking process, and which variables related to the awareness of information could be improved by a QA system as opposed to other information sources. Besides, because a QA system is an information seeker itself, it must be investigated which information sources it could consult, and how it should search these sources for relevant information. Finally, because a QA system runs on a computer, it must be investigated how Dutch general practitioners use and appreciate computers in their work. To answer these questions, literature on general practitioners and on medical informatics has been reviewed. Besides, interviews have been conducted with a few Dutch general practitioners and an expert on information use by Dutch general practitioners. Secondly, based on the findings from literature and the interviews, requirements have been specified for the IMIX demonstrator, and a design has been made for an IMOGEN sub module for general practitioners consisting of a response formulation component and a graphical user 18 Question answering for general practitioners interface. Prototypes have been constructed for both components. These prototypes have been evaluated by naïve users and a subset of the general practitioners who participated in the previous interviews. Work roles Tasks Characteristics of information needs Questions Sources of information Sources of information Question answering system User interface Information is sought Feedback Awareness of information Feedback Outcomes Answers Figure 3 A QA system integrated with the model of information seeking Finally, based on the results of the evaluation of the information presentation module, conclusions were drawn and recommendations were made concerning the conditions under which a system like the IMIX demonstrator would suit professional use. 1.4 Structure of the paper This paper is organized as follows. In chapter 2 existing literature on general practitioners’ work roles and tasks, information needs, information sources, and computer use is described. The interviews conducted with Dutch general practitioners are described in chapter 3. In chapter 4 the results of the literature research and the interviews are related to the design of the IMIX demonstrator, and the design for the information presentation module for general practitioners is presented. Chapter 5 describes the response formulation technology developed for the information presentation module. In chapter 6 the results of this research are evaluated. And finally, in chapters 7 and 8 the conclusions of this research are presented and discussed. 19 Mieke van Langen 20 Question answering for general practitioners 2 Literature on information use by general practitioners Like in all areas, in medical sciences the volume of information grows exponentially [VER99]. Detmer and Shortliffe [DS97] stated already in 1997 that every year more than 360,000 articles are published in medical journals, making knowledge diffusion to physicians rather slow. They refer to a study, which found that two years after wide publication, only 50% of the general practitioners knew that laser surgery could save the sight of some of their diabetic patients. Westberg and Miller [WM99] state that “because of the ever-increasing size of biomedical literature and the complexity of modern health care practices, physicians could spend ours to weeks reading texts and seeking expert opinions for each patient they encounter.” It is thus increasingly difficult, but also increasingly important for physicians to find the information they need. In this chapter the information use by general practitioners is investigated. In section 2.1 the work roles and tasks of Dutch general practitioners are described. The information needs of general practitioners are described in the next section. Section 2.3 deals with the information sources used to pursue these needs. In section 2.4 the role and use of computers in the general practice are described. Finally, conclusions are drawn with respect to the possibility of using a question answering (QA) system for information seeking by general practitioners. 2.1 The general practice This research especially deals with Dutch general practitioners. Dutch general practitioners work in solo practices, in duo or group practices, or in primary health care centers. In contrast to their American colleagues, they never work in hospitals [VER99]. The Dutch general practitioner acts as a gatekeeper to secondary care. He is therefore expected to manage a wide range of medical problems, giving rise to high information needs. General practitioners not only see patients, but also have to learn, perform research, educate and manage. Verhoeven [VER99] discerns five different roles of general practitioners, see Table 1. Each role is associated with different tasks and thus with different information needs. Work role Tasks Service provider Patient care Learner Professional reading, attending conferences and meetings Researcher Writing publications, speaking at conferences Educator Planning, curriculum development Administrator/manager Managing own practice Table 1 Work roles and tasks of general practitioners The role of service provider is common to all professionals [LPS96]. Physicians spend most of their time in this role and the tasks associated with patient care create their greatest need for information. Professionals also have a role of learner. They have to keep up with the advancements in their field, and upgrade their education and skills by taking courses [LPS96]. Tasks associated with this role include professional reading, and attending conferences and meetings. The third role, researcher, is not performed by all general practitioners. Most Dutch general practitioners primarily provide patient care. Only some of them combine this with research. Tasks associated with the role of researcher are writing publications and speaking at conferences. As an educator, general practitioners teach medical students and general practice trainees. Tasks associated with this role include planning and curriculum development. Finally, as an administrator and manager, general practitioners have to manage their own practice. 21 Mieke van Langen In this research only the role of service provider is considered, because this is the role in which medical questions may arise that should be answered quickly, thus making a question answering system potentially useful. In the roles of learner and researcher the general practitioner also wants medical questions to be answered, but in this case generally complete articles are needed to (scientifically) answer these questions, not just concise answers. The work as service provider consists of medical consultations both in the general practitioner’s consulting room and at patients’ homes. Medical consultations generally consist of the following phases [DN96]: data gathering and recording; searching databases (medical records, suitable drugs, etc.); choosing a course of action; documentation; providing explanations; and arranging any future consultations. The phase of data gathering and recording provides the input for the phase of searching databases. The latter phase especially concerns searching information and might thus be supported by a QA system. The outcome of this phase is used to support the phases of choosing a course of action and providing explanations to the patient. In principle, patient visits consist of the same phases as consultations in the consulting room. However, during patient visits the general practitioner cannot make use of the same resources he uses in his consulting room. Therefore, the phases of searching databases and documentation might be a little harder, also complicating the phases of choosing a course of action and providing explanations. A mobile device containing a question answering system could really improve the general practitioner’s possibilities of searching databases during patient visits and therefore potentially also improve patient care. 2.2 Information needs A lot of research has been done on the needs and use of medical knowledge by general practitioners. Quantitative estimates of the information needs of physicians in their role of service provider vary greatly, however. Different studies result in different estimates because they differ on definition of terms, subjects, setting, and method of data collection. Gorman [GOR95] tries to structure these studies by defining different types of information and different types of information needs. The information types are described in section 2.2.1 and the information needs in section 2.2.2. General practitioners encounter a lot of obstacles when they try to address their information needs, however. The factors determining whether an information need is pursued and satisfied or not are discussed in section 2.2.3. The information needs described in the following sections are only those of the general practitioner in his role of service provider. Therefore, only questions arising during medical consultations are taken into account. Information needs that are met by regularly reading medical journals or randomly “browsing” for information without a real question in mind are considered to be the information needs of the general practitioner as a learner. 2.2.1 Types of information Gorman [GOR95] identifies five types of information used by physicians, see Table 2. The first type, patient data, refers to information about a specific person. It includes the patient’s medical 22 Question answering for general practitioners history, observations from physical examination, and results of diagnostic testing. This information is usually obtained from the patient himself, his family and friends, and the medical record. These data fall outside the scope of a potential QA system, for it is not convenient to consult the patient by a QA system and it is assumed that electronic medical records are well enough organized to make a QA system superfluous. Patient data might be included, however, in the questions a general practitioner would submit to a QA system. Type of information Description Patient data Refer to a single person Medical knowledge Generalizable to many persons Population statistics Aggregate patient data Logistic information How to get the job done Social influences How others get the job done Table 2 Information types Medical knowledge is general information that is applicable to the care of all patients. It includes scientific medical knowledge, but also the accumulated informal experience of the general practitioner. Medical knowledge could be sub classified according to classic textbook categories (etiology, pathophysiology, clinical manifestations, diagnosis and differential diagnosis, treatment, and prevention) or according to organ system domain categories (dermatology, rheumatology, neurology, etc.). Next to the classic textbook categories Magrabi et al. [MCW05] discern a separate category, namely patient education. Questions about patient education deal with the need for information to better inform patients about their conditions or to increase their compliance with the treatment. Whereas information on for example etiology, diagnosis, or treatment is used to support the medical consultation phase of choosing a course of action, information for patient education is used for the phase of providing explanations. Population statistics refer to aggregated data about groups or populations of patients. This includes formal population statistics, but physicians also use their personal knowledge of recent illness patterns in the community as a form of informal epidemiological information. Logistic information refers to local knowledge about how to get the job done, often specific to a practice setting or payment mechanism. As examples Gorman [GOR95] mentions information about required forms, coverance by insurers, and referral lists of medical care organizations (which is typical for the American situation). He doesn’t mention, however, information about which physician a patient should be referred to from a medical instead of an economical point of view. I assume general practitioners sometimes need to find out which physician performs a particular treatment he thinks is convenient for a particular patient. This information seems to be on the boundary of medical knowledge and logistic information. Logistic information is usually local and can best be obtained from human sources such as office and hospital staff or colleagues. This type of information is therefore not suitable for a QA system. Social influences refer to knowledge about the expectations and beliefs of others, especially colleagues, but also patients, families, and others in the community. This type of information can evidently not be provided by a QA system. It may be of influence, however, on the general practitioner’s behavior concerning a QA system, but this falls outside the scope of this research. 23 Mieke van Langen Medical knowledge and population statistics can surely be dealt with by a question answering system. Actually, the IMIX demonstrator used for this research [NWOa] is a question answering system dealing with questions about medical knowledge and population statistics. Its document collection includes the Spectrum Medical Encyclopedia (aiming at the general public) and the Merck medical data (aiming at medical professionals as well as the general public). Besides, for the RSI domain, additional data are obtained from (among others) the RSI patient association, TNO Arbeid, Arbobondgenoten, Ergo-Direct, and Stichting RSI Nederland. 2.2.2 Types of information needs Next to types of information, Gorman [GOR95] also identifies different types of information needs, see Table 3. First of all he discerns unrecognized needs. These can be inferred from measurement of physician knowledge or observation of clinical practices. Information systems that depend on the physician to seek information can’t succeed until the physician recognizes that a need exists. To address unrecognized needs, information systems should be designed to do so, for example by issuing automatic reminders or by automatically informing physicians of additional diagnostic possibilities not initially considered by the physician. When a QA system would be used for this purpose, it should thus extract implicit questions from the data the general practitioner enters into the general practitioner information system during the phase of data gathering and recording. The extraction of implicit questions falls outside the scope of this research however. Type of information need Description Unrecognized need The physician is not aware of the information need or knowledge deficit Recognized need The physician is aware that information is needed Pursued need Information seeking occurs Satisfied need Information seeking succeeds Table 3 Types of information needs Secondly, Gorman identifies recognized needs. These are needs articulated by the physician. A question being articulated by a physician doesn’t guarantee, however, that the answer is actually necessary to benefit the patient or the practitioner, in other words, that it is really a ‘need’. Recognized needs for which some information seeking behavior is executed are called pursued needs. If the pursuit of a particular need is successful, this need is also called a satisfied need. A question answering system can only answer pursued needs, because it cannot read the physician’s mind for recognized needs. The general practitioner should enter the question into the system himself, making it a pursued need. The aim of the question answering system would be to answer the entered question, in other words to turn the pursued need into a satisfied need. Besides, it should be designed to tempt the general practitioner to enter all his medical questions, turning as much as possible recognized needs into pursued needs. Apart from the definitions for information and information needs, the definition of a question is also tricky, because medical questions tend to be multi-factorial. They can contain questions 24 Question answering for general practitioners within a question [GOR95]. Most of them are complex, and patient-, problem-, and practitionerspecific. Physicians therefore usually first need to tell the patient’s story, to explain the context of the question. These stories often contain information from several of the five different information types. See Appendix A for a sample of typical questions asked by general practitioners. Results from different studies on the information needs of physicians can apparently differ greatly when different definitions for information and information needs are used. Estimates for general practitioners range from 0.07 to 1.8 questions per patient encounter [GOR95]. In a questionnaire executed among 226 Dutch general practitioners in 1996 [VNB99] general practitioners indicated that questions for which they needed answers arose 6.9 times a week (the amounts ranged from 0.04 to 50). These general practitioners were also asked to record their most recent question, which is thus a recognized information need. Of these questions 50.5% dealt with therapeutic problems, and 24.8% with diagnostic problems. Circulatory, musculoskeletal, and digestive were the top three system domain categories the questions dealt with. Most other studies were executed in English speaking countries. Following, a few of their findings relevant for a Dutch QA system are described. Magrabi et al. [MCW05] analyzed the queries Australian general practitioners submitted to an experimental online evidence system, which were thus pursued information needs. This online evidence system was an information retrieval system in which users not only had to enter keywords, but could also select a search filter concerning the type of question (disease etiology, diagnosis, treatment, prescribing, or patient education). In this study 43% of the questions dealt with therapeutic problems (35% with treatment and 8% with prescribing), 40% with diagnostic problems, 10% with patient education, and 7% with disease etiology. Gastrointestinal, dermatology, and musculoskeletal were the top three system domain categories for which information was searched. A drawback of this study is that it only deals with the questions pursued with this online evidence system. Questions that were not pursued or that were answered with other means were not considered. However, the questions pursued with an online evidence system probably resemble those that could be answered by a QA system, because both systems can only make use of electronic resources. Ely et al. [EOE99] collected 1101 questions (recognized information needs) from 103 American general practitioners. They searched these data for generic questions. The most frequently used question structures were “What is the cause of symptom X?”, “What is the dose of drug X?”, “How should I manage disease or finding X?”, “How should I treat finding or disease X?”, and “What is the cause of physical finding X?”. Besides, Ely et al. found that older patients and female patients elicited more questions than younger and male patients respectively, and that younger physicians asked more questions than their older colleagues. Barrie and Ward [BW97] collected 85 medical questions (recognized information needs) from 27 Australian general practitioners. They found that physicians in solo or duo practices asked significantly fewer questions per consultation than those in larger practices. For a Dutch question answering system this means questions can be expected (in descending order of frequency) on diagnostic problems, treatment, patient education, prescribing, and disease etiology. Gastrointestinal, dermatology, musculoskeletal, and circulatory will likely be the most frequent system domain categories. Knowledge of frequently used generic questions can be used for the design of the question analysis component of the system. Finally, it is expected that younger physicians and physicians working in larger practices will be more likely to use a question answering system, because they generally have more questions than older general practitioners and general practitioners working in solo or duo practices. 25 Mieke van Langen 2.2.3 Pursuing information needs In a questionnaire executed among 226 Dutch general practitioners in 1996 [VNB99] general practitioners were found to immediately pursue an answer to their questions in 76% of the cases and in 85% of these cases a (partial) answer was found. This means that 65% of all recognized information needs were turned into satisfied needs, and 24% of the recognized information needs were not even pursued. Gorman and Helfand [GH95] even found that 70% of the questions arising in general practice are never pursued. This is quite a large difference. According to Gorman [GOR95], this might be due to differences in the definition of terms, subjects, setting, and method of data collection. Both studies examined general practitioners not working in hospitals, and both studies concerned recognized information needs about medical knowledge. However, Gorman and Helfand [GH95] observed American general practitioners and recorded their questions during patient care, while Verhoeven et al. [VNB99] collected Dutch general practitioners’ questions by sending them a questionnaire in which they were asked to record their most recent question and whether or not they pursued this question. This is a great difference in method of data collection. In Verhoeven’s research general practitioners might have been tempted to record an information need that they pursued because that’s what they remember or because of social influences, resulting in a much higher rate of pursued questions. Gorman and Helfand [GH95] found two factors that predicted the pursuit of information needs: the physician's belief that a definitive answer existed, and the urgency of the patient's problem. Ely et al. [EOE99] collected 1101 questions from 103 American general practitioners. They found that only questions about drug dose were routinely pursued and that an answer was found to 80% of the pursued questions. Both findings are consistent with those of Gorman and Helfand. In the literature lots of different barriers are identified that complicate the search for information by general practitioners. Apart from lack of time and information overload, Ely et al. [EOE02] have identified different obstacles for each of the following steps in asking and answering questions: recognizing an information gap; question formulation; searching for relevant information; answer formulation; and using the answer to direct patient care. In the research of Ely et al. the general practitioners only performed the first two steps and the last step themselves. Information searching, and answer formulation were done by experts who tried to answer questions generated by general practitioners. These are exactly the steps that would be executed by a QA system. The obstacles found for these steps are therefore highly important for this research. The obstacles identified by Ely et al. [EOE02] are summarized in Table 4. Obstacles related to recognizing an information gap deal with the transformation of an unrecognized need into a recognized need. Sometimes physicians are unaware of a gap in knowledge when they make a decision. In this case they have an information need, but they don’t recognize it. They might also suppress a recognized information need because of time pressure, embarrassment, personal characteristics, or characteristics of the clinical setting. Question formulation refers to modifying the question in order to be able to find relevant literature. For example, patient specific questions should be generalized, patient data could be added to focus the search, potential supplementary questions could be anticipated for, specific words might be changed, etc. When a QA system is used for answering medical questions, a dialogue might be needed to overcome the obstacles related to this step. 26 Question answering for general practitioners Six different sorts of obstacles related to the searching for relevant information step are identified. The first is failure to initiate the search. Reasons for not pursuing information needs include doubt about the existence of relevant information, insufficient justification (when the question is not important enough to justify a search), lack of time, and the availability of consultation (sometimes general practitioners just refer patients to specialists rather than learn enough about the problem to manage it themselves). Detmer and Shortliffe [DS97] also mention the ignorance of the availability of relevant information as a reason for not pursuing information needs. Steps in asking and answering questions Obstacles Recognizing an information gap Lack of awareness of an information need Suppression of a recognized information need Question formulation Inability to answer patient specific questions with general resources Missing patient data Uncertainty about the scope Difficulties modifying the question Searching for relevant information Failure to initiate the search Uncertainty about the searching strategy Inadequate (availability of) resources Inadequate information Inadequate evidence Inadequate use of evidence Answer formulation Failure to directly or completely answer the question Too long or too short answer Answer directed at the wrong audience Difficulty addressing unrecognized information needs Discomfort with formulating an answer to be used in patient care Using the answer to direct patient care Answer not trusted Answer no longer needed Answer inadequate Table 4 Obstacles to answering medical questions Secondly, general practitioners may be uncertain about the right searching strategy. They may have difficulties with selecting the appropriate resources, be uncertain about how to know when all the relevant evidence has been found so that the search can stop, don’t know the meaning of null search results, etc. The meaning of null search results is also important when developing a QA system, because when no articles are found on a certain treatment or when a relevant article doesn’t mention the treatment, this doesn’t necessarily mean that there is no treatment, but sometimes a null search result might be a clear answer. Thirdly, resources might be inadequate. They might be badly accessible, badly indexed, poorly organized, not clinically oriented, not trusted, not current, not allowing real time interaction with 27 Mieke van Langen the searcher, or a certain topic might not be included in a resource that should logically include it. Other obstacles are inappropriate descriptors of resources [VBM95], the cost of resources, difficulties learning or using many resources, and variable quality of the information [WM99]. The fourth obstacle, inadequate information, deals with information that is incorrect, not current, vague, unnecessarily cautious, biased, or fails to anticipate supplementary information needs, differentiate between different diagnoses, define terms, or adequately describe clinical procedures. Verhoeven et al. [VBM95] also mention the overload of irrelevant information as an obstacle to finding the right answer. The fifth and sixth group of obstacles related to the information searching concern the evidence. When studies don’t address the medical question, don’t compare the relevant treatments, or don’t study the outcome or population of interest, they may deal with the right subject, but still not be relevant for the question. Besides, relevant evidence might be badly synthesized or hardly applicable. Obstacles related to answer formulation include failure to directly or completely answer the question, too long or too short answers, answers directed at the wrong audience, and difficulty addressing unrecognized information needs that are evident in the question. Besides, nonphysician searchers indicated they were not comfortable formulating an answer that would direct patient care. Finally, the step using the answer to direct patient care was sometimes not executed, because the answers were not trusted, too late or inadequate. Different suggestions to overcome all these obstacles are summarized in literature. Verhoeven et al. [VBM95] suggest improved accessibility of information resources by computerization, education in the use of information sources, and improved accessibility to library facilities. Besides, she argues journal articles should be tailored more to the general practitioner’s daily work. Magrabi et al. [MCW05] suggest that search systems for electronic resources should be preprogrammed with specialist bibliographic knowledge to save the physician’s time. For example, the online evidence system investigated by Magrabi et al. used search filters (such as ‘diagnosis’ or ‘treatment’) that added specialist keywords to the query entered by a general practitioner that have been shown to significantly enhance the quality of search results, but are unlikely to be known by the general practitioner. Some of these solutions can’t go without each other. For example, computerization has the potential to offer general practitioners access to loads of information, but studies indicate that general practitioners have difficulty finding the most relevant resources and selecting the appropriate search terms [WM99]. Therefore, also education in the use of resources and/or search programs with specialist bibliographic knowledge are needed to make computerization a real solution. QA could also make computerization a solution, because it eliminates the need for education in the use of resources (for the only thing the physician has to do is entering a question in natural language). Besides, a QA system could also incorporate bibliographic knowledge for searching the right information. Ely et al. [EOE02] think authors should anticipate the needs of busy physicians. For example, when authors name a certain drug, they could include essential prescribing information, because this may be an unrecognized or supplementary information need when a physician has a question about this drug; resources could be written in a question and answer style; resources should be kept current by the ongoing surveillance of physicians’ changing questions; and research should be initiated and funded based on questions without adequate answers. These are all issues that could not be solved by a QA system. Ely et al. also indicate however, that the modification of questions from the way they were originally stated by the general practitioners often proved very helpful for searching the right information. This is an issue that might be addressed by a QA system. 28 Question answering for general practitioners A question answering system could deal with the following obstacles: obstacles related to question formulation could be overcome by modifying the query, possibly using a dialogue; because a question answering system is an information seeker itself, it could take away the uncertainty about the search strategy to be followed by the general practitioner; the system could improve the accessibility of other electronic resources, because it eliminates the need for general practitioners to directly interact with those resources; problems with poorly organized information could be accounted for, when the system adequately synthesizes information from different sources. However, most of the obstacles regarding information searching remain when a question answering system is used, because they are inherent to the resources used. The obstacles related to answer formulation could only be overcome by a question answering system when questions are correctly interpreted and the right resources are used. Besides, the system should correctly interpret null search results. 2.3 Information sources When general practitioners pursue their information needs, they can use a lot of different information sources. In this section only the sources of medical knowledge and population statistics are considered, because these are the information types that are relevant for a QA system. In the first subsection different types of information sources are discussed. In section 2.3.2 the sources of evidence based medicine used by general practitioners are described. Finally, the general practitioner’s information seeking behavior with respect to the different information resources (influenced by the awareness of information) is discussed in section 2.3.3. 2.3.1 Types of information sources For medical knowledge and population statistics three different types of information sources can be discerned [VNB99], see Table 5. Printed sources include general practitioners’ own books and journals (their private medical libraries), but Dutch general practitioners may also address the libraries of the local hospitals they refer their patients to. Besides, the Dutch Institute for Research of Health Care (NIVEL) in Utrecht provides medical information on demand for general practitioners from their own library, and the Royal Dutch Academy of Arts and Sciences (KNAW) in Amsterdam (owning the largest medical journal library in the Netherlands) provides journal articles to general practitioners [VER99]. Printed sources for patient education are provided by the Dutch College of General Practitioners (NHG), the Scientific Institute of Dutch Pharmacists (WINAp), patient organizations, associations of specialists, hospitals, drug manufacturers, etc. The Dutch College of General Practitioners publishes patient brochures and patient letters [NHGp]. Patient brochures provide general information on frequently occurring disorders and the measures the patient can take to prevent or cure them. Patient letters have been written especially for patients suffering a specific disease. These letters provide detailed information on the disease and its treatments. Printed sources of formal population statistics include published descriptions of disease prevalence in the medical journal literature [GOR95]. Human resources include colleagues, specialists, and office staff. Specialists contain a lot of domain-specific knowledge, but their time is limited. Westberg and Miller [WM99] state that modern academic health care centers may be able to satisfy many of the general practitioners’ information needs by providing Internet-mediated access to their electronic and human 29 Mieke van Langen information sources. They propose a triage model for doing so. With this model information requests are firstly mapped to electronic resources. Only when a request doesn’t seem to map well, access is provided to human resources. Human information on population statistics can be provided by public health departments [GOR95]. Information sources Examples Printed sources Drug reference books, private books and journals, library books and journals, journal articles received from others Human resources Other general practitioners, specialists, office staff Electronic sources Cd-rom, online databases, Internet Table 5 Information sources for medical knowledge Electronic sources include information on cd-rom or information that is accessible via a modem or Internet connection. The biggest electronic source is of course the World Wide Web. The World Wide Web potentially provides the general practitioner with all the latest information on medical issues. Physicians are often frustrated, however, by the difficulty in finding reliable and relevant information on the Web quickly [DS97]. They want resources that are practitioner oriented, produced by reputable sources, and cover a specific topic in medicine. Because of the Web’s rapid growth and lack of controls, its organization is poor, and validity and reliability of sources found on the Web are questionable [WM99]. These shortcomings render a substantial amount of Web information unsuitable for direct clinical application. A strategy for use of the Web to support clinical practice could be locating and using anchors of known high quality [WM99]. The major Internet search systems are not discriminatory in what they index and their index methods are word based. Detmer and Shortliffe [DS97] argue that a medical retrieval system should make use of content based instead of word based index methods. Whereas word based index methods index documents by the words occurring in the documents, with content based index methods documents are indexed by a mostly fixed set of general terms (not necessarily occurring in the documents) describing what the documents are about. Documents could for example be indexed by the controlled-vocabulary terms from the Medical Subject Headings (MeSH). Besides, representation methods that add contextual information to portions of documents may help improve retrieval relevance by focusing retrieval in only relevant semantic regions. One of the most mentioned sources of electronic medical information is Medline. This is a bibliographic database provided by the National Library of Medicine, containing citations to the last 40 years of medical literature [DS97]. One of the programs for searching Medline is PubMed [EPM], which is publicly available through the Internet. Next to citations PubMed provides a summary for most articles. Sometimes also free full-text articles are available in PubMed Central or elsewhere on the Internet (in which case a link is provided). Otherwise, one needs to go to a library or subscribe to the publisher to get the full-text article. Medline can be helpful for answering medical questions. However, finding specific answers to questions can be time-consuming and expensive because of the effort required to search through a sometimes large set of relevant publications [WM99]. Most of the printed sources for patient education are also provided electronically. For example, patient brochures and patient letters are published on the website of the Dutch College of General Practitioners [NHGp]. Besides, the Scientific Institute of Dutch Pharmacists electronically publishes information on the most frequently used medicines [APO]. Electronic sources of population statistics include electronically available published descriptions of disease prevalence in the medical journal literature, but also aggregate patient data from electronic medical records [GOR95]. Published statistics are not always applicable to a given 30 Question answering for general practitioners local population because of differences in ethnic composition, local vectors of disease, or lifestyle differences, but the increasing use of electronic medical records improves capabilities for analyzing data of the local population. For example, the Dutch National Information Network for Primary Care (LINH) is a network of 93 automated general practices with over 360,000 patients [LINH]. These general practitioners continually collect data on diseases, consultations, drug prescriptions, and referrals. These data are used to generate representative, continuous, quantitative, and qualitative information on the care provided by Dutch general practitioners, but could also be used to generate population statistics. Gorman et al. [GAW94] investigated to what extent the questions arising in general practice could be answered using only online medical journal literature. They randomly collected a set of 60 questions from American general practitioners not working in hospitals. Medical librarians tried to find answers to these questions in online resources. The general practitioners themselves evaluated the information found. In 56% of the cases physicians judged the information to be relevant for their question. In 46% of the cases the information provided a clear answer to their question. These 46% might include cases where no relevant information was found, because sometimes this is a clear answer to questions such as “Is there any information on new therapies for disease X?”. In 40% of the cases physicians expected the information would have had an impact on their patient, and in 51% of the cases they expected the information would have had an impact on themselves or on their practice. These percentages highly exceed the current use of electronic resources by general practitioners [VNB99], suggesting that a QA system might be able to answer a lot of questions currently being answered with human or printed resources. A question answering system could only make use of electronic sources. Luckily, printed sources and sources for patient education are increasingly available in electronic form, making them also accessible by a question answering system. Human resources can not be consulted by a question answering system, but the information they present might also be available via printed and electronic resources. 2.3.2 Sources of evidence based medicine Evidence based medicine is an approach to clinical practice in which physicians base their decisions and actions on appropriate evidence from the patient’s history, examination, laboratory data, and scientific medical knowledge [VER99]. Practicing evidence based medicine is guided by the following principles: formulating the question; searching the literature for relevant information; selecting the articles; appraising the evidence for validity and usefulness; and applying the evidence in everyday practice. Literature research must be guided by scientific strategies and should satisfy the same criteria as research in general. It should thus be valid, reproducible, and verifiable. Bias should be limited. For example, studies yielding statistically significant differences between groups are far more likely to be reported than those in which no differences were found. Therefore, to minimize publication bias, both published and unpublished studies need to be included and criteria for inclusion and exclusion must be accounted for. It is recognized however that critically evaluating all literature is unrealistic for busy physicians. Instead, general practitioners rely on evidence based resources, like guidelines, critically appraised articles, and systematic reviews. Dutch guidelines for general practitioners are issued by the Dutch College of General Practitioners (NHG). These guidelines include the NHG-Standaarden [NHGs], which prescribe the actions a physician should undertake concerning diagnosis and treatment of certain 31 Mieke van Langen diseases, and the NHG-formularium [NHGf], which gives pharmacotherapeutic advise. Both are evidence based, but they only provide guidelines. They don’t give any information about the effectiveness of different therapeutical options. English guidelines are provided for example by the National Guideline Clearinghouse [NGC]. Critically appraised articles are published by several journals [VER99]. For example Evidencebased Medicine and ACP Journal Club together publish the cd-rom Best Evidence [BE96], and the Journal of Family Practice also publishes critically appraised articles. Systematic reviews can be subdivided into qualitative systematic reviews and quantitative systematic reviews. For a qualitative systematic review the medical literature is searched for all relevant information on a specific disease, in order to formulate the best approach to diagnosis or treatment. For example, Clinical Evidence (published by the British Medical Journal) [BMJ] is a printed and electronic source providing information on the evidence of the effectiveness of different therapies. Each subject is started with relevant medical questions, and then the best available evidence is summarized to answer these questions. Besides, a list is provided of the interventions covered, categorized according to whether they have been found to be effective or not. Clinical Evidence doesn’t make any recommendations. It is updated every six months in print and monthly online. Quantitative systematic reviews (or meta-analyses) try to answer medical questions, using rigorous statistical analysis of pooled research studies. For example, the Cochrane Library (published by the Cochrane Collaboration) [CC] is an electronic journal that provides quantitative systematic reviews. Whereas qualitative reviews consider all reported treatments for a specific disease, quantitative reviews concentrate on the evidence for only one treatment and summarize the statistical data of all relevant studies to get more significant information about the effectiveness of this treatment. Next to guidelines, critically appraised articles, and systematic reviews, there are also question answering services that provide physicians with evidence based answers (generated by for example a clinical librarian) on their medical questions. Usually these services require that general practitioners submit their questions in PICO (Patient or Problem, Intervention, Comparison intervention, Outcomes) format in order to direct the search to relevant and precise answers [CEBM]. In this format firstly the patient’s problem and characteristics are described, then the intervention the physician is considering, then (if relevant) the alternative intervention to which the physician wants the intervention to be compared, and finally the outcomes the physician wants to reach with this intervention. For example, the NLH Question-Answering Service [NHS] is a pilot project that tries to answer medical questions. The answers provided by this service consist of the original question, the interpretation of this question, the text of the answer, and the references used. Verhoeven and Schuling [VS03] developed a question answering service for Dutch general practitioners to investigate whether general practitioners use this service and what the costs are. General practitioners used this service minimally, but it was found that they used the service more often when they personally knew the person who answered their question. It turned out to be possible to answer general practitioner’s questions within the required timeframe, and the costs were on average 200 Euro per question. Coumou and Meijman [CM03] suggest that these costs should be covered by the patient’s health insurer, because literature research is quicker, cheaper, and sometimes more useful than for example blood tests or MRI scans. Because general practitioners seem very reluctant to use question answering services, and a single search executed by a medical librarian costs on average 200 Euro, a question answering system is probably not suitable to answer the type of questions submitted to a question answering service. Besides, it is not realistic to expect a question answering system to critically appraise all information sources. A question answering system should therefore only consult evidence based resources 32 Question answering for general practitioners for answering medical questions. An electronic source like the World Wide Web or even Medline is thus not suitable as document collection for a medical question answering system when a physician wants to practice evidence based medicine (which he should). However, most evidence based information sources are in English and not freely available, except for the guidelines issued by the Dutch College of General Practitioners (NHG). When other resources are used, the information found by the question answering system should be presented in such a way that the general practitioner is able to evaluate the answers himself. 2.3.3 Using information sources The information sources most frequently used by general practitioners are human based (colleagues), followed by private books (tertiary literature), journals (primary literature) [VBM95], and continuing medical education (such as classes and conferences) [WM99]. Libraries and printed or online bibliographies are much less used. In 1999 medical computer applications, telemedicine and the World Wide Web were the least used information sources [WM99]. In a survey among Dutch general practitioners conducted in 1996 [VNB99] only 3% of the general practitioners indicated they sometimes used the Internet for answering patient-specific questions. Variances in the use of information sources exist among general practitioners. Factors influencing this behavior are presented in Table 6 [VBM95]. The first factor is physical, functional and intellectual accessibility of the resource. Physical accessibility concerns how close the resource is to the general practitioner. Distances are diminishing however as information increasingly becomes available via general practitioners’ desktop computers. Functional accessibility concerns the time and energy needed to search the information. As for electronic sources, a QA system could seriously decrease the time and energy needed to find relevant information. Intellectual accessibility concerns the understandability of the information. This depends on the intelligence of the practitioner, but also on the organization of the information. Factor Description Physical, functional, and intellectual accessibility of the resource Availability, searchability, and understandability Age Participation in research or teaching Social context Rural vs. urban physicians Practice characteristics Solo practice vs. health center Stage of information gathering Analysis, decision, etc. Table 6 Factors influencing information seeking behavior The second factor is the age of the general practitioner. Younger physicians tend to use libraries and printed sources more frequently than older physicians. Thirdly, physicians who engage in research or teaching use journals, conferences, libraries and online databases more often than others. The fourth factor is social context. Rural physicians tend to perform less online searches than urban physicians do. The fifth factor is practice characteristics. Physicians in solo practices use journals most, whereas physicians in health centers usually consult colleagues. Finally, the sixth factor is the stage of information gathering. In calling attention to 33 Mieke van Langen new information printed material is mostly used, during analysis personal contact is most important, and in the decision stage refresher courses are the most important information source. Human based resources are preferred for several reasons [GOR95]. Firstly, because many medical questions have a narrative character, they can be easier asked to a colleague than to printed or electronic sources. Secondly, information seeking behavior by general practitioners is not only determined by the need for medical knowledge, but also by the need for commiseration, affirmation of professional relationships, feedback about their own knowledge and practices relative to those of others, etc. These needs can’t be easily fulfilled by printed or electronic sources. Thirdly, there may be a need for higher-order information than descriptive medical knowledge, such as confirmation, explanation, analysis, synthesis, and judgment that takes into account the complexity of the patient’s case and combines it with an expert understanding of the issues involved. Finally, general practitioners need an answer to a patient care problem, not just information relevant to a query. With this respect, human sources understand best what the general practitioner needs. Electronic information sources are not frequently used, because the variables of awareness of electronic sources are generally negative. In the survey conducted in 1996 [VNB99] Dutch general practitioners indicated that they wanted improvements in the World Wide Web on ease of searching, financial costs, time to search, chance of success, and usefulness. No relation was found between the use of electronic sources and the age or type of practice of the general practitioners in this survey. Ideally, electronic information retrieval systems should automatically display relevant summary information and provide links to supporting evidence and analysis. To realize this vision a combination of content, information science methods and technology is required [DS97]. Content already becomes increasingly available electronically. Information science methods are needed to structure this content to achieve optimal retrieval, select resources to best answer particular questions, integrate information from several sources into one consistent view, and provide search interfaces that help users select the appropriate search terms. Technology concerns high-speed data networks, standard protocols, open-systems architectures, and cross-platform applications. In this area the Internet already provides most functionality. Moreover, electronic content increasingly becomes available via the Internet and integration of lots of electronic sources is thus possible. It is therefore most important to concentrate on the information science methods. A question answering system could implement most of the information science methods mentioned above: content can be structured by indexing the document collection, preferably content based (as opposed to word based); information from several sources is integrated into one consistent view when the system composes a single answer of all retrieved information; a question answering system naturally provides a search interface that helps users select the appropriate search terms, because the user only has to enter a question in natural language, which the system will then transform into the appropriate query. A challenging issue, however, is selecting the right resources. Physicians need assistance with this, because they are not aware of all the resources available to answer a particular question, nor do they have time to assess which resource is best [DS97]. To automate the resource-selection process, the system must have knowledge of what questions each resource can answer. It is therefore needed to know the scope, depth, intended audience, currency of information, and reputability of each source. 34 Question answering for general practitioners As for sources of evidence based medicine, Barrie and Ward [BW97] state that general practitioners may be reluctant to change towards these sources, because they highly value the human judgment and accessibility of their current sources. If general practitioners would experience an excess of unanswered questions or if they were dissatisfied with their current sources, they might be motivated to change resources. However, they seem to find answers to most of their medical questions and are satisfied with the information sources they currently use. Apart from the established preference of physicians for human information sources, the most important determinants of whether a knowledge resource will be used are its availability and clinical applicability [GAW94]. In a survey among Dutch general practitioners the most important characteristics for choosing an information source were found to be its reliability, usefulness, and ease of searching [VNB99]. Financial costs were considered the least important. The general practitioners indicated they think the following improvements in obtaining information are desirable: better-organized and more practical journal articles; more high quality, evidence based reviews; computer-accessible abstracts of most important articles; computer-accessible medical questions and answers; and more medical guidelines. A medical question answering system could help in providing computer-accessible abstracts of relevant articles, medical questions and answers and medical guidelines. However, in order to be used by general practitioners, the information presented by a question answering system should be reliable and easy to apply in clinical practice. 2.4 Computer use Computer applications developed to support clinical decision making for general practitioners can be divided in three categories [WM99], see Table 7. The first category consists of clinical information systems. These systems manage electronic medical records. They support general practitioners by reliably and efficiently storing and retrieving patient data. Therefore, structured ways of data registration have been developed [DHP98]. An approach for the registration of medical data is SOAP (Subjective findings, Objective findings, Assessment, and Plan). This is a problem-oriented way of recording. Firstly the symptoms stated by the patient (subjective findings) are recorded, then the signs found in medical examination (objective findings), thirdly the diagnosis concluded by the general practitioner (assessment), and finally the prescribed therapy (plan). Further, symptoms, diagnoses and therapies can be specified by specific codes. A coding instrument that is accepted by the Dutch College of General Practitioners (NHG) and the World Health Organization is the ICPC (International Classification of Primary Care). Criteria for the ICPC codes are defined in the ICHPPC-II-Defined (International Classification of Health Problems in Primary Care-II-Defined). There are also ICPC-like classifications for physical examination results and test outcomes (OCO), and for drugs. Type of general practitioner application Description Clinical information systems Electronic medical record systems Clinical decision support systems Provide information about diagnosis, therapy, and prognosis, based on patient data Bibliographic and full text information retrieval systems Search for information relevant for answering medical questions Table 7 Categories of applications developed for general practitioners 35 Mieke van Langen Secondly, clinical decision support systems provide general practitioners with information regarding diagnosis, therapy, and prognosis, based on patient data. A Dutch clinical decision support system that has been introduced countrywide is the Electronic Prescription System (EPS) [LSS01]. This system suggests drug prescriptions and other therapies for a patient, based on patient data, the (ICPC coded) diagnosis, and prescription guidelines issued by the Dutch College of General Practitioners [NHGf, NHGs]. The system can also print a drug prescription and send it directly to the pharmacist. Besides, patient letters [NHGp] can be retrieved with the system. To be able to use this system, the general practitioner should also use a clinical information system, the SOAP approach, and ICPC coding. Finally, bibliographic and full text information retrieval systems support general practitioners in finding information to answer their questions. An example of a bibliographic information retrieval system is PubMed [EPM] (see the part on electronic resources in section 2.3.1 for more information on this system). A question answering system would belong to the category of bibliographic and full text information retrieval systems. Theoretically, a question answering system could also be used as a decision support system, but a lot of reasoning is needed for that. Therefore, the IMIX demonstrator used for this research excludes advisory questions. The question types dealt with by the IMIX demonstrator include: questions about facts (e.g. What is RSI?); verification questions (e.g. Is RSI chronical?); multiple choice questions (e.g. Is exercise good or bad for RSI?); and quantity questions (e.g. How many people are suffering RSI?). During the evaluation of the implementation of the Electronic Prescription System (EPS) different surveys were executed on the computer use by Dutch general practitioners. In 1999 95% of the general practitioners reported they owned a computer, and in 2001 this percentage was already 97% [WHB02]. In 2001 100% of the computer owning general practitioners also owned a clinical information system, 94% of them actually used the electronic medical record, 71% also owned the EPS, and 87% of them actually used it. The EPS was even found to be used more often than any printed resource. The electronic medical record was used during the consultation in 86% of the cases, and mostly also the SOAP approach was used. ICPC coding is less commonly used. Only 25% of the electronic medical record using general practitioners indicated they nearly always used ICPC coding, but this amount had already significantly increased since 1999. The Internet use by Dutch general practitioners was investigated by order of the Dutch College of General Practitioners in 2000 [COX00]. In this research 76% of the general practitioners said they had access to the Internet, but only 37% could access the Internet from their general practice. Almost half the general practitioners without an Internet connection indicated they were considering implementing one before the end of the year. The general practitioners primarily used their Internet connection for e-mail, mostly with colleagues. However, 43% of the general practitioners also indicated they wanted to use websites with medical knowledge (those days the guidelines issued by the Dutch College of General Practitioners were not yet available via the Internet). When medical knowledge is available electronically, the general practitioners would rather access it through Internet than on cd-rom. 34% of the general practitioners indicated they would like to receive some education on searching medical knowledge on the World Wide Web. Several general practitioner information systems and their implementations and evaluations have been described in literature. They range from simple administrative systems, to advanced decision support systems. None of them really uses QA technology, though some information 36 Question answering for general practitioners retrieval systems get close. In the remainder of this section findings from literature are described that could be useful for the design of the information presentation module for a QA system. During the evaluation of several versions of a Dutch general practitioner information system by Dupuits et al. [DHP98], general practitioners indicated they didn’t appreciate the use of mouse input. They found the act of moving a mouse for data entry during a consultation too disturbing. Besides, the mouse with the underlay took too much space on their desks. Interfaces based on the use of windows and function keys were well appreciated. Speech input was also not feasible for practical reasons. Due to the diversity of voices, only the voices of a limited number of persons were recognized. Besides, disturbing sounds from the immediate surroundings occasionally caused the voice controller to randomly generate words. Findings from a user centered development process of a general practice medical workstation [RHF92] included that there is no best way to perform the work process, because medical practice is extremely complicated and users are highly variable. General practitioner information systems must therefore be equally flexible. Further, it was found that simple clear presentations are often more effective than sophisticated attempts to provide intelligent summaries, because physicians are good at recognizing patterns if the information is clearly presented. This research also indicated that some physicians find the use of a mouse difficult. It is to be questioned, however, whether this still is the case, because over ten years have proceeded from then. Further, several overall constraints were abstracted from the results of this research. Firstly, a general practitioner information system must always be interruptible. If the general practitioner turns away from the system, it must be in precisely the same state when he returns to it. Secondly, all functional options must be immediately visible at the user interface. The use of multi-state buttons should thus be minimized. Thirdly, there should be no modes. All functions should be possible at every phase of the consultation, since the course of the consultation is highly variable and unpredictable. Different evaluations of the EPS have been executed. The evaluation of the implementation of the system by the Dutch Institute for Research of Health Care (NIVEL) [WHB02] concentrated on the frequency of use and practical barriers for using the system (such as problems concerning the ICPC coding system, or lack of training). Lagendijk et al. [LSS01] primarily suggest different implementation methods to increase the general practitioners’ motivation to work with the system. They state that the most important issues for general practitioners are better communication (with hospitals, pharmacies and colleagues) and reduction of time pressure. An application for general practitioners should therefore contribute to one or both of these issues to increase general practitioners’ motivation. Boonstra [BOO03] investigated the subjective reasons for the limited acceptance of the EPS. He concluded that to improve general practitioners’ acceptance of the system it should, among others, be designed to fit the consultation process (it should not disturb, but enhance the communication with the patient); it should suggest alternative therapies rather than only one therapy (this would recognize and strengthen the self esteem of general practitioners as medical professionals); it should be designed so that users could add new therapies or local agreements on therapies; and patients should be informed about the features and advantages of the system (to avoid patients having a lower esteem of physicians using the system). For a question answering presentation module these findings implicate that the module should be very flexible. It should preferably accommodate multiple input modes, but at least input should not be restricted to mouse or speech input only. Further, the module should provide simple clear presentations as much as possible, and all functional options must be immediately visible at the user interface. Because the module must always be interruptible, it should not incorporate any timeouts. Finally, to achieve motivation to work with the system it should at least save the general practitioner time (for it cannot improve communication), and the system should not give the impression that the presented answer is the only best way, 37 Mieke van Langen because general practitioners want to have the feeling that as medical professionals they are in control of the decision process, not the system. 2.5 Conclusions Based on the literature reviewed in this chapter, the following conclusions can be drawn with respect to the work roles, tasks, and information needs of Dutch general practitioners suitable to be supported by a QA system, the variables related to the awareness of information that could be improved by a QA system, the information sources that could be consulted by a QA system, the way the system should search these sources for relevant information, and the user interface of the system. Work roles and tasks A QA system could support the general practitioner primarily in his work role of service provider, during the phase of searching databases, in order to help the general practitioner choosing a course of action and providing explanations to the patient. Especially when the system is available on a mobile device, it could really improve the physician’s tasks associated with the role of service provider, namely patient care. Information needs The types of information that can be provided by a QA system are medical knowledge and population statistics. General practitioners’ questions regarding these information types generally deal with diagnosis, treatment, patient education, prescribing, and disease etiology. The most frequently needed medical knowledge categories are gastrointestinal, dermatological, musculoskeletal, and circulatory knowledge. The document collection of the system could therefore consist of scientific medical literature on these subjects, sources for patient education, and formal population statistics. Besides, a Dutch QA system might provide some information specific for the Dutch situation that is on the boundary of medical knowledge and logistic information, like which hospital performs which treatments. It is expected that younger physicians and physicians working in larger practices are more likely to use a QA system, because they generally have more questions than their older colleagues and general practitioners working in solo or duo practices. Awareness of information The aim of the system should be to transform most of the recognized information needs concerning the information types provided by the system into satisfied needs. A lot of recognized needs are never pursued, however. The system should be designed to encourage general practitioners to pursue their information needs. Therefore, physicians should be convinced that a definitive answer exists and that this can be found with the QA system. Besides, the system should save the general practitioner time. Further, there are some variables related to the awareness of information that could be improved by a QA system as opposed to other information sources. A QA system could take away the uncertainty about the search strategy to be followed by the general practitioner, it could improve the functional accessibility of other electronic resources, and it could improve the intellectual accessibility of poorly organized electronic resources, when it adequately synthesizes information, possibly from different sources. Sources of information General practitioners seem to use very little electronic information sources, while a QA system can only access electronic resources. This shouldn’t be a problem, however, because research indicated that about half of the recognized information needs could be satisfactorily answered with online resources (also meaning that the other half of the information needs might not be met by a QA system). Only resources that are current, reliable, and suitable for direct clinical application should be used. Further, to allow for evidence based medicine, the system should 38 Question answering for general practitioners primarily use evidence based sources such as those provided by the Dutch College of General Practitioners (NHG) Information seeking The QA system should of course satisfy the pursued needs. Therefore, it is important that the system correctly interprets and modifies the question in order to find the right information and formulate a good answer, because general practitioners’ questions generally are very complex. A dialogue might be needed to accomplish this. To achieve optimal retrieval and select the right information sources to best answer particular questions, the system should index the available information (preferably using a content based indexing method). Further, the system needs to know the scope and depth of each source, it must incorporate bibliographic knowledge, and it should correctly interpret null search results. Concerning the answer formulation, the system should not give the impression that the presented information is the single best way for clinical practice. To allow for evidence based medicine, the system should enable critical evaluation of the answers by providing links to supporting evidence and analysis. User interface The information presentation module of a QA system should be very flexible. Input should not be restricted to mouse or speech input only, information should be presented as simply and clearly as possible, and all functional options must be immediately visible at the user interface. The module must not incorporate any timeouts. 39 Mieke van Langen 40 Question answering for general practitioners 3 Interviews with general practitioners Most literature cited in the previous chapter is a few years old and based on non-Dutch research. Coumou and Meijman [CM03] summarized the available literature published between 1992 and 2002 on the information seeking behavior of general practitioners. The only Dutch research on the information needs and use of general practitioners was executed by Verhoeven et al. [VBM95, VER99, VNB99]. Besides, the only articles dealing with answering general practitioners’ questions concerned bibliographic information or document retrieval systems, or human question answering services. Computerized question answering (QA) systems have not yet been used in the medical field. To get more in-depth information about Dutch general practitioners and their expectations of a QA system, interviews were conducted with a few Dutch general practitioners and Anita Verhoeven. In section 3.1 the research method is described, section 3.2 presents the results of the interviews, and in section 3.3 the conclusions with respect to QA for general practitioners are discussed. The results of the interviews are not always consistent with the findings from literature described in the previous chapter. In section 3.4 the differences and agreements between the literature and the interviews are analyzed. 3.1 Method Because great variations exist among general practitioners and their use of information systems, this research does not aim to develop a system that would be equally appreciated by all Dutch general practitioners. Requirements for the system will therefore not be based on quantitative research but on qualitative interviews with Anita Verhoeven and only a few Dutch general practitioners. Verhoeven also was a general practitioner, but because of a shortage of available general practices, she turned to another job. Nowadays, she is medical librarian and information specialist at the University of Groningen. In the 1990s she executed a PhD research on the information needs of general practitioners [VER99] (which is frequently cited in the previous chapter). The information she gave in the interview is partly used to complete the literature survey in the previous chapter, and partly to comment on the results of the interviews with general practitioners described in the next subsection. For the interviews with general practitioners thirteen general practitioners, working in ten different general practices (mostly in Ede, the Netherlands), were sent an introductory letter. The next week they were asked by telephone if they were willing to contribute to this research. Five of them, each working in another general practice, agreed to participate, four men and one woman. Two of them worked in a primary health care center; the others worked in duo practices. The interviews were semi-structured and covered the topics of information needs, information sources and computer use. A general outline of the interview is shown in Appendix B. First of all the general practitioners were explained that this research is about the information needs of general practitioners and concentrates on questions about medical knowledge. Then they were asked how often they are confronted with such questions and whether and when they search for an answer to these questions. Consequently they were asked which resources they generally use to pursue their information needs and whether they think their possibilities for finding answers on their questions are sufficient. The third part of the interview dealt with computer use. Firstly the general practitioners were asked whether they owned a computer at work and which applications they use. Then they were shown a few paper examples of user interfaces for information retrieval systems and a QA system (see Appendix B) to give them an idea what these systems are about. They were asked whether they used any information retrieval systems for medical purposes and whether they think a QA system might be useful for their work. Finally, a few questions were asked about their preferences concerning user interfaces. 3.2 Results In the following subsections the results of the interviews are discussed with respect to the general practitioners’ information needs, information sources, and computer use. 41 Mieke van Langen 3.2.1 Information needs Most general practitioners indicated they rarely meet any medical questions for which they have to look up an answer. Especially the three older physicians said they have a good memory and large experience and only sometimes have to look up information on uncommon cases. Besides, the general practitioners are all supported by a general practitioner information system (Medicom) that incorporates an electronic prescription system, which frequently eliminates the need to look up any information on medication (though one of the general practitioners indicated he rarely agreed with the information provided by the electronic prescription system). All general practitioners said they normally always pursue their information needs. If possible, they immediately search for an answer during the consultation. Otherwise, they do it afterwards or in the evening hours. During patient visits they rarely search for answers. Some general practitioners even avoid visiting patients as much as possible, because they lack access to most of their information sources when they’re not in their consulting room. Besides, their computers are connected to those of the local pharmacy (which is a feature specific for the general practitioner information systems provided by Medicom). Therefore, prescriptions are only executed in the consulting room from where they are directly sent to the pharmacy. When general practitioners are confronted with a medical question during a patient visit that needs an immediate answer, they call a specialist. Less urgent questions are answered when the general practitioner is back in his consulting room. Most general practitioners indicated that the questions they are confronted with during patient visits don’t differ from those met in the consulting room. Only one of them thought they did differ, because the patients she visits differ from those who come to her consulting room. The visited patients are generally elderly (having less questions than other patients) or terminal patients (eliciting questions about palliative care). She therefore always carries a manual on palliative care with her when she is visiting patients. 3.2.2 Information sources Printed information sources are used by all general practitioners. The most frequently mentioned resources are the Pharmacotherapeutic Directory (Farmacotherapeutisch Kompas), the Diagnostic Directory (Diagnostisch Kompas), the guidelines issued by the Dutch College of General Practitioners (NHG-Standaarden), and books on internal medicine (Interne geneeskunde), dermatology (especially for the images), and microbiology. These books are all in Dutch and organized specifically for direct clinical application by general practitioners. The general practitioners indicated they rarely use books about the basics of medicine, like physiology, because they are not suited for direct clinical application. Dutch journals, like Nederlands Tijdschrift voor Geneeskunde (Dutch Medical Journal) and Huisarts en Wetenschap (General Practitioner and Science), were primarily used by the general practitioners in their roles of learner (for ongoing medical education), not for searching answers on medical questions in their roles of service provider. One of the general practitioners even indicated he thinks journal articles merely present opinions, rather than objective knowledge. He thinks even systematic reviews are essentially subjective, because scientific research is always executed with a specific goal in mind. Human resources are also used by all general practitioners. They consult both their colleagues in the general practice and specialists. One of the general practitioners indicated he usually sends an e-mail to a specialist when he has any questions. The others rather consult specialists by phone. However, one of them told that when he has diagnostic problems concerning a dermatological disorder, he takes a picture of it with his digital camera and sends the image to a dermatologist by e-mail to ask for advise, which he called ‘tele-dermatology’. Most general practitioners don’t use much electronic resources. Verhoeven indicated in the interview that the primary reason for this is that general practitioners lack time and capabilities to search for information in electronic resources, and also lack time to improve their searching capabilities. The electronic prescription system is the only electronic resource used by all general practitioners. The system was used for retrieving prescription information and patient 42 Question answering for general practitioners letters. Some general practitioners also electronically accessed the guidelines issued by the Dutch College of General Practitioners (NHG). The Internet was used to search for answers on medical questions by only two of the general practitioners. One of them primarily used PubMed, the other Artsennet. PubMed provides access to a bibliographic database (Medline). This general practitioner thus frequently ends up with only a summary of a relevant article. When he wants to have the full text, he needs to go to the library, but he hasn’t got time for that. Artsennet is a website published by the Royal Dutch Medical Association (KNMG) presenting Dutch medical information for and written by Dutch physicians [KNMG]. Another general practitioner indicated he did sometimes search the Internet for medical information, but primarily in his role of learner, not as service provider, because he thought the information found on the Internet is very general and not suitable for direct clinical application. When answers to medical questions are searched on the Internet, this is usually done after the consultation, because it takes too much time to do this during the consultation. Verhoeven indicated in the interview that Dutch general practitioners rarely use English information sources. Their primary evidence based resource is the guidelines issued by the Dutch College of General Practitioners (NHG-Standaarden). These are publicly available to all Dutch general practitioners, whereas English evidence based resources generally are not. When general practitioners are searching for medical knowledge that is not available in the guidelines, Verhoeven would recommend firstly consulting Clinical Evidence [BMJ], then the Cochrane Library [CC], and finally scientific literature via PubMed [EPM]. However, none of the interviewed general practitioners used Clinical Evidence or the Cochrane Library. The only English resources used are journal articles. Most of the general practitioners indicated they think their possibilities of finding answers on their medical questions are sufficient. Especially the older general practitioners generally have few questions. However, one of the younger physicians indicated she sometimes finds it difficult to decide where to search, but when she doesn’t know where to find the information, she asks one of her colleagues. 3.2.3 Computer use All general practitioners participating in this research had a computer with cd-rom player in their consulting room. They used a general practitioner information system that incorporated an electronic prescription system. Most of the general practitioners owned some disk with medical knowledge, but they haven’t got time to use it or to learn working with it. Especially the older physicians indicated they are used to their working procedures without electronic resources. They see that their younger colleagues or the general practice trainees they teach in their roles of educator more often consult electronic resources, and they think it could be useful, but they don’t think they really miss anything in their possibilities of finding answers to their medical questions. Other applications used by the general practitioners are Microsoft Word (mostly for writing letters) and an e-mail program. The general practitioners all had access to the Internet, four of them also from their consulting rooms, and the other one was planning it, but they vary greatly in the ways they use it. One of the general practitioners indicated that “whenever he needs it, it is down”, another primarily used it as a communication channel (for e-mailing medical specialists) instead of an information resource, and the others really used it to look up medical knowledge. Apart from the guidelines issued by the Dutch College of General Practitioners, they used Google, PubMed and Artsennet. Google was thought to be easy for searching, because with all possible keywords lots of results are returned, but difficulties were experienced judging the reliability and applicability of these results. With PubMed and Artsennet it is harder to select the right keywords, but the results are always scientific and aimed at medical professionals. However, most articles cited by PubMed are not freely available. Most general practitioners found it hard to imagine themselves using a QA system. They think the printed sources they use now are already easily accessible and suitable for direct clinical 43 Mieke van Langen application. Besides, they prefer to look up information in these books, because of the context they provide. The guidelines issued by the Dutch College of General Practitioners can already be easily found electronically, eliminating the need for a QA system. Two of the general practitioners indicated they would rather use a QA system for patient education, because they have their own ways for searching the information they need for themselves, while they think the patient letters provided by the electronic prescription system are not always sufficient. One of them suggested he also might want to use a QA system for retrieving dermatological images. He had a book for this, but he thought a QA system might provide access to more images. Another general practitioner would like to use a QA system to retrieve summaries of Dutch recent scientific publications from for example Huisarts en Wetenschap (General Practitioner & Science). And yet another general practitioner would like to use it for retrieving ‘data’ like e-mail addresses, telephone numbers, patient organizations, waiting lists, and logistic information. Verhoeven thought a QA system could use the database of an evidence based question answering service as a resource. However, this would restrict the system to answering questions that already have been answered by humans. All general practitioners would prefer concise answers or relevant paragraphs to complete documents. However, they all want some link to the original document, to enable them to read more about the subject if they want to. Besides, they all emphasized the information sources used by a QA system should be up-todate and reliable. They could thus best be retrieved from the Internet, but only from some selected sites. Finally, one of the general practitioners indicated he would like to use medical language, or even ICPC-codes, in the questions he would submit to a QA system. The general practitioners participating in the interviews tend to prefer mouse input above using function keys, because function keys are harder to learn, and mouse input is quicker than switching between different menus with function keys. However, the general practitioner information system they use now is still not working with mouse input, but the next version will. Only one of the general practitioners indicated she would rather use keyboard input to avoid RSI problems. The general practitioners reacted rather differing on the suggestion of speech input. Some thought it would be excellent, because it would be much quicker than typing with two fingers, others thought it would only be useful for dictating letters, and one of them indicated using speech input during consultations would be rather disturbing in the contact with the patient. In any case they only want to use speech input when it is working perfectly. 3.3 Conclusions Based on the interviews, the following conclusions can be drawn with respect to the work roles, tasks, information needs, awareness of information, and sources of information of Dutch general practitioners relevant for QA, and the user interface of a QA system. Work roles and tasks The general practitioners participating in the interviews indicated they lacked their information sources during patient visits. A QA system that is accessible during patient visits could thus be very useful. However, the general practitioners haven’t got a personal digital assistant (PDA) or laptop computer yet. Information needs Because general practitioners vary greatly in their information needs and use, and in their use of computers they can’t be expected to use a QA system equally often. Younger physicians seem to have more questions and more often use Internet to search for an answer. They are therefore expected to appreciate a QA system more than their older colleagues. Awareness of information Most general practitioners preferably search for an answer on their medical questions during the consultation. However, they indicated that searching for an answer on the Internet takes too much time to do this. Therefore, a QA system should reduce the time needed to search an answer on the Internet sufficiently to make possible searching for answers during the consultation. 44 Question answering for general practitioners Sources of information The sources general practitioners currently use in printed form (textbooks) are already well organized and easily accessible. There is thus no need to make them accessible via a QA system. Only one of the general practitioners indicated he might want to search for dermatological images (also found in his manual) with a QA system, but other techniques like image retrieval would be more suitable for this. Human sources can’t be made accessible via a QA system, but their telephone numbers and e-mail addresses can. One of the general practitioners indicated he would very much like a system that supplied him with this type of information. However, information extraction techniques might be more suitable for retrieving contact information. Finally, electronic resources are especially suitable for use by a QA system, because the system can improve their functional accessibility. Information retrieval systems currently used by general practitioners either return too much irrelevant information or require very specific keywords. A QA system should act as an intermediary selecting the right sources and keywords for the general practitioner. Not all electronic sources are suitable for a QA system however. The guidelines issued by the Dutch College of General Practitioners (NHG) are already easily accessible via an index. Electronically available Dutch journal articles could be used, but only if they are suitable for direct clinical application, otherwise they could better be accessed via an information retrieval system to allow the general practitioner to read the context of the information. The general practitioners participating in the interviews disagreed on the clinical applicability of journal articles. It is therefore not clear whether they should be included or not. Patient education is freely available on the Internet and widely dispersed. Electronic sources for patient education are therefore especially suitable for a QA system. Moreover, some of the general practitioners indicated they would primarily use a QA system for patient education. Thus, for the present, only information for patient education should be included in a QA system for general practitioners. Most general practitioners participating in the interviews have access to the Internet from their consulting rooms. Besides, they all indicated they would like a QA system to supply them with the most recent information. Information sources used by a QA system can therefore best be retrieved from the Internet. However, to avoid irrelevant or unreliable information, the QA system should only consult some selected sites. Information seeking A QA system for general practitioners should be able to handle medical language and preferably also ICPC-coding in the input question. The answer should be presented as a concise answer, with direct links to the sites the information was retrieved from, so the general practitioners can easily retrieve the complete documents if they want to. User interface Most general practitioners preferably use mouse input, but keyboard input should be accommodated as well, because of RSI prevention. Speech input should only be used when it is working perfectly. 3.4 Discussion Some of the conclusions drawn from the literature (described in chapter 2) are confirmed by the results of the interviews described in this chapter. However, there also are some differences between the conclusions of both chapters. In this section the results of the literature study are related to those of the interviews. Work roles and tasks The literature study was restricted to the information needs of the general practitioner in his role of service provider, because this is the role in which there is a need for concise answers. Some of the information sources mentioned in literature seem to serve the general practitioner primarily in his role of learner however. For example, journal articles, whether in printed or electronic form, are consulted to answer medical questions by only some of the interviewed 45 Mieke van Langen general practitioners. The others primarily use them for ongoing medical education, because they think the information presented in journal articles is not suitable for direct clinical application. It is therefore not clear whether journal articles should be consulted by a QA system or not. In chapter 2 it was concluded that a QA system could support the general practitioner primarily during the medical consultation phase of searching databases. It is therefore most convenient to use the system during this phase of the consultation, instead of after the consultation. This conclusion is supported by the interviews, in which the general practitioners indicated they preferably search for an answer on their medical questions during the consultation. Therefore, a QA system should reduce the time needed to search for an answer on the Internet sufficiently to make possible searching for answers during the consultation. The assumption, based on literature, that the medical consultation phase of searching databases might be harder during patient visits than in the consulting room is also supported by the results of the interviews. There are even general practitioners that avoid visiting patients as much as possible for this reason. It is therefore expected that in the end general practitioners will turn to mobile computers, because they surely need mobile access to their information sources. With a laptop, the general practitioner generally has the same possibilities as he has with his desktop computer, thus a QA system wouldn’t need to be adjusted for use on a laptop. However, when a general practitioner would like to address the QA system via a PDA, a different user interface would be needed. Perhaps the presentation of the information would also have to be different, because less space is available for information presentation on a PDA. Information needs Both in literature and in the interviews general practitioners varied greatly in the number of questions they are confronted with during patient care. In a questionnaire executed in 1996 this number ranged from 0.04 to 50 times a week [VNB99]. In this research no relations were investigated between the number of questions and the age of the general practitioner. However, Ely et al. {EOE99] found that younger physicians generally have more questions than their older colleagues. This finding seems consistent with the results of the interviews. Based on the literature it was expected that general practitioners’ questions generally deal with diagnosis, treatment, patient education, prescribing, and disease etiology, and that the most frequently needed medical knowledge categories are gastrointestinal, dermatological, musculoskeletal, and circulatory knowledge [VNB99, MCW05]. These findings seem consistent with the information sources consulted by the general practitioners participating in the interviews. In the literature it was stated that a lot of recognized information needs are never pursued [GH95]. However, the general practitioners participating in the interviews said they normally pursue all their information needs. This great difference might be due to a difference in research method. In the interviews the general practitioners were asked directly how often they pursued their information needs, whereas in other studies general practitioners were observed [GH95] or they were sent a questionnaire [VNB99]. Questions may be forgotten shortly after they arose. In the case of an observation, such questions are recorded as recognized but not pursued. However, when physicians are directly asked for the percentage of recognized questions they pursue, they don’t remember their forgotten questions (which are then taken for unrecognized questions), making the percentage of pursued questions much higher. Besides, the general practitioners participating in the interviews all used an electronic prescription system, making it much easier to answer questions concerning prescription. On the other hand, these questions were already routinely pursued without such a system [EOE99]. It is therefore expected that the actual fraction of pursued questions is somewhat lower than the interviewed general practitioners suggest. 46 Question answering for general practitioners The assumption that questions may be easily forgotten is another reason why a QA system should reduce the time needed to search for an answer on the Internet sufficiently to make possible searching for answers during the consultation. To prevent general practitioners from not pursuing their information needs, they must be able to search for an answer to their questions immediately. Awareness of information Concerning electronic information sources, the literature finding that general practitioners have difficulties in selecting the right sources and keywords [WM99] is confirmed by the interviews. The general practitioners indicated that the information retrieval systems they used either return too much irrelevant information or require very specific keywords. Sources of information Whereas in literature no relation was found between the use of electronic sources and the age of the general practitioners [VBM95], the interviews suggest that younger physicians tend to use the Internet for searching information more often than their older colleagues. This might be due to the fact that they generally have more questions. Besides, younger physicians probably have learned using the Internet during their medical education, whereas older physicians mostly have had to learn using the Internet themselves (for which they generally lack time), because it is only for the last ten years that the Internet has come into use among the general public. Moreover, older physicians already have their ways of finding answers to their medical questions and they are generally satisfied with these ways. They are thus not really motivated to learn working with new information sources. It is expected that the younger general practitioners who have more questions and are already accustomed to using the Internet, will be most motivated to use a QA system, because for them such a system could really save time when it makes finding the relevant information on the Internet faster and easier. Besides, the time needed to learn working with the system will probably be the lowest for this group of general practitioners. A QA system is not suitable for finding all relevant information on the Internet, however. Some types of information could better be found with other searching techniques, such as indexes, information retrieval, image retrieval, or information extraction. To help general practitioners choosing the application most suitable for retrieving each information type, some sort of web portal could be constructed that provides access to all different applications and thus to all different information types. In the previous chapter it was stated that to allow for evidence based medicine, a QA system should primarily use evidence based resources such as those provided by the Dutch College of General Practitioners (NHG) and enable critical evaluation of the answers by providing links to supporting evidence and analysis. However, the general practitioners participating in the interviews (especially the older three) seemed to rely more on ‘experience based medicine’ than on evidence based medicine. Besides, the electronic sources provided by the Dutch College of General Practitioners are thought to be already easily searchable and general practitioners indicated they appreciate the context provided by these sources. Only the suggestion that links should be provided to supporting evidence is confirmed by the interviews. User interface While in previous research general practitioners seemed to prefer keyboard input to mouse input [DHP98, RHF92], the general practitioners participating in the interviews don’t mind using a mouse. They even tend to prefer mouse input above using function keys. Probably this is because nowadays general practitioners have become accustomed to using a mouse for their non-medical applications like Microsoft Word and Internet Explorer. They are all looking forward to the next version of their current general practitioner information system that will be accommodated for mouse input. Still, because of concerns of RSI prevention, applications for general practitioners should not be restricted to a single input mode. 47 Mieke van Langen Summarizing from both chapters, the following conclusions can be drawn with respect to the requirements and scope of a question answering system for general practitioners: Information needs for the present, only questions for patient education will be answered by the question answering system; the system should be aimed at physicians who are already accustomed to using the Internet; Awareness of information physicians should be convinced that answers exist on the questions they have concerning patient education, and that they can be found with the question answering system; the system should reduce the time needed to search for an answer on the Internet sufficiently to make possible searching for answers during the consultation; Sources of information only information sources that are current, reliable, and suitable for patient education should be used; information sources can best be retrieved from the Internet, but only from some selected sites; Information seeking the system should be able to handle medical language and preferably also ICPCcoding in the input question; a dialogue might be needed to enable the system to correctly interpret and modify the question; the system needs to know the scope and depth of each information source; bibliographic knowledge should be incorporated in the system; null search results should be correctly interpreted; a different information presentation module must be designed for use via a PDA; the retrieved information should be presented as a concise answer, with direct links to the sites the information was retrieved from; information from several sources should be integrated into one consistent view; the system should not give the impression that the presented information is the single best way for clinical practice; User interface a web portal could be constructed that provides access to all different types of information for general practitioners (including the question answering system); the user interface of the question answering system should be very flexible; input should not be restricted to a single input mode; speech input should only be used when it is working perfectly; a different user interface must be designed for use via a PDA; information should be presented as simply and clearly as possible; all functional options must be immediately visible at the user interface; there shouldn’t be any timeouts. 48 Question answering for general practitioners 4 The information presentation module (GIPS) Based on literature research and interviews with Dutch general practitioners (described in chapters 2 en 3 respectively) conclusions were drawn with respect to the requirements and scope of a question answering (QA) system for general practitioners. Based on these conclusions an information presentation module (GIPS: General practitioner Information Presentation Submodule) has been designed for the IMIX demonstrator. Besides, requirements have been specified for a web portal that provides access to all different information retrieval systems relevant for general practitioners (including GIPS). In section 4.1 the first version of the IMIX demonstrator and its components are described. The conclusions presented in the previous chapter are related to the components of the IMIX demonstrator in section 4.2. The design of GIPS is presented in section 4.3. In section 4.4 a prototype of an information portal for general practitioners is described. 4.1 The IMIX demonstrator The IMIX demonstrator is an interactive multimodal QA system. The architecture of the first version of the demonstrator is presented in Figure 4 on the next page [IMIXa]. It comprises six components (shown as rectangles in the figure): a speech recognizer (norisc.asr), two question answering modules (rolaquad.qa and qadr.qa), an output generator (imogen.gen), a text to speech module (imogen.tts), and a graphical user interface (imix.gui). The components interact with each other by reading and writing structured (XML-based) data on global data stores (pools, shown as ellipses in Figure 4). When a component subscribes to a data pool, it automatically receives messages published to that pool. An advantage of using pools is that the framework is very flexible [HER03]. Because the data producer and data consumer do not have to know each other, components can be added or replaced without causing changes to other components. The first component, the speech recognizer, is responsible for audio recording. It receives a signal from the pool “control.asr.start” when it can start speech recording, The recognizer may take some time to start up, and as soon as it is ready to receive speech input, it sends out a signal to the pool “control.asr.ready”. The recognizer decides by itself when the speech recording ends. When recording is finished and the analysis is complete, a message containing the word graph is sent to the pool "questions". The two question answering modules are triggered by a message arriving in "questions". This message originates either from the speech recognizer (in the case of speech input) or from the graphical user interface (text input). Each of the question answering modules processes the question and sends a message containing its answer(s) to the pool "answers". When two answer messages have arrived in "answers", the output generation module transforms them into visual output (containing text and/or pictures), and speech output. Both outputs are simultaneously sent to the "presentation" pool. The text to speech module starts speaking the text when it arrives in the "presentation" pool. It stops speaking as soon as a signal is received from the pool "control.tts.stop". The graphical user interface controls the dialogue with the user. After a start screen, the user chooses between text and speech using a button. In case of text, the user types text in a text field, which is sent to the pool "questions". In case of speech, the speech recognizer is started by sending a signal to "control.asr.start", while a "please wait" prompt is presented to the user. When a signal is received in "control.asr.ready", a microphone symbol is displayed. Then, the graphical user interface waits for information to arrive in the "presentation" pool, which it displays. A "continue" button then takes the system back to the beginning, at which a signal is sent to "control.tts.stop" to stop any speech output still going on. 49 Mieke van Langen Thus, the information presentation (collected in the pool “presentation”) is generated by the output generation module and displayed by the graphical user interface. These are the components that will constitute GIPS. The speech recognizer and text to speech module will be ignored in this study, because they are not yet working perfectly, which is required for use by general practitioners. The two question answering modules will be taken for granted, because they are not part of the IMOGEN project (of which this study is a part). Thus, requirements will be drafted for these modules, but they will not be implemented. Figure 4 Architecture for the first IMIX demonstrator In the next subsections the current specifications of the output generation module and the graphical user interface are presented. 4.1.1 Output generation module The output generator subscribes to the pool “answers”. The messages in this pool are XMLbased QA documents [THE05]. An example of this content is shown in Figure 5. The root element of a QA document is “qa”. The attribute “engine” of this element specifies which of the two question answering modules produced the message (“rolaquad” or “qadr”). A QA document further contains a question element and a list of answers. The question element contains the question. Its attribute “mode” specifies the input mode. Its value could be “typed” or “spoken”. In GIPS, only typed questions will occur. The question element in turn contains two elements: the question string and the question analysis. The 50 Question answering for general practitioners question string is the original string of the question as typed in by the user. The question analysis contains the annotation of the question. The tags used for the annotation differ per question answering module. The attribute “type” of the question analysis element indicates the type of the question. The classifications of question types are also different for each of the two question answering modules. <?xml version="1.0" encoding="iso-8859-1"?> <qa engine="qadr"> <question mode=typed> <string> Wat is RSI ? </string> <question-analysis type="definition(rsi)"> <node rel="top" cat="whq" begin="0" end="3"> .... </node> </question-analysis> </question> <answerlist nr_answers="3"> <answer rank="1" conf="6.58"> <id source=www.rsi-vereniging.nl#overrsi#misverstanden doc="www.rsi-vereniging.nl#overrsi#misverstanden" par="1"/> <context> <core> RSI is hetzelfde als een muisarm </core> </context> </answer> <answer rank="2" conf="6.57"> <id source=www.rsi-vereniging.nl#overrsi#misverstanden doc="www.rsi-vereniging.nl#overrsi#misverstanden" par="3"/> <context> <core> Tegenwoordig is RSI een verzamelnaam ( ' paraplubegrip ' ) voor alle klachten aan armen , nek en schouders . </core> </context> </answer> <answer rank="3" conf="6.57"> <id source="www.arbeid.tno.nl#kennisgebieden#rsi#index" doc="www.arbeid.tno.nl#kennisgebieden#rsi#index" par="2"/> <context> <core> Ook RSI is eigenlijk een verkeerde term . </core> </context> </answer> </answerlist> </qa> Figure 5 Example QA document The list of answers contains the answers produced by the question answering module. The number of answers is specified by the attribute “nr_answers”. Each answer element contained 51 Mieke van Langen in the list has two attributes: “rank”, and “conf”. The “rank” indicates the answer’s rank within the answer list. Higher ranked answers are more likely to be a good answer to the user’s question than lower ranked ones. The “conf” attribute indicates how confident the question answering module is that this answer is a good answer to the user’s question. Confidence scores issued by “rolaquad.qa” range between 0.00 and 1.00, and those issued by “qadr.qa” range between 1.00 and 8.00. Next to these attributes, each answer in the list contains two elements: an id and the annotated text of the answer. The id indicates the location of the answer within the document collection. For web documents “qadr.qa” uses the form www.arbeid.tno.nl#kennisgebieden#rsi#index [BOU04]. Compared to the original URL, slashes are replaced by hashes to be able to treat the ids as filenames. The other question answering module, “rolaquad.qa”, indicates the answer document with an integer index key assigned to the document in the private Rolaquad document database [CBD05]. The text of the answer is contained in a core element, which is in turn contained by a context element that possibly gives the surrounding context of the answer. In principal, the core comprises one sentence. <?xml version="1.0" encoding="iso-8859-15"?> <!-- the default namespace is defined in pml.xsd --> <presentation xmlns="pml.xsd"> <!-- presentation content is contained in content elements --> <!-- this element is supposed to be realized as "speech" --> <content realization="speech"> This is a P-ml document with a picture. </content> <!-- this is a named content element, which can be referred to --> <!-- from inside other content elements --> <content name="picture.png" encoding="base64" content-type="image/png"> <!-- the actual picture is a base64 encoded PNG file --> iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAIAAACQd1PeAAAADElEQVQI12P 4//8/AAX+Av7c </content> <!-- the visual part of the presentation is actually defined --> <!-- in HTML --> <content realization="visual"> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>IMIX</title></head> <body> <h1>P-ml document<h1> This is a P-ml document with a <!-- "picture.png" refers to the named content element --> <img src="picture.png" alt="picture" /> </body> </html> </content> </presentation> Figure 6 Example of a P-ml document 52 Question answering for general practitioners The output generation module transforms the information received in the “answers” pool into an appropriate presentation of the information, which is then published to the “presentation” pool. Presentations are exchanged using P-ml (presentation mark-up language) [IMIXp]. This is a mark-up language that was designed specifically for the IMIX project. An example of a P-ml document is shown in Figure 6. The root element of P-ml is “presentation”. Within the presentation element several content elements are specified that contain a part of the presentation. A content element has four optional attributes: “realization”, “name”, “encoding”, and “content-type”. The value of the realization attribute specifies the method in which the content element is to be realized. For instance, "speech" means that the content is to be realized by a speech synthesizer, and "visual" means that the content is to be visualized. For GIPS only “visual” content elements will be used. A content element that has a name attribute can be referred to from elsewhere in the P-ml document. Content elements without a “name” attribute can only affect the presentation if a realization attribute is specified. The encoding attribute specifies the encoding of non-XML compatible file formats. For instance, PNG images can be base64 encoded. Finally, the content-type attribute can be used as a hint to interpret the contents of a content element. This is especially useful in combination with the “encoding” attribute. P-ml does not support content representation, but a content element may contain any other XML element designed for content representation, like an XHTML, SVG or SSML document. However, a content element may also contain an encoded binary format or just plain text. 4.1.2 Graphical user interface The graphical user interface receives input from the user, which it publishes to the pool “questions”, and it displays the presentations generated by the output generation module in return. The graphical user interface has eight different states [IMIXg]. In Figure 7 the state transition diagram for the user interface is presented. In the first state, “welcome”, a welcome screen is displayed with information about the IMIX demonstrator and a start button. Then the system goes to the second state, “modality”, in which the user is asked to choose between text input and speech input. The third to fifth states (“wake_up”, “speech_input”, and “text_input”) deal with the different input modes. After a question has been entered, the system goes to the state “waiting_for_answer”. When an answer arrives, the system goes to the state “answer” and the answer screen is displayed. This screen has two buttons: one to start a new question and one to close the graphical user interface. In case a user chooses to start a new question, the system returns to the “modality” state. When the graphical user interface is closed, which can be done in any of the states, the system gets to the final state “end”. 4.2 Requirements In the previous chapter a list of conclusions with respect to the requirements and scope of a QA system for general practitioners was presented. This list was organized according to a subset of the elements of the model of information seeking by professionals [LPS96] (see also section 1.3): information needs; awareness of information; sources of information; and information seeking. 53 Mieke van Langen Besides, a separate category was added for requirements concerning the user interface. Based on all these conclusions, some user requirements can be specified for the IMIX demonstrator. welcome start modality speak [1 time] st type wake_up control.asr. ready speak enter [empty_textbox] timeout speech_input text_input control.asr. ready enter waiting_for_answer presentation answer new close Figure 7 State transition diagram for the first IMIX demonstrator The conclusions concerning the information needs specify the type of questions that will be dealt with and the user group that will be targeted. These conclusions don’t concern a single component of the system, but the problem and scope of the entire system. The requirements related to the awareness of information concern user education, and quality attributes like response time, reliability, and availability. These conclusions thus also concern the entire system. The conclusions with respect to the sources of information implicate requirements on the document collection consulted by the question answering modules. Nine conclusions were drawn with respect to the information seeking of the system: 1. the system should be able to handle medical language and preferably also ICPC-coding in the input question; 2. a dialogue might be needed to enable the system to correctly interpret and modify the question; 3. the system needs to know the scope and depth of each information source; 54 Question answering for general practitioners 4. 5. 6. 7. bibliographic knowledge should be incorporated in the system; null search results should be correctly interpreted; a different information presentation module must be designed for use via a PDA; the retrieved information should be presented as a concise answer, with direct links to the sites the information was retrieved from; 8. information from several sources should be integrated into one consistent view; and 9. the system should not give the impression that the presented information is the single best way for clinical practice. These conclusions could be categorized according to the steps of asking and answering questions identified by Ely et al. [EOE02] (see also section 2.2.3): recognizing an information gap; question formulation; searching for relevant information; answer formulation; and using the answer to direct patient care. The first two conclusions deal with the step of question formulation. This step is now dealt with by the two question answering modules of the IMIX demonstrator, but when a dialogue would be used to correctly interpret and modify the question (as is suggested by the second conclusion), it would be better to include a separate component to manage the dialogue. Actually, at this moment a second version of the IMIX demonstrator is developed that supports dialogue [VP05]. For this purpose a dialogue action manager is included in the architecture. This component continually decides which action to take next, depending on the status and the state of the dialogue with the user [OS04]. In most states, the next action is computed on the basis of the history of the dialogue, the semantic or pragmatic interpretation of the most recent user input, and the information obtained from the databases the dialogue action manager can access. The third to fifth conclusions related to information seeking concern the step of searching for relevant information. This step is dealt with by the two question answering modules of the IMIX demonstrator. These conclusions therefore implicate requirements for the question answering modules. Finally, the sixth to ninth conclusions related to information seeking concern the step of answer formulation. This step is dealt with by the output generation module. Requirements for the output generation module could therefore be derived from these conclusions. The conclusions concerning the user interface implicate requirements for the graphical user interface. Besides, based on these conclusions the decision was taken to leave the speech recognizer and the text to speech module out of the design for the QA system for Dutch general practitioners. Further, one of the conclusions recommended developing a web portal that provides access to all different types of information for general practitioners, including GIPS. The design of this portal is described in section 4.4. Summarizing, these are the problem and scope of the entire system, and the functional requirements for each of the different components: Problem and scope General practitioners are confronted with questions about patient care during medical consultations. Answers on these questions are needed during the medical consultation, but searching for answers on the Internet takes too much time. Therefore, a question answering system should reduce the time needed to search for an answer on the Internet sufficiently to make possible searching for answers during the consultation. For the present, only questions concerning patient education will be 55 Mieke van Langen answered by the question answering system, and the system will be aimed at physicians who are already accustomed to using the Internet. For the system to be successful, general practitioners will have to be convinced that answers exist on the questions they have concerning patient education, and that they can be found with the question answering system. Functional requirements Dialogue action manager medical language and ICPC-coding in the question must be understood; questions must be correctly interpreted and modified; Question answering modules the scope and depth of each information source must be known; bibliographic knowledge should be incorporated in the system; null search results should be correctly interpreted; Document collection only information sources that are current, reliable, and suitable for patient education should be used; information sources can best be retrieved from the Internet, but only from some selected sites; Output generation module a different output generation module must be designed for information presentation on a PDA; the output should be presented as a concise answer, with direct links to the documents the information was retrieved from; information from several sources should be integrated into one consistent view; it must be made clear that the output is not the single best way for clinical practice; Graphical user interface the user interface should be very flexible; both keyboard and mouse input must be accommodated; a different graphical user interface must be designed for use via a PDA; 4.3 Design Based on the requirements discussed above, the architecture of the IMIX demonstrator has been adjusted for use by Dutch general practitioners. The revised architecture is presented in Figure 8. Next to the two question answering modules, the output generation module, and the graphical user interface, an extra module is incorporated in the design: the dialogue action manager (dam.control). This component is integrated into the architecture according to the specification of the second version of the IMIX demonstrator [VP05]. However, because the second version of the demonstrator is not yet implemented, the components that will be developed for this research probably cannot be integrated with the other components during this research. As depicted in Figure 8, the output generation module and the graphical user interface now communicate with the question answering modules through the dialogue action manager. The user input to the dialogue action manager consists of two pools: “user.gui” (pressed buttons), and “user.language.raw” (text). As soon as the definitive question is known, the dialogue action manager may write it to the pool “user.language.analysed”. The dialogue action manager communicates with the question answering modules through the pools “questions” and “answers”. The output from the dialogue manager is sent to the pool “dam.out”. This may be an answer from the question answering modules, or another type of dialogue act, like a question or 56 Question answering for general practitioners an informing message. The output generator transforms the signal received in “dam.out” into visual output and sends it to the "presentation" pool. Then the output is displayed by the graphical user interface. user.language.analyzed user.language.raw questions rolaquad.qa user.gui qadr.qa answers dam.control imix.gui presentation dam.out imogen.gen Figure 8 Architecture for the IMIX demonstrator targeted at general practitioners The dialogue action manager, like the two question answering modules, will be taken for granted, because it is not part of the IMOGEN project (of which this study is a part). In the previous section two functional requirements were specified for this component: medical language and ICPC-coding in the question must be understood; and questions must be correctly interpreted and modified. The second requirement actually is the goal of the dialogue action manager. However, the dialogue action manager will not deal with the first requirement, because the IMIX demonstrator is aimed at naïve users [VP05]. Therefore, to implement this requirement, without changing the design of the dialogue action manager, the input to this component should not contain any medical language or ICPC-coding. The graphical user interface should therefore ‘translate’ these terms before sending them to the “user.language.raw” pool. In the next subsections the designs for the two components of GIPS (the output generation module and the graphical user interface) are specified. 4.3.1 Output generation module In the first version of the IMIX demonstrator, the output generator subscribes to the pool “answers” [IMIXa]. However, because the dialogue action manager is added to the design of the QA system for general practitioners, the output generator now subscribes to the pool “dam.out” [VP05]. In this pool two types of information occur: answers and other types of dialogue acts. The messages in this pool contain three different elements: dialogue act elements that give dialogue act information; an output element that contains the data to be used by the output generation module; and a context element that gives dialogue context information. The output element contains a text element (consisting of a text string) and/or qa elements that are exact 57 Mieke van Langen copies of the messages in the pool “answers” that already exists in the first version of the IMIX demonstrator. Text elements represent dialogue acts other than answers. When a signal is received in the “dam.out” pool, the output generation module transforms the information into an appropriate presentation, which is then published to the “presentation” pool. The qa elements generated by the question answering modules have to be modified to provide the general practitioner with a comprehensible presentation of the answer to his question. In the requirements it was stated that the answer should be presented as a concise response, with direct links to the documents the information was retrieved from (which should be very reliable), and that information from several sources should be integrated into one consistent view. Besides, it must be made clear that the response is not the single best way for clinical practice. The two question answering modules each generate a number of answers, which may originate from different sources, but multiple answers may also have been retrieved from the same source. To be able to provide a direct link to the source the answer was retrieved from, and to make clear that the presented response is just a view provided by a particular source (with which a general practitioner could disagree), answers from different sources should not be integrated into one sentence. Instead, an overview should be provided of all different answers integrated per source. This is illustrated in Figure 9. In this figure the three answers presented in the example QA document shown in Figure 5 are grouped according to the two sources the answers were retrieved from. The first and second answers RSI is hetzelfde als een muisarm. (RSI is the same as a mouse arm.) and Tegenwoordig is RSI een verzamelnaam ('paraplubegrip') voor alle klachten aan armen, nek en schouders. (Nowadays, RSI is a collective term for all complaints of arms, neck, and shoulders.) were retrieved from the website of the RSI Association. These two answers have been integrated into one sentence in Figure 9: Tegenwoordig is RSI, ook wel een muisarm genoemd, een verzamelnaam ('paraplubegrip') voor alle klachten aan armen, nek en schouders. (Nowadays, RSI, also called a mouse arm) is a collective term for all complaints of arms, neck, and shoulders.) Actually, this integrated sentence is quite advanced. The first answer is transformed into an appositive “ook wel een muisarm genoemd” (“also called a mouse arm”) to the second answer. It is unlikely that a natural language processing system will produce such a sentence in the near future, but it would be very nice. The third answer Ook RSI is eigenlijk een verkeerde term. (Actually, RSI is also a wrong term.) was retrieved from TNO Arbeid. This answer is presented separately. Together with each group of answers the name of the source is provided as a link to that source. This kind of presentation can be specified in HTML and has to be contained by a visual content element in a P-ml document. The decision that answers from different sources should not be integrated into one sentence is not universal to the IMOGEN project. Actually efforts are spent on the development of techniques to integrate similar answers retrieved from different documents into one sentence that is more specific or more general than the original answers [MK05]. 58 Question answering for general practitioners Wat is RSI? RSI-vereniging: Tegenwoordig is RSI, ook wel een muisarm genoemd, een verzamelnaam ('paraplubegrip') voor alle klachten aan armen, nek en schouders. TNO Arbeid: Ook RSI is eigenlijk een verkeerde term. Figure 9 Example presentation A separate requirement for the output generation module specified that a different output generation module should be designed for information presentation on a PDA. In the case of a PDA, there is not enough space to show all different answers on the screen. Therefore, only one source should be selected for which the answer is presented. The answer should be accompanied by the name of the source, but it is useless to provide a link to the complete document the answer was retrieved from, because it would take too much effort to read it on PDA. In Figure 10 the example presentation of Figure 9 is adapted to a PDA screen. This kind of presentation could be represented simply as plain text in a content element of a P-ml document. RSI-vereniging: Tegenwoordig is RSI, ook wel een muisarm genoemd, een verzamelnaam ('paraplubegrip') voor alle klachten aan armen, nek en schouders. Figure 10 Example presentation for PDA To integrate several answers retrieved from the same source into one concise response, possibly aided by the surrounding (annotated) context, some natural language processing is needed. This language technology is described in chapter 5. Other types of dialogue acts will not be modified by the output generation module. The string contained by a text element in the “dam.out” pool will thus be directly copied (as plain text) into a content element of a P-ml document. 4.3.2 Graphical user interface The graphical user interface controls the dialogue with the user. It accepts typed input and pressed buttons from the user and it displays the presentations generated by the output generation module in return. In the first version of the IMIX demonstrator the graphical user interface has eight different states [IMIXg]. However, for GIPS, the states dealing with the choice of input mode (“modality”) and speech input (“wake_up” and “speech_input”) are not needed, because only text input will be implemented. The “welcome” state will also be omitted, because one of the requirements for the graphical user interface stated that all functional options should be immediately visible at the user interface. The system will therefore start in the “text_input” state. When a question or other dialogue act is entered by pressing the ok button or typing [enter] (satisfying the requirement that both keyboard and mouse input must be accommodated), the user turn ends and the system turns to the “waiting_for_answer” state. When a dialogue act arrives that is not an answer, the system returns to the “text_input” state to allow the user to 59 Mieke van Langen type a reply. When an answer arrives, the system goes to the “answer” state. From this state the user can either start a new dialogue or close the graphical user interface. Like in the first version of the IMIX demonstrator, the system can go to the “end” state from any of the other states by closing the graphical user interface. Besides, the user should also be able to start a new dialogue from any of the states to prevent the user from getting stuck in a fruitless dialogue. However, the dialogue will never be reset after a timeout, because if a physician turns away from the system, it must always be in precisely the same state when he returns to it. The new state transition diagram for GIPS is shown in Figure 11. new text_input enter enter [empty_textbox] presentation [no_answer] waiting_for_answer presentation [answer] answer close Figure 11 State transition diagram for GIPS 4.4 An information portal for general practitioners The general practitioners participating in the interviews (described in chapter 3) were not unanimous in their information needs and their expectations of QA systems. QA systems turned out to be suitable primarily for answering questions for patient education based on information retrieved from the Internet. Other types of questions might also be answered with information retrieved from the Internet, but question answering is not a suitable technique for finding an answer to these questions. To provide an overview of the different techniques available to general practitioners for answering their questions, a web portal could be developed that provides access to all types of applications useful for general practitioners. Next to GIPS, the portal should provide access to an image retrieval system to retrieve dermatological images, information extraction technology to retrieve data like telephone numbers and e-mail addresses (the social map), the index of the guidelines issued by the Dutch College of General Practitioners [NHGs], and medical information retrieval systems like Artsennet [KNMG] and PubMed [EPM] to retrieve journal articles. Some of these applications, like GIPS itself, image retrieval, and information extraction are not yet available for use by general practitioners. Therefore, when a web portal would be implemented, it should be made very clear which types of information can and which cannot be found with the available applications to avoid general practitioners being frustrated because the application does not return the information they need. 60 Question answering for general practitioners A prototype of such a web portal has been implemented in HTML and JavaScript. The lay-out of the portal resembles that of “startpagina.nl” which is a frequently used portal for Internet users. Each application is shown as a textbox headed with a colored title. The color of the title indicates the availability of the application. A blue title indicates that the application is available, a purple title means there is only a prototype of the application, and applications with a red title are not available at all, but might be available in the future, in which case they would be useful for general practitioners. For each application text fields are included in the portal in which the user can directly enter his question, diminishing the number of steps a user would need when he was to enter his question in the application itself after opening it. GIPS is the only application with a purple title. A prototype of GIPS has also been implemented in HTML and JavaScript. This prototype consists of a simple graphical user interface in which the user can select a question from a list of questions. Then the answer to this question (generated previously with the prototype described in the next chapter) is shown in an answer field. One of the answers also includes a picture. In this prototype dialogue functionality is not implemented, and keyboard input has not been accommodated for. Screenshots of the web portal and GIPS are shown in Appendix C. The prototypes themselves can be viewed at my website: http://wwwhome.cs.utwente.nl/~langen/thesis/portal 61 Mieke van Langen 62 Question answering for general practitioners 5 Response formulation In the previous chapter the design for GIPS (General practitioner Information Presentation Submodule) was presented. There, the need was expressed for natural language processing (NLP) techniques to integrate several answers into one concise and coherent response. NLP is a subfield of artificial intelligence and linguistics. It concerns the processing and understanding of natural language by computers. NLP is a broad field comprising, among others, speech recognition, natural language generation, question answering, information retrieval, information extraction, and automatic summarization. In section 5.1 some related work on response formulation is presented. Section 5.2 describes which response formulation methods are suitable for GIPS. These methods are elaborated in sections 5.3 and 5.4. Finally, in section 5.5 the implementation of the response formulation task is described. To avoid confusion about the terminology, in this chapter the term ‘answer’ will be used for the answer sentences returned by the question answering modules, while the term ‘response’ will be used for the final (concise and coherent) answer generated by the output generation module. 5.1 Related work In previous QA research, little effort has been dedicated to response formulation. Most QA research has focused on improving system performance against a standard set of questions, like the QA track at the TREC conferences [TREC]. These questions require short factoid answers and system performance on these questions is measured by the rank at which the correct answer is returned by the system. No natural language responses need to be formulated for this task. For the sake of better human computer interaction, response formulation is receiving more and more attention, though. Generally, two response formulation approaches can be discerned that are related to existing approaches for the QA stage of answer extraction: formulation templates (5.1.1); and query-based summarization (5.1.2). Besides, research is executed on the use of sentence fusion (5.1.3) to integrate several answers into one response. 5.1.1 Formulation templates Kosseim [KPG03] investigated the use of formulation templates. For example, the question Who is the prime minister of Canada? could be transformed into the templates: The prime minister of Canada is <person-name> or <person-name>, prime minister of Canada, These templates can be used for answer extraction as well as response formulation. In the case of answer extraction, a lot of possible formulations are produced. Then the QA system searches for these formulations in the document collection and instantiates <person-name> with the matching noun phrase. This noun phrase is then considered the answer. In the case of response formulation, only a few (or only one) template of good linguistic quality is produced. Then <person-name> is instantiated with the answer to produce a linguistically correct response. I used formulation templates for a QA system that only answers factoid person questions [LAN05]. It is a relatively simple method that can only be used for short, factoid answers. As far as I know, the use of formulation templates for more extensive answer has not been investigated. 63 Mieke van Langen 5.1.2 Query-based summarization Query-based summarization aims at summarizing the part of a text relevant to the user’s question. This method is used for answer extraction. For example, Cardie et al. [CNP00] investigated the use of a variant of the vector space model (primarily used by information retrieval systems) to generate a query-based summarization. With this method, the answer document is divided into chunks (e.g. sentences, paragraphs, 200-word passages), a vector representation is generated for the question and for each document chunk, then the similarity of each chunk to the question is determined, and the most similar chunk (up to a predetermined length) is taken to be the query-dependent summary. This summary is then used to extract possible answers to the question. However, query-based summarizations also seem appropriate as response. According to Lin et al. [LQS03], the most natural response presentation style for a QA system is “focus-pluscontext”. This means the system returns the answer to the user’s question, extended with the text surrounding the answer. Lin et al. investigated which context level is preferred by users: only the exact answer; the sentence from which the answer was extracted; the paragraph; or the entire document. They concluded that users prefer to receive the paragraph the answer was retrieved from. Bosma [BOS05] investigated a more intelligent way of producing query-based summarizations. He used discourse annotations to determine which context sentences are most related to the sentence in which the answer is located. Only these sentences (which are not necessarily subsequent to each other in the original document) are included in the response, instead of the entire paragraph. This response was compared to a baseline consisting of the answer sentence extended with the preceding and the successive sentence. Users were asked to evaluate the query-based summarizations and the baseline on the extent to which their accurateness could be verified, on the usefulness with respect to the question, and on the amount of irrelevant information with respect to the question. It turned out that users thought Bosma’s query-based summarizations were more verifiable, and contained less irrelevant information than the baseline response. Actually, in their methods, Lin et al. and Bosma don’t use the query to produce a query-based summarization at all. They only use the answer sentence as starting-point to determine which sentences to include in the response. Therefore, I call this method “answer extension”, instead of “query-based summarization”. 5.1.3 Sentence fusion Sentence fusion is used especially in multi-document summarization to summarize the information common to all documents. As an example Barzilay [BAR03] presents the sentences: IDF Spokeswoman did not confirm this, but said the Palestinians fired an anti-tank missile at a bulldozer. and The clash erupted when Palestinian militants fired machine-guns and antitank missiles at a bulldozer that was building an embankment in the area to better protect Israelian forces. These sentences would be fused into the sentence: Palestinians fired an anti-tank missile at a bulldozer. Sentence fusion could also be used in QA systems for response formulation. Marsi and Krahmer [MK05] investigate a method for Dutch to integrate similar answers retrieved from different documents into one sentence that is more specific or more general (depending on the goal of the system) than the original answer sentences. For example, the answers 64 Question answering for general practitioners RSI can be caused by repeating the same sequence of movements many times an hour or day. and RSI is generally caused by a mixture of poor ergonomics, stress and poor posture. Might be fused into a more specific response, like: RSI can be caused by a mixture of poor ergonomics, stress, poor posture and by repeating the same sequence of movements many times an hour or day. The method developed by Marsi and Krahmer comprises three stages: alignment, merging, and generation. During alignment, words and phrases in the different sentences that are related to each other are aligned, and each alignment is labeled with the semantic relation holding between the aligned phrases (e.g. equals, restates, specifies). In the merging stage it is decided which information from either sentence should be preserved. Finally, a grammatically correct surface representation is generated for the fused sentence. Marsi and Krahmer haven’t investigated their method of sentence fusion for QA yet. They only evaluated it on parallel corpora. 5.2 GIPS To get an idea of the possible answers received as input for GIPS, ten example questions about RSI were submitted to the question answering module “qadr.qa”. This module was used because it provides the dependency structures of the question, the answer sentences, and the sentences in their context, which can be useful for subsequent natural language processing. More information on dependency structures is provided later in this chapter. The example questions and their answers are presented in detail in Appendix D. On inspection of the answers there appear to be two different types of answers. The first type can be interpreted without knowing the original context of the answer sentence. I call these “autonomous answers”. For example, on the first question in Appendix D: Wat is RSI? (What is RSI?) one of the answers is RSI is een verzamelnaam voor zeer uiteenlopende vormen van overbelasting in het gebied van nek, schouders, armen en ellebogen. (RSI is a collective term for very different forms of overload in the area of neck, shoulders, arms, and elbows.) This answer can be interpreted easily, because it simply gives a definition. However, there is a second type of answers that can hardly be interpreted without knowing their context. I call these “dependent answers”. Many of these answers are not relevant for the question, but seem to have a strong relation with a sentence in their context that is necessary to correctly interpret the answer and would be very relevant for the question. For example, on the third question in Appendix D: Welke spieren zijn betrokken bij RSI? (Which muscles are affected by RSI?) one of the answers is Uit onderzoek is bekend dat deze het vaakst betrokken is bij RSI. (It is known from research that this one is affected by RSI most often.) This answer cannot be entirely interpreted, because it contains a referring expression “deze” (“this one”). This expression probably refers to a muscle. It is expected that the previous 65 Mieke van Langen sentence contains the name of this referred muscle, which would be very relevant for the question. Because there are two different types of answers, also two different response formulation strategies are proposed. Autonomous answer sentences retrieved from the same document could be integrated into a single sentence using some sentence fusion method. This is described in section 5.3. Dependent answers need to be extended with sentences from their original context, using some answer extension method, to make them interpretable and probably also more relevant, see section 5.4. 5.3 Answer integration Autonomous answers could be integrated into one sentence to make the response more concise and fluent. However, GIPS should only integrate answers originating from the same document into one sentence, because the general practitioner receiving the response must be able to check its source. Besides, it is hypothesized that general practitioners will more readily accept and trust a response accompanied by a link to its source, than a computer generated response that cannot be easily verified. Within NLP, two different approaches for integrating sentences have been investigated: aggregation and sentence fusion. Aggregation is used in natural language generation (NLG) to make generated text more coherent by reducing redundant information and introducing connectives [RM99, SHA02]. NLG is defined as “the process of constructing natural language outputs from non-linguistic inputs” [JM00], which is also called concept-to-text generation [LAP03]. In concept-to-text generation, information that is represented in a knowledge base or other logical representation is transformed into natural language. For example, the logical proposition “likes(John, Mary) AND likes(Mary, John)” could be transformed into the natural language phrase “John and Mary like each other.”. Sentence fusion is used in text-to-text generation to create a concise and fluent fusion of two or more sentences [BAR03]. Text-to-text generation methods take information that is represented in natural language as input, and transform it into a new natural language representation satisfying certain constraints. For example, in automatic summarization, sentences could be deleted or fused to represent only the most important information. Sentence fusion is used especially in multi-document summarization to summarize the information common to all documents. The response formulation task investigated in this chapter concerns text-to-text generation, because it starts with text snippets (answer sentences) retrieved by the question answering modules and aims at integrating them into a single coherent text (response). Therefore, sentence fusion (being a text-to-text generation task) seems the most appropriate approach for answer integration. Actually, the use of sentence fusion for the IMIX demonstrator is investigated at the University of Tilburg by Marsi and Krahmer [MK05]. However, I think that the concept-to-text generation field of aggregation could also provide useful insights for the response formulation task, because it has been investigated more thoroughly. Whereas sentence fusion primarily concerns the fusion of similar sentences (like in multi-document summarization or fusion of parallel corpora), aggregation concerns the integration of sentences connected by all sorts of relations. Therefore, I investigated the possibilities to use aggregation for response formulation. The next subsection provides a general description of aggregation. Then in sections 5.3.2 till 5.3.7 the different linguistic levels of aggregation are elaborated. In section 5.3.8 conclusions are drawn with respect to the aggregation types that can be used for the response formulation task. Finally, in section 5.3.9 an algorithm is presented for answer integration. 66 Question answering for general practitioners 5.3.1 Aggregation Reape and Mellish [RM99] conducted a literature survey to investigate what aggregation is. They concluded that aggregation is “the combination of two or more linguistic structures into a single linguistic structure which contributes to sentence structuring and construction”. Aggregation roughly consists of two stages [SHA02]. In the first stage, it is decided which linguistic structures might be aggregated. Mostly, linguistic structures are aggregated when they have a rhetorical relation or when they show some similarity. In the second stage, transformations are applied to these structures. For this purpose linguistic constructions are used, such as conjunction, adjective phrase attachment, quantification, and gapping. Examples of these constructions are shown in Table 8. Aggregation is usually used to make computer generated text more concise, cohesive, and/or fluent and (consequently) to generate more complex sentences [SHA02]. For the sake of fluency, the linear ordering of aggregated constituents is also important. For example, the sentence “Mary is a sweet little girl.” is more fluent than “Mary is a little sweet girl.”, while both sentences are grammatical. Therefore, when two sentences are aggregated, a linear ordering of their constituents should also be specified. There are several types of aggregation. Each author makes a different classification of aggregation types. Reape and Mellish [RM99] tried to integrate these classifications by introducing a classification based on levels of linguistic representation. They discern six different levels at which aggregation can occur. Table 9 lists the different types of aggregation associated with these levels of linguistic representation. Linguistic construction Example Conjunction Sue invited John. Sue invited Mary. → Sue invited John and Mary. Adjective phrase attachment Mary is a girl. Mary is sweet. → Mary is a sweet girl. Quantification John was invited. Mary was invited. John and Mary are the only children. → All children were invited. Gapping (deletion of a second verb) John hit Mary. Phil hit Sue. → John hit Mary and Phil ∅ Sue. Right-node-raising John likes Mary. Phil hates Mary. → John likes ∅ and Phil hates Mary. Table 8 Examples of linguistic constructions 5.3.2 Conceptual aggregation The first level, the conceptual or inferential level, concerns non-linguistic, languageindependent, domain knowledge contained in some sort of knowledge base. Conceptual aggregation concerns the use of user modeling, domain knowledge, and common sense reasoning to reduce the number of concepts [SHA98]. This type of aggregation typically 67 Mieke van Langen reduces the number of propositions in the message while increasing the complexity of the value of some conceptual role. For example, the two sentences "John hit Mary" and "Mary kicked John" might result in the aggregated sentence "John and Mary fought." Aggregation type Description Conceptual aggregation Reduction of the number of concepts Discourse aggregation Reduction of the complexity of the rhetorical structure Semantic aggregation Combination of two or more semantic entities Syntactic aggregation Combination of sentences using syntactic constructions Lexical aggregation Reduction of the number of lexical predicates and/or lexemes Referential aggregation Referring expression generation Table 9 Types of aggregation 5.3.3 Discourse aggregation The second level, discourse, concerns the coherence of a text. Text coherence implicates that utterances in a text are linguistically as well as non-linguistically connected to each other [HAA02a]. Besides, the utterances must contribute to the rhetorical structure of the text. A theory that describes the different rhetorical relations between sentences is the Rhetorical Structure Theory (RST) developed by Mann and Thompson [MT87]. Most RST relations are asymmetric. In that case one of the sentences is considered the nucleus (conveying the most essential information) and the other the satellite. However, there are also multi-nuclear relations in which there is no distinction between nucleus and satellite. Table 10 lists some examples of RST relations [JM00]. RST relations are hierarchical, therefore the rhetorical structure of a text can be represented as a tree (see for example Figure 12 and Figure 13). Relation Description Elaboration The satellite presents some additional detail concerning the nucleus. e.g. John likes girls. He likes Mary most. Contrast The nuclei present things that are different in some relevant way. e.g. John likes girls. He doesn’t like Sue. Sequence The nuclei are realized in succession. e.g. John invites Mary. John invites Sue. Purpose The satellite presents the goal of performing the activity presented in the nucleus. e.g. John invites Mary. He wants her to come to his party. Result The situation presented in the satellite results from the one presented in the nucleus. e.g. John invites Mary. She comes to his party. Table 10 Examples of RST relations Discourse or rhetorical aggregation is defined as “any operation that applies to a discourse structure, rhetorical structure or text plan and maps it to a better structure or plan” [RM99]. An example of discourse aggregation might be the mapping of the rhetorical structure tree E(nuc(E(nuc(n),sat(p1))),sat(p2)) (for example: “John like girls. He likes Mary most. He also likes Sue.”) 68 Question answering for general practitioners Into E(nuc(n),sat(and(p1,p2))) (for example: “John likes girls. He likes Mary most and he also likes Sue.”) where “E” is the elaboration relation, “nuc” is the nucleus, and “sat” is the satellite. These trees are illustrated graphically in Figure 12 and Figure 13 respectively. As this example illustrates, discourse aggregation typically reduces the complexity of the rhetorical structure while increasing the complexity of one of its propositional leaves. Elaboration p2 Elaboration n p1 Figure 12 Rhetorical structure tree E(nuc(E(nuc(n),sat(p1))),sat(p2)) Elaboration n p1 p2 Figure 13 Rhetorical structure tree E(nuc(n),sat(and(p1,p2))) 5.3.4 Semantic aggregation The semantic level concerns linguistic, language-dependent representations of meaning. In contrast to the conceptual level, the information at this level is domain-independent [RM99]. Semantic aggregation is defined as the combination of two or more semantic entities into one entity. Methods of semantic aggregation are semantic grouping and logical transformations. Semantic grouping is the ordering and bracketing of semantic content. Logical transformations concern the mapping of semantic predicates into fewer or just different predicates. For example, the meanings of “Jamie is Chris’s sister” and “Chris is Jamie’s brother” might be mapped to the meaning of “Chris and Jamie are brother and sister”. The distinction between conceptual and semantic aggregation is difficult. Actually, other authors, like Shaw [SHA02], don’t distinguish between these types of aggregation. Instead, Shaw considers all aggregation operations that make no use of syntactic knowledge or lexicon (except for referential aggregation) interpretive aggregation. According to Shaw, interpretive aggregation operators perform inferences over conceptions and relations across propositions. Reape and Mellish [RM99] also admit that they couldn’t find any clear examples of semantic 69 Mieke van Langen aggregation “which couldn’t alternatively be classified as either conceptual, syntactic or lexical aggregation”. This type of aggregation is thus a bit doubtful. 5.3.5 Syntactic aggregation Syntax refers to the way words are arranged together [JM00]. Words belong to different word classes, implying restrictions on the way they can be used in a sentence. Syntactic aggregation is the most common form of aggregation [RM99]. It combines propositions using syntactic constructions, like conjunction, gapping, etc. [SHA02]. Syntactic aggregation can be paratactic or hypotactic. In paratactic aggregation the aggregated sentences are of equal syntactic status. The main paratactic aggregation operator is the coordinating conjunction, a linguistic construction that uses a coordinator (like “and”, “or”, “but”) to link linguistic units of equal syntactic status. For example the sentences “John likes school.” and “Mary likes school.” can be aggregated as “John and Mary like school.”. The coordinating conjunction can be used to combine propositions that have an addition, sequence, or non-volitional result rhetorical relation. The clauses in hypotactic constructions have unequal syntactic status [SHA02]. For example, when two propositions have an elaboration relation, the proposition in satellite position can be transformed into a modifying construction, such as an adjectival phrase, a prepositional phrase, or a relative clause, like the transformation of “Mary is sweet.” into an adjective in “Mary is a sweet girl.”. Lexical information is used to determine if the result of hypotactic aggregation doesn’t violate any syntactic or lexical constraints. For example, when the sentences “Mary is a girl.” and “Mary is sweet.” are aggregated as “Mary is a sweet girl.”, “girl” must be realizable using an adjective, and “sweet” must be realizable as a pronominal modifier. Such restrictions are coded in a lexicon. Rhetorical relations are very important in choosing a linguistic construction for syntactic aggregation. Shaw [SHA02] gives the following example to show how different rhetorical relations lead to very different aggregated sentences. Consider the following two sentences: 1. John abused the duck. 2. The duck buzzed John. When the main rhetorical relation connecting the nucleus and satellite is elaboration, the sentences might be aggregated by a relative clause, resulting in the following aggregated sentences (depending on which of the sentences was the nucleus): a. John abused the duck that had buzzed him. b. The duck buzzed John who had abused it. By using the past perfect tense in the relative clauses, the nucleus and satellite also have a sequence relation, therefore sentences a en b describe very different situations. In the first, John was the victim first before he became an aggressor, while in the second the duck was the victim first. When the main rhetorical relation connecting the nucleus and satellite is sequence, nonvolitional result, or addition, the sentences might be aggregated by conjunction, resulting in the sentences c and d. c. The duck buzzed John and he abused it. d. John abused the duck and it buzzed him. Thus, at least four different aggregated sentences can result from two sentences, depending on their rhetorical relation. 70 Question answering for general practitioners 5.3.6 Lexical aggregation The lexical level represents the concatenation of morphemes making up a word [JM00]. Lexical aggregation combines multiple lexical items to express them more concisely [SHA02]. This operation is related to paraphrasing. Compared with hypotactic aggregation, lexical aggregation operators use more detailed lexical information. For example, the phrase “a dog used by the police” might be transformed into “a police dog”, transforming the reduced relative clause into a prenominal modifier. Another type of lexical aggregation is the combination of multiple lexemes into one, like the transformation of the phrase “rise sharply” into “shoot”. 5.3.7 Referential aggregation The last type of aggregation, referential aggregation, is usually associated with referring expression generation [RM99]. Referring expression generation concerns the linking of words by introducing pronouns, demonstratives, and other types of reference [JM00]. Another type of referential aggregation, however, is quantification [SHA02]. Quantification replaces a set of entities with a reference to their type (based on ontology) as restricted by a quantifier. For example, when it is known that John and Mary are the only students, the sentences “John likes school” and “Mary likes school” could be transformed into “all students like school”. For this type of aggregation an ontology is needed that provides information on instance-class relations, inheritance relations, and part-of relations between different entities. However, referring expressions and quantification can introduce ambiguity, when applied incorrectly. For example, it might be unclear to which entity a referring expression refers [JM00], or when multiple quantifiers are synthesized in the same sentence, the scope of the quantifiers could be ambiguous [SHA02]. Thus, referential aggregation should be used cautiously. 5.3.8 Conclusions Because the task of response formulation is a text-to-text generation process instead of a concept-to-text generation process, not all levels of aggregation are relevant for the response formulation task. Conceptual, discourse, and semantic aggregation suggest an underlying knowledge and rhetorical structure presentation of the text that is not available to GIPS. Transforming the answers or the entire original document into a knowledge representation would extremely increase the complexity of the aggregation task. Therefore, conceptual, discourse, and semantic aggregation will not be used. Syntactic aggregation is especially useful for the response formulation task, because it concerns using linguistic constructions to integrate sentences, while preserving their grammatical and lexical correctness. However, a specific syntactic construction can only be used to aggregate sentences having a certain rhetorical relation. Therefore, the rhetorical relations between the answers should be determined. This is very difficult, because the answers to be integrated are autonomous and usually not subsequent in the original text. They are all suggested to present an answer on the same question, however. Their rhetorical relation could thus be suggested to be addition. The primary syntactic construction used to aggregate sentences having an addition relation is the coordinating conjunction. This construction could thus be used to aggregate different autonomous answers retrieved from the same document. For lexical aggregation a lot of morphological and semantic knowledge about Dutch words is needed, and it primarily concerns aggregation within sentences. This type of aggregation is probably more useful for concept-to-text generation processes in which lexical choices still have to be made. In the case of GIPS, answers already have their lexical representation. Therefore, this type of aggregation will not be used for this research. However, some lexical knowledge might be needed to correctly execute syntactic aggregation. Finally, referential aggregation would be very useful to make a text more coherent, but it can introduce ambiguity when applied incorrectly. In an application for general practitioners 71 Mieke van Langen ambiguity must be avoided as much as possible. Therefore, this type of aggregation will also be omitted. 5.3.9 Answer integration algorithm In the previous section it was concluded that answers should be integrated using a coordinating conjunction construction. Shaw [SHA02] presents a conjunction algorithm for an NLG system that incorporates different types of ellipsis, like gapping and right-node-raising. With some minor adjustments, this algorithm could also be very useful for the answer integration task. Shaw’s algorithm consists of the following steps: 1. group propositions and order them according to their similarities while satisfying pragmatic and contextual constraints; 2. determine recurring elements in the ordered propositions being combined; 3. create a sentence boundary when the combined clause reaches a-priori thresholds; 4. decide which recurring elements are redundant and should be deleted. The first step, the grouping and ordering of answers, could be done by grouping the answers per source and ordering them according to the order of the original document to preserve the discourse structures of the original document. Then, based on the number of answers per source and their similarity, it will be determined which answers should be aggregated. To avoid generating too complex sentences, no more then two answers should be integrated into a single sentence. When there are more then two answers, the similarity of all subsequent answers could be computed to determine which couples could best be integrated. When two answers are to be integrated, the second step is to determine recurring elements. Recurring elements could be deleted in the fourth step. Therefore, they need to be exactly identical. To check whether two constituents are identical, Shaw proposes two equivalence tests: the alphabet equivalence test; and the sense equivalence test. Alphabet equivalence concerns the surface form of a constituent. This can be easily checked by comparing the respective strings. Sense equivalence concerns “the identity of the indexicals”. Shaw says that for nouns this means that their entity identifiers should be tested, and for verbs and adjectives their lexical senses should be tested. However, this is only possible in concept-to-text generation. In the case of the answers returned by the question answering modules, the entire original text should be analyzed to determine the senses of the nouns, verbs and adjectives. This would extremely complicate the algorithm. Instead, the dependency structures with which the answers are annotated could be used as an extra equivalence test. Dependency relations have a tree structure. They are described in detail in the next section. In short, identical constituents should have the same relation, POS and root tags in the case of a single word, or the same relation, cat tags and child nodes in the case of a phrase. Otherwise, they are not identical. The third step, creating a sentence boundary, can be omitted, because the number of answers to be integrated was already restricted to two. Finally, in the fourth step the sentences are joined using the coordinator “and”, and recurring elements are deleted. When a recurring element, or a group of recurring elements, is realized at the end of a sentence, it should be deleted backward, meaning that the first occurrence of the identical constituent is deleted. Otherwise, it should be deleted forward, thus deleting the second occurrence of the identical constituent. According to Shaw, this directionality is a universal phenomenon, also valid for Dutch. For example, consider the following sentences: a. Mary eats an apple in the morning. b. Sue eats a banana in the morning. c. 72 Mary eats an apple, and Sue a banana in the morning. Question answering for general practitioners In sentences a and b two recurring elements can be identified: “eats”, and “in the morning”. The first element, “eats”, is realized in the middle of the sentence and should thus be deleted forward. The other is realized at the end of the sentence and is thus deleted backward. The deleted elements are underlined in sentences a and b. Sentence c is the aggregated sentence. Not all sentences can be aggregated that easily, however. Shaw [SHA02] identifies several additional constraints on the conjunction algorithm dealing with scope ambiguities and morphology. Scope ambiguities can occur with modifiers, negation, and quantifiers. For example, when the phrases “tall men” and “women” are aggregated as “tall men and women”, it is not clear whether the women are also tall. To avoid these kinds of ambiguity, the elements should be reordered to make the scope clear, like in “women and tall men”. Morphological problems can occur when, for example, number agreement rules are harmed, like in “Mary and Sue eats an apple.”. Compared to the sentence fusion algorithm described by Marsi and Krahmer [MK05], this conjunction algorithm is more rigid. In this algorithm only identical constituents are deleted, whereas in the sentence fusion algorithm also constituents having a specification relation or restatements can be fused. Consider the example sentences presented in section 5.1.3: RSI can be caused by repeating the same sequence of movements many times an hour or day. and RSI is generally caused by a mixture of poor ergonomics, stress and poor posture. The sentence fusion algorithm aligns the phrases “can be caused by” and “is generally caused by” and labels them as restatements of each other. Then one of these phrases is arbitrarily chosen to be preserved in the fused sentence. Instead, the conjunction algorithm would sign the words “caused” and “by” as recurring elements, deleting them forward, which would result in the ungrammatical sentence: RSI can be caused by repeating the same sequence of movements many times an hour or day, and is generally a mixture of poor ergonomics, stress and poor posture. Thus, the sentence fusion algorithm is more complex to implement, but it seems to be a more natural solution for answer integration then this conjunction algorithm. Sentence fusion appears to incorporate coordinating conjunction constructions, as well as lexical aggregation by aligning not only identical but also similar phrases. 5.4 Answer extension When an answer cannot be interpreted without knowing its context, it should be extended with the sentences most related to this answer to formulate a coherent response. Therefore, the rhetorical relations between the sentences in the context of the answer and the answer itself should be determined. In a natural language generation system, the rhetorical relation between two sentences should be specified by a component called a content planner [SHA02] or discourse planner [JM00]. However, the response formulation task described in this chapter is a text-to-text generation process. The rhetorical relations must thus be inferred from the text. Bosma [BOS05] used RST annotations for his answer extension algorithm. However, there aren’t any automated RST analysis tools available for Dutch yet. Therefore, for GIPS a simple algorithm has been developed that detects sentences that are strongly related to a given answer. It is assumed that strongly related sentences have been aggregated on some level by the author. Conceptual, semantic, syntactic, and lexical aggregation are primarily used to aggregate clauses or phrases within sentences or to integrate multiple sentences or propositions into one. 73 Mieke van Langen Aggregation operators that are used to express a relation between two sentences (instead of within a sentence) are discourse and referential aggregation. Discourse and referential aggregation operators can be recognized by certain linguistic markers, like cue phrases (described in section 5.4.1), anaphoric referring expressions (5.4.2), and document structure (5.4.3). In section 5.4.4 the algorithm is described that is used to select the sentences that should be included in the answer. 5.4.1 Cue phrases In discourse aggregation, linguistic devices are used to signal rhetorical relations explicitly [KS98]. These devices are called cue phrases, or discourse connectives. Power et al. [PSB03] discern three different types of discourse connectives: subordinating conjunctions, coordinating conjunctions, and conjunctive adverbs. Subordinating conjunctions (like “although”, “because”) connect a nucleus and satellite that must be expressed within the same sentence. The conjunction can be located either in the first or in the second clause. Coordinating conjunctions (“and”, “or”, “but”) connect two nuclei either occurring in the same sentence or in different sentences. The conjunction always occurs in the second span. In Dutch there are five coordinating conjunctions [KS04]: “en” (“and”), “maar” (“but”), “want” (“for”), “dus” (“so”), and “of” (“or”). Conjunctive adverbs (“however”, “moreover”) always connect text spans occurring in different sentences. The adverb is located in the second sentence. In this research only cue phrases that indicate a relation between two sentences are relevant, because they are used to determine whether a rhetorical relation exists between these sentences. Therefore, only coordinating conjunctions connecting text spans expressed in different sentences, and conjunctive adverbs are relevant for this research. These cue phrases always occur in the second of the two related sentences. Figure 14 Knott and Sanders’ Dutch cue phrase taxonomy Knott and Sanders [KS98] constructed a cue phrase taxonomy expressing relationships between different cue phrases, see Figure 14. This taxonomy doesn’t describe the total set of Dutch cue phrases, however, because it was primarily used for “a first theory-driven systematic 74 Question answering for general practitioners and cross-linguistic cue phrase study” in which the use of Dutch cue phrases was compared to that of English cue phrases. Besides, it also includes subordinating conjunctions like “omdat” (“because”), which are not relevant for this research. To be able to recognize the relevant cue phrases, a list of cue phrases that signal a rhetorical relation between two sentences has been constructed. Therefore, a small Dutch textcorpus (about 10,000 words) containing paragraphs from the IMIX document collection and NHG patient education documents [NHGp], has been analyzed. Firstly, all conjunctions and adverbs that signal a rhetorical relation between the sentence they occur in and the previous sentence were marked manually. Then, it was investigated how these phrases could be recognized as connecting two sentences, because coordinating conjunctions could also connect text spans within a sentence, and because in Dutch some conjunctions and conjunctive adverbs are ambiguous. For example, the additive cue phrase “ook” identified by Knott and Sanders is a conjunctive adverb (meaning “also”) in: Jantje houdt van zwemmen. Ook Pietje zwemt graag. (John likes swimming. Peter also likes to swim.) but it is an elliptic device (meaning “too”) connecting two phrases within a sentence in: Jantje houdt van zwemmen en Pietje ook. (John likes swimming and Peter too.) Therefore, all occurrences of the marked words in the text were examined manually to identify under which circumstances they could be determined to be a cue phrase connecting two sentences. For this analysis, dependency trees were used. Figure 15 Example dependency tree Dependency trees make explicit the dependency relations between constituents in a sentence [BNM01]. Each non-terminal node in a dependency tree is connected with a head-daughter and one or more non-head daughters, whose dependency relations to the head are specified in a relation tag. For example, in Figure 15 the dependency tree of the sentence “Jantje houdt van zwemmen en Pietje ook.” is shown. The “top” node of this tree is connected with three leaf nodes (belonging to the words “en Pietje ook”) and a non-terminal node “main”. The head- 75 Mieke van Langen daughter of the “main” node is the verb “houdt”. Besides, the main node has two non-head daughters: one is the subject of the head (“Jantje”), the other is a prepositional complement. This prepositional complement in turn has a head-daughter (the preposition “van”) and a nonhead daughter (the verb “zwemmen”), which is the object of the head. One of the question answering modules of the IMIX demonstrator (“qadr.qa”) provides dependency structures of the question, the answer sentences, and the sentences in the context of the answers. These dependency structures are generated by the Alpino parser [BNM01]. In the next version of the IMIX demonstrator this parser will be available for all IMIX modules, enabling the output generation module to procure dependency structures also for the answers returned by the other question answering module (“rolaquad.qa”). Next to the relation to the head, the Alpino parser provides a POS-tag, the root, and the original word for each leaf node in the dependency tree. With the analysis described above, three coordinating conjunctions were identfied: “en”, “maar”, and “of”. These conjunctions were found to only signal a relation between two sentences, when they are the first word of the second sentence. The other two Dutch coordinating conjunctions, “dus” and “want”, didn’t occur as the first word of a sentence in this corpus. However, because all coordinating conjunctions are used in the same way, these conjunctions would also be proper cue phrases when they are the first word of a sentence. The coordinating conjunctions “of” and “dus” could also be used in another role, however [KS04]. “Of” could also be a subordinating conjunction (“whether”). In that case it would definitely not be a cue phrase. When a sentence starts with a coordinating conjunction, this conjunction is attached directly to the top node of the corresponding dependency tree. When a sentence starts with a subordinating conjunction, however, this conjunction has a parent node labeled with the POS-tag “cp”. Therefore, when a sentence starts with “of” it should firstly be checked whether this phrase’s parent node is the top node, before marking it as a cue phrase. The coordinating conjunction “dus” could also be an adverb, but in that case it could still be a cue phrase (especially when it is the first word of a sentence). Therefore, no extra checking is needed for this phrase. The Dutch coordinating conjunctions are presented in Table 11, together with the conditions they should satisfy to be determined a cue phrase. Some of these conjunctions (“want” and “maar”) are also Dutch nouns. However, in that case they will probably not be the first word of a sentence. Therefore, the POS-tags of these words do not need to be checked. Coordinating conjunction Constraints en maar of Only when it is the first word of the sentence. Only when it is the first word of the sentence. Only when it is the first word of the sentence and its parent node is the top node. Only when it is the first word of the sentence. Only when it is the first word of the sentence. dus want Table 11 Coordinating conjunctions for Dutch Next to these coordinating conjunctions, several different conjunctive adverbs were identified. Although according to Power et al. [PSB03] conjunctive adverbs always connect text spans occurring in different sentences, in the analyzed corpus the adverbs are also sometimes used to connect clauses within a sentence. Besides, they are also used as other types of adverbs, or even other parts of speech. Based on the analysis of the conjunctive adverbs, some regularities were discovered 76 Question answering for general practitioners . Firstly, the POS-tag of the adverb should of course be “adv”. Unfortunately, some of the conjunctive adverbs are labeled as adjectives or prepositional phrases by the Alpino parser. Those should still be considered cue phrases, however. Secondly, it was found that a conjunctive adverb probably doesn’t connect the sentence with the previous sentence in the following cases: when it is embraced by brackets; when the relation of one of its parent nodes in the dependency tree is modifier (labeled by the Alpino parser with the relation tag “mod”); when the sentence starts with a subordinated clause (when the first word of the sentence has a parent node labeled with the relation tag “mod” and the POS-tag “cp”); when it is positioned in a clause following a semi-colon or colon; when it is positioned after a conjunction or pronoun (labeled by the Alpino parser with the POS-tags “vg” or “pron”) whose direct parent node is also a parent node of the adverb, unless this conjunction or pronoun is the first word of the sentence; when it is positioned in the second part of a dependency tree whose top node has only two daughters, both labeled with the relation tag “dp”. The first rule concerns the use of brackets. When a cue phrase is positioned in a bracketed phrase, like “bijvoorbeeld” (“like”) in: Over de bijdrage van persoonsgebonden risicofactoren (bijvoorbeeld lichaamsbouw, het omgaan met stress) aan de kans op het krijgen van RSI is nog vrijwel niets bekend. (Little is known of the influence of personal risk factors (like body structure, or dealing with stress) on the chance of getting RSI.) it probably indicates the relation of the bracketed part to the rest of the sentence. Therefore bracketed parts should be discarded. The second rule concerns modifiers. All cue phrases are normally modifiers themselves. However, when a cue phrase is located inside a larger modifier, it probably does not relate to the previous sentence but to the head of the modifier it is part of. For example, in: Met een normale bloedsuiker wordt de kans op bijvoorbeeld hart- en vaatziekten kleiner. (A normal blood suger reduces the chance on for example heart and vascular diseases.) the phrase “op bijvoorbeeld hart- en vaatziekten” (“on for example heart and vascular diseases”) is a modifier of the noun “kans” (“chance”). In this case the cue phrase “bijvoorbeeld” (“for example”) clearly does not signal a relation with the previous sentence. Therefore, cue phrases that have a parent node labeled “mod” are discarded. The third rule concerns subordinated clauses. When a sentence starts with a subordinated clause, like: Hoewel daarbij vrijwel continu bewogen en dus afgewisseld wordt, leidt typen er toch toe dat steeds dezelfde spiergroepen gespannen zijn. (Although this involves almost continuous movement and thus variation, typing still causes the same muscle groups to be constantly tense.) a cue phrase positioned after this subordinated clause, like “toch” (“still”), probably relates the second clause with the subordinated clause. A cue phrase that is positioned within a subordinated clause could signal a relation with the previous sentence as well as with the next clause. Therefore, it was decided that sentences starting with a subordinating clause had better not be searched for cue phrases. The fourth rule concerns colons and semicolons. When a cue phrase is positioned after a colon or semicolon, it probably relates to the part preceding the colon or semicolon. Thus when a 77 Mieke van Langen sentence contains one of these punctuation marks, only the part preceding it will be searched for cue phrases. The fifth rule concerns conjunctions and pronouns. When a cue phrase is positioned after a conjunction or pronoun, it probably doesn’t refer to the previous sentence, but to something within the sentence preceding the conjunction or pronoun. However, this is only the case when the cue phrase is contained within the scope of the conjunction or pronoun. This means that the direct parent node of the conjunction or pronoun in the dependency tree is also a parent of the cue phrase. For example, in: Hoewel daarbij vrijwel continu bewogen en dus afgewisseld wordt, leidt typen er toch toe dat steeds dezelfde spiergroepen gespannen zijn. (Although this involves almost continuous movement and thus variation, typing still causes the same muscle groups to be constantly tense.) the cue phrase “dus” (“thus”) occurs after and within the scope of the conjunction “en” (“and”). It doesn’t signal a relation with the previous sentence. On the contrary, in: Het risico op hart- en vaatziekten wordt echter niet alleen door de bloeddruk bepaald. (The risk of heart and vascular diseases is not only determined by blood pressure, however.) the cue phrase “echter” (“however”) occurs after but outside the scope of the conjunction “en” (“and”). This cue phrase does signal a relation with the previous sentence. An exception to this rule occurs when the conjunction or pronoun is the first word of the sentence. In that case cue phrases occurring within the scope of the conjunction or cue phrase probably still refer to the previous sentence, like “toch” (“yet”) in: En toch ontstaan vaak klachten, zelfs veel meer dan met die zware typmachines van vroeger. (And yet a lot of complaints arise, even more than with those ancient typing machines.) Finally, the sixth rule concerns sentences consisting of two parts that might as well have been two different sentences. Cue phrases located in the second part of such sentences probably signal a relation with the first part instead of with the previous sentence. Therefore those second parts should also be discarded. The above mentioned rules are not universal. For example, cue phrases positioned after a colon might still connect this sentence with the previous sentence. However, it is far more likely that it relates the second part of the sentence to its first part. For some of the conjunctive adverbs additional constraints could be specified. In Table 12 all conjunctive adverbs are listed, along with their constraints. Of course, the corpus used to extract these cue phrases is relatively small. Probably, in a larger corpus more cue phrases and more precise constraints could be detected, but this list provides a good starting point for the algorithm used in this research. Specific constraints were identified for “bijvoorbeeld” (“like”, “for example”) and “wel” (“still”, “however”). “Bijvoorbeeld” is frequently used in appositive constructions. Like in Deskundigen (bijvoorbeeld artsen) stimuleren het gebruik van pauzesoftware. (Experts (like physicians) stimulate the use of break programs.) or: Deskundigen, bijvoorbeeld artsen, stimuleren het gebruik van pauzesoftware. (Experts, like physicians, stimulate the use of break programs.) 78 Question answering for general practitioners In that case, it does not connect two sentences, but is just a modifier of the phrase it is in apposition with. Frequently, this use of “bijvoorbeeld” can be recognized because one of the general constraints (like the use of brackets, or a parent with relation “modifier”) is met. However, this is not always the case. A relatively easy way to recognize those occurrences not embraced by brackets or contained in a modifier, is by looking at commas: when “bijvoorbeeld” is directly preceded by a comma, it is probably part of an appositive. Only when it is also directly followed by a comma, like in: Deskundigen, bijvoorbeeld, stimuleren het gebruik van pauzesoftware. (Experts, for example, stimulate the use of break programs.) it is surely not part of an appositive and, if no other constraints are harmed, it might well be a conjunctive adverb connecting two sentences. Conjunctive adverb Constraints bijvoorbeeld bovendien daarnaast daarom dan ook dus echter evenzeer immers namelijk ook tenslotte tevens toch verder vervolgens wel Not when it occurs directly after a comma, unless it is also directly followed by a comma. This phrase is labeled with the POS-tag “pp” by the Alpino parser. This phrase is labeled with the POS-tag “pp” by the Alpino parser. This cue phrase poses the same conditions as its constituent adverbs “dan” and “ook”. It will therefore not be discerned as a separate cue phrase, though its use is different from that of “dan” and “ook”. This phrase is labeled with the POS-tag “adj” by the Alpino parser. Only when it is the first word of the sentence. Table 12 Conjunctive adverbs for Dutch The adverb “wel” has a lot of different senses (“well”, “indeed”, “rather”), only some of which are conjunctive adverbs. The only occurrences of “wel” in the text corpus that were considered a conjunctive adverb connecting two sentences, were positioned at the start of a sentence. And when “wel” was the first word of the sentence, it always connected two sentences. Therefore, it was decided that “wel” should only be considered a cue phrase when it is the first word of the sentence. 5.4.2 Anaphoric referring expressions There are a lot of different types of referring expressions. Anaphoric referring expressions refer to an entity previously mentioned to the reader or hearer [HAA02b]. The most common are pronouns and demonstratives [KS04]. They usually refer to an entity mentioned at most two 79 Mieke van Langen sentences ago [JM00]. Pronouns could for example be “he”, “she”, “it”, “they” or possessive pronouns like “his”, “her”, “its”, “their”. Demonstratives, or demonstrative pronouns, are “this”, “that”, “these”, and “those”. In Dutch, there are also four demonstratives: “dit”, “dat”, “deze”, and “die”. Besides, in Dutch there is a possessive demonstrative “diens” [KS04]. These demonstratives can be used as a noun phrase or adjectively. When they are used in the place of a noun phrase combined with a preposition, their form is modified [HOU00]. For example, “met dit” (“with this”) becomes “hiermee”, and “dit … mee” like in: # Dit kun je mee zwemmen. (You can swim with this.) becomes “hier … mee”: Hier kun je mee zwemmen. (You can swim with this.) (The # in front of the first sentence indicates that this sentence is not grammatical.) In this way, under influence of a preposition, the demonstratives “dit” and “deze” are transformed into “hier”, and “dat” and “die” are transformed into “daar”. When there aren’t any words between “hier” or “daar” and the preposition, they are integrated into one word (“hierdoor”, “hierop”, daarvan”, etc.). In the same way, the pronoun “het” (“it”) is transformed into “er” under influence of a preposition. Next to demonstrative pronouns, there are also demonstrative adverbs [KS04]. These refer to a place, time, or manner. In Dutch, there are two demonstrative adverbs referring to place: “hier” (“here”) and “daar” (“there”); two demonstrative adverbs referring to time: “toen” and “dan” (“then”); and one demonstrative adverb referring to manner: “zo” (“in this way”). These adverbs were not categorized as conjunctive adverbs in the previous subsection, because they don’t signal a rhetorical relation between two sentences, but refer to something mentioned previously. Another type of anaphoric referring expression is a definite noun phrase [JM00]. A definite noun phrase consists of a definite determiner (“the”) and a noun phrase mentioned previously or paraphrasing something mentioned previously. However, definite noun phrases are also used non-anaphorically to refer to an entity that is contained in the hearer’s set of beliefs about the world, or an entity of which the uniqueness is implied by the description itself. A definite noun phrase thus not always refers to an entity introduced in the previous sentence. It is therefore not very suitable as a linguistic marker of a coherence relation between two sentences. To construct a list of anaphoric referring expressions referring to an entity, place, time, or manner mentioned in the previous sentence, the Dutch corpus that was also used for extracting cue phrases has again been analyzed. Firstly, all pronouns and demonstratives that refer to something introduced or referred to in the previous sentence were marked manually. Besides, also adjectives referring explicitly to something mentioned in the previous sentence, like “andere” (“other”) were marked. Then, it was investigated how these expressions could be recognized as referring to the previous sentence, because anaphoric expressions could also refer to an entity in the same sentence, or for example to the entire text (like in “This document is about RSI.”). Besides, in Dutch some pronouns and demonstratives are ambiguous. For example, the Dutch word “het” can be a pronoun (“it”), but it could also be a determiner (“the”), in which case it is not a referring expression. Therefore, in the same way as with cue phrases, all occurrences of the marked words in the text were examined manually to identify under which circumstances they could be determined to be anaphoric expressions referring to an entity in the previous sentence. Based on the analysis, for anaphoric expressions also some general regularities were discovered, analogous to but slightly different from those for cue phrases. It was found that a referring expression probably doesn’t refer to an entity in the previous sentence in the following cases: when it is embraced by brackets; when it is positioned in a clause following a subordinated clause (labeled with the relation tag “mod” and the POS-tag “cp”); 80 Question answering for general practitioners when it is positioned in a clause following a semi-colon; when the sentence contains a colon; when it is positioned after a conjunction or pronoun (labeled with the POS-tags “vg” or “pron”) whose direct parent node is also a parent node of the referring expression, unless this conjunction or pronoun is the first word of the sentence; when it is positioned in the second part of a dependency tree whose top node has only two daughters, both labeled with the relation tag “dp”. These regularities are also not universal. For example, anaphoric expressions positioned in the second part of a sentence might still refer to something in the previous sentence. However, it is far more likely that it refers to something mentioned in the first part of the sentence. Even so, an anaphoric expression that is the first word of the sentence not necessarily refers to something in the previous sentence. It might also refer to another sentence or another text level. There are three differences between the regularities for anaphoric expressions, and the ones for cue phrases. Firstly, whereas cue phrases probably do not refer to the previous sentence when the relation of one of its parent nodes in the dependency tree is modifier, this restriction doesn’t hold for anaphoric expressions. Modifiers have some relation to the phrase they modify: the head. Therefore, when a cue phrase is part of a modifier, it probably expresses the rhetoric relation this modifier has to the head. However, when an anaphoric expression is located within a modifier, it could still refer to something in the previous sentence, especially when the sentence starts with this modifier, like in: In dat stadium is het noodzakelijk ook arbeidsgebonden psychosociale en persoonsgebonden aspecten in beschouwing te nemen. (In that stage it is necessaru to also take into account psychosocial and personal aspects.) In this sentence the phrase “in dat stadium” (“in that stage”) is a modifier, but the anaphoric expression “dat” (“that”) definitely refers to something mentioned in a previous sentence. The second difference has to do with subordinated clauses. When a sentence starts with a subordinated clause, like: Hoewel daarbij vrijwel continu bewogen en dus afgewisseld wordt, leidt typen er toch toe dat steeds dezelfde spiergroepen gespannen zijn. (Although this involves almost continuous movement and thus variation, typing still causes the same muscle groups to be constantly tense.) cue phrases or anaphoric expressions located after the subordinated clause, like “toch” (“still”), usually refer to (something within) the preceding subordinated clause. Cue phrases located within this clause itself, might refer to the previous sentence, but they might also express the relation with the rest of the sentence. Therefore, cue phrases located within a sentence starting with a subordinated clause are never recognized as connecting two sentences. However, referring expressions within a subordinated clause, like “daarbij” (“with this”) in the example sentence, were more often used anaphorically (referring backwards) than cataphorically (referring forwards) in the examined text corpus. Therefore, anaphoric expressions are only discarded when they are located after a subordinated clause. Thirdly, when a sentence contains a colon, cue phrases are only discarded when they are located after the colon, because in that case they probably refer to the part preceding the colon. Referring expressions preceding a colon however, are frequently used cataphorically instead of anaphorically, like “andere” (“other”) in: Pauzesoftware mag niet de aandacht afleiden van risicofactoren van andere aard: werkplek en werkorganisatie. (Break programs should not distract from other types of risk factors: workspace and work organization.) Therefore, when a sentence contains a colon, all referring expressions are discarded. 81 Mieke van Langen Referring expression POS-tag ander adj andere adj daar noun / adv daarbij pp daarmee pp daarover pp daarvan pp dan adv dat det datzelfde det dergelijke adj deze det die det dit det er noun even adj genoemde adj hetzelfde det hier noun / adv hierbij pp hierdoor pp hiermee pp hieruit pp hiervan pp laatstgenoemde adj zo adv zo'n det Constraints Not in “onder andere” or “geen andere”, or in “ene-andere” or “sommige-andere” constructions. Not after a “van-naar” construction. Not in “if-then” constructions, recognizable by the word “als” (“if”) occurring previously in the sentence or by a parent node with the relation tag “nucl” and the POS-tag “smain”. When it is the first word of the sentence. Not in indications of the current period like “deze week”, “deze maand”, “deze eeuw”. Not in indications of the current period like “dit weekend”, “dit jaar”. Only when it is followed by a particle (labeled “part”) or preposition (labeled “prep”) that has the same direct parent. When it is labeled with the relation tag “mod”, and directly followed by an adjective labeled “hd”. Not in “as-as” constructions, recognizable by the word “als” (“as”) occurring later in the sentence. When it is the first word of the sentence. Only when it is the first word of the sentence. Not in “as-as” or “as-as possible” constructions, recognizable by the words “als” (“as”) or “mogelijk” (“as possible”) occurring later in the sentence. Table 13 Anaphoric referring expressions for Dutch 82 Question answering for general practitioners For some of the anaphoric expressions additional constraints were specified, because these expressions could also be used non-anaphorically. For example, the demonstrative adverb “zo” (“in this way”) is also used in comparisons in the sense of “as”, like in: Eet daarom zo min mogelijk. (Therefore, eat as few as possible.) In Table 13 all referring expressions are listed, along with their constraints and the POS-tags they should have to be considered an anaphoric expression. One of the anaphoric expressions found in the corpus is not mentioned in this list: “het”. As explained above, “het” could be a determiner (“the”) or a pronoun (“it”). Only when it is a pronoun, it could be a referring expression. However, in that case it is still very difficult to determine whether it really is a referring expression. For example, “het” is an anaphoric expression, probably referring to some disease mentioned in the previous sentence, in: In sommige families komt het meer voor dan in andere. (In some families it is more prevalent than in others.) but it is not a referring expression in: In sommige gebieden regent het vaker dan in andere. (In some areas it is raining more often than in others.) Both sentences have very similar dependency trees. More complex linguistic knowledge would thus be needed to detect the difference between both occurrences of “het”. Paice and Husk [PH87] investigated the different uses of “it” in English text. They identified seven different types of what they call structural “it” (as opposed to referential “it”). These seven types could also be identified for the Dutch word “het” (next to the use of “het” as a determiner). For each type different rules would be needed to recognize it. Most of them could be reliably detected using limited word lists and dependency trees, but for some of them, like expressions of time and ambience like “it is twelve o’clock” or “it is raining”, more complex lexical or semantic knowledge would be needed. Because of the complexity of differentiating between structural and referential uses of “het”, it was considered better to leave “het” out of the list of anaphoric expressions for this research. Pronouns other than “het” (“it”) were not found, except for one occurrence of “ze” (“they”), but this was also too difficult to distinguish from other occurrences not referring to an entity in the previous sentence. Probably this corpus doesn’t contain much pronouns like “hij” (“he”) or “zij” (“she”) referring to an entity in the previous sentence, because the texts all deal with medical information. Other text categories, like for example newspaper articles, are expected to contain much more pronouns, because they more often deal with people instead of diseases. All demonstrative pronouns were found in the corpus. Besides, a lot of occurrences of “hier” and “daar” in combination with a proposition were retrieved. They all satisfied the same constraints. Therefore, the list could be extended with all possible combinations of “hier” and “daar” with propositions. In Table 14 all combinations found in the Dutch dictionary [STE94] are presented. These referring expressions could have the POS-tags “pp” or “adv”. Again, the corpus used to extract all these referring expressions is relatively small. Probably, in a larger corpus more expressions and more precise constraints could be detected, but the lists presented in Table 13 and Table 14 provide a good starting point for the algorithm used in this research. 83 Mieke van Langen Hier Daar hieraan hierachter hierbij hierbinnen hierboven hierbuiten hierdoor hierheen hierin hierlangs hiermede hiermee hierna hiernaast hierom hieromheen hieromtrent hieronder hierop hierover hiertegen hiertegenover hiertoe hiertussen hieruit hiervan hiervandaan hiervoor daaraan daarachter daarbeneden daarbij daarbinnen daarboven daarbuiten daardoor daardoorheen daarheen daarin daarlangs daarmede daarmee daarna daarnaar daarnaast daarom daaromheen daaromtrent daaronder daarop daaropvolgend daarover daaroverheen daartegen daartegenover daartoe daartussen daaruit daarvan daarvandaan daarvoor Table 14 Prepositional anaphoric expressions for Dutch 5.4.3 Document structure Next to cue phrases and anaphoric expressions, an answer may contain other signs of relatedness to another sentence or graphical element, like punctuation marks and captions. These signs deal with document structure. According to Power et al. [PSB03] document structure describes the organization of a document into graphical constituents like sections, paragraphs, sentences, bulleted lists, tables, and figures. Besides, document structure covers some features within sentences like quotation. Some of these graphical constituents are made explicit in mark-up languages, such as HTML. Chapters, sections, paragraphs, sentences, clauses, and phrases are all levels of document structure [PSB03] (in descending order of abstractness). A response should contain at least one sentence, and at most an entire paragraph (for concerns of conciseness). A sentence starts with a capital letter and ends in a full stop (a dot, question mark, or exclamation mark). When the answer returned by the question answering module is a sentence, it could suffice as a response. However, if it ends with a question mark, it is probably a question. In that case, the next sentence is expected to be the answer on this question, and should thus also be included in the response. However, if the question is a link in an HTML-document (enclosed by the tags <a> and </a>), it would be better to include the linked document in the response. When the answer returned by the question answering module is not a sentence, it could for example be a clause, a caption, or a heading. In the case of a clause, ending for example in a semicolon or colon, the next clause should also be included to make the sentence complete. When the answer ends with a colon, it might also be followed by an image, table, or list instead of a second clause. In that case, this image, table, or list should be included in the response. If the answer doesn’t finish with a punctuation mark at all, the sentence is probably not finished. This may be because it crosses a page break in the original document. In that case, the rest of the sentence should be retrieved from the original document. When the answer is a caption belonging to a figure or table, the corresponding figure or table should also be retrieved from the original document and included in the response. Captions can be easily recognized, because they start with “Table” or “Figure”. When the answer is a heading (recognizable by a heading number, or the heading tags in an HTML-document) at least the first sentence of the headed section should also be included in the response. 84 Question answering for general practitioners A paragraph begins on a new line. Because the response should contain at most an entire paragraph, it should not cross a paragraph boundary in the original document. In HTMLdocuments paragraphs are separated by the tags <br> or <p>. In other documents, paragraph boundaries may be harder to detect. For example, in PDF-documents, all lines start with a new line to preserve the lay-out. New lines thus not always indicate a paragraph boundary. Lists and quotations are examples of indented structures [PSB03]. Indented structures are of a certain level of document structure and are contained by an element possibly being of another level. For example, the elements of a list may be paragraphs, while the list itself is contained by a sentence. When the answer returned by the question answering module is part or contains part of an indented structure, the entire structure should be included in the response. The sentence containing or preceding the indented structure should also be included in the response, because it indicates the context of the structure. Vertical lists can be recognized by bullets (in the case of a bulleted list) or numbers (in the case of an enumerated list). In an HTML-document lists are enclosed by the tags <ul>, <ol>, or <dl>, and each element is preceded by the tag <li>. Quotations or other types of comments can be recognized by the enclosing single or double quotation marks, or brackets. Horizontal lists or enumerations are not indented structures, because they are described in plain text. In this case the elements of the list are simply phrases within a clause, sentences within a paragraph, or paragraphs within a section, etc. When the list is not contained within one sentence, it might be recognized by cue phrases like “firstly … secondly … finally”. However, horizontal lists may also be indicated with more subtle cue phrases or be implicit in the text. They will therefore be ignored in this research. Finally, answers can be captions belonging to a table or figure, but they can also refer to a table or figure. Normally this is done by explicitly mentioning the word “table” or “figure” followed by a number. In that case the referred table or figure should also be included in the response. 5.4.4 Answer extension algorithm In the previous sections three types of linguistic markers that signal a strong relation between two sentences have been identified: cue phrases, anaphoric expressions, and document structure. In this section an algorithm is described that can be used to extend an answer sentence based on these linguistic markers. The input for the algorithm is a QA document generated by the question answering module “qadr.qa”. This document contains up to five different answers that have already been annotated with their dependency structures. First of all these answers are grouped per source. Then each group of answers originating from the same source is extended. To generate a coherent response by extending a group of answers, the answer sentences are firstly ordered according to the order of the original document. Then the first answer is extended. When this answer contains any cue phrases or anaphoric expressions that satisfy the relevant constraints, the previous sentence is included in the response. When this previous sentence also contains a cue phrase or anaphoric expression, its predecessor is also included. This procedure is repeated until the latest added sentence doesn’t contain any cue phrases or anaphoric expressions, or when a paragraph boundary is reached. Then, the first sentence following the answer could be considered. If it contains any cue phrases or anaphoric expressions, it could also be included in the response. However, to prevent the response from getting too long, this procedure is only repeated if the answer hasn’t already been extended with three or more sentences. Thus, when a sentence included in the response contains any cue phrases or anaphoric expressions, its previous sentence is always included, because otherwise the response could not fully be interpreted. But sentences following the response containing a cue phrase and/or anaphoric expression are only included if the response would not grow too large. 85 Mieke van Langen When there are other answers retrieved from the same document, it is firstly checked whether they have already been included in the response. If they haven’t, the same procedure is used to extend these answers. Finally, the complete response is checked for any linguistic markers of document structure. For example, when the last sentence ends with a colon or semicolon, the next sentence or other graphical constituent is also included, and if there are any unfinished quotations, sentences are added until they are finished (if they can’t be finished, they are completely omitted). Possibly also figures or tables are included in the response. For example, on the third question in Appendix D: Welke spieren zijn betrokken bij RSI? (Which muscles are affected by RSI?) four different answers were retrieved from the same document (in the order of occurrence in the original document): 1. Gezien de neiging van RSI zich uit te breiden, blijkt dat na verloop van tijd vaak veel spieren betrokken zijn bij het proces. (In view of the tendency of RSI to spread, after some time often a lot of muscles seem to be affected by the process.) 2. Toch zijn er wel een aantal spieren die opvallend vaak zijn aangedaan bij RSI. (However, there are a number of muscles that are affected by RSI notably often.) 3. Uit onderzoek is bekend dat deze het vaakst betrokken is bij RSI. (It is known from research that this one is affected by RSI most often.) 4. Ademhalingsspieren Andere spieren die betrokken kunnen zijn bij RSI zijn de scaleni. (Respiratory muscles Other muscles that could be affected by RSI are the scaleni.) The first answer doesn’t contain any linguistic markers of a relation with the previous sentence. However, the sentence following the answer Het aantal spieren dat bij verschillende RSI patiënten mee kan doen, is dan ook groot. contains a cue phrase (“dan ook”). The subsequent sentence Een beschrijving daarvan zou haast neerkomen op het dupliceren van een anatomische atlas. contains the anaphoric expression “daarvan”, and the following sentence Toch zijn er wel een aantal spieren die opvallend vaak zijn aangedaan bij RSI. contains the cue phrase “toch”. Actually, this last sentence is the second answer. The second answer is thus automatically included in the response by extending the first answer. When all answers have been extended, the following response results (answers are bold, and linguistic markers have been marked): Gezien de neiging van RSI zich uit te breiden, blijkt dat na verloop van tijd vaak veel spieren betrokken zijn bij het proces. Het aantal spieren dat bij verschillende RSI patiënten mee kan doen, is dan ook groot. Een beschrijving daarvan zou haast neerkomen op het dupliceren van een anatomische atlas. Toch zijn er wel een aantal spieren die opvallend vaak zijn aangedaan bij RSI. De meest beruchte spier is de monnikskapspier (trapezius). Uit onderzoek is bekend dat deze het vaakst betrokken is bij RSI. Dat is ook niet zo verwonderlijk, want deze spier zorgt voor het optillen en stabiliseren van de schouders. Zodra de armen worden opgetild, zoals bij typen en telefoneren, neemt de spanning in deze spier fors toe. 86 Question answering for general practitioners Ademhalingsspieren Andere spieren die betrokken kunnen zijn bij RSI zijn de scaleni. Dit is een groepje spieren die vast zit aan de halswervelkolom en aan de bovenste ribben. Bij diep ademhalen spannen deze spieren aan, maar bij normale activiteiten nauwelijks. Vandaar dat deze spieren ook wel hulpademhalingsspieren genoemde worden, ze heffen de ribben bij speciale omstandigheden zoals niezen, zuchten en hoesten. This response consists of two paragraphs, separated in the original document by several pages. The response is very relevant for the question, but it is quite long. Actually, the original document contains three pages of information relevant for the question. This may indicate that a concise response is not possible on this question. On the other hand, GIPS is required to return only concise responses. In the case of the example presented above, the response could be restricted to the first paragraph, because that contains three out of four answers. It is not clear, however, how often a response would grow this large and how relevant they would be. A decision on how to prevent responses from getting too long has therefore been deferred to the evaluation stage (described in the next chapter). 5.5 Implementation The response formulation algorithms described in this chapter are incorporated by the output generation module of GIPS. A prototype of this module has been implemented in Java version 1.5.0. This prototype consists of a class “GipsGen” which provides a method “generate”. This method takes a QA document generated by the question answering module “qadr.qa” as input (other types of input cannot be processed by this prototype). It reads the question sentence, the answer sentences, their context sentences, and the annotations associated with these sentences. Then it generates a P-ml document according to the answer extension algorithm described above. It uses some subclasses to accomplish this. For example, a class “Node” has been implemented to be able to generate dependency trees. Besides, a class “GipsTest” has been implemented to provide “GipsGen” with example input and collect the P-ml files it generates. The answer integration algorithm described in section 5.3 has not been implemented, because this would take too much time. Besides, although the sentence fusion algorithm developed at the University of Tilburg [MK05] aims at integrating answers retrieved from different documents, it also seems a good solution for integrating autonomous answers retrieved from the same document. The sentence fusion algorithm would be more flexible than the answer integration algorithm described in section 5.3. However, the sentence fusion algorithm is not implemented for question answering yet. Autonomous answers are therefore simply treated as extended answers (extended with only zero sentences) by this prototype. A second prototype has been constructed that generates a baseline response. The class “BaselineGen” used for this purpose is very similar to “GipsGen”. It groups the answers per source and puts them in the original order, just as is done with the answer extension algorithm. However, “BaselineGen” doesn’t use the lists of cue phrases and anaphoric expressions. Nor does it generate any dependency trees. It simply includes the preceding and the successive sentence for each answer sentence. 5.5.1 Limitations The GIPS prototype has some limitations compared to the answer extension algorithm. First of all, because it doesn’t have access to the IMIX document collection and the Alpino parser, answers could only be extended with sentences provided as context sentences in the QA document. Ideally, this context consists of the entire paragraph the answer was retrieved from. However, this is not always the case. Especially sentences retrieved from pdf-documents are frequently poor. For example, they are not finished because they are interrupted by a page break, or they contain a page header, like the document title or page number. Besides, it was also not possible to retrieve referred pictures or tables. 87 Mieke van Langen Secondly, in the sentences returned by “qadr.qa”, all words and punctuation marks are separated by white spaces. Therefore, it was very difficult to determine whether a quotation mark signaled the start or the end of a quotation, and thus to determine whether extra sentences should be included to finish the quotation or not. Fortunately, quotations don’t occur frequently in the text category used for this research. Bracketed parts did not have this problem, because the opening bracket “(“ and finishing bracket “)” could easily be distinguished. Finally, vertical lists are also not recognized by the GIPS prototype. Therefore, they cannot be finished when part of it is included in the response. All these limitations could be solved when the system has access to the original documents (and preferably also the HTML-tags) of the answers, and to the Alpino parser. This could have been accomplished during this research, but that would have taken very much extra effort, while the extra value for this research would have been minimal. 5.5.2 Code The Java code for the prototype and the baseline, including the example input and output files used for evaluation of the system, can be retrieved from my website: http://wwwhome.cs.utwente.nl/~langen/thesis 88 Question answering for general practitioners 6 Evaluation Based on literature research and interviews with general practitioners, an information presentation module for a medical QA system (GIPS) has been designed. In this chapter the design for GIPS in general and the response formulation algorithm developed for GIPS in particular are evaluated. The evaluation of the entire design for GIPS is described in section 6.1. The evaluation of the answer extension algorithm used for the response formulation component is described in section 6.2. 6.1 Evaluation of the entire design To evaluate the entire design for GIPS, a prototype of the information portal for general practitioners (including a prototype of the graphical user interface of GIPS) has been constructed (see section 4.4). These prototypes have not been constructed to test the functionality of the corresponding systems, but to make the design tangible for users. The method used to evaluate this design is described in section 6.1.1. In sections 6.1.2 and 6.1.3 the results and conclusions of this evaluation are presented. 6.1.1 Evaluation method As stated previously, this research did not aim to develop a system that would be equally appreciated by all Dutch general practitioners. It was expected that general practitioners who already use the Internet to search for answers on their medical questions are more likely to appreciate and use a QA system than other general practitioners. Therefore, to evaluate this research, the design for GIPS and the information portal resulting from this research have been evaluated only with the two general practitioners who indicated in the previous interviews that they already used the Internet to search for answers on their medical questions. Again, qualitative interviews have been used. The interviews had the same semi-structure as the previous ones. They covered the topics of information needs, information sources and computer use. A general outline of the interviews is shown in Appendix E. First of all, the general practitioners were confronted with the prototypes of the information portal and GIPS. Then they were asked how often they think they would use such systems and whether they would use them during consultations and patient visits. Secondly, it was explained to the general practitioners what kind of resources GIPS would use and they were asked what they think of these resources. Thirdly, the general practitioners were asked a few questions about whether they think the information portal and GIPS could be an improvement for their work and what they think of the functionality of the systems. 6.1.2 Evaluation results In this paragraph, the results of the interviews are discussed with respect to the general practitioners’ information needs, information sources, and computer use. Information needs The general practitioners both thought they would use a system like GIPS to pursue their information needs if it were entirely functional and working properly. One of them thought she would use it multiple times a day, the other thought he would use it a few times per week. Example questions they would have liked to submit to the system are “How often do RSI and the chronic fatigue syndrome coincide?” or “What kind of exercises could be done when … ?”. The general practitioners would both use the system during the consultation in order to be able to print its responses and give them to the patient, just as they do now with NHG patient letters that are provided by the general practitioner information system they use. Patient letters are not available for all topics, however. Therefore they would like any additional information GIPS could provide. On the use of the information portal and mobile computing, the general practitioners were less unanimous. The information portal provides access to three existing systems (the NHG 89 Mieke van Langen guidelines, Artsennet.nl, and PubMed) and three future systems (GIPS, image retrieval, and a social map). One of the general practitioners was primarily enthusiastic about the ease with which the three existing systems could be accessed via the information portal. The other one thought he would rather access these existing systems directly, just as he is doing now. He really liked the idea of the three future systems, however. Mobile computing is still science fiction for both general practitioners. One of them would really like it. She is looking forward to having access to patient data and information systems like GIPS during patient visits. The other general practitioner hasn’t got any problems with the way he is working now and wouldn’t consider purchasing a laptop just to have access to his data during patient visits. Information sources Both general practitioners indicated that it is very hard to determine the reliability of electronic sources. One of them said he once read about an anesthesiologist (whose name he can’t remember) who made an overview of reliable medical websites. He thinks such an overview (constructed by a medical professional) could be used as document collection for GIPS. The other general practitioner said she would trust the information provided by the NHG, but from patient organizations she would only use practical information. She thinks these sources do not provide reliable theoretical information. Computer use One of the general practitioners really thought the information portal would save her time compared with the way she now accessed the search engines Artsennet.nl and PubMed. She even asked whether the prototype of the information portal would remain online. The general practitioners both appreciated the way the information is presented by GIPS and the length of the responses. They especially liked the printing function. However, one of the general practitioners indicated his computer wasn’t connected to a printer yet. He would like to have a printer on his desktop to be able to print the information during the consultation without having to leave the room. This will be realized in the near future. Finally, both general practitioners thought GIPS and the other future systems (image retrieval and the social map) could be an improvement for their work. Besides, one of the general practitioners said he appreciated GIPS, because he could enter an entire question instead of only a few keywords, like in traditional search engines. However, he indicated he found it hard to identify any negative aspects now, but when all possible questions could be entered in GIPS, he would firstly like to experiment with it for a while to determine the quality of the responses. 6.1.3 Conclusions A question answering system that answers questions for patient education would really be suitable for use by general practitioners, especially for the ones who already use Internet to look up information on patient care. The design that has been made during this research (GIPS) seems to be appreciated by general practitioners. However, special attention should be paid to selecting the right information sources. Besides, the automation level of general practices is currently not always sufficient for optimal use of such a question answering system. For example, general practitioners should have access both to the Internet and to a printer in their consulting rooms. Next to question answering technology, image retrieval technology for retrieving dermatological images, and information extraction technology for retrieving address information of medical professionals and organizations (a social map) would also be very useful for general practitioners. The information portal designed during this research seems a good way of providing general practitioners with an overview of the kinds of information they can find on the Internet. 90 Question answering for general practitioners 6.2 Evaluation of the answer extension algorithm An answer extension algorithm was developed for GIPS that extends an answer sentence (returned by a question answering module) with the sentences most related to it. In this section it is investigated whether these sentences add any relevant information and whether including them makes the response more coherent. The prototype used for this evaluation was described in section 5.5. The method used to evaluate the algorithm is described in section 6.2.1. In sections 6.2.2 and 6.2.3 the results and conclusions of this evaluation are presented. 6.2.1 Evaluation method To evaluate the answer extension algorithm, responses have been generated for a set of test questions. This test set includes the ten example RSI questions presented in Appendix D and ten non RSI-related medical questions randomly selected from an IMIX document with example questions [IMIXv]. The paragraphs from which the answers on these questions were retrieved were not part of the text corpus used for the development of the answer extension algorithm. On manual inspection of the responses on the test questions, the algorithm seems to have correctly recognized the relevant cue phrases and anaphoric expressions. The responses generated with the prototype were compared with the corresponding baseline responses (consisting of the answer sentence extended with the preceding and successive sentence). Then the paragraphs were selected that satisfied the following criteria: the paragraph generated by the answer extension algorithm differs from the baseline; the paragraph at least covers the same topic as the corresponding question (to prevent the evaluation from being influenced by the correctness of the answers generated by the question answering module “qadr.qa”); the paragraph doesn’t refer to any figures, tables, or lists; if multiple paragraphs from a single response satisfy these criteria, only the first one is selected. In total, the twenty responses contained 53 paragraphs. Of these paragraphs 14 were identical to the baseline. This does not necessarily mean that GIPS extended all these 14 answers with the preceding and successive sentence, because sometimes the QA document doesn’t provide any context sentences or for example provides only successive context sentences, in which cases GIPS as well as the baseline were not able to extend the answers properly. 16 other paragraphs were considered not dealing with the same topic as the question, and 5 others contained expressions referring to a figure, table, or list. The remaining 18 paragraphs belonged to 12 different responses. For each of these 12 responses the first suitable paragraph was selected. Any unfinished sentences in these paragraphs were finished manually by consulting the original document instead of the context provided by the QA document. Naïve users were asked to evaluate these paragraphs. It was not necessary to ask general practitioners to evaluate the paragraphs, because the responses were not evaluated on medical correctness, but on linguistic characteristics. Besides, GIPS aims at answering questions for patient education. Patients should thus be able to understand the responses. Naïve users are all potential patients, they are thus very good subjects for the evaluation of the responses. However, because some linguistic feeling is needed to be able to evaluate the responses on the relevant variables, only higher (non-medically) educated users participated in this evaluation. Twenty-nine users participated in this evaluation. Though all participants received some higher education, the group was very heterogeneous, consisting of men and women of different ages (from 19 to 50 years old) and from different disciplines. There were no medically educated participants, because the evaluation of the responses should not be influenced by judgments on the medical correctness of the responses. The participants were randomly divided into two groups. Each group received a different questionnaire. The two questionnaires are shown in Appendix F. The first group had to evaluate the GIPS generated paragraphs for one half of the twelve questions, and the baseline 91 Mieke van Langen responses for the other questions. For the second group, it was the other way round. The GIPS generated responses and baseline responses were randomly ordered. The goal of this evaluation was to investigate whether the sentences included by the answer extension algorithm add any relevant information and whether including them makes the response more coherent. Therefore, the participants were asked to indicate for each response on a five-point scale how useful the response is with respect to the question, how much irrelevant information with respect to the question is contained in the response, and how coherent the response is. The participants were told that the usefulness of the response doesn’t relate to the amount of useful information a response provides, but only to the presence of any useful information with respect to the question. The amount of irrelevant information was defined as the proportion of information that is irrelevant with respect to the question. A response can thus both be very useful and contain a lot of irrelevant information with respect to the question. Finally, the coherence was defined as the linguistic coherence of the response. If there are any referring expressions that could not be resolved, like “these things” in: These things are caused by smoking. the coherence is said to be low. However, if the response is a fluent text that can be fully interpreted, the coherence is high. Or, as one of the participants stated it: “would I understand the response if I had not read the question?”. This evaluation method is very similar to that used by Bosma for the evaluation of his answer extension algorithm [BOS05]. He used the same baseline and the same type of questionnaire. However, instead of the coherence of the responses, he investigated to what extent the participants were able to verify whether the responses were accurate. This “verifiability” is high when a participant is able to verify that the response is accurate, as well as when he is able to verify that the response is not accurate. It is only low when a response doesn’t contain enough context to determine whether the response concerns the subject of the question or another subject. This variable was not used for the evaluation of the responses generated by GIPS, because in a pretest participants thought it very hard to distinguish this variable from the usefulness of the response. Besides, when they understood what was meant with the verifiability, they never evaluated it as low. Therefore, it was decided to prevent any further participants from evaluating this variable. Instead, because the answer extension algorithm developed for GIPS was intended to improve the coherence of a response, participants were asked whether they thought the paragraphs formed a coherent text. 6.2.2 Evaluation results Five-point scales were used to evaluate the responses. These scales ranged from “very low” to “very high”. The middle score was “neutral”. For each variable (usefulness, irrelevance, and coherence) the proportion of high scores on the GIPS generated responses was compared to the proportion of high scores on the baseline responses. Neutral scores were ignored, because users also used this score when they could not decide whether the score should be high or low. The proportion of high scores is thus defined as the number of high scores divided by the total number of high and low scores on a specific variable. Double-sided two-sample t-tests were used to determine whether the difference between the proportions of high scores on the GIPS generated responses and on the baseline responses was significantly larger than zero, at a significance level of 0.05. Only three of the twelve answers (the first, fourth, and sixth) were extended by the extension algorithm of GIPS. The other answers were considered autonomous. To investigate whether the not extended answers were really autonomous and whether the extended answers were extended better than the baseline answers, the proportions of high scores were also analyzed separately for the three dependent answers and the nine autonomous answers. 92 Question answering for general practitioners Usefulness In Table 15 the proportions of high scores on usefulness are shown for the GIPS generated responses and the baseline responses. On dependent answers the usefulness of the responses with respect to the question was equally well evaluated, but on autonomous answers the baseline seems to perform somewhat better than GIPS. This difference is not significant (at a significance level of 0.05), however. There are three individual autonomous answers on which the baseline did score significantly higher than GIPS. In these cases GIPS apparently failed to include useful sentences that were included by the baseline. Relationships between these sentences were not signaled by cue phrases or anaphoric expressions, however. The algorithm thus did work properly. For example, on the question: Komt RSI in Nederland vaker voor dan in de rest van Europa? (Is the prevalence of RSI in the Netherlands higher than in the rest of Europe?) the answer sentence was: De geschiedenis leert dat RSI geen modeverschijnsel is. (History teaches us that RSI is not a trend.) This sentence is indeed not useful with respect to the question at all. It doesn’t have any cue phrases or anaphoric expressions either. However, the preceding sentence is very useful with respect to the question: Nederland heeft niet meer RSI-klachten dan andere landen in Europa. (The Netherlands don’t count more RSI complaints than other countries in Europe.) The other two questions on which the baseline provided a significantly more useful reponse concerned answer sentences that were more useful than the one illustrated above. However, in these cases the preceding and successive sentences provided some additional information that was evaluated as useful by the participants. Total Dependent answers Autonomous answers GIPS Baseline Significant difference 0.514 0.919 0.374 0.607 0.914 0.505 no no no Table 15 Proportion of high scores on usefulness Irrelevance The scores on the proportion of irrelevant information with respect to the question are shown in Table 16. The irrelevance of GIPS generated responses was significantly lower than the irrelevance of the baseline responses, at a significance level of 0.05. This was not the case for the extended responses, however. On the individual extended responses also no significant differences were measured concerning irrelevance. For only one of these responses, on the sixth question: Wat zijn de verschijnselen van griep? (What are the symptoms of influenza?) the baseline scored better (i.e. had a lower irrelevance) than GIPS. Possibly this was due to the sentence length of the GIPS generated response: 93 Mieke van Langen Griep wordt veroorzaakt door het zogenaamde influenzavirus, met verschijnselen van koorts, neusverkoudheid, hoesten, hoofdpijn, spierpijn en vermoeidheid. Omdat het virus erg besmettelijk is kan iedereen griep krijgen en zal je meestal ruim een week het bed moeten houden. Gezonde mensen knappen daarna weer op door rust en veel drinken, maar bij mensen met een chronische aandoening, patiënten met een verminderde weerstand, bewoners van verpleeg-verzorgingshuizen en ouderen boven de 65 jaar kan de ziekte ernstig verlopen. Zij worden dan ook jaarlijks door hun huisarts gevaccineerd tegen griep, de zogenaamde influenzavaccinatie, die voor 70-80% bescherming biedt tegen het krijgen van griep (influenza). The first two sentences of this response (in bold) were both answer sentences, retrieved from the same paragraph. The third and the fourth sentences were added, because they contained an anaphoric expression and a cue phrase respectively. The response thus counts only four sentences, but these sentences have a mean length of 25 words. As a comparison: another extended response counted seven sentences, but received a lower irrelevance score. Its mean sentence length was 15 words. The baseline for the response presented above didn’t include the last sentence (counting 25 words). Instead the sentence preceding the first answer sentence was included. This sentence counted only 14 words. Total Dependent answers Autonomous answers GIPS Baseline Significant difference 0.530 0.697 0.483 0.699 0.686 0.704 -0.169 no -0.221 Table 16 Proportion of high scores on irrelevance Coherence The results on coherence are presented in Table 17. GIPS scores significantly higher on coherence than the baseline, at a significance level of 0.05. Again, this is not true for the extended responses, however. On the individual extended responses no significant differences were measured concerning coherence either. But, probably for the same reason as hypothesized before, the GIPS generated response on the sixth question (the one presented above) scored lower than the baseline response. Another GIPS generated response also scored lower, but this difference was even less significant. The third extended response scored almost significantly better than the baseline response. Total Dependent answers Autonomous answers GIPS Baseline Significant difference 0.807 0.795 0.812 0.647 0.711 0.622 0.160 no 0.189 Table 17 Proportion of high scores on coherence 94 Question answering for general practitioners 6.2.3 Conclusions Concerning the usefulness of the responses with respect to the question no great differences were found between GIPS and the baseline. There was only one question for which GIPS apparently failed to include the most useful sentence where the baseline did. Actually, this failure could also be attributed to the question answering module of the IMIX demonstrator, which should have marked this sentence as the answer sentence instead of the successive one. Based on the evaluation of the proportion of the response that is irrelevant to the question, it might be concluded that GIPS succeeds in filtering a lot of irrelevant information that was included by the baseline. However, answers that are extended by GIPS should not grow too large. The algorithm used by GIPS only restricted the number of sentences included, but maybe it should also take into account the number of words. Finally, on coherence the GIPS generated responses scored significantly higher than the baseline responses. It is suggested that restricting the number of words of extended responses might increase the coherence even more. Especially answers that were not extended by GIPS (autonomous answers) contained significantly less irrelevant information and were judged significantly more coherent than the baseline responses. It might thus be concluded that these answers should indeed not be extended. For dependent answers (which were extended) no significant differences with the baseline responses were found. This is probably due to the small number of dependent answers. Bosma [BOS05] developed a similar answer extension algorithm and executed a similar evaluation. He also concluded that the differences on usefulness between the responses generated by his algorithm (the query-based summarizations) and the baseline responses were not significant, and that the query-based summarizations contained less irrelevant information than the baseline responses. He did not evaluate the responses on coherence. For his algorithm an automatic RST-annotation tool for Dutch would be needed, which is not yet available. The algorithm developed for GIPS only needs a tool for generating dependency trees. For this task a Dutch tool is available: the Alpino parser. Therefore, the answer extension algorithm developed for GIPS currently seems a good alternative for the algorithm developed by Bosma. 95 Mieke van Langen 96 Question answering for general practitioners 7 Conclusions The research question to be answered in this master’s thesis was: Which information needs of Dutch general practitioners can be satisfied by a question answering (QA) system and how should the answers be presented? Literature research and interviews with general practitioners were used to identify the different work roles and tasks, information needs, variables of awareness of information, and information sources of general practitioners. Work roles and tasks It was concluded that a QA system could support the general practitioner primarily in his work role of service provider, during the phase of searching databases, in order to help the general practitioner providing explanations to the patient. Information needs The information needs most suitable to pursue with a QA system, would be those concerning patient education and population statistics. Most Dutch general practitioners already use a general practitioner information system that incorporates the patient letters issued by the Dutch College of General Practitioners (NHG). These letters can be printed and handed out to the patient. However, not all topics are covered by these patient letters. Therefore, some of the interviewed general practitioners indicated they would like to use a question answering system to search for additional information to give to the patient. Awareness of information Concerning the awareness of information, a QA system could improve the accessibility of electronic information sources for patient education, because the general practitioner only has to enter a question in Dutch. A dialogue between the system and the general practitioner might be used in order to specify the question when needed. The general practitioners participating in the interviews did not yet use the Internet to search for answers during medical consultations, because that would take too much time. However, when they want to hand out information to the patient, they should look up the information during the medical consultation. Therefore, the QA system’s response time should be short enough to enable using the system during medical consultations. Besides, general practitioners must know which information they can find with the QA system. To provide an overview of the information that can be found with the QA system and the information types that can be retrieved with other information retrieval systems useful for general practitioners, an information portal has been designed. This portal was appreciated by the general practitioners participating in the evaluation. Information sources The information sources used by the QA system should be suitable for patient education. Because the information must be up-to-date, these sources could best be retrieved from the Internet. However, it is hard to determine the reliability of sources on the Internet. Therefore, only information sources marked as reliable by medical professionals should be used. Computer use The computer use of general practitioners was also investigated. Most general practitioners have a computer in their consulting rooms and make use of a general practitioner information system. The use of mobile devices in the general practice is far less common, however. Although it is expected that these devices would really improve the physician’s work during patient visits, none of the interviewed general practitioners was planning to have one. Some of them even rather omit visiting patients for this reason. It was concluded that the general practitioners most likely to appreciate a QA system are those who already search for answers on their medical questions on the Internet. Concerning the user interface, the system should accommodate both keyboard and mouse input. Speech input should only be possible when it is working perfectly. The system should be able to recognize 97 Mieke van Langen ICPC coding and other medical slang in the question. There shouldn’t be any timeouts and all functional options must be immediately visible on the graphical user interface. Further, general practitioners must have access to a computer with Internet connection and a printer in their consulting rooms to be able to use the QA system optimally. A lot of general practitioners already have these facilities, because they also need them for searching the Web and printing patient letters. It is expected that most other general practitioners will acquire these facilities in the near future. So when the QA system would become available on the Internet, general practitioners wouldn’t have to make any extra costs to be able to use it. There only needs to be an organization that informs the general practitioners about the QA system, and selects the right information sources and keeps them up-to-date. Information presentation The presentation of the answers a QA system returns should include a few important aspects for general practitioners. The answer selection component of a QA system (which provides the input for the response formulation component) returns a number of answers, which could all be correct. It was concluded that answers retrieved from different sources should not be integrated into a single answer, because general practitioners want to be able to check (the reliability of) the source of the answer and because they want to have the feeling that as medical professionals they are in control of the decision process, not the system. However, answers from different sources should be integrated into a single view, to enable general practitioners to select the most suitable sources or answers easily, and print this selection of answers. Therefore, each answer should be presented together with a link to its source, and a checkbox to indicate whether it should be printed or not. Answers originating from the same source could be integrated into one concise answer, possibly extended with sentences from their context. An algorithm has been developed that determines whether an answer should be extended and with which sentences it should be extended. For this purpose, lists of cue phrases and anaphoric referring expressions were produced and rules were extracted that determine whether an occurrence of a cue phrase or anaphoric expression in a text signals a relation with the preceding sentence (in which case this preceding sentence should be included in the response). The answer extension algorithm has been evaluated and the evaluation results have been compared to those of Bosma [BOS05] who developed a similar algorithm. It was concluded that the answer extension algorithm produced coherent responses that contain less irrelevant information than a baseline response consisting of the answer sentence extended with the preceding and successive sentence. These results were similar to those of Bosma. However, for his algorithm an automatic RST-annotation tool for Dutch would be needed, which is not yet available. Therefore, the answer extension algorithm developed during this research currently seems a good alternative for the algorithm developed by Bosma. 98 Question answering for general practitioners 8 Discussion This research essentially consists of two parts. The first part concerns the information and computer use by general practitioners. The second part deals with response formulation. The research on the information and computer use by general practitioners concentrated on the possibility of using question answering (QA) technology to improve general practitioners’ work. It was concluded that QA systems would primarily be suitable to answer questions for patient education. However, during the interviews with general practitioners, other types of information needs that might be pursued with intelligent information retrieval technology also became apparent. For example, when confronted with dermatological diseases, general practitioners frequently have to look up images in a dermatology book. Systems for dermatological image retrieval might help general practitioners by reducing the time to find the relevant picture. Another information type needed by general practitioners is a “social map” that provides an overview of regional health professionals and medical organizations. A lot of these organizations can be found on the Internet. Information extraction technology might be used to enable general practitioners to retrieve contact information of these organizations quickly. It is therefore strongly recommended that the applicability of image retrieval and information extraction technology for general practitioners is investigated in future research. Another question arising from this research is whether a QA system that answers questions for patient education could also be used by patients themselves instead of general practitioners. I think this depends on the state of the art of the QA technology. When the system always returns responses that make sense, it would be very useful for patients, because the system will use a document collection consisting of reliable sources aimed at patient education. Actually, the IMIX demonstrator (the QA system that was the starting-point of this research) is targeted towards these naïve users. However, currently the system also returns a lot of answers that are not even dealing with the same subject as the question. This information could be very misleading to the patient. General practitioners could serve as an intermediary to filter these answers and give the patient only those answers useful for him. Further, a design was made for an information portal for general practitioners. A very simple prototype has been implemented to illustrate this design. More work could be dedicated to improve this portal, for example by enabling personalization. When general practitioners are able to add and remove systems themselves, they will possibly be more likely to appreciate and use this portal. In the second part of this research, dealing with response formulation, two algorithms were developed: an answer integration algorithm, and an answer extension algorithm. The answer integration algorithm has not been implemented because of time constraints and because it was expected that a Dutch sentence fusion algorithm which is already being investigated by Marsi and Krahmer [MK05] would achieve better results. Their algorithm has not yet been implemented and evaluated for application in a QA system, however. Therefore, the comparison of (the complexity and results of) these algorithms is left for future research. The answer extension algorithm has been implemented and evaluated. Lists of Dutch cue phrases and anaphoric expressions were constructed for this purpose, and rules were extracted that determine whether an occurrence of a cue phrase or anaphoric expression in a text signals a relation with the preceding sentence. These lists and rules were based on a rather small text corpus covering only the medical domain. Therefore, they are probably not complete. Future research is needed to construct completer lists, especially when the algorithm would be used for other domains. However, although the lists were not complete, the evaluation results of the answer extension algorithm are promising. The answer extension algorithm restricts the number of sentences the answer sentence could be extended with. Sentences occurring after the answer sentence are only added when the total number of added sentences doesn’t exceed three. However, based on the evaluation results, it 99 Mieke van Langen was hypothesized that this number should depend on the sentence length. For example, a second test could be added that checks whether the number of words added doesn’t exceed twenty before adding an extra sentence. Extra research would be needed to investigate what numbers of words and sentences the response should maximally contain to optimize both the relevance and coherence of the response. Further, it was concluded that answers that were not extended by the answer extension algorithm, were more coherent and contained significantly less irrelevant information with respect to the question than a baseline response consisting of the answer sentence extended with the preceding and successive sentence, while no significant differences with respect to the usefulness of the responses were found. It thus seems that the answer extension algorithm is very useful to determine whether an answer should be extended or not. However, no significant differences were found between the extended responses and the baseline responses. This is probably due to the small number of extended responses. A more thorough evaluation would be needed to investigate the usefulness, irrelevance, and coherence of extended responses. For this purpose, it would be better to integrate the answer extension algorithm with the IMIX demonstrator to enable automatic generation of example responses, and speed up the process of finding extended answers suitable for evaluation. It was also concluded that, for the time being, the answer extension algorithm would be a good alternative for the algorithm developed by Bosma [BOS05], because for his algorithm an automatic RST-annotation tool for Dutch would be needed, which is not yet available. However, when such a tool becomes available, it would be useful to compare the performance of both algorithms on the same set of example responses to determine which algorithm performs best on usefulness, irrelevance, and coherence. Finally, in this research, response formulation was restricted to selecting and integrating sentences from the document an answer was retrieved from. More intelligent technology would be needed to perform some reasoning with the retrieved information, and formulate responses like “Yes, it is”, or “No, but …” on verification questions, or “100,000” on quantity questions. However, it is expected that with the current state of the art, general practitioners would not very much appreciate and trust systems interpreting texts for them. This issue was therefore not dealt with in this research. 100 Question answering for general practitioners References [APO] Apotheek.nl. Geneesmiddelen. http://www.apotheek.nl [BAR03] Barzilay, R. Information fusion for multidocument summarization: Paraphrasing and generation. PhD Thesis, Columbia University, 2003. [BE96] Best evidence [database on cd-rom]. Philadelphia: American College of Physicians, 1996. [BMJ] British Medical Journal. Clinical Evidence. London: BMJ Publishing Group Limited. http://www.clinicalevidence.com [BNM01] Bouma, G., Noord, G. van, and Malouf, R. Alpino: Wide-coverage computational analysis of Dutch. 2001. http://www.let.rug.nl/~vannoord/papers/alpino.pdf [BOO03] Boonstra, A. Interpretative perspectives on the acceptance of an optional information system: Lessons from the introduction of an electronic prescription system for general practitioners. University of Groningen: Research Institute SOM, 2003. http://www.ub.rug.nl/eldoc/som/a/03A08/03A08.pdf [BOS05] Bosma, W. Extending answers using discourse structure. Submitted to Crossing Barriers in Text Summarization Research. Workshop to be held in conjunction with RANLP, 2005. [BOU03] Bouma, G. Question answering for Dutch using dependency relations. September 2003, Groningen, the Netherlands. http://odur.let.rug.nl/~gosse/Imix/ project_description.pdf [BOU04] Bouma, G. QADR output specification. 2004. IMIX Internal Project Page (restricted): http://imix.uvt.nl/Demonstrator/Integration/qadr_qa/xml_specs.pdf [BW97] Barrie, A.R. and Ward, A.M. Questioning behaviour in general practice: A pragmatic study. In British Medical Journal 315, 1997. pp. 1512-1515. [CBD05] Canisius, S., Bosch, A. van den, and Daelemans, W. IMIX Rolaquad: XML output specification. 2005. IMIX Internal Project Page (restricted): http://imix.uvt.nl/Demonstrator/Integration/rolaquad_qa/Rolaquad-XMLspecification.pdf [CC] Cochrane Collaboration. Cochrane Library. http://www.cochrane.org [CEBM] Centre for Evidence-Based Medicine. Focusing clinical questions. http://www.cebm.net/focus_quest.asp [CM03] Coumou, H. and Meijman, F. Hoe zoekt de huisarts literatuurgegevens bij problemen van patiënten? In Huisarts en Wetenschap 46, 2003. pp. 359-63. [CNP00] Cardie, C., Ng, V., Pierce, D., and Buckley, C. Examining the role of statistical and linguistic knowledge sources in a general-knowledge question-answering system. In Proceedings of the 6th Conference on Applied Natural Language Processing. 2000. pp. 180-187. [COX00] Cox, D. Uitslag enquête LHV en NHG. Huisartsen surfen thuis! In Huisarts en Wetenschap 43, 2000. pp. 408-409. [DHP98] Dupuits, F.M.H.M., Hasman, A., and Pop, P. Computer-based assistance in family medicine. In Computer Methods and Programs in Biomedicine 55, 1998. pp. 39-50. [DN96] Dennis, A. and Newman, W. Supporting doctor-patient interaction: Using a surrogate application as a basis for evaluation. In Proceedings of the CHI '96 Conference 101 Mieke van Langen Companion on Human Factors in Computing Systems: Common Ground, April 1318, 1996, Vancouver, BC, Canada. ACM. pp. 223-224. [DS97] Detmer, W.M. and Shortliffe, E.H. Using the Internet to improve knowledge diffusion in medicine. In Communications of the ACM 40 (8), 1997. pp. 101-108. [EOE02] Ely, J.W., Osheroff, J.A., Ebell, M.H., Lee Chambliss, M., Vinson, D.C., Stevermer, J.J., and Pifer, E.A. Obstacles to answering doctors’ questions about patient care with evidence: qualitative study. In British Medical Journal 324, 2002. pp. 710—722. [EOE99] Ely, J.W., Osheroff, J.A., Ebell, M.H., Bergus, G.R., Levy, B.T., Lee Chambliss, M., and Evans, E.R. Analysis of questions asked by family doctors regarding patient care. In British Medical Journal 319, 1999. pp. 358-361. [EPM] Entrez. PubMed. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed [GAW94] Gorman, P.N., Ash, J., and Wykoff, L. Can primary care physicians' questions be answered using the medical journal literature? In Bulletin of the Medical Library Association 82 (2), 1994. pp. 140-146. [GH95] Gorman, P.N. and Helfand, M. Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. In Medical Decision Making 15 (2), 1995. pp. 113-119. [GOO] Google. http://www.google.nl [GOR95] Gorman, P.N. Information needs of physicians. In Journal of the American Society for Information Science 46 (10), 1995. pp. 729-736. [HAA02a] Haan, S. de. Discourse. In Appel, R., Baker, A., Hengeveld, K., Kuiken, F., and Muysken, P. (eds.). Taal en taalwetenschap. Oxford: Blackwell Publishers, 2002. pp. 71-86. [HAA02b] Haan, S. de. Zinsbetekenis. In Appel, R., Baker, A., Hengeveld, K., Kuiken, F., and Muysken, P. (eds.). Taal en taalwetenschap. Oxford: Blackwell Publishers, 2002. pp. 163-181. [HER03] Herzog, G. Multiplatform Testbed: A tutorial. Nijmegen, 2003. IMIX Internal Project Page (restricted): http://imix.uvt.nl/Demonstrator/Multiplatform/MultiplatformTutorial030428.pdf [HOU00] Houët, H. Prisma handboek van de Nederlandse taal. First edition. Utrecht: Het Spectrum, 2000. [IMIXa] IMIX. Architecture. IMIX Internal Project Page (restricted): http://imix.uvt.nl/Demonstrator/Integration/architecture.html [IMIXg] IMIX. Module imix.gui. IMIX Internal Project Page (restricted): http://imix.uvt.nl/Demonstrator/Integration/imix_gui/doc.html [IMIXp] IMIX. P-ml presentation format. IMIX Internal Project Page (restricted): http://imix.uvt.nl/Demonstrator/Integration/imogen_gen/pml.html [IMIXv] IMIX. Stevin vragen. IMIX Internal Project Page (restricted): http://imix.uvt.nl/data/stevin-vragen.txt [JM00] Jurafsky, D. and Martin, J.H. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, 2000. [JMR03] Jijkoun, V., Mishne, G. and Rijke, M. de. Building Infrastructure for Dutch Question Answering. In: A.P. de Vries (ed.), Proceedings DIR 2003, 2003. 102 Question answering for general practitioners [KNMG] Koninklijke Nederlandsche Maatschappij tot bevordering der Geneeskunst. Artsennet. http://www.artsennet.nl [KPG03] Kosseim, L., Plamondon, L., and Guillemette, L. Answer formulation for questionanswering. In Proceedings of The Sixteenth Conference of the Canadian Society for Computational Studies of Intelligence, Canada, June 2003. pp. 24–34. [KS98] Knott, A. and Sanders, T. The classification of coherence relations and their linguistic markers: An exploration of two languages. In Journal of Pragmatics, 30 (2). 1998. pp. 135-175. [KS04] Koenen, L. and Smits, R. Handboek Nederlands. First edition. Utrecht: Bijleveld, 2004. [LAN05] Langen, M.C.G. van. Building a system for answering Dutch person questions. In 2nd Twente Student Conference on IT. Enschede, 2005. [LAP03] Lapata, M. Probabilistic text structuring: Experiments with sentence ordering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. July 2003. pp. 545-552. [LINH] Landelijk Informatie Netwerk Huisartsenzorg: Feiten en cijfers over huisartsenzorg in Nederland. http://www.linh.nl [LQS03] Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., and Karger, D.R. What makes a good answer? The role of context in question answering. In Proceedings of the Ninth IFIP TC13 International Conference on Human-Computer Interaction. Zurich, Switzerland, 2003. [LSS01] Lagendijk, P.J.B., Schuring, R.W. and Spil, T.A.M. Het Elektronisch Voorschrijf Systeem: Van kwaal tot medicijn. Enschede: Universiteit Twente, 2001. [LPS96] Leckie, G.J., Pettigrew, K.E., and Sylvain, C. Modeling the information seeking of professionals: A general model derived from research on engineers, health care professionals, and lawyers. In Library Quarterly 66 (2). 1996. pp. 161-193. [MCW05] Magrabi, F., Coiera, E.W., Westbrook, J.I., Gosling, A.S. and Vickland, V. General practitioners’ use of online evidence during consultations. In International Journal of Medical Informatics 74 (1), January 2005. pp. 1-12. [MK05] Marsi, E. and Krahmer, E. Explorations to sentence fusion. Submitted to ENLG ’05. 2005. [MS03] Moldovan, D. and Surdeanu, M. On the role of information retrieval and information extraction in question answering systems. In M. T. Pazienza (ed.). SCIE 2002. July 2002, pp. 129-147. [MT87] Mann, W.C. and Thompson, S.A. Rhetorical structure theory: A theory of text organization. Technical report RS-87-190. University of Southern California, Information Sciences Institute. 1987. [NGC] National Guideline Clearinghouse. http://www.guideline.gov [NHGf] Nederlands Huisartsen Genootschap. NHG-Formularium. http://nhg.artsennet.nl [NHGp] Nederlands Huisartsen Genootschap. Patiëntenvoorlichting. http://nhg.artsennet.nl [NHGs] Nederlands Huisartsen Genootschap. NHG-Standaarden. http://nhg.artsennet.nl [NHS] National Health Service. NLH Question-Answering Service. http://www.clinicalanswers.nhs.uk 103 Mieke van Langen [NWOa] Nederlandse Organisatie voor Wetenschappelijk Onderzoek. Interactieve Multimodale Informatie Extractie. http://www.nwo.nl/imix [NWOb] Nederlandse Organisatie voor Wetenschappelijk Onderzoek. IMOGEN: Interactive Multimodal Output Generation. http://www.nwo.nl/nwohome.nsf/pages/NWOP_653H7L [OS04] Os, E. den (ed.). Functional specification IMIX demonstrator. 2004. IMIX Internal Project Page (restricted): http://imix.uvt.nl/Demonstrator/Specification/functional_specification_1_0.doc [PH87] Paice, C.D. and Husk, G.D. Towards the automatic recognition of anaphoric features in English text: The impersonal pronoun “it”. In Computer Speech and Language 2. 1987. pp. 109-132. [PSB03] Power, R., Scott, D., and Bouayad-Agha, N. Document structure. In Computational Linguistics 29 (4). 2003. pp. 211-260. [RHF92] Rector, A.L., Horan, B., Fitter, M., Kay, S., Newton, P.D., Nowlan, W.A., Robinson, D., and Wilson, A. User centered development of a general practice medical workstation: The PEN&PAD experience. In Proceedings of the SIGCHI conference on Human factors in computing systems, June 1992, Monterey, California, United States. ACM. pp. 447-453. [RM99] Reape, M. and Mellish, C. Just what is aggregation anyway? In Proceedings of the 7th European Workshop on Natural Language Generation. Toulouse, France. 1999. pp. 20-29. [SHA98] Shaw, J.C. Clause aggregation using linguistic knowledge. In Proceedings of the 9th International Workshop on Natural Language Generation. Canada, 1998. pp. 138147. [SHA02] Shaw, J.C. Clause aggregation: An approach to generating concise text. PhD thesis, Columbia University, 2002. [STE94] Sterkenburg, P.G.J. van. Van Dale handwoordenboek van hedendaags Nederlands. Second edition. Utrecht: Van Dale Lexicografie, 1994. [THE05] Theune, M. QA XML output (shared part). 2005. IMIX Internal Project Page (restricted): http://imix.uvt.nl/Demonstrator/Integration/answers/QA-messages-v1.pdf [TREC] Text Retrieval Conference. Question answering collections: http://trec.nist.gov/data/qa.html [VBM95] Verhoeven, A.A.H., Boerma, E.J., and Meyboom-de Jong, B. Use of information sources by family physicians: a literature survey. In Bulletin of the Medical Library Association 83, 1995. pp. 85-90. [VER99] Verhoeven, A.A.H. Information-seeking by general practitioners. PhD Thesis, Rijksuniversiteit Groningen. Groningen: Van Denderen, 1999. [VNB99] Verhoeven, A.A.H., Noort, C.P. van, Bosveld, H.E.P., Boerma, E.J., and Meyboomde Jong, B. Information use and needs: a survey among Dutch general practitioners. In Verhoeven, A.A.H. Information-seeking by general practitioners. PhD Thesis, Rijksuniversiteit Groningen. Groningen: Van Denderen, 1999. [VP05] Vidiam and Paradime. Functional specification IMIX dialogue system: Version 1. 2005. http://wwwhome.cs.utwente.nl/~schooten/vidiam/funcspec2/funcspec22apr2005.pdf [VS03] Verhoeven, A.A.H. and Schuling, J. Op zoek naar bewijs: een vraag- en antwoorddienst voor de huisarts. In Huisarts en Wetenschap 46, 2003. pp. 12-17. 104 Question answering for general practitioners [WHB02] Wolters, I., Hoogen, H. van den, and Bakker, D. de. Evaluatie invoering Elektronisch Voorschrijf Systeem Monitoringfase: de situatie in 2001. Utrecht: NIVEL, 2002. [WM99] Westberg, E.E. and Miller, R.A. The basis for using the Internet to support the information needs of primary care. In Journal of the American Medical Informatics Association 6 (1), 1999. pp. 6-25. 105 Mieke van Langen 106 Question answering for general practitioners Appendix A: Questions This Appendix contains a sample of typical questions asked by general practitioners. These questions were collected in Oregon studies of information needs [GOR95, GAW94]. 1. In a patient with refractory headaches, now benefiting from a calcium channel blocker, is there a specific drug or dose that has been shown to work? Is there a study showing this? 2. After 2 courses of antibiotics in a physician’s daughter with bronchitis, what treatment is appropriate for persistent symptoms? 3. In an octogenarian with anemia, angina, and a history of transient ischemic attacks, with a normal creatinine, iron, and mean corpuscular volume, who refuses a bone marrow exam, what diagnostic and therapeutic options are there? 4. Is it safe to use ibuprofen in a 50-year-old man with a history of colon cancer, now reporting dysuria, who has cellular casts in his urine? 5. Does Norpace cause fatigue? 6. What are the cost, risk, and usefulness of dipyridamole thallium scanning in a patient with chronic obstructive lung disease, claudication, and angina pectoris? 7. In a woman with sclerosing adenosis on breast biopsy and family history of breast cancer, who requires estrogen therapy to control symptoms, how can the risk of breast cancer be lowered? 8. In an 88-year-old woman with dysphagia due to past laryngeal cancer, now in respiratory failure due to aspiration, what is the physician’s role in aggressiveness of care decisions when the patient’s family has unrealistic expectations? 9. For a child with exacerbation of steroid dependent asthma and varicella exposure, how do you give varicella immune globulin and where do you get it? 10. Is meclizine effective for labyrinthitis? 11. In a man with vague intermittent abdominal and back pain, what additional information will be most useful and what is the complete differential diagnosis? 12. Can aspirin or an antiplatelet agent be used as prophylaxis against pulmonary embolism (PE) in an elderly woman with unexplained oxygen desaturation and no clinical risk factors for PE (none that warrant transport 100 miles for diagnostic tests)? 13. In a woman with history of delivering at 33 weeks, now having Braxton-Hicks contractions at 32 weeks, on terbutaline and bed rest, in breech position, is c-section indicated if labor cannot be stopped? 14. How can I distinguish and manage chest pain in an older woman with known coronary disease, status post angioplasty of the left anterior descending coronary artery, arthritis which precludes treadmill testing, esophagitis, inadequate personality which complicates history, given that dipyridamole testing is 180 miles away? 15. In a patient with steroid dependent chronic obstructive lung disease, does the risk of renal or gastrointestinal complications outweigh the benefit of non-steroidal anti-inflammatory therapy for degenerative joint disease? 16. Can an insulin-dependent diabetic be certified as a commercial driver? 17. At what age is screening prostate-specific antigen [testing] indicated in a low-risk patient? 18. What is the exact increase in risk of thrombotic events on oral contraceptives in a woman with family history of myocardial infarction (her grandmother at age forty-nine) and of deepvein thrombosis? 19. Are nonacetylated salicylates really safer (and how much safer) in patients with NSAID GI intolerance (who benefit from anti-inflammatory effect)? 20. For diagnosis of deep-vein thrombosis, how good is ultrasound; does it obviate the need for venogram (can it exclude the diagnosis)? 21. Is amoxicillin safe for use in a lactating woman? 22. What is [sic] the sensitivity and specificity of arterial ultrasound exam of the lower extremities? 23. Is hypothyroidism associated with high cholesterol or low? 24. What is the dose of Imferon? 107 Mieke van Langen 25. At what point is endoscopy indicated in patients with esophagitis who remain symptomatic on medication? 26. Where can I send this patient for education about his alcoholism: more education than Alcoholics Anonymous provides, less expense than inpatient treatment? 108 Question answering for general practitioners Appendix B: Interview general practitioners Information needs The following questions deal with the medical questions you are confronted with during patient care. 1. How frequently are you confronted with questions from patient care? 2. Do you always search for answers on such questions? Include answers which you look up in the Pharmacotherapeutic Directory (Farmacotherapeutisch Kompas), or which you obtain from consulting a colleague. 3. At what moments do you search for information? 4. Do you also search for information when you are visiting patients? 5. Are the questions you are confronted with during patient visits different from those you are confronted with in the consulting room? Information sources 6. Which information sources do you use when you are looking for the answer to a medical question? 7. Do you use any electronic sources? 8. Do you also use English information sources? 9. What do you think of your possibilities for finding information for patient care? 10. Are any improvements needed for finding information for patient care? Computer use 11. Do you own any of the following items at work? Computer CD-ROM player Software to search the medical literature CD or disk with medical knowledge Subscription to Internet 12. Which computer applications do you use at work? The following questions deal with information retrieval and question answering systems. I will show you some examples; see Figure 16 to Figure 18. 13. Do you ever use information retrieval systems during clinical practice? No, continue with question 17 Yes, namely ……………………………….. 14. Are you generally able to find what you need with these systems? 15. Which features of these systems do you like? 16. Are there any negative aspects of these systems? 17. Do you think a question answering system could be an improvement for your work? 18. In which information sources would you like a question answering system to search? 109 Mieke van Langen 19. What kind of answers do you prefer when you search for an answer to a medical question? Complete articles Only relevant paragraphs A concise answer Other, namely: ……………………………….. 20. Would this be the same when you were searching for information during a patient visit? 21. If you had a system that provides you with only paragraphs or concise answers to your medical questions, what additional information would you like to have? 22. When you use a computer at work, do you generally prefer mouse or keyboard input? 23. Would you like to use speech input? 24. Can you show me a user interface of a medical information system that you really appreciate? Figure 16 A common information retrieval user interface [GOO] 110 Question answering for general practitioners Figure 17 A medical information retrieval user interface [EPM] Figure 18 An example of a question answering user interface 111 Mieke van Langen 112 Question answering for general practitioners Appendix C: Screenshots In Figure 19 a screenshot is shown of the prototype of the web portal for general practitioners. Figure 20 shows a screenshot of the prototype of GIPS, which can be accessed through the web portal. Figure 19 Screenshot of the information portal for general practitioners Figure 20 Screenshot of the prototype of GIPS 113 Mieke van Langen 114 Question answering for general practitioners Appendix D: Questions and answers In this appendix ten example RSI questions and their answers as generated by the “qadr.qa” question answering module of the IMIX demonstrator are presented (in Dutch). The answers are grouped per source. 1. Wat is RSI? www.rsi-vereniging.nl/rsi-vereniging/huisarts : RSI is een verzamelnaam voor zeer uiteenlopende vormen van overbelasting in het gebied van nek , schouders , armen en ellebogen . www.rsi-vereniging.nl/rsi-vereniging/handvat : Beroepsziekte : met 2600 mensen per jaar in de WAO is RSI de meest gesignaleerde beroepsziekte in 2001 . www.rsi-vereniging.nl/rsi-vereniging/archief/muismetstaart : Twee jaar geleden hebben we een grote beeldschermwerk-dag georganiseerd , sinds die tijd is RSI gelukkig wel een belangrijk onderwerp in Nederland . " RSI is geen verkoudheid waar je na een paar weken weer vanaf bent , zoveel is wel duidelijk . www.rsi-vereniging.nl/rsi-vereniging/archief/internationale_rsi : RSI is een overkoepelende term voor aandoeningen aan nek , schouder , arm en hand . 2. Waardoor kan RSI ontstaan? www.rsi-vereniging.nl/rsi-vereniging/behandelplan : RSI wordt veroorzaakt door een combinatie van risicofactoren . www.rsi-vereniging.nl/rsi-vereniging/behandelmethoden : Sommige ayurvedische therapeuten gaan ervan uit dat RSI wordt veroorzaakt door een stoornis in de stofwisseling . www.rsi-vereniging.nl/rsi-vereniging/archief/muismetstaart : Caissicres , kappers , musici en lopende band-medewerkers : er is nog een groot aantal andere beroepen die tot RSI leiden . 3. Welke spieren zijn betrokken bij RSI? Review_RSI_Bulthuis_Elkhuizen : Gezien de neiging van RSI zich uit te breiden , blijkt dat na verloop van tijd vaak veel spieren betrokken zijn bij het proces . Ademhalingsspieren Andere spieren die betrokken kunnen zijn bij RSI zijn de scaleni . Uit onderzoek is bekend dat deze het vaakst betrokken is bij RSI . Toch zijn er wel een aantal spieren die opvallend vaak zijn aangedaan bij RSI . 115 Mieke van Langen 4. Welke beroepen worden getroffen door RSI? www.arbobondgenoten.nl/arbothem/lichblst/rsi/tno_verzuim_en_rsi : Tabel 4.4 Proportie RSI bij werknemers met 13 weken verzuim : onderverdeling naar beroepsgroep Verdeling naar beroepsgroep Aangezien beroepsgroepen en bedrijfstakken voor een groot gedeelte overeen komen is het niet verrassend dat er relatief veel verzuimende werknemers met ambachtelijke en industriële beroepen RSI hebben . 39 verzuim door RSI gevonden werden zijn ambachtelijke en industriële beroepen en dienstverlenende beroepen . www.rsi-vereniging.nl/rsi-vereniging/grrsi : In meer recent onderzoek is , behalve bevestiging van hoge prevalenties in sommige van de genoemde beroepen , ook een hoge prevalentie bij echografisten en in visverwerkende bedrijven gevonden ( Ohl94 , Smi97 ) . Uit een overzicht van buitenlandse onderzoeken naar een relatie tussen arbeid en diverse klachten en aandoeningen die onder RSI gerekend worden , komt een aantal beroepen met zeer hoge prevalenties naar voren ( Hag95 ) . 5. Hoe is opkomende RSI te herkennen? www.muisarm.nl/site/fysiologische_verklaring : RSI uit zich in spier- , pees- , en zenuwklachten . 6. Welke oefeningen kan ik op mijn werkplek uitvoeren om RSI te voorkomen? www.rsi-vereniging.nl/gezond/inrichting : De Interne Arbodienst van de Universiteit Leiden geeft de volgende aanwijzingen : www.rsi-vereniging.nl/gezond/Bewegenisgezond : Zo blijven mensen achter de pc in vorm en kan RSI mogelijk worden voorkomen . www.rsi-vereniging.nl/rsi-vereniging/handvat : Beroepsziekte : met 2600 mensen per jaar in de WAO is RSI de meest gesignaleerde beroepsziekte in 2001 . 7. Hoe kan ik mijn werkplek het beste inrichten om RSI te voorkomen? www.rsi-vereniging.nl/gezond/inrichting : De Interne Arbodienst van de Universiteit Leiden geeft de volgende aanwijzingen : Een goed ingerichte werkplek is de eerste stap om RSI klachten te voorkomen . www.rsi-vereniging.nl/gezond/stap_rsi2002 : Zowel voor de 116 Question answering for general practitioners 8. Helpt pauzesoftware bij de bestrijding van RSI? www.rsi-vereniging.nl/overrsi/links : Het RSI-Kenniscentrum richt zich op kennis van effectiviteit van therapeutische interventies , hulpmiddelen en voorlichting rond de preventie en bestrijding van RSI . www.muisarm.nl/site/opening : Stichting RSI Nederland wil o.a. met deze website een substantiële bijdrage leveren aan de informatieverstrekking over RSI , en daarmee helpen bij de preventie en de bestrijding van de muisarm . 9. Kan je door RSI in de WAO komen? www.arbobondgenoten.nl/arbothem/lichblst/rsi/tno_verzuim_en_rsi : Wel is van het aantal personen tussen 35 en 55 dat in de WAO terechtkomt een relatief groter percentage door RSI in de WAO gekomen ( 4,2% ) dan in de jongere ( 2,7% ) en oudere leeftijdsgroepen ( 3,0% ) 3 . Wel komen van het aantal werkende vrouwen er bijna twee keer zoveel in de WAO ( door RSI ) als van het aantal werkende manWel is van het aantal personen tussen 35 en 55 dat in de WAO terechtkomt een relatief iets groter percentage door RSI in de WAO gekomen dan in de jongere en oudere leeftijdsgroepen3 . Wel komen van het aantal werkende personen bijna twee keer zoveel vrouwen in de WAO ( door RSI ) dan mannen . Bedrijfssectoren waarin men een hoog risico loopt door RSI in de WAO te komen zijn de reinigingsindustrie , de textielindustrie en de steen- , cement- , glas- , en keramische industrie . 10. Komt RSI in Nederland vaker voor dan in de rest van Europa? www.rsi-vereniging.nl/onderzoek/wrkrsi : Nederland is geen koploper wat betreft RSI klachten in Europa . Nederland is in Europa de laatste jaren koploper wat betreft computergebruik ( Paoli , 1992 ; Paoli , 1997 ; Paoli & Merllié , 2000 ; Andries e.a. , 2002 ) . Ik wil nu overgaan tot de vragen van alledag die je vaak over RSI hoort , en ik wil met u onderzoeken welke antwoorden het wetenschappelijk onderzoek al heeft . De geschiedenis leert dat RSI geen een modeverschijnsel is . merck : De ziekte komt veel in Europa voor en er zijn ook gevallen bekend in de voormalige Sovjetunie , China , Japan en Australië . 117 Mieke van Langen 118 Question answering for general practitioners Appendix E: Evaluation interview general practitioners The general practitioner is confronted with the prototypes of the information portal for general practitioners and the graphical user interface of GIPS developed during this research. It is explained that, eventually, GIPS will also be able to hold a dialogue to specify the question, ICPC coding will be recognized, and keyboard input will be accommodated. Then the general practitioner is asked to answer the following questions assuming both systems are working perfectly. Information needs 1. How frequently do you think you would use this portal or GIPS? 2. Could you remind any questions you met recently which you would have liked to enter in one of these systems? If yes, what were they? 3. Would you use it during patient care or after the consultation? 4. Could you also imagine yourself using it during patient visits? Information sources GIPS will retrieve its answers from online sources for patient education published by the NHG, patient organizations, etc. 5. Do you think these sources are appropriate? 6. Are there any other sources you would like the system to search in? Computer use 7. Do you think using this portal would save time compared to using the search engines you are using now? 8. Do you think you would use the printing option provided by GIPS? 9. Do you like the presentation of the answers? 10. Are there any negative aspects of the portal or GIPS? 11. Do you think these systems could be an improvement for your work? 119 Mieke van Langen 120 Question answering for general practitioners Appendix F: Questionnaire response formulation The participants to this questionnaire have been divided into two groups that completed different questionnaires. Each questionnaire contains twelve question-answer pairs, half of which have been answered by GIPS and the other half are the baseline answers. In the first questionnaire, the first, second, fourth, seventh, ninth, and tenth answer are GIPS generated. In the second questionnaire it is the other way round. For each answer the participants are asked how useful the provided information is with respect to the question, how much irrelevant information is contained in the answer, and how coherent the answer is. First questionnaire Very High Neut Low Very high ral low 1. Welke spieren zijn betrokken bij RSI? Gezien de neiging van RSI zich uit te breiden, blijkt dat na verloop van tijd vaak veel spieren betrokken zijn bij het proces. Het aantal spieren dat bij verschillende RSI patiënten mee kan doen, is dan ook groot. Een beschrijving daarvan zou haast neerkomen op het dupliceren van een anatomische atlas. Toch zijn er wel een aantal spieren die opvallend vaak zijn aangedaan bij RSI. De meest beruchte spier is de monnikskapspier (trapezius). Uit onderzoek is bekend dat deze het vaakst betrokken is bij RSI. Dat is ook niet zo verwonderlijk, want deze spier zorgt voor het optillen en stabiliseren van de schouders. Usefulness Irrelevance Coherence 2. Hoe voorkom je likdoorns? Doordat likdoorns meestal ontstaan door slecht passende schoenen, kunnen deze verdwijnen wanneer beter passend schoeisel wordt gedragen. Usefulness Irrelevance Coherence 3. Komt RSI in Nederland vaker voor dan in de rest van Europa? Nederland heeft niet meer RSI-klachten dan andere landen in Europa. De geschiedenis leert dat RSI geen modeverschijnsel is. In de vleesindustrie en andere beroepen met veel repeterende armbewegingen en in kantoorwerk kwamen de klachten al voor voordat de term RSI bestond. Usefulness Irrelevance Coherence 121 Mieke van Langen Very High Neut Low Very high ral low 4. Wat is RSI? RSI is een verzamelnaam voor zeer uiteenlopende vormen van overbelasting in het gebied van nek, schouders, armen en ellebogen. Het kan zijn dat de arts aanvullend onderzoek laat verrichten naar andere ziektebeelden met soortgelijke symptomen. Usefulness Irrelevance Coherence 5. Kan je door RSI in de WAO komen? Specifieke risicoberoepen zijn horecapersoneel, conciërges en schoonmakers, gezinshulpen en bejaardenverzorgers en 'overige ambachtelijke beroepen'. Bedrijfssectoren waarin men een hoog risico loopt door RSI in de WAO te komen zijn de reinigingsindustrie, de textielindustrie en de steen-, cement-, glas-, en keramische industrie. Administratieve beroepen vormen in het huidige onderzoek geen risicogroep voor verzuim door RSI; tevens vormen de zakelijke dienstverlening en de overheid geen risicogroepen voor WAOintrede door RSI. Usefulness Irrelevance Coherence 6. Wat zijn de verschijnselen van griep? Elke winter wordt 5 tot 20 % van de Nederlands bevolking getroffen door griep. Griep wordt veroorzaakt door het zogenaamde influenzavirus, met verschijnselen van koorts, neusverkoudheid, hoesten, hoofdpijn, spierpijn en vermoeidheid. Omdat het virus erg besmettelijk is kan iedereen griep krijgen en zal je meestal ruim een week het bed moeten houden. Gezonde mensen knappen daarna weer op door rust en veel drinken, maar bij mensen met een chronische aandoening, patiënten met een verminderde weerstand, bewoners van verpleeg-verzorgingshuizen en ouderen boven de 65 jaar kan de ziekte ernstig verlopen. Usefulness Irrelevance Coherence 7. Hoeveel procent van de Nederlandse bevolking heeft psoriasis? De kans op psoriasis is gerelateerd aan het aantal familieleden dat deze aandoening heeft. Usefulness Irrelevance Coherence 122 Question answering for general practitioners Very High Neut Low Very high ral low 8. Hoe is opkomende RSI te herkennen? Verklaringen RSI uit zich in spier-, pees-, en zenuwklachten. Een combinatie van onderstaande mechanismen veroorzaakt de problemen. Usefulness Irrelevance Coherence 9. Waardoor kan RSI ontstaan? RSI wordt veroorzaakt door een combinatie van risicofactoren. Usefulness Irrelevance Coherence 10. Wat is het verschil tussen een vlokkentest en een vruchtwaterpunctie? Een vlokkentest kan in plaats van een vruchtwaterpunctie worden gedaan, tenzij voor een onderzoek juist vruchtwater nodig is, bijvoorbeeld voor het bepalen van de concentratie alfafoetoproteïne in het vruchtwater. Usefulness Irrelevance Coherence 11. Wat is slapeloosheid? Artsen classificeren slapeloosheid als primair of secundair. Primaire slapeloosheid is een lang bestaande aandoening die weinig of geen verband lijkt te hebben met enige spanning of bijzondere gebeurtenissen in het leven. De secundaire vorm wordt veroorzaakt door pijn, angst, geneesmiddelen, depressie of extreme spanningen. Usefulness Irrelevance Coherence 12. Welke beroepen worden getroffen door RSI? Beroepsgroepen waarin men een meer dan gemiddeld risico loopt op (kort) verzuim door RSI zijn beroepen in de transportsector en dienstverlenende beroepen. Beroepsgroepen die als risicogroep voor langdurig (meer dan 13 weken durend) verzuim door RSI gevonden werden zijn ambachtelijke en industriële beroepen en dienstverlenende beroepen. Specifieke risicoberoepen zijn horecapersoneel, conciërges en schoonmakers, gezinshulpen en bejaardenverzorgers en 'overige ambachtelijke beroepen'. Usefulness Irrelevance Coherence 123 Mieke van Langen Second questionnaire Very High Neut Low Very high ral low 1. Welke spieren zijn betrokken bij RSI? De één heeft vooral last in de schouders, bij de ander ontstaan de klachten in de pols of in de arm. Gezien de neiging van RSI zich uit te breiden, blijkt dat na verloop van tijd vaak veel spieren betrokken zijn bij het proces. Het aantal spieren dat bij verschillende RSI patiënten mee kan doen, is dan ook groot. Een beschrijving daarvan zou haast neerkomen op het dupliceren van een anatomische atlas. Toch zijn er wel een aantal spieren die opvallend vaak zijn aangedaan bij RSI. De meest beruchte spier is de monnikskapspier (trapezius). Uit onderzoek is bekend dat deze het vaakst betrokken is bij RSI. Dat is ook niet zo verwonderlijk, want deze spier zorgt voor het optillen en stabiliseren van de schouders. Usefulness Irrelevance Coherence 2. Hoe voorkom je likdoorns? Eeltplekken kunnen worden voorkomen door de irritatiebron weg te nemen of, als dit niet mogelijk is, handschoenen, beschermende materialen, bijvoorbeeld ringen te dragen. Doordat likdoorns meestal ontstaan door slecht passende schoenen, kunnen deze verdwijnen wanneer beter passend schoeisel wordt gedragen. Een middel dat de hoornlaag losweekt, bijvoorbeeld salicylzuur, kan likdoorns sneller doen verdwijnen. Usefulness Irrelevance Coherence 3. Komt RSI in Nederland vaker voor dan in de rest van Europa? De geschiedenis leert dat RSI geen modeverschijnsel is. Usefulness Irrelevance Coherence 4. Wat is RSI? Verwacht bij het eerste onderzoek geen definitieve diagnose. RSI is een verzamelnaam voor zeer uiteenlopende vormen van overbelasting in het gebied van nek, schouders, armen en ellebogen. Het kan zijn dat de arts aanvullend onderzoek laat verrichten naar andere ziektebeelden met soortgelijke symptomen. 124 Usefulness Irrelevance Coherence Question answering for general practitioners Very High Neut Low Very high ral low 5. Kan je door RSI in de WAO komen? Bedrijfssectoren waarin men een hoog risico loopt door RSI in de WAO te komen zijn de reinigingsindustrie, de textielindustrie en de steen-, cement-, glas-, en keramische industrie. Usefulness Irrelevance Coherence 6. Wat zijn de verschijnselen van griep? Griep wordt veroorzaakt door het zogenaamde influenzavirus, met verschijnselen van koorts, neusverkoudheid, hoesten, hoofdpijn, spierpijn en vermoeidheid. Omdat het virus erg besmettelijk is kan iedereen griep krijgen en zal je meestal ruim een week het bed moeten houden. Gezonde mensen knappen daarna weer op door rust en veel drinken, maar bij mensen met een chronische aandoening, patiënten met een verminderde weerstand, bewoners van verpleegverzorgingshuizen en ouderen boven de 65 jaar kan de ziekte ernstig verlopen. Zij worden dan ook jaarlijks door hun huisarts gevaccineerd tegen griep, de zogenaamde influenzavaccinatie, die voor 70-80% bescherming biedt tegen het krijgen van griep (influenza). Usefulness Irrelevance Coherence 7. Hoeveel procent van de Nederlandse bevolking heeft psoriasis? De aandoening heeft een erfelijke component. De kans op psoriasis is gerelateerd aan het aantal familieleden dat deze aandoening heeft. Psoriasis is niet te genezen, maar in de meeste gevallen wel goed te behandelen. Usefulness Irrelevance Coherence 8. Hoe is opkomende RSI te herkennen? RSI uit zich in spier-, pees-, en zenuwklachten. Usefulness Irrelevance Coherence 125 Mieke van Langen Very High Neut Low Very high ral low 9. Waardoor kan RSI ontstaan? Integrale aanpak RSI wordt veroorzaakt door een combinatie van risicofactoren. Gezien de grote verscheidenheid aan factoren ligt het eigenlijk voor de hand dat de aanpak van de klachten zich dient te richten op alle aspecten die een rol hebben bij het ontstaan van klachten. Usefulness Irrelevance Coherence 10. Wat is het verschil tussen een vlokkentest en een vruchtwaterpunctie? Met een vlokkentest worden bepaalde afwijkingen van de foetus opgespoord, meestal tussen de tiende en twaalfde week van de zwangerschap. Een vlokkentest kan in plaats van een vruchtwaterpunctie worden gedaan, tenzij voor een onderzoek juist vruchtwater nodig is, bijvoorbeeld voor het bepalen van de concentratie alfafoetoproteïne in het vruchtwater. Vóór de test wordt met behulp van echografie vastgesteld of de foetus leeft, wat de leeftijd van de foetus is en wat de ligging van de placenta is. Usefulness Irrelevance Coherence 11. Wat is slapeloosheid? Primaire slapeloosheid is een lang bestaande aandoening die weinig of geen verband lijkt te hebben met enige spanning of bijzondere gebeurtenissen in het leven. Usefulness Irrelevance Coherence 12. Welke beroepen worden getroffen door RSI? Beroepsgroepen die als risicogroep voor langdurig (meer dan 13 weken durend) verzuim door RSI gevonden werden zijn ambachtelijke en industriële beroepen en dienstverlenende beroepen. 126 Usefulness Irrelevance Coherence
© Copyright 2025