Appendix 2 United States Agency for International Development Performance Monitoring and Evaluation TIPS NUMBER 1 2011 Printing PERFORMANCE MONITORING & EVALUATION TIPS CONDUCTING A PARTICIPATORY EVALUATION ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive Service (ADS) Chapter 203. WHAT IS DIRECT OBSERVATION ? USAID is promoting participation in all aspects of its development work. Participatory evaluation provides for active involvement in the evaluation process of those with a stake in the program: providers, partners, customers (beneficiaries), and any other interested parties. Participation typically takes place throughout all phases of the evaluation: planning and design; gathering and analyzing the data; identifying the evaluation findings, conclusions, and recommendations; disseminating results; and preparing an action plan to improve program performance. This TIPS outlines how to conduct a participatory evaluation. CHARACTERISTICS OF PARTICIPATORY EVALUATION 1 Participatory evaluations typically share several cally, rapid appraisal techniques are used to decharacteristics that set them apart from trad- termine what happened and why. tional evaluation approaches. These include: Use of facilitators. Participants actually conParticipant focus and ownership. Partici- duct the evaluation, not outside evaluators as is patory evaluations are primarily oriented to traditional. However, one or more outside exthe information needs of program stakehold- perts usually serve as facilitator—that is, proers rather than of the donor agency. The donor vide supporting roles as mentor, trainer, group agency simply helps the participants conduct processor, negotiator, and/or methodologist. their own evaluations, thus building their ownership and commitment to the results and facilitating their follow-up action. WHY CONDUCT A PARTICIPATORY Scope of participation. The range of particiEVALUATION? pants included and the roles they play may vary. For example, some evaluations may target only program providers or beneficiaries, while othExperience has shown that participatory evaluers may include the full array of stakeholders. ations improve program performance. Listening Participant negotiations. Participating to and learning from program beneficiaries, field groups meet to communicate and negotiate to staff, and other stakeholders who know why a reach a consensus on evaluation findings, solve program is or is not working is critical to makproblems, and make plans to improve perfor- ing improvements. Also, the more these insiders are involved in identifying evaluation quesmance. tions and in gathering and analyzing data, the Diversity of views.Views of all participants are more likely they are to use the information to sought and recognized. More powerful stake- improve performance. Participatory evaluation holders allow participation of the less powerful. empowers program providers and beneficiaries to act on the knowledge gained. Learning process. The process is a learning experience for participants. Emphasis is on Advantages to participatory evaluations are identifying lessons learned that will help partici- that they: pants improve program implementation, as well as on assessing whether targets were achieved. • Examine relevant issues by involving key players in evaluation design Flexible design. While some preliminary planning for the evaluation may be necessary, design issues are decided (as much as possible) in the participatory process. Generally, evaluation questions and data collection and analysis methods are determined by the participants, not by outside evaluators. • Promote participants’ learning about the program and its performance and enhance their understanding of other stakeholders’ points of view • Improve participants’ evaluation skills • Mobilize stakeholders, enhance teamwork, and build shared commitment to act on evalua- Empirical orientation. Good participatory evaluations are based on empirical data. Typi2 ment exists among stakeholders that a collaborative approach is likely to fail. tion recommendations • Increase likelihood that evaluation informaStep 2: Decide on the degree of particition will be used to improve performance pation. What groups will participate and what roles will they play? Participation may be broad, But there may be disadvantages. For example, with a wide array of program staff, beneficiaries, participatory evaluations may partners, and others. It may, alternatively, tar• Be viewed as less objective because program get one or two of these groups. For example, if the aim is to uncover what hinders program staff, customers, and other stakeholders implementation, field staff may need to be inwith possible vested interests participate volved. If the issue is a program’s effect on local communities, beneficiaries may be the most • Be less useful in addressing highly technical appropriate participants. If the aim is to know aspects if all stakeholders understand a program’s goals and view progress similarly, broad participation • Require considerable time and resources to identify and involve a wide array of stakehold- may be best. Roles may range from serving as a resource or informant to participating fully in ers some or all phases of the evaluation. • Take participating staff away from ongoing Step 3: Prepare the evaluation scope of activities work. Consider the evaluation approach—the • Be dominated and misused by some stake- basic methods, schedule, logistics, and funding. Special attention should go to defining roles of holders to further their own interests the outside facilitator and participating stakeholders. As much as possible, decisions such as STEPS IN CONDUCTING A the evaluation questions to be addressed and the development of data collection instruments PARTICIPATORY and analysis plans should be left to the particiEVALUATION patory process rather than be predetermined in the scope of work. Step 1: Decide if a participatory evaluation approach is appropriate. Participatory evaluations are especially useful when there are questions about implementation difficulties or program effects on beneficiaries, or when information is wanted on stakeholders’ knowledge of program goals or their views of progress. Traditional evaluation approaches may be more suitable when there is a need for independent outside judgment, when specialized information is needed that only technical experts can provide, when key stakeholders don’t have time to participate, or when such serious lack of agree- Step 4: Conduct the team planning meeting. Typically, the participatory evaluation process begins with a workshop of the facilitator and participants. The purpose is to build consensus on the aim of the evaluation; refine the scope of work and clarify roles and responsibilities of the participants and facilitator; review the schedule, logistical arrangements, and agenda; and train participants in basic data collection and analysis. Assisted by the facilitator, participants identify the evaluation questions they want answered. The approach taken to identify questions may be open ended or may stipulate 3 broad areas of inquiry. Participants then select appropriate methods and develop data-gathering instruments and analysis plans needed to answer the questions. and interpreting them help participants build a common body of knowledge. Once the analysis is complete, facilitators work with participants to reach consensus on findings, conclusions, and recommendations. Facilitators may need to negotiate among stakeholder groups if disagreements emerge. Developing a common understanding of the results, on the basis of empirical evidence, becomes the cornerstone for group commitment to a plan of action. Step 5: Conduct the evaluation. Participatory evaluations seek to maximize stakeholders’ involvement in conducting the evaluation in order to promote learning. Participants define the questions, consider the data collection skills, methods, and commitment of time and labor required. Participatory evaluations usually use rapid appraisal techniques, which are simpler, quicker, and less costly than conventional sample surveys. They include methods such as those in the box below. Typically, facilitators are skilled in these methods, and they help train and guide other participants in their use. Step 7: Prepare an action plan. Facilitators work with participants to prepare an action plan to improve program performance. The knowledge shared by participants about a program’s strengths and weaknesses is turned into action. Empowered by knowledge, participants become agents of change and apply the lessons they have learned to improve performance. Step 6: Analyze the data and build consensus on results. Once the data are gathered, participatory approaches to analyzing WHAT’S DIFFERENT ABOUT PARTICIPATORY EVALUATIONS? Traditional Evaluation Participatory Evaluation • participant focus and ownership of evaluation • donor focus and ownership of evaluation • broad range of stakeholders participate • stakeholders often don’t participate • focus is on accountability • focus is on learning • predetermined design • flexible design • formal methods • rapid appraisal methods • outsiders are evaluators • outsiders are facilitators 4 may be selected through probability or nonprobability sampling techniques, or through “convenience” sampling (interviewing stakeholders at locations where they’re likely to be, such as a clinic for a survey on health care programs). The major advantage of minisurveys is that the datacan be collected and analyzed within a few days. It is the only rapid appraisal method that generates quantitative data. Rapid Appraisal Methods Key informant interviews. This involves interviewing 15 to 35 individuals selected for their knowledge and experience in a topic of interest. Interviews are qualitative, in-depth, and semistructured. They rely on interview guides that list topics or open-ended questions. The interviewer subtly probes the informant to elicit information, opinions, and experiences. Case studies. Case studies record anedotes that illustrate a program’s shortcomings or accomplishments. They tell about incidents or concrete events, often from one person’s experience. Focus group interviews. In these, 8 to 12 carefully selected participants freely discuss issues, ideas, and experiences among themselves. A moderator introduces the subject, keeps the discussion going, and tries to prevent domination of the discussion by a few participants. Focus groups should be homogeneous, with participants of similar backgrounds as much as possible. Village imaging. This involves groups of villagers drawing maps or diagrams to identify and visualize problems and solutions. Selected Further Reading Community group interviews. Aaker, Jerry and Jennifer Shumaker. 1994. Looking Back and Looking Forward: A Participatory Approach to Evaluation. Heifer Project International. P.O. Box 808, Little Rock, AK 72203. These take place at public meetings open to all community members. The primary interaction is between the participants and the interviewer, who presides over the meeting and asks questions, following a carefully prepared questionnaire. Aubel, Judi. 1994. Participatory Program Evaluation: A Manual for Involving Program Stakeholders in the Evaluation Process. Catholic Relief Services. USCC, 1011 First Avenue, New York, NY 10022. Direct observation. Using a detailed observation form, observers record what they see and hear at a program site. The information may be about physical surroundings or about ongoing activities, processes, or discussions. Minisurveys. These are usually Freeman, Jim. Participatory Evaluations: Making Projects Work, 1994. Dialogue on Development Technical Paper No. TP94/2. International Centre, The University of Calgary. based on a structured questionnaire with a limited number of mostly closeended questions. They are usually administered to 25 to 50 people. Respondents Feurstein, Marie-Therese. 1991. Partners inEvaluation: Evaluating Development and Community Programmes with Participants. TALC, 5 Box 49, St. Albans, Herts AL1 4AX, United Kingdom. Guba, Egon and Yvonna Lincoln. 1989. Fourth Generation Evaluation. Sage Publications. Pfohl, Jake. 1986. Participatory Evaluation: A User’s Guide. PACT Publications. 777 United Nations Plaza, New York, NY 10017. Rugh, Jim. 1986. Self-Evaluation: Ideas for Participatory Evaluation of Rural Community Development Projects. World Neighbors Publication. 6 1996, Number 2 Performance Monitoring and Evaluation TIPS USAID C enter for D evelopment I nformation and E valuation CONDUCTING KEY INFORMANT INTERVIEWS What Are Key Informant Interviews? USAID reengineering emphasizes listening to and consulting with customers, partners and other stakeholders as we undertake development activities. Rapid appraisal techniques offer systematic ways of getting such information quickly and at low cost. This Tips advises how to conduct one such method— key informant interviews. They are qualitative, in-depth interviews of 15 to 35 people selected for their first-hand knowledge about a topic of interst. The interviews are loosely structured, relying on a list of issues to be discussed. Key informant interviews resemble a conversation among acquaintances, allowing a free flow of ideas and information. Interviewers frame questions spontaneously, probe for information and takes notes, which are elaborated on later. When Are Key Informant Interviews Appropriate? This method is useful in all phases of development activities— identification, planning, implementation, and evaluation. For example, it can provide information on the setting for a planned activity that might influence project design. Or, it could reveal why intended beneficiaries aren’t using services offered by a project. Specifically, it is useful in the following situations: 1. When qualitative, descriptive information is sufficient for decision-making. 2. When there is a need to understand motivation, behavior, and perspectives of our customers and partners. In-depth interviews of program planners and managers, service providers, host government officials, and beneficiaries concerning their attitudes and behaviors about a USAID activity can help explain its successes and shortcomings. 3. When a main purpose is to generate recommendations. Key informants can help formulate recommendations that can improve a program’s performance. 4. When quantitative data collected through other methods need to be interpreted. Key informant interviews can provide the how and why of what happened. If, for example, a sample survey showed farmers were failing to make loan repayments, key informant interviews could uncover the reasons. PN-ABS-541 2 5. When preliminary information is needed to design a comprehensive quantitative study. Key informant interviews can help frame the issues before the survey is undertaken. Advantages and Limitations Advantages of key informant interviews include: • they provide information directly from knowledgeable people • they provide flexibility to explore new ideas and issues not anticipated during planning • they are inexpensive and simple to conduct Some disadvantages: • they are not appropriate if quantitative data are needed • they may be biased if informants are not carefully selected • • they are susceptible to interviewer biases Step 3. Select key informants. The number should not normally exceed 35. It is preferable to start with fewer (say, 25), since often more people end up being interviewed than is initially planned. Key informants should be selected for their specialized knowledge and unique perspectives on a topic. Planners should take care to select informants with various points of view. Selection consists of two tasks: First, identify the groups and organizations from which key informants should be drawn—for example, host government agencies, project implementing agencies, contractors, beneficiaries. It is best to include all major stakeholders so that divergent interests and perceptions can be captured. Second, select a few people from each category after consulting with people familiar with the groups under consideration. In addition, each informant may be asked to suggest other people who may be interviewed. Step 4. Conduct interviews. it may be difficult to prove validity of findings Once the decision has been made to conduct key informant interviews, following the step-by-step advice outlined below will help ensure highquality information. Establish rapport. Begin with an explanation of the purpose of the interview, the intended uses of the information and assurances of confidentiality. Often informants will want assurances that the interview has been approved by relevant officials. Except when interviewing technical experts, questioners should avoid jargon. Steps in Conducting the Interviews Step 1. Formulate study questions. These relate to specific concerns of the study. Study questions generally should be limited to five or fewer. Step 2. Prepare a short interview guide. Key informant interviews do not use rigid questionnaires, which inhibit free discussion. However, interviewers must have an idea of what questions to ask. The guide should list major topics and issues to be covered under each study question. Because the purpose is to explore a few issues in depth, guides are usually limited to 12 items. Different guides may be necessary for interviewing different groups of informants. Sequence questions. Start with factual questions. Questions requiring opinions and judgments should follow. In general, begin with the present and move to questions about the past or future. Phrase questions carefully to elicit detailed information. Avoid questions that can be answered by a simple yes or no. For example, questions such as “Please tell me about the vaccination campaign?” are better than “Do you know about the vaccination campaign?” Use probing techniques. Encourage informants to detail the basis for their conclusions and recommendations. For example, an informant’s comment, such as “The water program has really changed things around here,” can be probed for more details, such as “What changes have you noticed?” “Who seems to have benefitted most?” “Can you give me some specific examples?” 3 Maintain a neutral attitude. Interviewers should be sympathetic listeners and avoid giving the impression of having strong views on the subject under discussion. Neutrality is essential because some informants, trying to be polite, will say what they think the interviewer wants to hear. Minimize translation difficulties. Sometimes it is necessary to use a translator, which can change the dynamics and add difficulties. For example, differences in status between the translator and informant may inhibit the conversation. Often information is lost during translation. Difficulties can be minimized by using translators who are not known to the informants, briefing translators on the purposes of the study to reduce misunderstandings, and having translators repeat the informant’s comments verbatim. Step 5. Take adequate notes. Interviewers should take notes and develop them in detail immediately after each interview to ensure accuracy. Use a set of common subheadings for interview texts, selected with an eye to the major issues being explored. Common subheadings ease data analysis. Step 6. Analyze interview data. Interview summary sheets. At the end of each interview, prepare a 1-2 page interview summary sheet reducing information into manageable themes, issues, and recommendations. Each summary should provide information about the key informant’s position, reason for inclusion in the list of informants, main points made, implications of these observations, and any insights or ideas the interviewer had during the interview. Descriptive codes. Coding involves a systematic recording of data. While numeric codes are not appropriate, descriptive codes can help organize responses. These codes may cover key themes, concepts, questions, or ideas, such as sustainability, impact on income, and participation of women. A usual practice is to note the codes or categories on the left-hand margins of the interview text. Then a summary lists the page numbers where each item (code) appears. For example, women’s participation might be given the code “wom–par,” and the summary sheet might indicate it is discussed on pages 7, 13, 21, 46, and 67 of the interview text. Categories and subcategories for coding (based on key study questions, hypotheses, or conceptual frameworks) can be developed before interviews begin, or after the interviews are completed. Precoding saves time, but the categories may not be appropriate. Postcoding helps ensure empirically relevant categories, but is time consuming. A compromise is to begin developing coding categories after 8 to 10 interviews, as it becomes apparent which categories are relevant. Storage and retrieval. The next step is to develop a simple storage and retrieval system. Access to a computer program that sorts text is very helpful. Relevant parts of interview text can then be organized according to the codes. The same effect can be accomplished without computers by preparing folders for each category, cutting relevant comments from the interview and pasting them onto index cards according to the coding scheme, then filing them in the appropriate folder. Each index card should have an identification mark so the comment can be attributed to its source. Presentation of data. Visual displays such as tables, boxes, and figures can condense information, present it in a clear format, and highlight underlying relationships and trends. This helps communicate findings to decision-makers more clearly, quickly, and easily. Three examples below and on page 4 illustrate how data from key informant interviews might be displayed. Table 1. Problems Encountered in Obtaining Credit Male Farmers 1. Collateral requirements 2. Burdensome paperwork Female Farmers 1. Collateral requirements 2. Burdensome paperwork 3. Long delays in 3. Long delays in getting loans getting loans 4. Land registered under male's name 5. Difficulty getting to bank location 4 Table 2. Impacts on Income of a Microenterprise Activity “In a survey I did of the participants last year, I found that a majority felt their living conditions have improved.” —university professor Assess reliability of key informants. Assess informants’ knowledgeability, credibility, impartiality, willingness to respond, and presence of outsiders who may have inhibited their responses. Greater weight can be given to information provided by more reliable informants. “I have doubled my crop and profits this year as a result of the loan I got.” —participant Check interviewer or investigator bias. One’s own biases as an investigator should be examined, including tendencies to concentrate on information that confirms preconceived notions and hypotheses, seek consistency too early and overlook evidence inconsistent with earlier findings, and be partial to the opinions of elite key informants. “I believe that women have not benefitted as much as men because it is more difficult for us to get loans.” —female participant Check for negative evidence. Make a conscious effort to look for evidence that questions preliminary findings. This brings out issues that may have been overlooked. Table 3. Recommendations for Improving Training Recommendation Number of Informants Develop need-based training courses 39 Develop more objective selection procedures 20 Plan job placement after training 11 Get feedback from informants. Ask the key informants for feedback on major findings. A summary report of the findings might be shared with them, along with a request for written comments. Often a more practical approach is to invite them to a meeting where key findings are presented and ask for their feedback. Selected Further Reading These tips are drawn from Conducting Key Informant Interviews in Developing Countries, by Krishna Kumar (AID Program Design and Evaluation Methodology Report No. 13. December 1986. PN-AAX-226). Step 7. Check for reliability and validity. Key informant interviews are susceptible to error, bias, and misinterpretation, which can lead to flawed findings and recommendations. Check representativeness of key informants. Take a second look at the key informant list to ensure no significant groups were overlooked. U.S. Agency for International Development For further information on this topic, contact Annette Binnendijk, CDIE Senior Evaluation Advisor, via phone (703) 875-4235), fax (703) 875-4866), or e-mail. Copies of TIPS can be ordered from the Development Information Services Clearinghouse by calling (703) 351-4006 or by faxing (703) 351-4039. Please refer to the PN number. To order via the Internet, address a request to [email protected] Washington, D.C. 20523 2 ND NUMBER 3 EDITION, 2010 PERFORMANCE MONITORING & EVALUATION TIPS PREPARING AN EVALUATION STATEMENT OF WORK ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance management and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. PARTICIPATION IS KEY Use a participatory process to ensure resulting information will be relevant and useful. Include a range of staff and partners that have an interest in the evaluation to: Participate in planning meetings and review the SOW; Elicit input on potential evaluation questions; and Prioritize and narrow the list of questions as a group. WHAT IS AN EVALUATION STATEMENT OF WORK (SOW)? The statement of work (SOW) is viewed as the single most critical document in the development of a good evaluation. The SOW states (1) the purpose of an evaluation, (2) the questions that must be answered, (3) the expected quality of the evaluation results, (4) the expertise needed to do the job and (5) the time frame and budget available to support the task. WHY IS THE SOW IMPORTANT? The SOW is important because it is a basic road map of all the elements of a well-crafted evaluation. It is the substance of a contract with external evaluators, as well as the framework for guiding an internal evaluation team. It contains the information that anyone who implements the evaluation needs to know about the purpose of the 1 evaluation, the background and history of the program being evaluated, and the issues/questions that must be addressed. Writing a SOW is about managing the first phase of the evaluation process. Ideally, the writer of the SOW will also exercise management oversight of the evaluation process. PREPARATION – KEY ISSUES BALANCING FOUR DIMENSIONS A well drafted SOW is a critical first step in ensuring the credibility and utility of the final evaluation report. Four key dimensions of the SOW are interrelated and should be balanced against one another (see Figure 1): The number and complexity of the evaluation questions that need to be addressed; Adequacy of the time allotted to obtain the answers; Availability of funding (budget) to support the level of evaluation design and rigor required; and Availability of the expertise needed to complete the job. The development of the SOW is an iterative process in which the writer has to revisit, and sometimes adjust, each of these dimensions. Finding the appropriate balance is the main challenge faced in developing any SOW. essential that evaluation planning form an integral part of the initial program or project design. This includes factoring in baseline data collection, possible comparison or „control‟ site selection, and the preliminary design of data collection protocols and instruments. Decisions about evaluation design must be reflected in implementation planning and in the budget. There will always be unanticipated problems and opportunities that emerge during an evaluation. It is helpful to build-in ways to accommodate necessary changes. The writer of the SOW is, in essence, the architect of the evaluation. It is important to commit adequate time and energy to the task. Adequate time is required to ADVANCE PLANNING It is a truism that good planning is a necessary – but not the only – condition for success in any enterprise. The SOW preparation process is itself an exercise in careful and thorough planning. The writer must consider several principles when beginning the process. gather information and to build productive relationships with stakeholders (such as program sponsors, participants, or partners) as well as the evaluation team, once selected. The sooner that information can be made available to the evaluation team, the more efficient they can be in providing credible answers to the important questions outlined in the SOW. The quality of the evaluation is dependent on providing quality guidance in the SOW. As USAID and other donors place more emphasis on rigorous impact evaluation, it is WHO SHOULD BE INVOLVED? Participation in all or some part of the evaluation is an important decision for the development of the SOW. USAID and evaluation experts strongly recommend that evaluations maximize stakeholder participation, especially in the initial planning process. Stakeholders may encompass a wide array of persons and institutions, including policy makers, program managers, implementing partners, host country organizations, and beneficiaries. In some cases, stakeholders may also be involved throughout the evaluation and with the dissemination of results. The benefits of stakeholder participation include the following: Learning across a broader group of decision-makers, thus increasing the likelihood that the evaluation findings will be used to improve development effectiveness; Acceptance of the purpose and process of evaluation by those concerned; A more inclusive and better focused list of questions to be answered; Increased acceptance and ownership of the process, findings and conclusions; and Increased possibility that the evaluation will be used by decision makers and other stakeholders. USAID operates in an increasingly complex implementation world 2 FIGURE 2. ELEMENTS OF A GOOD EVALUATION SOW 1. Describe the activity, program, or process to be evaluated 2. Provide a brief background on the development hypothesis and its implementation 3. State the purpose and use of the evaluation 4. Clarify the evaluation questions 5. Identify the evaluation method(s) 6. Identify existing performance information sources, with special attention to monitoring data 7. Specify the deliverables(s) and the timeline 8. Identify the composition of the evaluation team (one team member should be an evaluation specialist) and participation of customers and partners 9. Address schedule and logistics 10. Clarify requirements for reporting and dissemination 11. Include a budget with many players, including other USG agencies such as the Departments of State, Defense, Justice and others. If the activity engages other players, it is important to include them in the process. Within USAID, there are useful synergies that can emerge when the SOW development process is inclusive. For example, a SOW that focuses on civil society advocacy might benefit from input by those who are experts in rule of law. Participation by host government and local organizational leaders and beneficiaries is less common among USAID supported evaluations. It requires sensitivity and careful management; however, the benefits to development practitioners can be substantial. Participation of USAID managers in evaluations is an increasingly common practice and produces many benefits. To ensure against bias or conflict of interest, the USAID manager‟s role can be limited to participating in the fact finding phase and contributing to the analysis. However, the final responsibility for analysis, conclusions and recommendations will rest with the independent members and team leader. THE ELEMENTS OF A GOOD EVALUATION SOW 1. DESCRIBE THE ACTIVITY, PROGRAM, OR PROCESS TO BE EVALUATED Be as specific and complete as possible in describing what is to be evaluated. The more information provided at the outset, the more time the evaluation team will have to develop the data needed to answer the SOW questions. If the USAID manager does not have the time and resources to bring together all the relevant information needed to inform the evaluation in advance, the SOW might require the evaluation team to submit a document review as a first deliverable. This will, of course, add to the amount of time and budget needed in the evaluation contract. 3 2. PROVIDE A BRIEF BACKGROUND Give a brief description of the context, history and current status of the activities or programs, names of implementing agencies and organizations involved, and other information to help the evaluation team understand background and context. In addition, this section should state the development hypothesis(es) and clearly describe the program (or project) theory that underlies the program‟s design. USAID activities, programs and strategies, as well as most policies, are based on a set of “ifthen” propositions that predict how a set of interventions will produce intended results. A development hypothesis is generally represented in a results framework (or sometimes a logical framework at the project level) and identifies the causal relationships among various objectives sought by the program (see TIPS 13: Building a Results Framework). That is, if one or more objectives are achieved, then the next higher order objective will be achieved. Whether the development hypothesis is the correct one, or whether it remains valid at the time of the evaluation, is an important question for most evaluation SOWs to consider. 3. STATE THE PURPOSE AND USE OF THE EVALUATION Why is an evaluation needed? The clearer the purpose, the more likely it is that the evaluation will produce credible and useful findings, conclusions and recommendations. In defining the purpose, several questions should be considered. Who wants the information? Will higher level decision makers be part of the intended audience? What do they want to know? For what purpose will information be used? the When will it be needed? How accurate must it be? ADS 203.3.6.1 identifies a number of triggers that may inform the purpose and use of an evaluation, as follows: A key management decision is required for which there is inadequate information; Performance information indicates an unexpected result (positive or negative) that should be explained (such as gender differential results); Customer, partner, or other informed feedback suggests that there are implementation problems, unmet needs, or unintended consequences or impacts; Issues of impact, sustainability, cost-effectiveness, or relevance arise; The validity of the development hypotheses or critical assumptions is questioned, for example, due to unanticipated changes in the host country environment; and Periodic portfolio reviews have identified key questions that need to be answered or require consensus. 4. CLARIFY THE EVALUATION QUESTIONS The core element of an evaluation SOW is the list of questions posed for the evaluation. One of the most common problems with evaluation SOWs is that they contain a long list of poorly defined or “difficult to answer” questions given the time, budget and resources provided. While a participatory process ensures wide ranging input into the initial list of questions, it is equally important to reduce this list to a manageable number of key questions. Keeping in mind the relationship between budget, time, and expertise needed, every potential question should be thoughtfully examined by asking a number of questions. Is this question of essential importance to the purpose and the users of the evaluation? Is this question clear, precise and „researchable‟? What level of reliability and validity is expected in answering the question? Does determining an answer to the question require a certain kind of experience and expertise? Are we prepared to provide the management commitment, time and budget to secure a credible answer to this question? 4 If these questions can be answered yes, then the team probably has a good list of questions that will inform the evaluation team and drive the evaluation process to a successful result. 5. IDENTIFY EVALUATION METHODS The SOW manager has to decide whether the evaluation design and methodology should be specified in the SOW. 1 This depends on whether the writer has expertise, or has internal access to evaluation research knowledge and experience. If so, and the writer is confident of the „on the ground‟ conditions that will allow for different evaluation designs, then it is appropriate to include specific requirements in the SOW. If the USAID SOW manager does not have the kind of evaluation experience needed, especially for more formal and rigorous evaluations, it is good practice to: 1) require that the team (or bidders, if it is contracted out) include a description of (or approach for developing) the proposed research design and methodology, or 2) require a detailed design and evaluation plan to be submitted as a first deliverable. In this way, the SOW manager benefits from external evaluation expertise. In either case, the design and methodology should not be finalized until the team has an opportunity to gather detailed 1 See USAID ADS 203.3.6.4 on Evaluation Methodologies; information and discuss issues with USAID. final The selection of the design and data collection methods must be a function of the type of evaluation and the level of statistical and quantitative data confidence needed. If the project is selected for a rigorous impact evaluation, then the design and methods used will be more sophisticated and technically complex. If external assistance is necessary, the evaluation SOW will be issued as part of the initial RFP/RFA (Request for Proposal or Request for Application) solicitation process. All methods and evaluation designs should be as rigorous as reasonably possible. In some cases, a rapid appraisal is sufficient and appropriate (see TIPS 5: Using Rapid Appraisal Methods). At the other extreme, planning for a sophisticated and complex evaluation process requires greater up-front investment in baselines, outcome monitoring processes, and carefully constructed experimental or quasi-experimental designs. 6. IDENTIFY EXISTING PERFORMANCE INFORMATION Identify the existence and availability of relevant performance information sources, such as performance monitoring systems and/or previous evaluation reports. Including a summary of the types of data available, the timeframe, and an indication of their quality and reliability will help the evaluation team to build on what is already available. 7. SPECIFY DELIVERABLES AND TIMELINE The SOW must specify the products, the time frame, and the content of each deliverable that is required to complete the evaluation contract. Some SOWs simply require delivery of a draft evaluation report by a certain date. In other cases, a contract may require several deliverables, such as a detailed evaluation design, a work plan, a document review, and the evaluation report. The most important deliverable is the final evaluation report. TIPS 17: Constructing an Evaluation Report provides a suggested outline of an evaluation report that may be adapted and incorporated directly into this section. The evaluation report should differentiate between findings, conclusions, and recommendations, as outlined in Figure 3. As evaluators move beyond the facts, greater interpretation is required. By ensuring that the final report is organized in this manner, decision makers can clearly understand the facts on which the evaluation is based. In addition, it facilitates greater understanding of where there might be disagreements concerning the interpretation of those facts. While individuals may disagree on recommendations, they should not disagree on the basic facts. 5 Another consideration is whether a section on “lessons learned” should be included in the final report. A good evaluation will produce knowledge about best practices, point out what works, what does not, and contribute to the more general fund of tested experience on which other program designers and implementers can draw. Because unforeseen obstacles may emerge, it is helpful to be as realistic as possible about what can be accomplished within a given time frame. Also, include some wording that allows USAID and the evaluation team to adjust schedules in consultation with the USAID manager should this be necessary. 8. DISCUSS THE COMPOSITION OF THE EVALUATION TEAM USAID evaluation guidance for team selection strongly recommends that at least one team member have credentials and experience in evaluation design and methods. The team leader must have strong team management skills, and sufficient experience with evaluation standards and practices to ensure a credible product. The appropriate team leader is a person with whom the SOW manager can develop a working partnership as the team moves through the evaluation research design and planning process. He/she must also be a person who can deal effectively with senior U.S. and host country officials and other leaders. Experience with USAID is often an important factor, particularly for management focused evaluations, and in formative evaluations designed to establish the basis for a future USAID program or the redesign of an existing program. If the evaluation entails a high level of complexity, survey research and other sophisticated methods, it may be useful to add a data collection and analysis expert to the team. Generally, evaluation skills will be supplemented with additional subject matter experts. As the level of research competence increases in many countries where USAID has programs, it makes good sense to include local collaborators, whether survey research firms or independents, to be full members of the evaluation team. 9. ADDRESS SCHEDULING, LOGISTICS AND OTHER SUPPORT Good scheduling and effective local support contributes greatly to the efficiency of the evaluation team. This section defines the time frame and the support structure needed to answer the evaluation questions at the required level of validity. For evaluations involving complex designs and sophisticated survey research data collection methods, the schedule must allow enough time, for example, to develop sample frames, prepare and pretest survey instruments, training interviewers, and analyze data. New data collection and analysis technologies can accelerate this process, but need to be provided for in the budget. In some cases, an advance trip to the field by the team leader and/or methodology expert may be justified where extensive pretesting and revision of instruments is required or when preparing for an evaluation in difficult or complex operational environments. Adequate logistical and administrative support is also essential. USAID often works in countries with poor infrastructure, frequently in conflict/post-conflict environments where security is an issue. If the SOW requires the team to make site visits to distant or difficult locations, such planning must be incorporated into the SOW. 6 Particularly overseas, teams often rely on local sources for administrative support, including scheduling of appointments, finding translators and interpreters, and arranging transportation. In many countries where foreign assistance experts have been active, local consulting firms have developed this kind of expertise. Good interpreters are in high demand, and are essential to any evaluation team‟s success, especially when using qualitative data collection methods. 10. CLARIFY REQUIREMENTS FOR REPORTING AND DISSEMINATION Most evaluations involve several phases of work, especially for more complex designs. The SOW can set up the relationship between the evaluation team, the USAID manager and other stakeholders. If a working group was established to help define the SOW questions, continue to use the group as a forum for interim reports and briefings provided by the evaluation team. The SOW should specify the timing and details for each briefing session. Examples of what might be specified include: Due dates for draft and final reports; Dates for oral briefings (such as a mid-term and final briefing); Number of copies needed; Language requirements, where applicable; Formats and page limits; Requirements for datasets, if primary data collected; has been A requirement to submit all evaluations to the Development Experience Clearing house for archiving this is the responsibility of the evaluation contractor; and Other needs communicating, marketing disseminating results that the responsibility of evaluation team. for and are the The SOW should specify when working drafts are to be submitted for review, the time frame allowed for USAID review and comment, and the time frame to revise and submit the final report. 11. INCLUDE A BUDGET With the budget section, the SOW comes full circle. As stated, budget considerations have to be part of the decision making process from the beginning. The budget is a product of the questions asked, human resources needed, logistical and administrative support required, and the time needed to produce a high quality, rigorous and useful evaluation report in the most efficient and timely manner. It is essential for contractors to understand the quality, validity and rigor required so they can develop a responsive budget that will meet the standards set forth in the SOW. For more information: TIPS publications are available online at [insert website]. Acknowledgements: Our thanks to those whose experience and insights helped shape this publication including USAID‟s Office of Management Policy, Budget and Performance (MPBP). This publication was written by Richard Blue, Ph.D. of Management Systems International. Comments regarding this publication can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 7 1996, Number 4 Performance Monitoring and Evaluation TIPS USAID Center for Development Information and Evaluation USING DIRECT OBSERVATION TECHNIQUES What is Direct Observation? Most evaluation teams conduct some fieldwork, observing what's actually going on at assistance activity sites. Often, this is done informally, without much thought to the quality of data collection. Direct observation techniques allow for a more systematic, structured process, using well-designed observation record forms. Advantages and Limitations USAID's reengineering guidance encourages the use of rapid, low cost methods for collecting information on the performance of our development activities. The main advantage of direct observation is that an event, institution, facility, or process can be studied in its natural setting, thereby providing a richer understanding of the subject. For example, an evaluation team that visits microenterprises is likely to better understand their nature, problems, and successes after directly observing their products, technologies, employees, and processes, than by relying solely on documents or key informant interviews. Another advantage is that it may reveal conditions, problems, or patterns many informants may be unaware of or unable to describe adequately. On the negative side, direct observation is susceptible to observer bias. The very act of observation also can affect the behavior being studied. When Is Direct Observation Useful? Direct observation, the subject of this Tips, is one such method. Direct observation may be useful: When performance monitoring data indicate results are not being accomplished as planned, and when implementation problems are suspected, but not understood. Direct observation can help identify whether the process is poorly implemented or required inputs are absent. When details of an activity's process need to be assessed, such as whether tasks are being implementing according to standards required for effectiveness. When an inventory of physical facilities and inputs is needed and not available from existing sources. PN-ABY-208 2 When interview methods are unlikely to elicit needed information accurately or reliably, either because the respondents don't know or may be reluctant to say. When preparing direct observation forms, consider the following: 1. Identify in advance the possible response categories for each item, so that the observer can answer with a simple Steps in Using Direct Observation The quality of direct observation can be improved by following these steps. Step 1. Determine the focus Because of typical time and resource constraints, direct observation has to be selective, looking at a few activities, events, or phenomena that are central to the evaluation questions. For example, suppose an evaluation team intends to study a few health clinics providing immunization services for children. Obviously, the team can assess a variety of areas—physical facilities and surroundings, immunization activities of health workers, recordkeeping and managerial services, and community interactions. The team should narrow its focus to one or two areas likely to generate the most useful information and insights. Next, break down each activity, event, or phenomena into subcomponents. For example, if the team decides to look at immunization activities of health workers, prepare a list of the tasks to observe, such as preparation of vaccine, consultation with mothers, and vaccine administration. Each task may be further divided into subtasks; for example, administering vaccine likely includes preparing the recommended doses, using the correct administration technique, using sterile syringes, and protecting vaccine from heat and light during use. If the team also wants to assess physical facilities and surroundings, it will prepare an inventory of items to be observed. OBSERVATION OF GROWTH MONITORING SESSION Name of the Observer Date Time Place Was the scale set to 0 at the beginning of the growth session? Yes______ No ______ How was age determined? By asking______ From growth chart_______ Other_______ When the child was weighed, was it stripped to practical limit? Yes______ No______ Was the weight read correctly? Yes______No______ Process by which weight and age transferred to record Health Worker wrote it_____ Someone else wrote it______ Other______ Did Health Worker interpret results for the mother? Yes_______No_______ Step 2. Develop direct observation forms The observation record form should list the items to be observed and provide spaces to record observations. These forms are similar to survey questionnaires, but investigators record their own observations, not respondents' answers. Observation record forms help standardize the observation process and ensure that all important items are covered. They also facilitate better aggregation of data gathered from various sites or by various investigators. An excerpt from a direct observation form used in a study of primary health care in the Philippines provides an illustration below. yes or no, or by checking the appropriate answer. Closed response categories help minimize observer variation, and therefore improve the quality of data. 2. Limit the number of items in a form. Forms should normally not exceed 40–50 items. If nessary, it is better to use two or more smaller forms than a single large one that runs several pages. 3 3. Provide adequate space to record additional observations for which response categories were not determined. 4. Use of computer software designed to create forms can be very helpful. It facilitates a neat, unconfusing form that can be easily completed. People and organizations follow daily routines associated with set times. For example, credit institutions may accept loan applications in the morning; farmers in tropical climates may go to their fields early in the morning and return home by noon. Observation periods should reflect work rhythms. Step 3. Select the sites Step 5. Conduct the field observation Once the forms are ready, the next step is to decide where the observations will be carried out and whether it will be based on one or more sites. Establish rapport. Before embarking on direct observation, a certain level of rapport should be established with the people, community, or organization to be studied. The presence of outside observers, especially if officials or experts, may generate some anxiety among those being observed. Often informal, friendly conversations can reduce anxiety levels. A single site observation may be justified if a site can be treated as a typical case or if it is unique. Consider a situation in which all five agricultural extension centers established by an assistance activity have not been performing well. Here, observation at a single site may be justified as a typical case. A single site observation may also be justified when the case is unique; for example, if only one of five centers had been having major problems, and the purpose of the evaluation is trying to discover why. However, single site observations should be avoided generally, because cases the team assumes to be typical or unique may not be. As a rule, several sites are necessary to obtain a reasonable understanding of a situation. In most cases, teams select sites based on experts' advice. The investigator develops criteria for selecting sites, then relies on the judgment of knowledgeable people. For example, if a team evaluating a family planning project decides to observe three clinics—one highly successful, one moderately successful, and one struggling clinic—it may request USAID staff, local experts, or other informants to suggest a few clinics for each category. The team will then choose three after examining their recommendations. Using more than one expert reduces individual bias in selection. Alternatively, sites can be selected based on data from performance monitoring. For example, activity sites (clinics, schools, credit institutions) can be ranked from best to worst based on performance measures, and then a sample drawn from them. Step 4. Decide on the best timing Timing is critical in direct observation, especially when events are to be observed as they occur. Wrong timing can distort findings. For example, rural credit Also, let them know the purpose of the observation is not to report on individuals' performance, but to find out what kind of problems in general are being encountered. Allow sufficient time for direct observation. Brief visits can be deceptive partly because people tend to behave differently in the presence of observers. It is not uncommon, for example, for health workers to become more caring or for extension workers to be more persuasive when being watched. However, if observers stay for relatively longer periods, people become less selfconscious and gradually start behaving naturally. It is essential to stay at least two or three days on a site to gather valid, reliable data. Use a team approach. If possible, two observers should observe together. A team can develop more comprehensive, higher quality data, and avoid individual bias. Train observers. If many sites are to be observed, nonexperts can be trained as observers, especially if observation forms are clear, straightforward, and mostly closed-ended. Step 6. Complete forms Take notes as inconspicuously as possible. The best time for recording is during observation. However, this is not always feasible because it may make some people selfconscious or disturb the situation. In these cases, recording should take place as soon as possible after observation. Step 7. Analyze the data organizations receive most loan applications during the planting season, when farmers wish to purchase agricultural inputs. If credit institutions are observed during the nonplanting season, an inaccurate picture of loan processing may result. Data from close-ended questions from the observation form can be analyzed using basic procedures such as frequency counts and cross-tabulations. Statistical software packages such as SAS or SPSS facilitate such statistical analysis and data display. 4 Analysis of any open-ended interview questions can also provide extra richness of understanding and insights. Here, use of database management software with text storage capabilities, such as dBase, can be useful. Step 8. Check for reliability and validity. Direct observation techniques are susceptible to error and bias that can affect reliability and validity. These can be minimized by following some of the procedures suggested, such as checking the representativeness of the sample of Direct Observation of Primary Health Care Services in the Philippines An example of structured direct observation was an effort to identify deficiencies in the primary health care system in the Philippines. It was part of a larger, multicountry research project, the Primary Health Care Operations Research Project (PRICOR). The evaluators prepared direct observation forms covering the activities, tasks, and subtasks health workers must carry out in health clinics to accomplish clinical objectives. These forms were closed-ended and in most cases observations could simply be checked to save time. The team looked at 18 health units from a "typical" province, including samples of units that were high, medium and low performers in terms of key child survival outcome indicators. The evaluation team identified and quantified many problems that required immediate government attention. For example, in 40 percent of the cases where followup treatment was required at home, health workers failed to tell mothers the timing and amount of medication required. In 90 percent of cases, health workers failed to explain to mothers the results of child weighing and growth plotting, thus missing the opportunity to involve mothers in the nutritional care of their child. Moreover, numerous errors were made in weighing and plotting. This case illustrates that use of closed-ended observation instruments promotes the reliability and consistency of data. The findings are thus more credible and likely to influence program managers to make needed improvements. sites selected; using closed-ended, unambiguous response categories on the observation forms, recording observations promptly, and using teams of observers at each site. Selected Further Reading Information in this Tips is based on "Rapid Data Collection Methods for Field Assessments" by Krishna Kumar, in Team Planning Notebook for Field-Based Program Assessments (USAID PPC/CDIE, 1991). For more on direct observation techniques applied to the Philippines health care system, see Stewart N. Blumenfeld, Manuel Roxas, and Maricor de los Santos, "Systematic Observation in the Analysis of Primary Health Care Services," in Rapid Appraisal Methods, edited by Krishna Kumar (The World Bank:1993) CDIE's Tips series provide advice and suggestions to USAID managers on how to plan and conduct performance monitoring and evaluation activities. They are supplemental references to the reengineering automated directives system (ADS), chapter 203. For further information, contact Annette Binnendijk, CDIE Senior Evaluation Advisor, phone (703) 875–4235, fax (703) 875–4866, or e-mail. Tips can be ordered from the Development Information Services Clearinghouse by calling (703) 351-4006 or by faxing (703) 351–4039. Please refer to the PN number. To order via Internet, address requests to [email protected] NUMBER 5 2ND EDITION, 2010 PERFORMANCE MONITORING & EVALUATION TIPS USING RAPID APPRAISAL METHODS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. WHAT IS RAPID APPRAISAL? Rapid Appraisal (RA) is an approach that draws on multiple evaluation methods and techniques to quickly, yet systematically, collect data when time in the field is limited. RA practices are also useful when there are budget constraints or limited availability of reliable secondary data. For example, time and budget limitations may preclude the option of using representative sample surveys. BENEFITS – WHEN TO USE RAPID APPRAISAL METHODS Rapid appraisals are quick and can be done at relatively low cost. Rapid appraisal methods can help gather, analyze, and report relevant information for decision-makers within days or weeks. This is not possible with sample surveys. RAs can be used in the following cases: • for formative evaluations, to make mid-course corrections in project design or implementation when customer or partner feedback indicates a problem (See ADS 203.3.6.1); • when a key management decision is required and there is inadequate information; • for performance monitoring, when data are collected and the techniques are repeated over time for measurement purposes; • to better understand the issues behind performance monitoring data; and • for project pre-design assessment. LIMITATIONS – WHEN RAPID APPRAISALS ARE NOT APPROPRIATE Findings from rapid appraisals may have limited reliability and validity, and cannot be generalized to the larger population. Accordingly, 1 rapid appraisal should not be the sole basis for summative or impact evaluations. Data can be biased and inaccurate unless multiple methods are used to strengthen the validity of findings and careful preparation is undertaken prior to beginning field work. WHEN ARE RAPID APPRAISAL METHODS APPROPRIATE? Choosing between rapid appraisal methods for an assessment or more time-consuming methods, such as sample surveys, should depend on balancing several factors, listed below. • Purpose of the study. The importance and nature of the decision depending on it. • Confidence in results. The accuracy, reliability, and validity of findings needed for management decisions. • Time frame. When a decision must be made. • Resource constraints (budget). • Evaluation questions to be answered. (see TIPS 3: Preparing an Evaluation Statement of Work) USE IN TYPES OF EVALUATION Rapid appraisal methods are often used in formative evaluations. Findings are strengthened when evaluators use triangulation (employing more than one data collection method) as a check on the validity of findings from any one method. Rapid appraisal methods are also used in the context of summative evaluations. The data from rapid appraisal methods and techniques complement the use of quantitative methods such as surveys based on representative sampling. For example, a randomized survey of small holder farmers may tell you that farmers have a difficult time selling their goods at market, but may not have provide you with the details of why this is occurring. A researcher could then use interviews with farmers to determine the details necessary to construct a more complete theory of why it is difficult for small holder farmers to sell their goods. KEY PRINCIPLES FOR ENSURING USEFUL RAPID APPRAISAL DATA COLLECTION No set of rules dictates which methods and techniques should be used in a given field situation; however, a number of key principles can be followed to ensure the collection of useful data in a rapid appraisal. • Preparation is key. As in any evaluation, the evaluation design and selection of methods must begin with a thorough understanding of the evaluation questions and the client’s needs for evaluative information. The client’s intended uses of data must guide the evaluation design and the types of methods that are used. • Triangulation increases the validity of findings. To lessen bias and strengthen the validity of findings from rapid appraisal methods and techniques, it is imperative to use multiple methods. In this way, data collected using one method can be compared to that collected using other methods, thus giving a researcher the ability to generate valid and reliable findings. If, for example, data collected using Key Informant Interviews reveal the same findings as data collected from Direct Observation and Focus Group Interviews, there is less chance that the findings from the first method were due to researcher bias or due to the findings being outliers. Table 1 summarizes common rapid appraisal methods and suggests how findings from any one method can be strengthened by the use of other methods. COMMON RAPID APPRAISAL METHODS INTERVIEWS This method involves one-on-one interviews with individuals or key informants selected for their knowledge or diverse views. Interviews are qualitative, in-depth and semi-structured. Interview guides are usually used and 2 EVALUATION METHODS COMMONLY USED IN RAPID APPRAISAL • Interviews • Community Discussions • Exit Polling • Transect Walks (see p. 3) • Focus Groups • Minisurveys • Community Mapping • Secondary Data Collection • Group Discussions • Customer Service Surveys • Direct Observation questions may be further framed during the interview, using subtle probing techniques. Individual interviews may be used to gain information on a general topic but cannot provide the in-depth inside knowledge on evaluation topics that key informants may provide. MINISURVEYS A minisurvey consists of interviews with between five to fifty individuals, usually selected using nonprobability sampling (sampling in which respondents are chosen based on their understanding of issues related to a purpose or specific questions, usually used when sample sizes are small and time or access to areas is limited). Structured questionnaires are used with a limited number of close-ended questions. Minisurveys generate quantitative data that can often be collected and analyzed quickly. FOCUS GROUPS The focus group is a gathering of a homogeneous body of five to twelve participants to discuss issues and experiences among themselves. These are used to test an idea or to get a reaction on specific topics. A moderator introduces the topic, stimulates and focuses the discussion, and prevents domination of discussion by a few, while another documents the evaluator conversation. THE ROLE OF TECHNOLOGY IN RAPID APPRAISAL Certain equipment and technologies can aid the rapid collection of data and help to decrease the incidence of errors. These include, for example, hand held computers or personal digital assistants (PDAs) for data input, cellular phones, digital recording devices for interviews, videotaping and photography, and the use of geographic information systems (GIS) data and aerial photographs. GROUP DISCUSSIONS This method involves the selection of approximately five participants who are knowledgeable about a given topic and are comfortable enough with one another to freely discuss the issue as a group. The moderator introduces the topic and keeps the discussion going while another evaluator records the discussion. Participants talk among each other rather than respond directly to the moderator. COMMUNITY DISCUSSIONS This method takes place at a public meeting that is open to all community members; it can be successfully moderated with as many as 100 or more people. The primary interaction is between the participants while the moderator leads the discussion and asks questions following a carefully prepared interview guide. DIRECT OBSERVATION Teams of observers record what they hear and see at a program site using a detailed observation form. Observation may be of the physical surrounding or of ongoing activities, processes, or interactions. COLLECTING SECONDARY DATA This method involves the on-site collection of existing secondary data, such as export sales, loan information, health service statistics, etc. These data are an important augmentation to information collected using qualitative methods such as interviews, focus groups, and community discussions. The 3 evaluator must be able to quickly determine the validity and reliability of the data. (see TIPS 12: Indicator and Data Quality) TRANSECT WALKS The transect walk is a participatory approach in which the evaluator asks a selected community member to walk with him or her, for example, through the center of town, from one end of a village to the other, or through a market. The evaluator asks the individual, usually a key informant, to point out and discuss important sites, neighborhoods, businesses, etc., and to discuss related issues. COMMUNITY MAPPING Community mapping is a technique that requires the participation of residents on a program site. It can be used to help locate natural resources, routes, service delivery points, regional markets, trouble spots, etc., on a map of the area, or to use residents’ feedback to drive the development of a map that includes such information. COMMON RAPID APPRAISAL METHODS Table 1 Method Useful for Providing Example Advantages Limitations Further References INDIVIDUAL INTERVIEWS Interviews − A general overview of the topic from someone who has a broad knowledge and in-depth experience and understanding (key informant) or indepth information on a very specific topic or subtopic (individual) − Suggestions and recommendations to improve key aspects of a program Minisurveys − Quantitative data on narrowly focused questions, for a relatively homogeneous population, when representative sampling is not possible or required Key informant: Interview with program implementation director − Provides in-depth, − Susceptible to inside information interviewer and on specific issues selection biases from the − Individual individuals interviews lack the perspective and broader experience understanding and − Flexibility permits insight that a key exploring informant can unanticipated provide topics Interview with director of a regional trade association Individual: Interview with an activity manager within an overall − Easy to administer development program − Low cost Interview with a local entrepreneur trying to enter export trade − A customer service assessment − Quantitative data from multiple respondents − Rapid exit interviews after voting − Low cost TIPS No. 2, Conducting Key Informant Interviews K. Kumar, Conducting Key Informant Surveys in Developing Countries, 1986 Bamberger, Rugh, and Mabry, Real World Evaluation, 2006 UNICEF Website: M&E Training Modules: Overview of RAP Techniques − Findings are less generalizable than those from sample surveys unless the universe of the population is surveyed TIPS No. 9, Conducting Customer Service Assessments K. Kumar, Conducting Mini Surveys in Developing Countries, 1990 Bamberger, Rugh, and Mabry, RealWorld Evaluation, 2006 on purposeful sampling − Quick data on attitudes, beliefs, behaviors of beneficiaries or partners GROUP INTERVIEWS Focus Groups − Customer views on services, products, benefits − Information on implementation problems − Suggestions and recommendations for improving specific activities − Discussion on − Group discussion experience related may reduce to a specific program inhibitions, intervention allowing free exchange of ideas − Effects of a new business regulation − Low cost or proposed price changes 4 − Discussion may be dominated by a few individuals unless the process is facilitated/ managed well TIPS No. 10, Conducting Focus Group Interviews K. Kumar, Conducting Group Interviews in Developing Countries, 1987 T. Greenbaum, Moderating Focus Groups: A Practical Guide for Group Facilitation, 2000 Group Discussions Community Discussions − Understanding of issues from different perspectives and experiences of participants from a specific subpopulation − Discussion with young women on access to prenatal and infant care − Small group size allows full participation − Discussion with entrepreneurs about export regulations − Understanding of an − A Town Hall issue or topic from a meeting wide range of participants from key evaluation sites within a village, town, city, or city neighborhood − Findings cannot be Bamberger, Rugh, and Mabry, RealWorld generalized to a Evaluation, 2006 larger population UNICEF Website: M&E Training Modules: Community Meetings − Allows good understanding of specific topics − Low cost − Yields a wide range of opinions on issues important to participants − Findings cannot be generalized to larger population or to subpopulations of concern Bamberger, Rugh, and Mabry, RealWorld Evaluation, 2006 − Observer bias unless two to three evaluators observe same place or activity TIPS No. 4, Using Direct Observation Techniques − Must be able to determine reliability and validity of data TIPS No. 12, Guidelines for Indicator and Data Quality − A great deal of information can be − Larger groups obtained at one difficult to point of time moderate UNICEF Website: M&E Training Modules: Community Meetings ADDITIONAL COMMONLY USED TECHNIQUES Direct Observation − Visual data on physical − Market place to − Confirms data infrastructure, observe goods being from interviews supplies, conditions bought and sold, − Low cost who is involved, − Information about an sales interactions agency’s or business’s delivery systems, services WFP Website: Monitoring & Evaluation Guidelines: What Is Direct Observation and When Should It Be Used? − Insights into behaviors or events Collecting Secondary Data − Validity to findings gathered from interviews and group discussions − Microenterprise bank loan info. − Value and volume of exports − Quick, low cost way of obtaining important quantitative data − Number of people served by a health clinic, social service provider PARTICIPATORY TECHNIQUES Transect Walks Community Mapping − Important visual and locational information and a deeper understanding of situations and issues − Walk with key informant from one end of a village or urban neighborhood to another, through a market place, etc. − Info. on locations important for data collection that could be difficult to find − Insiders viewpoint − Susceptible to interviewer and − Quick way to find selection biases out location of places of interest to the evaluator − Low cost − Map of village and − Important − Rough locational surrounding area locational data information with locations of when there are no markets, water and detailed maps of fuel sources, conflict the program site − Quick comprehension areas, etc. on spatial location of services/resources in a region which can give insight to access issues 5 Bamberger, Rugh, and Mabry, Real World Evaluation, 2006 UNICEF Website: M&E Training Modules: Overview of RAP Techniques Bamberger, Rugh, and Mabry, Real World Evaluation, 2006 UNICEF Website: M&E Training Modules: Overview of RAP Techniques References Cited M. Bamberger, J. Rugh, and L. Mabry, Real World Evaluation. Working Under Budget, Time, Data, and Political Constraints. Sage Publications, Thousand Oaks, CA, 2006. T. Greenbaum, Moderating Focus Groups: A Practical Guide for Group Facilitation. Sage Publications, Thousand Oaks, CA, 2000. K. Kumar, “Conducting Mini Surveys in Developing Countries,” USAID Program Design and Evaluation Methodology Report No. 15, 1990 (revised 2006). K. Kumar, “Conducting Group Interviews in Developing Countries,” USAID Program Design and Evaluation Methodology Report No. 8, 1987. K. Kumar, “Conducting Key Informant Interviews in Developing Countries,” USAID Program Design and Evaluation Methodology Report No. 13, 1989. For more information: TIPS publications are available online at [insert website]. Acknowledgements: Our thanks to those whose experience and insights helped shape this publication including USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was authored by Patricia Vondal, PhD., of Management Systems International. Comments regarding this publication can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 6 NUMBER 6 2ND EDITION, 2010 PERFORMANCE MONITORING & EVALUATION TIPS SELECTING PERFORMANCE INDICATORS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. WHAT ARE PERFORMANCE INDICATORS? Performance indicators define a measure of change for the results identified in a Results Framework (RF). When wellchosen, they convey whether key objectives are achieved in a meaningful way for performance management. While a result (such as an Assistance Objective or an Intermediate Result) identifies what we hope to accomplish, indicators tell us by what standard that result will be measured. Targets define whether there will be an expected increase or decrease, and by what magnitude.1 Indicators may be quantitative or qualitative in nature. Quantitative indicators are numerical: an example is a person’s height or weight. On the other hand, qualitative indicators require subjective evaluation. Qualitative data are sometimes reported in numerical form, but those numbers do not have arithmetic meaning on their own. Some examples are a score on an institutional capacity index or progress along a milestone scale. When developing quantitative or qualitative indicators, the important point is that the indicator be 1 For further information, see TIPS 13: Building a Results Framework and TIPS 8: Baselines and Targets. 1 Selecting an optimal set of indicators to track progress against key results lies at the heart of an effective performance management system. This TIPS provides guidance on how to select effective performance indicators. constructed in a way that permits consistent measurement over time. USAID has developed many performance indicators over the years. Some examples include the dollar value of nontraditional exports, private investment as a percentage of gross domestic product, contraceptive prevalence rates, child mortality rates, and progress on a legislative reform index. WHY ARE PERFORMANCE INDICATORS IMPORTANT? FOR WHAT RESULTS ARE PERFORMANCE INDICATORS REQUIRED? Performance indicators provide objective evidence that an intended change is occurring. Performance indicators lie at the heart of developing an effective performance management system – they define the data to be collected and enable actual results achieved to be compared with planned results over time. Hence, they are an indispensable management tool for making evidence-based decisions about program strategies and activities. Performance indicators can also be used: THE PROGRAM LEVEL USAID’s ADS requires that at least one indicator be chosen for each result in the Results Framework in order to measure progress (see ADS 203.3.3.1)2. This includes the Assistance Objective (the highest-level objective in the Results Framework) as well as supporting Intermediate Results (IRs)3. These indicators should be included in the Mission or Office Performance Management Plan (PMP) (see TIPS 8: Preparing a PMP). To assist managers in focusing on the achievement of development results. To provide objective evidence that results are being achieved. To orient and motivate staff and partners toward achieving results. To communicate USAID achievements to host country counterparts, other partners, and customers. To more effectively report results achieved to USAID's stakeholders, including the U.S. Congress, Office of Management and Budget, and citizens. PROJECT LEVEL AO teams are required to collect data regularly for projects and activities, including inputs, outputs, and processes, to ensure they are progressing as expected and are contributing to relevant IRs and AOs. These indicators should be included in a project-level monitoring and evaluation 2 For further discussion of AOs and IRs (which are also termed impact and outcomes respectively in other systems) refer to TIPS 13: Building a Results Framework. 3 Note that some results frameworks incorporate IRs from other partners if those results are important for USAID to achieve the AO. This is discussed in further detail in TIPS 13: Building a Results Framework. If these IRs are included, then it is recommended that they be monitored, although less rigorous standards apply. 2 (M&E) plan. The M&E plan should be integrated in project management and reporting systems (e.g., quarterly, semiannual, or annual reports). TYPES OF INDICATORS IN USAID SYSTEMS Several different types of indicators are used in USAID systems. It is important to understand the different roles and functions of these indicators so that managers can construct a performance management system that effectively meets internal management and Agency reporting needs. CUSTOM INDICATORS Custom Indicators are performance indicators that reflect progress within each unique country or program context. While they are useful for managers on the ground, they often cannot be aggregated across a number of programs like standard indicators. Example: Progress on a milestone scale reflecting legal reform and implementation to ensure credible elections, as follows: Draft law is developed in consultation with nongovernmental organizations (NGOs) and political parties. Public input is elicited. PARTICIPATION IS ESSENTIAL Experience suggests that participatory approaches are an essential aspect of developing and maintaining effective performance management systems. Collaboration with development partners (including host country institutions, civil society organizations (CSOs), and implementing partners) as well as customers has important benefits. It allows you to draw on the experience of others, obtains buy-in to achieving results and meeting targets, and provides an opportunity to ensure that systems are as streamlined and practical as possible. Draft law is modified based on feedback. The secretariat presents the draft to the Assembly. The law is passed by the Assembly. The appropriate government body completes internal policies or regulations to implement the law. The example above would differ for each country depending on its unique process for legal reform. STANDARD INDICATORS Standard indicators are used primarily for Agency reporting purposes. Standard indicators produce data that can be aggregated across many programs. Optimally, standard indicators meet both Agency reporting and on-the-ground management needs. However, in many cases, standard indicators do not substitute for performance (or custom indicators) because they are designed to meet different needs. There is often a tension between measuring a standard across many programs and selecting indicators that best reflect true program results and that can be used for internal management purposes. Example: Number of Laws or Amendments to Ensure Credible Elections Adopted with USG Technical Assistance. In comparing the standard indicator above with the previous example of a custom indicator, it becomes clear that the custom indictor is more likely to be useful as a management tool, because it provides greater specificity and is more sensitive to change. Standard indicators also tend to measure change at the output level, because they are precisely the types of measures that are, at face value, more easily aggregated across many programs, as the following example demonstrates. Example: The number of people trained in policy and regulatory practices. CONTEXTUAL INDICATORS Contextual indicators are used to understand the broader environment in which a program operates, to track assumptions, or to examine externalities that may affect success, failure, or progress. 3 INDICATORS AND DATA—SO WHAT’S THE DIFFERENCE? Indicators define the particular characteristic or dimension that will be used to measure change. Height is an example of an indicator. The data are the actual measurements or factual information that result from the indicator. Five feet seven inches is an example of data. They do not represent program performance, because the indicator measures very highlevel change. Example: Score on the Freedom House Index or Gross Domestic Product (GDP). This sort of indicator may be important to track to understand the context for USAID programming (e.g. a severe drop in GDP is likely to affect economic growth programming), but represents a level of change that is outside the manageable interest of program managers. In most cases, it would be difficult to say that USAID programming has affected the overall level of freedom within a country or GDP (given the size of most USAID programs in comparison to the host country economy, for example). WHAT ARE USAID’S CRITERIA FOR SELECTING INDICATORS? USAID policies (ADS 203.3.4.2) identify seven key criteria to guide the selection of performance indicators: Direct Objective Useful for Management Attributable Practical Adequate Disaggregated, as necessary These criteria are designed to assist managers in selecting optimal indicators. The extent to which performance indicators meet each of the criteria must be consistent with the requirements of good management. As managers consider these criteria, they should use a healthy measure of common sense and reasonableness. While we always want the ―best‖ indicators, there are inevitably trade-offs among various criteria. For example, data for the most direct or objective indicators of a given result might be very expensive to collect or might be available too infrequently. Table 1 includes a summary checklist that can be used during the selection process to assess these trade-offs. Two overarching factors determine the extent to which performance indicators function as useful tools for managers and decision-makers: The degree to which performance indicators accurately reflect the process or phenomenon they are being used to measure. The level of comparability of performance indicators over time: that is, can we measure results in a consistent and comparable manner over time? 1. DIRECT An indicator is direct to the extent that it clearly measures the intended result. This criterion is, in many ways, the most important. While this may appear to be a simple concept, it is one of the more common problems with indicators. Indicators should either be widely accepted for use by specialists in a subject area, exhibit readily understandable face validity (i.e., be intuitively understandable), or be supported by research. Managers should place greater confidence in indicators that are direct. Consider the following example: Result: Increased Transparency of Key Public Sector Institutions 4 Indirect Indicator: Passage of the Freedom of Information Act (FOIA) Direct Indicator: Progress on a milestone scale demonstrating enactment and enforcement of policies that require open hearings The passage of FOIA, while an important step, does not actually measure whether a target institution is more transparent. The better example outlined above is a more direct measure. Level Another dimension of whether an indicator is direct relates to whether it measures the right level of the objective. A common problem is that there is often a mismatch between the stated result and the indicator. The indicator should not measure a higher or lower level than the result. For example, if a program measures improved management practices through the real value of agricultural production, the indicator is measuring a higher-level effect than is stated (see Figure 1). Understanding levels is rooted in understanding the development hypothesis inherent in the Results Framework (see TIPS 13: Building a Results Framework). Tracking indicators at each level facilitates better understanding and analysis of whether the Figure 1. Levels RESULT INDICATOR Increased Production Real value of agricultural production. Improved Management Practices Number and percent of farmers using a new technology. Improved Knowledge and Awareness Number and percent of farmers who can identify five out of eight steps for implementing a new technology. development hypothesis is working. For example, if farmers are aware of how to implement a new technology, but the number or percent that actually use the technology is not increasing, there may be other issues that need to be addressed. Perhaps the technology is not readily available in the community, or there is not enough access to credit. This flags the issue for managers and provides an opportunity to make programmatic adjustments. Proxy Indicators Proxy indicators are linked to the result by one or more assumptions. They are often used when the most direct indicator is not practical (e.g., data collection is too costly or the program is being implemented in a conflict zone). When proxies are used, the relationship between the indicator and the result should be well-understood and clearly articulated. The more assumptions the indicator is based upon, the weaker the indicator. Consider the following examples: Result: Increased Household Income Proxy Indicator: Dollar value of household expenditures The proxy indicator above makes the assumption that an increase in income will result in increased household expenditures; this assumption is well-grounded in research. Result: Increased Access to Justice Proxy Indicator: Number of new courts opened The indicator above is based on the assumption that physical access to new courts is the fundamental development problem—as opposed to corruption, the costs associated with using the court system, or lack of knowledge of how to obtain legal assistance and/or use court systems. Proxies can be used when assumptions are clear and when there is research to support that assumption. 2. OBJECTIVE An indicator is objective if it is unambiguous about 1) what is being measured and 2) what 5 data are being collected. In other words, two people should be able to collect performance information for the same indicator and come to the same conclusion. Objectivity is critical to collecting comparable data over time, yet it is one of the most common problems noted in audits. As a result, pay particular attention to the definition of the indicator to ensure that each term is clearly defined, as the following examples demonstrate: Poor Indicator: Number of successful firms Objective Indicator: Number of firms with an annual increase in revenues of at least 5% The better example outlines the exact criteria for how ―successful‖ is defined and ensures that changes in the data are not attributable to differences in what is being counted. Objectivity can be particularly challenging when constructing qualitative indicators. Good qualitative indicators permit regular, systematic judgment about progress and reduce subjectivity (to the extent possible). This means that there must be clear criteria or protocols for data collection. 3. USEFUL FOR MANAGEMENT An indicator is useful to the extent that it provides a meaningful measure of change over time for management decision-making. One aspect of usefulness is to ensure that the indicator is measuring the ―right change‖ in order to achieve development results. For example, the number of meetings between Civil Society Organizations (CSOs) and government is something that can be counted but does not necessarily reflect meaningful change. By selecting indicators, managers are defining program success in concrete ways. Managers will focus on achieving targets for those indicators, so it is important to consider the intended and unintended incentives that performance indicators create. As a result, the system may need to be fine-tuned to ensure that incentives are focused on achieving true results. A second dimension is whether the indictor measures a rate of change that is useful for management purposes. This means that the indicator is constructed so that change can be monitored at a rate that facilitates management actions (such as corrections and improvements). Consider the following examples: Result: Targeted legal reform to promote investment Less Useful for Management: Number of laws passed to promote direct investment. More Useful for Management: Progress toward targeted legal reform based on the following stages: Stage 1. Interested groups propose that legislation is needed on issue. Stage 2. Issue is introduced in the relevant legislative committee/executive ministry. Stage 3. Legislation is drafted by relevant committee or executive ministry. Stage 4. Legislation is debated by the legislature. Stage 5. Legislation is passed by full approval process needed in legislature. Stage 6. Legislation is approved by the executive branch (where necessary). Stage 7. Implementing actions are taken. Stage 8. No immediate need identified for amendments to the law. The less useful example may be useful for reporting; however, it is so general that it does not provide a good way to track progress for performance management. The process of passing or implementing laws is a long-term one, so that over the course of a year or two the AO team may only be able to report that one or two such laws have passed when, in reality, a high degree of effort is 6 invested in the process. In this case, the more useful example better articulates the important steps that must occur for a law to be passed and implemented and facilitates management decision-making. If there is a problem in meeting interim milestones, then corrections can be made along the way. 4. ATTRIBUTABLE An indicator is attributable if it can be plausibly associated with USAID interventions. The concept of ―plausible association‖ has been used in USAID for some time. It does not mean that X input equals Y output. Rather, it is based on the idea that a case can be made to other development practitioners that the program has materially affected identified change. It is important to consider the logic behind what is proposed to ensure attribution. If a Mission is piloting a project in three schools, but claims national level impact in school completion, this would not pass the common sense test. Consider the following examples: Result: Improved Budgeting Capacity Less Attributable: Budget allocation for the Ministry of Justice (MOJ) More Attributable: The extent to which the budget produced by the MOJ meets established criteria for good budgeting If the program works with the Ministry of Justice to improve budgeting capacity (by providing technical assistance on budget analysis), the quality of the budget submitted by the MOJ may improve. However, it is often difficult to attribute changes in the overall budget allocation to USAID interventions, because there are a number of externalities that affect a country’s final budget – much like in the U.S. For example, in tough economic times, the budget for all government institutions may decrease. A crisis may emerge that requires the host country to reallocate resources. The better example above is more attributable (and directly linked) to USAID’s intervention. 5. PRACTICAL A practical indicator is one for which data can be collected on a timely basis and at a reasonable cost. There are two dimensions that determine whether an indicator is practical. The first is time and the second is cost. Time Consider whether resulting data are available with enough frequency for management purposes (i.e., timely enough to correspond to USAID performance management and reporting purposes). Second, examine whether data are current when available. If reliable data are available each year, but the data are a year old, then it may be problematic. Cost Performance indicators should provide data to managers at a cost that is reasonable and appropriate as compared with the management utility of the data. As a very general rule of thumb, it is suggested that between 5% and 10% of program or project resources be allocated for monitoring and evaluation (M&E) purposes. However, it is also important to consider priorities and program context. A program would likely be willing to invest more resources in measuring changes that are central to decisionmaking and less resources in measuring more tangential results. A more mature program may have to invest more in demonstrating higherlevel changes or impacts as compared to a new program. 6. ADEQUATE Taken as a group, the indicator (or set of indicators) should be sufficient to measure the stated result. In other words, they should be the minimum number necessary and costeffective for performance management. The number of indicators required to adequately measure a result depends on 1) the complexity of the result being measured, 2) the amount of information needed to make reasonably confident decisions, and 3) the 7 level of resources available. Too many indicators create information overload and become overly burdensome to maintain. Too few indicators are also problematic, because the data may only provide a partial or misleading picture of performance. The following demonstrates how one indicator can be adequate to measure the stated objective: Result: Increased Traditional Exports in Targeted Sectors Adequate Indicator: Value of traditional exports in targeted sectors In contrast, an objective focusing on improved maternal health may require two or three indicators to be adequate. A general rule of thumb is to select between two and three performance indicators per result. If many more indicators are needed to adequately cover the result, then it may signify that the objective is not properly focused. 7. DISAGGREGATED, AS NECESSARY The disaggregation of data by gender, age, location, or some other dimension is often important from both a management and reporting point of view. Development programs often affect population cohorts or institutions in different ways. For example, it might be important to know to what extent youth (up to age 25) or adults (25 and older) are participating in vocational training, or in which districts schools have improved. Disaggregated data help track whether or not specific groups participate in and benefit from activities intended to include them. In particular, USAID policies (ADS 203.3.4.3) require that performance management systems and evaluations at the AO and project or activity levels include gender-sensitive indicators and sexdisaggregated data if the activities or their anticipated results involve or affect women and men differently. If so, this difference would be an important factor in managing for sustainable program impact. Consider the following example: Result: Increased Access to Credit Gender-Sensitive Indicator: Value of loans disbursed, disaggregated by male/female. WHAT IS THE PROCESS FOR SELECTING PERFORMANCE INDICATORS? Selecting appropriate and useful performance indicators requires careful thought, iterative refining, collaboration, and consensus-building. The following describes a series of steps to select optimal performance indicators4. Although presented as discrete steps, in practice some of these can be effectively undertaken simultaneously or in a more iterative manner. These steps may be applied as a part of a larger process to develop a new PMP, or in part, when teams have to modify individual indicators. STEP 1. DEVELOP A PARTICIPATORY PROCESS FOR IDENTIFYING PERFORMANCE INDICATORS The most effective way to identify indicators is to set up a process that elicits the participation and feedback of a number of partners and stakeholders. This allows managers to: A common way to begin the process is to hold working sessions. Start by reviewing the Results Framework. Next, identify indicators for the Assistance Objective, then move down to the Intermediate Results. In some cases, the AO team establishes the first round of indicators and then provides them to other partners for input. In other cases, key partners may be included in the working sessions. Draw on different areas of expertise. Ensure that indicators measure the right changes and represent part of a larger approach to achieve development impact. Build commitment and understanding of the linkage between indicators and results. This will increase the utility of the performance management system among key stakeholders. This process focuses on presenting greater detail related specifically to indicator selection. Refer to TIPS 7: Preparing a PMP for a broader set of steps on how to develop a full PMP. 4 8 Build capacity for performance management among partners, such as NGOs and partner country institutions. Ensure that systems are as practical and streamlined as possible. Often development partners can provide excellent insight on the practical issues associated with indicators and data collection. It is important to task the group with identifying the set of minimal indicators necessary and sufficient to manage the program effectively. That is, the group must go through a process of prioritization in order to narrow down the list. While participatory processes may take more time at the front end, they almost always result in more coherent and effective system. STEP 2. CLARIFY THE RESULT Carefully define the result desired. Good performance indicators are based on clearly articulated and focused objectives. Review the precise wording and intention of the objective. Determine what exactly is meant by the result. For example, if the result is ―improved business environment,‖ what does that mean? What specific aspects of the business environment will be improved? Optimally, the result should be stated with as much specificity as possible. If the result is broad (and the team doesn’t have the latitude to change the objective), then the team might further define its meaning. Example: One AO team further defined their IR, ―Improved Business Environment,‖ as follows: Making it easier to do business in terms of resolving disputes, obtaining licenses from the government, and promoting investment. An identified set of key policies are in place to support investment. Key policies include laws, regulations, and policies related to the simplification of investment procedures, bankruptcy, and starting a business. As the team gains greater clarity and consensus on what results are sought, ideas for potential indicators begin to emerge. Be clear about what type of change is implied. What is expected to change—a situation, a condition, the level of knowledge, an attitude, or a behavior? For example, changing a country's voting law(s) is very different from changing citizens' awareness of their right to vote (which is different from voting). Each type of change is measured by different types of performance indicators. Identify more precisely the specific targets for change. Who or what are the specific targets for the change? For example, if individuals, which individuals? For an economic growth program designed to increase exports, does the program target all exporters or only exporters of non-traditional agricultural products? This is known as identifying the ―unit of analysis‖ for the performance indicator. STEP 3: IDENTIFY POSSIBLE INDICATORS Usually there are many possible indicators for a particular result, but some are more appropriate and useful than others. In selecting indicators, don’t settle too quickly on the first ideas that come most conveniently or obviously to mind. Create an initial list of possible indicators, using the following approaches: Conduct a brainstorming session with colleagues to draw upon the expertise of 9 the full Assistance Objective Team. Ask, ―how will we know if the result is achieved?‖ Consider other resources. Many organizations have databases or indicator lists for various sectors available on the internet. Consult experts. Review the PMPs and indicators of previous programs or similar programs in other Missions. with technical STEP 4. ASSESS THE BEST CANDIDATE INDICATORS, USING THE INDICATOR CRITERIA Next, from the initial list, select the best candidates as indicators. The seven basic criteria that can be used to judge an indicator’s appropriateness and utility described in the previous section are summarized in Table 1. When assessing and comparing possible indicators, it is helpful to use this type of checklist to guide the assessment process. Remember that there will be trade-offs between the criteria. For example, the optimal indicator may not be the most cost-effective to select. STEP 5. SELECT THE “BEST” PERFORMANCE INDICATORS Select the best indicators to incorporate in the performance management system. They should be the optimum set of measures that are useful to management and can be obtained at reasonable cost. Be Strategic and Streamline Where Possible. In recent years, there has been a substantial increase in the number of indicators used to monitor and track programs. It is important to remember that there are costs, in terms of time and money, to collect data for each indicator. AO teams should: Select indicators based on strategic thinking about what must truly be achieved for program success. Review indicators to determine whether any final narrowing can be done. Are some indicators not useful? If so, discard them. Use participatory approaches in order to discuss and establish priorities that help managers focus on key indicators that are necessary and sufficient. Ensure that the rationale for indicator selection is recorded in the PMP. There are rarely perfect indicators in the development environment—it is more often a case of weighing different criteria and making the optimal choices for a particular program. It is important to ensure that the rationale behind these choices is recorded in the PMP so that new staff, implementers, or auditors understand why each indicator was selected. STEP 6. FINE TUNE WHEN NECESSARY Indicators are part of a larger system that is ultimately designed to assist managers in achieving development impact. On the one hand, indicators must remain comparable over time but, on the other hand, some refinements will invariably be needed to ensure the system is as effective as possible. (Of course, there is no value in continuing to collect bad data, for example.) As a result, these two issues need to be balanced. Remember that indicator issues are often flags for other 10 underlying problems. If a large number of indicators are frequently changed, this may signify a problem with program management or focus. At the other end of the continuum, if no indicators were to change over a long period of time, it is possible that a program is not adapting and evolving as necessary. In our experience, some refinements are inevitable as data are collected and lessons learned. After some rounds of data collection are completed, it is often useful to discuss indicator issues and refinements among AO team members and/or with partners and implementers. In particular, the period following portfolio reviews is a good time to refine PMPs if necessary. TABLE 1. INDICATOR SELECTION CRITERIA CHECKLIST Criteria 1. Direct Definition Checklist Direct. The indicator clearly represents the intended result. An outsider or an expert in the field would agree that the indicator is a logical measure for the stated result. Level. The indicator reflects the right level; that is, it does not measure a higher or lower level than the stated result. Proxies. The indicator is a proxy measure. If the indicator is a proxy, note what assumptions the proxy is based upon. 2. Objective The indicator is clear and unambiguous about what is being measured. 3. Useful for Management The indicator is useful for management decision-making. 4. Attributable The indicator can be plausibly associated with USAID interventions. 5. Practical Time. Data are produced with enough frequency for management purposes (i.e. timely enough to correspond to USAID performance management and reporting purposes). Data are current when available. Cost. Data are worth the cost to USAID managers. 6. Adequate The indicators, taken as a group, are sufficient to measure the stated result. All major aspects of the result are measured. 7. Disaggregated, as necessary The indicators are appropriately disaggregated by gender, age, location, or some other dimension that is important for programming. In particular, gender disaggregation has been considered as required (see ADS 203.3.4.3). 11 Comments For more information: TIPS publications are available online at [insert website]. Acknowledgements: Our thanks to those whose experience and insights helped shape this publication, including Gerry Britan and Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was updated by Michelle Adams-Matson of Management Systems International. Comments can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 12 1996, Number 7 Performance Monitoring and Evaluation TIPS USAID Center for Development Information and Evaluation PREPARING A PERFORMANCE MONITORING PLAN What Is a Performance Monitoring Plan? A performance monitoring plan (PMP) is a tool USAID operating units use to plan and manage the collection of performance data. Sometimes the plan also includes plans for data analysis, reporting, and use. USAID's reengineering guidance requires operating units to prepare a Performance Monitoring Plan for the systematic and timely collection of performance data. This Tips offers advice for preparing such a plan. Reengineering guidance requires operating units to prepare PMPs once their strategic plans are approved. At a minimum, PMPs should include: a detailed definition of each performance indicator the source, method, frequency and schedule of data collection, and the office, team, or individual responsible for ensuring data are available on schedule As part of the PMP process, it is also advisable (but not mandated) for operating units to plan for: how the performance data will be analyzed, and how it will be reported, reviewed, and used to inform decisions While PMPs are required, they are for the operating unit's own use. Review by central or regional bureaus is not mandated, although some bureaus encourage sharing PMPs. PMPs should be updated as needed to ensure plans, schedules, and assignments remain current. Why Are PMPs Important? A performance monitoring plan is a critical tool for planning, managing, and documenting data collection. It contributes to the effectiveness of the performance monitoring system by assuring that comparable data will be collected on a regular and timely basis. These are essential to the operation of a credible and useful performance-based management approach. PMPs promote the collection of comparable data by sufficiently documenting indicator definitions, sources, and methods of data collection. This enables operating units to collect comparable data over time even when key personnel change. PMPs support timely collection of data by documenting the frequency and schedule of data collection as well as by assigning responsibilities. Operating units should also consider developing plans for data analysis, reporting, and review efforts as part of the PMP process. It makes sense to PN-ABY-215 2 Use a Participatory Approach The Agency's reengineering directives require that operating units involve USAID's partners, customers, and stakeholders in planning approaches to monitoring performance. Experience indicates the value of collaborating with relevant host government officials, implementing agency staff, contractors and grantees, other donors, and customer groups, when preparing PMPs. They typically have the most familiarity with the quality, availability, think through data collection, analysis, reporting, and review as an integrated process. This will help keep the performance monitoring system on track and ensure performance data informs decision-making. While there are strong arguments for including such integrated plans in the PMP document, this is not mandated in the reengineering guidance. Some operating units may wish to prepare these plans separately. Elements of a PMP The following elements should be considered for inclusion in a performance monitoring plan. Elements 1- 5 are required in the reengineering guidance, whereas 6 -9 are suggested as useful practices. I. Plans for Data Collection (Required) In its strategic plan, an operating unit will have identified a few preliminary performance indicators for each of its strategic objectives, strategic support objectives, and special objectives (referred to below simply as SOs), and USAID-supported intermediate results (IRs). In most cases, preliminary baselines and targets will also have been provided in the strategic plan. The PMP builds on this initial information, verifying or modifying the performance indicators, baselines and targets, and documenting decisions. PMPs are required to include information outlined below (elements 1-5) on each performance indicator that has been identified in the Strategic Plan for SOs and IRs. Plans should also address how critical assumptions and results supported by partners (such as the host government, other donors, NGOs) will be monitored, although the same standards and requirements for developing indicators and collecting data do not apply. Furthermore, it is useful to include in the PMP lowerlevel indicators of inputs, outputs, and processes at the activity level, and how they will be monitored and linked to IRs and SOs. 1. Performance Indicators and Their Definitions Each performance indicator needs a detailed definition. Be precise about all technical elements of the indicator statement. As an illustration, consider the indicator, number of small enterprises receiving loans from the private banking system. How are small enterprises defined -- all enterprises with 20 or fewer employees, or 50 or 100? What types of institutions are considered part of the private banking sector -- credit unions, government-private sector joint-venture financial institutions? Include in the definition the unit of measurement. For example, an indicator on the value of exports might be otherwise well defined, but it is also important to know whether the value will be measured in current or constant terms and in U.S. dollars or local currency. The definition should be detailed enough to ensure that different people at different times, given the task of collecting data for a given indicator, would collect identical types of data. 2. Data Source Identify the data source for each performance indicator. The source is the entity from which the data are obtained, usually the organization that conducts the data collection effort. Data sources may include government departments, international organizations, other donors, NGOs, private firms, USAID offices, contractors, or activity implementing agencies. Be as specific about the source as possible, so the same source can be used routinely. Switching data sources for the same indicator over time can lead to inconsistencies and misinterpretations and should be avoided. For example, switching from estimates of infant mortality rates based on national sample surveys to estimates based on hospital registration statistics can lead to false impressions of change. 3 Plans may refer to needs and means for strengthening the capacity of a particular data source to collect needed data on a regular basis, or for building special data collection efforts into USAID activities. 3. Method of Data Collection Specify the method or approach to data collection for each indicator. Note whether it is primary data collection or is based on existing secondary data. For primary data collection, consider: the unit of analysis (individuals, families, communities, clinics, wells) data disaggregation needs (by gender, age, ethnic groups, location) sampling techniques for selecting cases (random sampling, purposive sampling); and techniques or instruments for acquiring data on these selected cases (structured questionnaires, direct observation forms, scales to weigh infants) For indicators based on secondary data, give the method of calculating the specific indicator data point and the sources of data. Note issues of data quality and reliability. For example, using secondary data from existing sources cuts costs and efforts, but its quality may not be as reliable. Provide sufficient detail on the data collection or calculation method to enable it to be replicated. 4. Frequency and Schedule of Data Collection Performance monitoring systems must gather comparable data periodically to measure progress. But depending on the performance indicator, it may make sense to collect data on a quarterly, annual, or less frequent basis. For example, because of the expense and because changes are slow, fertility rate data from sample surveys may only be collected every few years whereas data on contraceptive distributions and sales from clinics' record systems may be gathered quarterly. PMPs can also usefully provide the schedules (dates) for data collection efforts. When planning the frequency and scheduling of data collection, an important factor to consider is management's needs for timely information for decisionmaking. 5. Responsibilities for Acquiring Data For each performance indicator, the responsibility the operating unit for the timely acquisition of data from their source should be clearly assigned to a particular office, team, or individual. II. Plans for Data Analysis, Reporting, Review, and Use An effective performance monitoring system needs to plan not only for the collection of data, but also for data analysis, reporting, review, and use. It may not be possible to include everything in one document at one time, but units should take the time early on for careful planning of all these aspects in an integrated fashion. 6. Data Analysis Plans To the extent possible, plan in advance how performance data for individual indicators or groups of related indicators will be analyzed. Identify data analysis techniques and data presentation formats to be used. Consider if and how the following aspects of data analysis will be undertaken: Comparing disaggregated data. For indicators with disaggregated data, plan how it will be compared, displayed, and analyzed. Comparing current performance against multiple criteria. For each indicator, plan how actual performance data will be compared with a) past performance, b) planned or targeted performance or c) other relevant benchmarks. Analyzing relationships among performance indicators. Plan how internal analyses of the performance data will examine interrelationships. For example How will a set of indicators (if there are more than one) for a particular SO or IR be analyzed to reveal progress? What if only some of the indicators reveal progress? How will cause-effect relationships among SOs and IRs within a results framework be analyzed? How will USAID activities be linked to achieving IRs and SOs? Analyzing cost-effectiveness. When practical and feasible, plan for using performance data to compare systematically alternative program approaches in terms of costs as well as results. The Government Performance and Results Act (GPRA) encourages this. 4 7. Plans for Complementary Evaluations Reengineering stresses that evaluations should be conducted only if there is a clear management need. It may not always be possible or desirable to predict years in advance when or why they will be needed. Nevertheless, operating units may find it useful to plan on a regular basis what evaluation efforts are needed to complement information from the performance monitoring system. The operating unit's internal performance reviews, to be held periodically during the year, may be a good time for such evaluation planning. For example, if the reviews reveal that certain performance targets are not being met, and if the reasons why are unclear, then planning evaluations to investigate why would be in order. 8. Plans for Communicating and Using Performance Information Planning how performance information will be reported, reviewed, and used is critical for effective managing for results. For example, plan, schedule, and assign responsibilities for internal and external reviews, briefings, and reports. Clarify what, how and when management decisions will consider performance information. Specifically, plan for the following: Operating unit performance reviews. Reengineering guidance requires operating units to conduct internal reviews of performance information at regular intervals during the year to assess progress toward achieving SOs and IRs. In addition, activity-level reviews should be planned regularly by SO teams to assess if activities' inputs, outputs, and processes are supporting achievement of IRs and SOs. USAID/Washington reviews and the R4 Report. Reengineering requires operating units to prepare and submit to USAID/Washington an annual Results Review and Resource Request (R4) report, which is the basis for a joint review with USAID/W of performance and resource requirements. Help plan R4 preparation by scheduling tasks and making assignments. External reviews, reports, and briefings. Plan for reporting and disseminating performance information to key external audiences, such as host government counterparts, collaborating NGOs, other partners, donors, customer groups, and stakeholders. Communication techniques may include reports, oral briefings, videotapes, memos, newspaper articles. Influencing management decisions. The ultimate aim of performance monitoring systems is to promote performance-based decision-making. To the extent possible, plan in advance what management decisionmaking processes should be influenced by performance information. For example, budget discussions, programming decisions, evaluation designs/scopes of work, office retreats, management contracts, and personnel appraisals often benefit from the consideration of performance information. 9. Budget Estimate roughly the costs to the operating unit of collecting, analyzing, and reporting performance data for a specific indicator (or set of related indicators). Identify the source of funds. If adequate data are already available from secondary sources, costs may be minimal. If primary data must be collected at the operating unit's expense, costs can vary depending on scope, method, and frequency of data collection. Sample surveys may cost more than $100,000, whereas rapid appraisal methods can be conducted for much less. However, often these low-cost methods do not provide quantitative data that are sufficiently reliable or representative. Reengineering guidance gives a range of 3 to 10 percent of the total budget for an SO as a reasonable level to spend on performance monitoring and evaluation. CDIE's Tips series provides advice and suggestions to USAID managers on how to plan and conduct performance monitoring and evaluation activities effectively. They are supplemental references to the reengineering automated directives system (ADS), chapter 203. For further information, contact Annette Binnendijk, CDIE Senior Evaluation Advisor, via phone (703) 875-4235, fax (703) 875-4866, or email. Copies of TIPS can be ordered from the Development Information Services Clearinghouse by calling (703) 351-4006 or by faxing (703) 351-4039. Please refer to the PN number. To order via Internet, address requests to [email protected] NUMBER 8 2ND EDITION, 2010 PERFORMANCE MONITORING & EVALUATION TIPS BASELINES AND TARGETS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. INTRODUCTION The achievement of planned results is at the heart of USAID’s performance management system. In order to understand where we, as project managers, are going, we need to understand where we have been. Establishing quality baselines and setting ambitious, yet achievable, targets are essential for the successful management of foreign assistance programs. WHAT ARE BASELINES AND TARGETS? A baseline is the value of a performance indicator before the implementation of projects or activities, while a target is the specific, planned level of result to be achieved within an explicit timeframe (see ADS 203.3.4.5). Targets are set for indicators at the Assistance Objective (AO), Intermediate Result (IR), and output levels. WHY ARE BASELINES IMPORTANT? Baselines help managers determine progress in achieving outputs and outcomes. They also help identify the extent to which change has happened at each level of result. USAID ADS 203.3.3 requires a PMP for each AO. Program managers should provide baseline and target values for every indicator in the PMP. Lack of baseline data not only presents challenges for management decision-making purposes, but also hinders evaluation efforts. For example, it is generally not possible to conduct a rigorous impact 1 evaluation without solid baseline data (see TIPS 19: Rigorous Impact Evaluation). ESTABLISHING THE BASELINE Four common scenarios provide the context for establishing baseline data: 1. BASELINE IS ESTABLISHED If baseline data exist prior to the start of a project or activity, additional data collected over the life of the project must be collected in a consistent manner in order to facilitate comparisons. For example, consider the drop-out rate for girls 16 and under. If baseline data are obtained from the Ministry of Education, the project should continue to collect these data from this same source, ensuring that the data collection remains the same. methodology Data may also be obtained from a prior implementing partner’s project, provided that the data collection protocols, instruments, and scoring procedures can be replicated. For example, a policy index might be used to measure progress of legislation (see TIPS 14: Monitoring the Policy Reform Process). If these activities become a part of a new project, program managers should consider the benefit of using the same instrument. In cases where baseline data exist from primary or secondary sources, it is important that the data meet USAID’s data quality standards for validity, reliability, precision, integrity, and timeliness (see TIPS 12: Data Quality Standards). 2. BASELINES MUST BE COLLECTED In cases where there are no existing data with which to establish a baseline, USAID and/or its implementing partners will have to collect it if the required data are not already being collected by, for example, a host-country government, an international organization, or another donor. Primary data collection can be expensive, particularly if data are collected through a formal survey or Participation of key stakeholders in setting targets helps establish a common understanding about what the project will accomplish and when. USAID staff, implementing partners, host country governments, other donors, and civil society partners, among others, should attend working sessions at the outset of program implementation to review baseline data and other information to set interim and final targets. a new index. Program managers should consider this cost and incorporate it into program or project planning. Ideally, data should be collected prior to the initiation of the program. If this is not feasible, baselines should be collected as soon as possible. For example, an implementing partner may collect perception data on the level of corruption in targeted municipalities for USAID’s PMP sixty days after approval of a project’s work plan; in another case, a score on an advocacy capacity index may not be collected until Community Service Organizations (CSOs) are awarded grants. If baseline data cannot be collected until later in the course of implementing an activity, the AO Team should document when and how the baseline data will be collected (ADS 203.3.4.5). 3. BASELINES ARE ESTABLISHED ON A ROLLING BASIS In some cases, it is possible to collect baseline data on a rolling basis as implementation proceeds. For example, imagine that a health project is being rolled out sequentially across three provinces over a three-year period. Data collected in the first province will serve as baseline for Year One; data collected in the second province will serve as baseline for the second province in Year Two; and data collected in the third province will serve as baseline for that province in Year Three. The achievement of results requires the joint action of many stakeholders. Manageable interest means we, as program managers, have sufficient reason to believe that the achievement of our planned results can be significantly influenced by interventions of USAID’s program and staff resources. When setting targets, take into account the achievement of how other actors will affect outcomes and what it means for USAID to achieve success. program is the number of grants awarded, the baseline is zero. WHY ARE TARGETS IMPORTANT? Beyond meeting USAID requirements, performance targets are important for several reasons. They help justify a program by describing in concrete terms what USAID’s investment will produce. Targets orient stakeholders to the tasks to be accomplished and motivate individuals involved in a program to do their best to ensure the targets are met. Targets also help to establish clear expectations for USAID staff, implementing partners, and key stakeholders. Once a program is underway, they serve as the guideposts for monitoring whether progress is being made on schedule and at the levels originally envisioned. Lastly, targets promote transparency and accountability by making available information on whether results have been achieved or not over time. 4. BASELINE IS ZERO For some indicators, baselines will be zero. For example, if a new program focuses on building the teaching skills of teachers, the baseline for the indicator “the number of teachers trained” is zero. Similarly, if an output of a new 2 A natural tension exists between the need to set realistic targets and the value, from a motivational perspective, of setting targets ambitious enough to ensure that staff and stakeholders will stretch to meet them; when motivated, people can often achieve more than they imagine. Targets that are easily achievable are not useful for management and reporting purposes since they are, in essence, pro forma. AO Teams should plan ahead for the analysis and interpretation of actual data against their performance targets (ADS 203.3.4.5). USING TARGETS FOR PERFORMANCE MANAGEMENT IN A LEARNING ORGANIZATION Targets can be important tools for effective program management. However, the extent to which targets are or are not met should not be the only criterion for judging the success or failure of a program. Targets are essentially flags for managers; if the targets are wildly exceeded or well-below expectations, the program manager should ask, “Why?” Consider an economic growth project. If a country experiences an unanticipated downturn in its economy, the underlying FIGURE 1. PORTFOLIO REVIEWS AND PERFORMANCE TARGETS To prepare for Portfolio Reviews, AO Teams should conduct analysis of program data, including achievement of planned targets. ADS 203.3.7.2 provides illustrative questions for these reviews: • Are the desired results being achieved? • Are the results within USAID’s manageable interest? • Will planned targets be met? • Is the performance management system currently in place adequate to capture data on the achievement of results? assumptions upon which that project was designed may be affected. If the project does not meet targets, then it is important for managers to focus on understanding 1) why targets were not met, and 2) whether the project can be adjusted to allow for an effective response to changed circumstances. In this scenario, program managers may need to reexamine the focus or priorities of the project and make related adjustments in indicators and/or targets. Senior managers, staff, and implementing partners should review performance information and targets as part of on-going project management responsibilities and in Portfolio Reviews (see Figure 1.) TYPES OF TARGETS FINAL AND INTERIM TARGETS A final target is the planned value of a performance indicator at the end of the AO or project. For AOs, the final targets are often set three to five years away, while for IRs they are often set one to three years away. Interim targets should be set for the key points of time in between the baseline and final target in cases where change is expected and data can be collected. QUANTITATIVE AND QUALITATIVE TARGETS Targets may be either quantitative or qualitative, depending on the nature of the associated indicator. Targets for quantitative indicators are numerical, whereas targets and for qualitative indicators are descriptive. To facilitate comparison of baselines, targets, and performance data for descriptive data, and to maintain data quality, some indicators convert qualitative data into a quantitative measure (see Figure 2). Nonetheless, baseline and target data for quantitative and 3 FIGURE 2. TARGET SETTING FOR QUANTITATIVE AND QUALITATIVE INDICATORS - WHAT’S THE DIFFERENCE? Quantitative indicators and targets are numerical. Examples include the dropout rate, the value of revenues, or number of children vaccinated. Qualitative indicators and targets are descriptive. However, descriptions must be based on a set of pre-determined criteria. It is much easier to establish baselines and set targets when qualitative data are converted into a quantitative measure. For example, the Advocacy Index is used to measure the capacity of a target organization, based on agreed-upon standards that are rated and scored. Other examples include scales, indexes, and scorecards (see Figure 3). qualitative indicators must be collected using the same instrument so that change can be captured and progress towards results measured accurately (see TIPS 6: Selecting Performance Indicators). EXPRESSING TARGETS As with performance indicators, targets can be expressed differently. There are several possible ways to structure targets to answer questions about the quantity of expected change: • Absolute level of achievement – e.g., 75% of all trainees obtained jobs by the end of the program or 7,000 people were employed by the end of the program. • Change in level of achievement – e.g., math test scores for students in grade nine increased by 10% in Year One, or math test scores for students in grade nine increased FIGURE 3. SETTING TARGETS FOR QUALITATIVE MEASURES For the IR Improvements in the Quality of Maternal and Child Health Services, a service delivery scale was used as the indicator to measure progress. The scale, as shown below, transforms qualitative information about services into a rating system against which targets can be set: 0 points = Service not offered 1 point = Offers routine antenatal care 1 point = Offers recognition and appropriate management of high risk pregnancies 1 point = Offers routine deliveries 1 point = Offers appropriate management of complicated deliveries 1 point = Offers post-partum care 1 point = Offers neonatal care Score = Total number of service delivery points Illustrative Target: Increase average score to 5 by the end of year. by three points in Year One. Yields per hectare under improved management practices increased by 25% or yields per hectare increased by 100 bushels from 2010 to 2013. • Change in relation to the scale of the problem – e.g., 35% of total births in target area attended by skilled health personnel by the end of year two, or the proportion of households with access to reliable potable water increased by 50% by 2013. • Creation or provision of something new – e.g., 4,000 doses of tetanus vaccine distributed in Year One, or a law permitting non-government organizations to generate income is passed by 2012. Other targets may be concerned with the quality of expected results. Such targets can relate to indicators measuring customer satisfaction, public opinion, responsiveness rates, enrollment rates, complaints, or failure rates. For example, the average customer satisfaction score for registration of a business license (based on a seven-point scale) increases to six by the end of the program, or the percentage of mothers who return six months after delivery for postnatal care increases to 20% by 2011. Targets relating to cost efficiency or producing outcomes at the least expense are typically measured in terms of unit costs. Examples of such targets might include: cost of providing a couple-year-ofprotection is reduced to $10 by 1999 or per-student costs of a training program are reduced by 20% between 2010 and 2013. demonstrate that: DISAGGREGATING TARGETS A gender-sensitive indicator can be defined as an indicator that captures gender-related changes in society over time. For example, a program may focus on increasing enrollment of children in secondary education. Program managers may not only want to look at increasing enrollment rates, but also at the gap between girls and boys. One way to measure performance would be to When a program’s progress is measured in terms of its effects on different segments of the population, disaggregated targets can provide USAID with nuanced information that may not be obvious in the aggregate. For example, a program may seek to increase the number of micro-enterprise loans received by businesses in select rural provinces. By disaggregating targets, program inputs can be directed to reach a particular target group. Targets can be disaggregated along a number of dimensions including gender, location, income level, occupation, administration level (e.g., national vs. local), and social groups. For USAID programs, performance management systems must include gender-sensitive indicators and sexdisaggregated data when the technical analyses supporting the AO or project to be undertaken 4 • The different roles and status of women and men affect the activities differently; and • The anticipated results of the work would affect women and men differently. FIGURE 4. AN EXAMPLE OF DISAGGREGATED TARGETS FOR GENDER SENSITIVE INDICATORS Indicator: Number of children graduating from secondary school; percent gap between boys and girls. B=boys; G=girls Year 2010 (baseline) Planned 2011 175 120B; 55G 50.0% 200 120B; 80G 25.0% 200 115B; 92G 2012 2013 Actual 145 115B; 30G 58.6% 160 120 B; 40G 56.3% 200 130 B; 70G 30.0% 205 110B; 95G disaggregate the total number of girls and boys attending school at the beginning and at the end of the school year (see Figure 4). Another indicator might look at the quality of the participation levels of girls vs. boys with a target of increasing the amount of time girls engage in classroom discussions by two hours per week. Gender-sensitive indicators can use qualitative or quantitative methodologies to assess impact directly on beneficiaries. They can also be used to assess the differential impacts of policies, programs, or practices supported by USAID on women and men (ADS 201.3.4.3). Program managers should think carefully about disaggregates prior to collecting baseline data and setting targets. Expanding the number of disaggregates can increase the time and costs associated with data collection and analysis. SETTING TARGETS Targets should be realistic, evidence-based, and ambitious. Setting meaningful targets provides staff, implementing partners, and stakeholders with benchmarks to document progress toward achieving results. Targets need to take into account program resources, the implementation period, and the development hypothesis implicit in the results framework. PROGRAM RESOURCES The level of funding, human resources, material goods, and institutional capacity contribute to determining project outputs and affecting change at different levels of results and the AO. Increases or decreases in planned program resources should be considered when setting targets. ASSISTANCE OBJECTIVES AND RESULTS FRAMEWORKS Performance targets represent commitments that USAID AO Teams make about the level and timing of results to be achieved by a program. Determining targets is easier when objectives and indicators are within USAID’s manageable interest. Where a result sits in the causal chain, critical assumptions, and other contributors to achievement of the AO will affect targets. Other key considerations include: 1. Historical Trends: Perhaps even more important than examining a single baseline value, is understanding the underlying historical trend in the indicator value over time. What pattern of change has been evident in the past five to ten years on the performance indicator? Is there a trend, upward FIGURE 5. PROGRESS IS NOT ALWAYS A STRAIGHT LINE While it is easy to establish annual targets by picking an acceptable final performance level and dividing expected progress evenly in the years between, such straight-line thinking about progress is often inconsistent with the way development programs really work. More often than not, no real progress – in terms of measureable impacts or results – is evident during the start-up period. Then, in the first stage of implementation, which may take the form of a pilot test, some but not much progress is made, while the program team adjusts its approaches. During the final two or three years of the program, all of this early work comes to fruition. Progress leaps upward, and then rides a steady path at the end of the program period. If plotted on a graph, it would look like “stair steps,” not a straight line 5 or downward, that can be drawn from existing reports, records, or statistics? Trends are not always a straight line; there may be a period during which a program plateaus before improvements are seen (see Figure 5). 2. Expert Judgments: Another option is to solicit expert opinions as to what is possible or feasible with respect to a particular indicator and country setting. Experts should be knowledgeable about the program area as well as local conditions. Experts will be familiar with what is and what is not possible from a technical and practical standpoint – an important input for any target-setting exercise. 3. Research Findings: Similarly, reviewing development literature, especially research and evaluation findings, may help in choosing realistic targets. In some program areas, such as population and health, extensive research findings on development trends are already widely available and what is possible to achieve may be well-known. In other areas, such as democracy, research on performance indicators and trends may be scarce. 4. Stakeholder Expectations: While targets should be defined on the basis of an objective assessment of what can be accomplished given certain conditions and resources, it is useful to get input from stakeholders regarding what they want, need, and expect from USAID activities. What are the expectations of progress? Soliciting expectations may involve formal interviews, rapid appraisals, or informal conversations. Not only end users should be surveyed; intermediate actors (e.g., implementing agency staff) can be especially useful in developing realistic targets. 5. Achievement of Similar Programs: Benchmarking is the FIGURE 6. BENCHMARKING One increasingly popular way of setting targets and comparing performance is to look at the achievement of another program or process by one or a collection of high-performing organizations. USAID is contributing to the development of benchmarks for programs such as water governance (http://www.rewab.net), financial management (www.fdirisk.com) and health care systems (www.healthsystems2020.org) Targets may be set to reflect this “best in the business” experience, provided of course that consideration is given to the comparability of country conditions, resource availability, and other factors likely to influence the performance levels that can be achieved. process of comparing or checking the progress of other similar programs. It may be useful to analyze progress of other USAID Missions or offices, or other development agencies and partners, to understand the rate of change that can be expected in similar circumstances. APPROACHES FOR TARGET SETTING There is no single best approach to use when setting targets; the process is an art and a science. Although much depends on available information, the experience and knowledge of AO Team members will add to the thinking behind performance target. Alternative approaches include the following: 1. Projecting a future trend, then adding the “valued added” by USAID activities. Probably the most rigorous and credible approach, this involves estimating the future trend without USAID’s program, and then adding whatever gains can be expected as a result of USAID’s efforts. This is no simple task, as projecting the future can be very tricky. The task is made somewhat easier if historical data are available and can be used to establish a trend line. 2. Establishing a final performance target for the end of the planning period, and then planning the progress from the baseline level. This approach involves deciding on the program’s performance target for the final year, and then defining a path of progress for the years in between. Final targets may be judged on benchmarking techniques or on judgments of experts, program staff, customers, or partners about the expectations of what can be reasonably achieved within the planning period. When setting interim targets, remember that progress is not always a straight line. All targets, both final and interim, should be based on a careful analysis of what is realistic to achieve, given the stage of program implementation, resource availability, country conditions, technical constraints, etc. 6 3. Setting annual performance targets. Similar to the previous approach, judgments are made about what can be achieved each year, instead of starting with a final performance level and working backwards. In both cases, consider variations in performance, e.g., seasons and timing of activities and expected results. DOCUMENT AND FILE Typically, USAID project, baselines, targets, and actual data are kept in a data table for analysis either in the PMP, as a separate document, or electronically. Furthermore, it is important to document in the PMP how targets were selected and why target values were chosen. Documentation serves as a future reference for: • Explaining a methodology. target-setting • Analyzing actual performance data. • Setting targets in later years. Responding to inquiries or audits For more information: TIPS publications are available online at [insert website]. Acknowledgements: Our thanks to those whose experience and insights helped shape this publication, including Gerry Britan and Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was updated by Jill Tirnauer of Management Systems International. Comments can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 7 NUMBER 9 2011 Printing PERFORMANCE MONITORING & EVALUATION TIPS CONDUCTING CUSTOMER SERVICE ASSESSMENTS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive Service (ADS) Chapter 203. WHAT IS A CUSTOMER SERVICE ASSESSMENT? Under USAID’s new operations system, Agency operating units are required to routinely and systematically assess customer needs for, perceptions of, and reactions to USAID programs. A customer service assessment is a management tool for understanding USAID’s programs from the customer’s perspective. Most often these assessments seek feedback from customers about a program’s service delivery performance. The Agency seeks views from both ultimate customers (the end-users, or beneficiaries, of USAID activities—usually disadvantaged groups) and intermediate customers (persons or organizations using USAID resources, services, or products to serve the needs of the ultimate customers). This TIPS gives practical advice about customer service assessments— for example, when they should be conducted, what methods may be used, and what information can be usefully included. Customer service assessments may also be used to elicit opinions from customers or potential customers about USAID’s strategic plans, development objectives, or other planning issues. 1 For example, the operating unit may seek their views on development needs and priorities to help identify new, relevant activities. WHO DOES CUSTOMER SERVICE ASSESSMENTS? WHY CONDUCT CUSTOMER SERVICE ASSESSMENTS? USAID guidance specifies that all operating units should develop a customer service plan. The plan should include information about customers’ needs, preferences, and reactions as an element in a unit’s planning, achieving, perfor- USAID’s reengineered operating system calls for regularly conducting customer service assessments for all program activities. Experience indicates that effective customer feedback on service delivery improves performance, achieves better results, and creates a more participatory working environment for programs, and thus increases sustainability. Box 1.The Customer Service Plan The customer service plan presents the operating unit’s vision for including customers and partners to achieve its objectives. It explains how customer feedback will be incorporated to determine customer needs and perceptions of services provided, and how this feedback will be regularly incorporated into the unit’s operations. The customer service plan is a management tool for the operating unit and does not require USAID/W approval. Specifically, the plan These assessments provide USAID staff with the information they need for making constructive changes in the design and execution of development programs. This information may also be shared with partners and customers as an element in a collaborative, ongoing relationship. In addition, customer service assessments provide input for reporting on results, allocating resources, and presenting the operating unit’s development programs to external audiences. • Identifies the ultimate and intermediate customers for service delivery and segments customer groups for different programs, products, or services Customer service assessments are relevant not only to program-funded activities directed to customers external to USAID. They can also be very useful in assessing services provided to internal USAID customers. • Describes and regularly schedules appropriate means for assessing service delivery, performance, and customer satisfaction Moreover, customer service assessments are federally mandated. The Government Performance and Results Act of 1993 and Executive Order 12862 of 1993 direct federal agencies to reorient their programs toward achievement of measurable results that reflect customers’ needs and to systematically assess those needs. Agencies must report annually to the Administration on customer service performance. • Establishes service principles and specifies measurable service performance standards indicates staff responsibilities for managing customer service activities—including assessments • Specifies the resources required for customer service activities and assessments. 2 formance in delivering the program’s products and services. mance monitoring and evaluation functions (see Box 1). Depending on the scope of its program operations, an operating unit may find it needs to plan several customer service assessments. The various assessments might be tailored to different strategic objectives, program activities and services, or customer groups (differentiated, for example, by gender, ethnicity, or income). Responsibility for designing and managing these assessments typically is assigned to the relevant development objective. Unless the service or product delivery is satisfactory (i.e., timely, relevant, accessible, good quality) from the perspective of the customers, it is unlikely that the program will achieve its substantive development results, which, after all, ultimately depend on customers’ participation and use of the service or product. For example, a family planning program is unlikely to achieve reduced fertility rates unless customers are satisfied with the contraceptive products it offers and the delivery mechanism it uses to provide them. If not sufficiently satisfied, customers will simply not use them. HOW DO CUSTOMER SERVICE ASSESSMENTS COMPLEMENT PERFORMANCE MONITORING AND EVALUATION? Customer service assessments thus complement broader performance monitoring and evaluation systems by monitoring a specific type of result: service delivery performance Performance monitoring and evaluation broad- from the customer’s perspective. By providing ly addresses the results or outcomes of a pro- managers with information on whether cusgram.These results reflect objectives chosen by tomers are satisfied with and using a program’s the operating unit (in consultation with part- products and services, these assessments are ners and customer representatives) and may especially useful for giving early indications of encompass several types of results. whether longer term substantive development results are likely to be met. Often they are medium- to longer-term developmental changes or impacts. Examples: reduc- Both customer service assessments and perfortions in fertility rates, increases in income, im- mance monitoring and evaluation use the same provements in agricultural yields, reductions in array of standard social science investigation forest land destroyed. techniques—surveys, rapid and participatory appraisal, document reviews, and the like. In some cases, the same survey or rapid appraisal may even be used to gather both types of information. For example, a survey of customers of an irrigation program might ask questions about service delivery aspects (e.g., access, timeliness, quality, use of irrigation water) and questions concerning longer term development results (e.g., yields, income). Another type of result often included in performance monitoring and evaluation involves customer perceptions and responses to goods or services delivered by a program— for example, the percentage of women satisfied with the maternity care they receive, or the proportion of farmers who have tried a new seed variety and intend to use it again. Customer service assessments look at this type of result—customer satisfaction, perceptions, preferences, and related opinions about the operating unit’s per3 planning the assessment should 1) identify the purpose and intended uses of the information, 2) clarify the program products or services being assessed, 3) identify the customer groups involved, and 4) define the issues the study will address. Moreover, the scope of work typically discusses data collection methods, analysis techniques, reporting and dissemination plans, and a budget and time schedule. STEPS IN CONDUCTING A CUSTOMER SERVICE ASSESSMENT Step 1. Decide when the assessment should be done. Customer service assessments should be conducted whenever the operating unit requires customer information for its management purposes. The general timing and frequency of customer service assessments is typically outlined in the unit’s customer service plan. Specific issues to be assessed will vary with the development objective, program activities under way, socioeconomic conditions, and other factors. However, customer service assessments generally aim at understanding • Customer views regarding the importance Customer service assessments are likely to of various USAID-provided services (e.g., be most effective if they are planned to coortraining, information, commodities, technidinate with critical points in cycles associated cal assistance) to their own needs and priwith the program being assessed (crop cycles, orities local school year cycles, host country fiscal year cycles, etc.) as well as with the Agency’s own • Customer judgments, based on measurable annual reporting and funding cycles. service standards, on how well USAID is performing service delivery Customer service assessments will be most valuable as management and reporting tools if • Customer comparisons of USAID service they are carried out some months in advance of delivery with that of other providers. the operating unit’s annual planning and reporting process. For example, if a unit’s results re- Open-ended inquiry is especially well suited for view and resources request (R4) report is to be addressing the first issue.The other two may be completed by February, the customer service measured and analyzed quantitatively or qualiassessment might be conducted in November. tatively by consulting with ultimate or intermediate customers with respect to a number of However, the precise scheduling and execution service delivery attributes or criteria important of assessments is a task appropriate for those responsible for results in a program sector— members of the strategic objective or results Box 2. package team. Illustrative Criteria For Assessing Service Delivery Step 2. Design the assessment. Convenience. Ease of working with the operating unit, simple processes, minimal red tape, easy physical access to contacts Depending on the scale of the effort, an operating unit may wish to develop a scope of work for a customer service assessment. At a minimum, 4 Responsiveness. Follow up promptly, meet changing needs, solve problems, answer questions, return calls and program activity. Reliability. On-time delivery that is thorough, accurate, complete With its objective clearly in mind, and the information to be collected carefully specified, the operating unit may decide in-house resources, external assistance consultants, or a combination of the two, to conduct the assessment. Step 3. Conduct the assessment. Quality of products and services. Perform as intended; flexible in meeting local needs; professionally qualified personnel Contact personnel. Professional, knowledgable, understand local culture, language skills Select from a broad range of methods. A customer service assessment is not just a survey. It may use a broad repertory of inquiry tools designed to elicit information about the needs, preferences, or reactions of customers regarding a USAID activity, product or service. Methods may include the following: to customer satisfaction (see Box 2). • Formal customer surveys In more formal surveys, for example, customers may be asked to rate services and products on, say, a 1-to-5 scale indicating their level of satisfaction with specific service characteristics or attributes they consider important (e.g., quality, reliability, responsiveness). In addition to rating the actual services, customers may be asked what they would consider “excellent” service, referring to the same service attributes and using the same 5-point scale. Analysis of the gap between what customers expect as an ideal standard and what they perceive they actually receive indicates the areas of service delivery needing improvement. • rapid appraisal methods (e.g., focus groups, town meetings, interviews with key informants) Breadth of choice. Sufficient choices to meet customer needs and preferences • Participatory appraisal techniques, in which customers plan analyze, self-monitor, evaluate or set priorities for activities • Document reviews, including systematic use of social science research conducted by others. Use systematic research methods. A hastily prepared and executed effort does not provide quality customer service assessment informaIn more qualitative approaches, such as focus tion. Sound social science methods are essengroups, customers discuss these issues among tial. themselves while researchers listen carefully to their perspectives. Operating units and teams Practice triangulation. To the extent resources should design their customer assessments to and time permit, it is preferable to gather incollect customer feedback on service delivery formation from several sources and methods, issues and attributes they believe are most im- rather than relying on just one. Such triangulaportant to achieving sustainable results toward tion will build confidence in findings and proa clearly defined strategic objective. These is- vide adequate depth of information for good sues will vary with the nature of the objective decision-making and program management. In 5 particular, quantitative surveys and qualitative studies often complement each other. Whereas a quantitative survey can produce statistical measurements of customer satisfaction (e.g., with quality, timeliness, or other aspects of a program operation) that can be generalized to a whole population, qualitative studies can provide an in-depth understanding and insight into customer perceptions and expectations on these issues. and encourage closer rapport with customers and partners. Moreover, they encourage a more collaborative, participatory, and effective approach to achievement of objectives. Conduct assessments routinely. Customer service assessments are designed to be consciously iterative. In other words, they are undertaken periodically to enable the operating unit to build a foundation of findings over time to inform management of changing customer needs and perceptions. Maintaining an outreach orientation will help the program adapt to changing circumstances as reflected in customer views. H. S. Plunkett and Elizabeth Baltimore, Customer Focus Cookbook, USAID/M/ROR, August 1996. Selected Further Reading Resource Manual for Customer Surveys. Statistical Policy Office, Office of Management and Budget. October 1993. Zeithaml, Valarie A; A. Parasuraman; and Leonard L.Berry. Delivering Quality Service. New York: Free Press Step 4. Broadly disseminate and use assessment findings to improve performance. Customer service assessments gain value when broadly disseminated within the operating unit, to other operating units active in similar program sectors, to partners, and more widely within USAID. Sharing this information is also important to maintaining open, transparent relations with customers themselves. Assessment findings provide operating unit managers with insight on what is important to customers and how well the unit is delivering its programs. They also can help identify operations that need quality improvement, provide early detection of problems, and direct attention to areas where remedial action may be taken to improve delivery of services. Customer assessments form the basis for review of and recommitment to service principles. They enable measurement of service delivery performance against service standards 6 NUMBER 10 2011 Printing PERFORMANCE MONITORING & EVALUATION TIPS CONDUCTING FOCUS GROUP INTERVIEWS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to peroformance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive Service (ADS) Chapter 203. WHAT IS A FOCUS GROUP INTERVIEW? USAID’s guidelines encourage use of rapid, lowcost methods to collect information on the performance of development assistance activities. A focus group interview is an inexpensive, rapid appraisal technique that can provide managers with a wealth of qualitative information on performance of development activities, services, and products, or other issues. A facilitator guides 7 to 11 people in a discussion of their experiences, feelings, and preferences about a topic. The facilitator raises issues identified in a discussion guide and uses probing techniques to solicit views, ideas, and other information. Sessions typically last one to two hours. Focus group interviews, the subject of this TIPS, is one such method. ADVANTAGES AND LIMITATIONS 1 cannot be explained recommendations and This technique has several advantages. It is low suggestions are needed from customers, cost and provides speedy results. Its flexible forpartners, experts, or other stakeholders mat allows the facilitator to explore unanticipated issues and encourages interaction among participants. In a group setting participants pro- For example, focus groups were used to unvide checks and balances, thus minimizing false cover problems in a Nepal family planning program where facilities were underutilized, and or extreme views. to obtain suggestions for improvements from Focus groups have some limitations, however. customers.The focus groups revealed that rural The flexible format makes it susceptible to fa- women considered family planning important. cilitator bias, which can undermine the validity However, they did not use the clinics because of and reliability of findings. Discussions can be caste system barriers and the demeaning mansidetracked or dominated by a few vocal individ- ner of clinic staff. Focus group participants suguals. Focus group interviews generate relevant gested appointing staff of the same social status qualitative information, but no quantitative data to ensure that rural women were treated with from which generalizations can be made for a respect. They also suggested that rural women whole population. Moreover, the information disseminate information to their neighbors can be difficult to analyze; comments should be about the health clinic. interpreted in the context of the group setting. Before deciding whether to use focus group interviews as a source of information, the study WHEN ARE FOCUS GROUP purpose needs to be clarified. This requires identifying who will use the information, deterINTERVIEWS USEFUL? mining what information is needed, and understanding why the information is needed. Once this is done, an appropriate methodology can Focus group interviews can be useful in all phasbe selected. (See Tips 5 Using Rapid Appraisal es of development activities— planning, impleMethods for additional information on selecting mentation, monitoring, and evaluation. They can appraisal techniques.) be used to solicit views, insights, and recommendations of program staff, customers, stakeholders, technical experts, or other groups. STEPS IN CONDUCTING FOCUS GROUP INTERVIEWS They are especially appropriate when: • program activities are being planned and it is important for managers to understand customers’ and other stakeholders’ attitudes, preferences or needs Follow this step-by-step advice to help ensure high-quality results. • specific services or outreach approaches Step 1. Select the team have to take into account customers’ prefConducting a focus group interview requires a erences small team, with at least a facilitator to guide • major program implementation problems the discussion and a rapporteur to record it. The facilitator should be a native speaker who 2 can put people at ease. The team should have Step 3. Decide on timing and location substantive knowledge of the topic under discussion. Discussions last one to two hours and should be conducted in a convenient location with Skills and experience in conducting focus some degree of privacy. Focus groups in a small groups are also important. If the interviews village arouse curiosity and can result in uninare to be conducted by members of a broader vited participants. Open places are not good evaluation team without previous experience spots for discussions. in focus group techniques, training is suggested. This training can take the form of role playing, Step 4. Prepare the discussion guide formalized instruction on topic sequencing and probing for generating and managing group dis- The discussion guide is an outline, prepared in cussions, as well as pre-testing discussion guides advance, that covers the topics and issues to be in pilot groups. discussed. It should contain few items, allowing some time and flexibility to pursue unanticipatStep 2. Select the participants ed but relevant issues. First, identify the types of groups and institutions that should be represented (such as program managers, customers, partners, technical experts, government officials) in the focus groups. This will be determined by the informtion needs of the study. Often separate focus groups are held for each type of group. Second, identify the most suitable people in each group. One of the best approaches is to consult key informants who know about local conditions. It is prudent to consult several informants to minimize the biases of individual preferences. Excerpt from a Discussion Guide on Curative Health Services (20-30 minutes) Q. Who treats/cures your children when they get sick? Why? Note: Look for opinions about Each focus group should be 7 to 11 people to allow the smooth flow of conversation. • outcomes and results • provider-user relations • costs (consultations, transportation, medicine) • waiting time • physical aspects (privacy, cleanliness) • availability of drugs, lab services • access (distance, availability of transportation) • follow-up at home Participants should be homogenous, from similar socioeconomic and cultural backgrounds. They should share common traits related to the discussion topic. For example, in a discussion on contraceptive use, older and younger women should participate in separate focus groups. Younger women may be reluctant to discuss sexual behavior among their elders, especially if it deviates from tradition. Ideally, people should not know each other. Anonymity lowers inhibition and prevents formation of cliques. 3 The guide provides the framework for the facilitator to explore, probe, and ask questions. Initiating each topic with a carefully crafted question will help keep the discussion focused. Using a guide also increases the comprehensiveness of the data and makes data collection more efficient. Its flexibility, however can mean that different focus groups are asked different questions, reducing the credibility of the findings. An excerpt from a discussion guide used in Bolivia to assess child survival services provides an illustration. (See box on page 3) • What do you think about corruption in the criminal justice system? • How do you feel about the three parties running in upcoming national elections? Use probing techniques. When participants give incomplete or irrelevant answers, the facilitator can probe for fuller, clearer responses. A few suggested techniques: Repeat the question—repetition gives more time to think Step 5. Conduct the interview Adopt sophisticated naivete” posture—convey limited understanding of the issue and ask for specific details Establish rapport. Often participants do not know what to expect from focus group discussions. It is helpful for the facilitator to outline the purpose and format of the discussion at the beginning of the session, and set the group at ease. Participants should be told that the discussion is informal, everyone is expected to participate, and divergent views are welcome. Pause for the answer—a thoughtful nod or expectant look can convey that you want a fuller answer Repeat the reply—hearing it again sometimes Phrase questions carefully. Certain types of ques- stimulates conversation. Ask when, what, tions impede group discussions. For example, where, which, and how questions—they proyes-or-no questions are one dimensional and voke more detailed information do not stimulate discussion. “Why” questions put people on the defensive and cause them to Use neutral comments— Anything else?” Why do take “politically correct” sides on controversial you feel this way?” issues. Control the discussion. In most groups a few indiOpen-ended questions are more useful be- viduals dominate the discussion. To balance out cause they allow participants to tell their story participation: in their own words and add details that can re• Address questions to individuals who are sult in unanticipated findings. For example: reluctant to talk • What do you think about the criminal justice system? • Give nonverbal cues (look in another direction or stop taking notes when an individual • How do you feel about the upcoming natalks for an extended period) tional elections? • Intervene, politely summarize the point, If the discussion is too broad the facilitator can then refocus the discussion narrow responses by asking such questions as: 4 • Take advantage of a pause and say, “Thank you for that interesting idea, perhaps we can discuss it in a separate session. Meanwhile with your consent, I would like to move on to another item.” trends andpatterns, strongly held or frequently aired opinions. Read each transcript. Highlight sections that correspond to the discussion guide questions and mark comments that could be used in the final report. Minimize group pressure. When an idea is being adopted without any general discussion or disagreement, more than likely group pressure is occurring. To minimize group pressure the facilitator can probe for alternate views. For example, the facilitator can raise another issue, or say, “We had an interesting discussion but let’s explore other alter natives.” Analyze each question separately. After reviewing all the responses to a question or topic, write a summary statement that describes the discussion. In analyzing the results, the team should consider: • Words. Weigh the meaning of words participants used. Can a variety of words and phrases categorize similar responses? Step 6. Record the discussion A rapporteur should perform this function. Tape recordings in conjunction with written • Framework. Consider the circumstances in notes are useful. Notes should be extensive which a comment was made (context of and reflect the content of the discussion as well previous discussions, tone and intensity of as nonverbal behavior (facial expressions, hand the comment). movements). • Internal agreement. Figure out whether shifts Shortly after each group interview, the team in opinions during the discussion were should summarize the information, the team’s caused by group pressure. impressions, and implications of the information for the study. • Precision of responses. Decide which responses were based on personal experience and Discussion should be reported in participants’ give them greater weight than those based language, retaining their phrases and grammation vague impersonal impressions. cal use. Summarizing or paraphrasing responses can be misleading. For instance, a verbatim reply • The big picture. Pinpoint major ideas. Allo“Yes, indeed! I am positive,” loses its intensity cate time to step back and reflect on major when recorded as “Yes.” findings. Step 7. Analyze results • Purpose of the report. Consider the objectives of the study and the information needed for decisionmaking. The type and After each session, the team should assemble scope of reporting will guide the analytical the interview notes (transcripts of each focus process. For example, focus group reports group interview), the summaries, and any other typically are: (1) brief oral reports that highrelevant data to analyze trends and patterns. light key findings; (2) descriptive reports The following method can be used. that summarize the discussion; and (3) analytical reports that provide trends, patterns, Read summaries all at one time. Note potential 5 or findings and include selected comments. Focus Group Interviews of Navarongo Community Health and Family Planning Project in Ghana The Ghanaian Ministry of Health launched a small pilot project in three villages in 1994 to assess community reaction to family planning and elicit community advice on program design and management. A new model of service deliverywas introduced: community health nurses were retrained as community health officers living in the communities and providing village-based clinical services. Focus group discussions were used to identify constraints to introducing family planning services and clarify ways to design operations that villagers value. Discussions revealed that many women want more control over their ability to reproduce, but believe their preferences are irrelevant to decisions made in the male dominated lineage system. This indicated that outreach programs aimed primarily at women are insufficient. Social groups must be included to legitimize and support individuals’ family-planning decisions. Focus group discussions also revealed women’s concerns about the confidentiality of information and services. These findings preclude development of a conventional communitybased distribution program, since villagers clearly prefer outside service delivery workers to those who are community members. Selected Further Reading Krishna Kumar, Conducting Group Interviews in Developing Countries, A.I.D. Program Design and Evaluation Methodology Report No. 8, 1987 (PN-AAL-088) Richard A. Krueger, Focus Groups: A Practical Guide for Applied Research, Sage Publications, 1988 6 2009, NUMBER 12 2ND EDITION PERFORMANCE MONITORING & EVALUATION TIPS DATA QUALITY STANDARDS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. WHY IS DATA QUALITY IMPORTANT? Results-focused development programming requires managers to design and implement programs based on evidence. Since data play a central role in establishing effective performance management systems, it is essential to ensure good data quality (see Figure 1). Without this, decision makers do not know whether to have confidence in the data, or worse, could make decisions based on misleading data. Attention to data quality assists in: Figure 1. Data Quality Plays a Central Role in Developing Effective Performance Management Systems Cycle: Plan: Identify or Refine Key Program Objectives Design: Develop or Refine the Performance Management Plan Analyze Data Use Data: Use Findings from Data Analysis to Improve Program Effectiveness Data Quality Ensuring that limited development resources are used as effectively as possible 1 Ensuring that Agency program and budget decisions in Washington and the field are as well informed possible as practically Meeting the requirements of the Government Performance and Results Act (GPRA) Reporting the impact of USAID programs to external stakeholders, including senior management, OMB, the Congress, and the public with confidence DATA QUALITY STANDARDS Data quality is one element of a larger interrelated performance management system. Data quality flows from a well designed and logical strategic plan where Assistance Objectives (AOs) and Intermediate Results (IRs) are clearly identified. If a result is poorly defined, it is difficult to identify quality indicators, and further, without quality indicators, the resulting data will often have data quality problems. One key challenge is to determine what level of data quality is acceptable (or “good enough”) for management purposes. It is important to understand that we rarely require the same degree of rigor as needed in research or for laboratory experiments. Standards for data quality must be keyed to our intended use of the data. That is, the level of accuracy, currency, precision, and reliability of performance The Five Data Quality Standards 1. Validity 2. Reliability 3. Precision 4. Integrity 5. Timeliness information should be consistent with the requirements of good management. Determining appropriate or adequate thresholds of indicator and data quality is not an exact science. This task is made even more difficult by the complicated and often datapoor development settings in which USAID operates. As with performance indicators, we sometimes have to consider trade-offs, or make informed judgments, when applying the standards for data quality. This is especially true if, as is often the case, USAID relies on others to provide data for indicators. For example, if our only existing source of data for a critical economic growth indicator is the Ministry of Finance, and we know that the Ministry’s data collection methods are less than perfect, we may have to weigh the alternatives of relying on lessthan-ideal data, having no data at all, or conducting a potentially costly USAIDfunded primary data collection effort. In this case, 2 a decision must be made as to whether the Ministry’s data would allow the Assistance Objective team to have confidence when assessing program performance or whether they are so flawed as to be useless, or perhaps misleading, in reporting and managing for results. The main point is that managers should not let the ideal drive out the good. 1. VALIDITY Validity refers to the extent to which a measure actually represents what we intend to measure.1 Though simple in principle, validity can be difficult to assess in practice, particularly when measuring social phenomena. For example, how can we measure political power or sustainability? Is the poverty gap a good measure of the extent of a country’s poverty? However, even valid indicators have little value, if the data collected do not correctly measure the variable or characteristic encompassed by the indicator. It is quite possible, in other words, to identify valid indicators but to then collect inaccurate, unrepresentative, or incomplete data. In such cases, the quality of the indicator is moot. It would be equally undesirable to collect 1 This criterion is closely related to “directness” criteria for indicators. good data indicator. for an invalid There are a number of ways to organize or present concepts related to data validity. In the USAID context, we focus on three key dimensions of validity that are most often relevant to development programming, including: face validity, attribution, and measurement error. FACE VALIDITY Face validity means that an outsider or an expert in the field would agree that the data is a true measure of the result. For data to have high face validity, the data must be true representations of the indicator, and the indicator must be a valid measure of the result. For example: Result: Increased household income in a target district Indicator: Value of median household income in the target district In this case, the indicator has a high degree of face validity when compared to the result. That is, an external observer is likely to agree that the data measure the intended objective. On the other hand, consider the following example: Result: Increased household income in a target district Indicator: Number of houses in the target community with tin roofs This example does not appear to have a high degree of face validity as a measure of increased income, because it is not immediately clear how tin roofs are related to increased income. The indicator above is a proxy indicator for increased income. Proxy indicators measure results indirectly, and their validity hinges on the assumptions made to relate the indicator to the result. If we assume that 1) household income data are too costly to obtain and 2) research shows that when the poor have increased income, they are likely to spend it on tin roofs, then this indicator could be an appropriate proxy for increased income. ATTRIBUTION Attribution focuses on the extent to which a change in the data is related to USAID interventions. The concept of attribution is discussed in detail as a criterion for indicator selection, but reemerges when assessing validity. Attribution means that changes in the data can be plausibly associated with USAID interventions. For example, an indicator that measures changes at the national level is not usually appropriate for a program targeting a few areas or a particular segment of the 3 population. following: Consider the Result: revenues in municipalities. Increased targeted Indicator: Number of municipalities where tax revenues have increased by 5%. In this case, assume that increased revenues are measured among all municipalities nationwide, while the program only focuses on a targeted group of municipalities. This means that the data would not be a valid measure of performance because the overall result is not reasonably attributable to program activities. MEASUREMENT ERROR Measurement error results primarily from the poor design or management of data collection processes. Examples include leading questions, unrepresentative sampling, or inadequate training of data collectors. Even if data have high face validity, they still might be an inaccurate measure of our result due to bias or error in the measurement process. Judgments about acceptable measurement error should reflect technical assessments about what level of reductions in measurement error are possible and practical. This can be assessed on the basis of cost as well as management judgments about what level of accuracy decisions. is needed for Some degree of measurement error is inevitable, particularly when dealing with social and economic changes, but the level of measurement error associated with all performance data collected or used by operating units should not be so large as to 1) call into question either the direction or degree of change reflected by the data or 2) overwhelm the amount of anticipated change in an indicator (making it impossible for managers to determine whether progress. reflected in the data is a result of actual change or of measurement error). The two main sources of measurement error are sampling and nonsampling error. Sampling Error (or representativeness) Data are said to be representative if they accurately reflect the population they are intended to describe. The representativeness of data is a function of the process used to select a sample of the population from which data will be collected. It is often not possible, or even desirable, to collect data from every individual, household, or community involved in a program due to resource or practical constraints. In these cases, data are collected from a sample to infer the status of the population as a whole. If we are interested in describing the characteristics of a country’s primary schools, for example, we would not need to examine every school in the country. Depending on our focus, a sample of a hundred schools might be enough. However, when the sample used to collect data are not representative of the population as a whole, significant bias can be introduced into the data. For example, if we only use data from 100 schools in the capital area of the country, our data will not likely be representative of all primary schools in the country. Drawing a sample that will allow managers to confidently generalize data/findings to the population requires that two basic criteria are met: 1) that all units of a population (e.g., households, schools, enterprises) have an equal chance of being selected for the sample and 2) that the sample is of adequate size. The sample size necessary to ensure that resulting data are representative to any specified degree can vary substantially, depending on the unit of analysis, the size of the population, the variance of the characteristics being tracked, and the number of characteristics that we need to analyze. Moreover, during data collection it is rarely possible to obtain data for every member of an initially 4 chosen sample. Rather, there are established techniques for determining acceptable levels of non-response or for substituting new respondents. If a sample is necessary, it is important for managers to consider the sample size and method relative to the data needs. While data validity should always be a concern, there may be situations where accuracy is a particular priority. In these cases, it may be useful to consult a sampling expert to ensure the data are representative. Non-Sampling Error Non-sampling error includes poor design of the data collection instrument, poorly trained or partisan enumerators, or the use of questions (often related to sensitive subjects) that elicit incomplete or untruthful answers from respondents. Consider the earlier example: Result: Increased household income in a target district Indicator: Value of median household income in the target district While these data appear to have high face validity, there is the potential for significant measurement error through reporting bias. If households are asked about their income, they might be tempted to income to under-report demonstrate the need for additional assistance (or overreport to demonstrate success). A similar type of reporting bias may occur when data is collected in groups or with observers, as respondents may modify their responses to match group or observer norms. This can be a particular source of bias when collecting data on vulnerable groups. Likewise, survey or interview questions and sequencing should be developed in a way that minimizes the potential for the leading of respondents to predetermined responses. In order to minimize nonsampling measurement error, managers should carefully plan and vet the data collection process with a careful eye towards potential sources of bias. Minimizing Measurement Error Keep in mind that USAID is primarily concerned with learning, with reasonable confidence, that anticipated improvements have occurred, not with reducing error below some arbitrary level. 2 Since it is impossible to completely eliminate measurement error, and reducing error tends to become increasingly expensive or difficult, it is important to consider what an 2 For additional information, refer to Common Problems/Issues with Using Secondary Data in the CDIE Resource Book on Strategic Planning and Performance Monitoring, April 1997. acceptable level of error would be. Unfortunately, there is no simple standard that can be applied across all of the data collected for USAID’s varied programs and results. As performance management plans (PMPs) are developed, teams should: Identify the existing or potential sources of error for each indicator and document this in the PMP. Assess how this error compares with the magnitude of expected change. If the anticipated change is less than the measurement error, then the data are not valid. Decide whether alternative data sources (or indicators) need to be explored as better alternatives or to complement the data to improve data validity. 2. RELIABILITY Data should reflect stable and consistent data collection processes and analysis methods over time. Reliability is important so that changes in data can be recognized as true changes rather than reflections of poor or changed data collection methods. For example, if we use a thermometer to measure a child’s temperature repeatedly and the results vary from 95 to 105 degrees, even though we know the child’s temperature hasn’t changed, the thermometer is 5 not a reliable instrument for measuring fever. In other words, if a data collection process is unreliable due to changes in the data collection instrument, different implementation across data collectors, or poor question choice, it will be difficult for managers to determine if changes in data over the life of the project reflect true changes or random error in the data collection process. Consider the following examples: Indicator: Percent increase in income among target beneficiaries. The first year, the project reports increased total income, including income as a result of off-farm resources. The second year a new manager is responsible for data collection, and only farm based income is reported. The third year, questions arise as to how “farm based income” is defined. In this case, the reliability of the data comes into question because managers are not sure whether changes in the data are due to real change or changes in definitions. The following is another example: Indicator: Increased volume of agricultural commodities sold by farmers. A scale is used to measure volume of agricultural commodities sold in the What’s the Difference Between Validity and Reliability? Validity refers to the extent to which a measure actually represents what we intend to measure. Reliability refers to the stability of the measurement process. That is, assuming there is no real change in the variable being measured, would the same measurement process provide the same result if the process were repeated over and over? market. The scale is jostled around in the back of the truck. As a result, it is no longer properly calibrated at each stop. Because of this, the scale yields unreliable data, and it is difficult for managers to determine whether changes in the data truly reflect changes in volume sold. 3. PRECISION Precise data have a sufficient level of detail to present a fair picture of performance and enable management decisionmaking. The level of precision or detail reflected in the data should be smaller (or finer) than the margin of error, or the tool of measurement is considered too imprecise. For some indicators, for which the magnitude of expected change is large, even relatively large measurement errors may be perfectly tolerable; for other indicators, small amounts of change will be important and even moderate levels of measurement error will be unacceptable. Example: The number of politically active nongovernmental organizations (NGOs) is 900. Preliminary data shows that after a few years this had grown to 30,000 NGOs. In this case, a 10 percent measurement error (+/- 3,000 NGOs) would be essentially irrelevant. Similarly, it is not important to know precisely whether there are 29,999 or 30,001 NGOs. A less precise level of detail is still sufficient to be confident in the magnitude of change. Consider an alternative scenario. If the second data point is 1,000, a 10 percent measurement error (+/- 100) would be completely unacceptable because it would represent all of the apparent change in the data. 4. INTEGRITY Integrity focuses on whether there is improper manipulation of data. Data that are collected, analyzed and reported should have established mechanisms in place to reduce manipulation. There are generally two types of issues that affect data integrity. The first is transcription error. The second, and somewhat more complex issue, is whether there is any incentive on the 6 part of the data source to manipulate the data for political or personal reasons. Transcription Error Transcription error refers to simple data entry errors made when transcribing data from one document (electronic or paper) or database to another. Transcription error is avoidable, and Missions should seek to eliminate any such error when producing internal or external reports and other documents. When the data presented in a document produced by an operating unit are different from the data (for the same indicator and time frame) presented in the original source simply because of data entry or copying mistakes, a transcription error has occurred. Such differences (unless due to rounding) can be easily avoided by careful cross-checking of data against the original source. Rounding may result in a slight difference from the source data but may be readily justified when the underlying data do not support such specificity, or when the use of the data does not benefit materially from the originally reported level of detail. (For example, when making cost or budget projections, we typically round numbers. When we make payments to vendors, we do not round the amount paid in the accounting ledger. Different purposes can accept different levels of specificity.) Technology can help to reduce transcription error. Systems can be designed so that the data source can enter data directly into a database— reducing the need to send in a paper report that is then entered into the system. However, this requires access to computers and reliable internet services. Additionally, databases can be developed with internal consistency or range checks to minimize transcription errors. The use of preliminary or partial data should not be confused with transcription error. There are times, where it makes sense to use partial data (clearly identified as preliminary or partial) to inform management decisions or to report on performance because these are the best data currently available. When preliminary or partial data are updated by the original source, USAID should quickly follow suit, and note that it has done so. Any discrepancy between preliminary data included in a dated USAID document and data that were subsequently updated in an original source does not constitute transcription error. Manipulation A somewhat more complex issue is whether data is manipulated. Manipulation should be considered 1) if there may be incentive on the part of those that report data to skew the data to benefit the project or program and managers suspect that this may be a problem, 2) if managers believe that numbers appear to be unusually favorable, or 3) if the data are of high value and managers want to ensure the integrity of the data. There are a number of ways in which managers can address manipulation. First, simply understand the data collection process. A well organized and structured process is less likely to be subject to manipulation because each step in the process is clearly documented and handled in a standard way. Second, be aware of potential issues. If managers have reason to believe that data are manipulated, then they should further explore the issues. Managers can do this by periodically spot checking or verifying the data. This establishes a principle that the quality of the data is important and helps to determine whether manipulation is indeed a problem. If there is substantial concern about this issue, managers might conduct a Data Quality Assessment (DQA) for the AO, IR, or specific data in question. Example: A project assists the Ministry of Water to reduce water loss for agricultural use. The Ministry reports key statistics on water loss to the project. These statistics are critical for the Ministry, the project and USAID to understand program performance. Because of the 7 importance of the data, a study is commissioned to examine data quality and more specifically whether there is any tendency for the data to be inflated. The study finds that there is a very slight tendency to inflate the data, but it is within an acceptable range. 5. TIMELINESS Data should be available and up to date enough to meet management needs. There are two key aspects of timeliness. First, data must be available frequently enough to influence management decision making. For performance indicators for which annual data collection is not practical, operating units will collect data regularly, but at longer time intervals. Second, data should be current or, in other words, sufficiently up to date to be useful in decision-making. As a general guideline, data should lag no more than three years. Certainly, decisionmaking should be informed by the most current data that are practically available. Frequently, though, data obtained from a secondary source, and at times even USAID-funded primary data collection, will reflect substantial time lags between initial data collection and final analysis and publication. Many of these time lags are unavoidable, even if considerable additional resources were to be expended. Sometimes preliminary estimates may be obtainable, but they should be clearly flagged as such and replaced as soon as possible as the final data become available from the source. The following example demonstrates issues related to timeliness: Result: Primary school attrition in a targeted region reduced. Indicator: Rate of student attrition at targeted schools. In August 2009, the Ministry of Education published full enrollment analysis for the 2007 school year. In this case, currency is a problem because there is a 2 year time lag for these data. While it is optimal to collect and report data based on the U.S. Government fiscal year, there are often a number of practical challenges in doing so. We recognize that data may come from preceding calendar or fiscal years. Moreover, data often measure results for the specific point in time that the data were collected, not from September to September, or December to December. Often the realities of the recipient country context will dictate the appropriate timing of the data collection effort, rather than the U.S. fiscal year. For example, if agricultural yields are at their peak in July, then data collection efforts to measure yields should be conducted in July of each year. Moreover, to the extent that USAID relies on secondary data sources and partners for data collection, we may not be able to dictate exact timing ASSESSING DATA QUALITY Approaches and steps for how to assess data quality are discussed in more detail in TIPS 18: Conducting Data Quality Assessments. USAID policy requires managers to understand the strengths and weaknesses of the data they use on an on-going basis. In addition, a Data Quality Assessment (DQA) must be conducted at least once every 3 years for those data reported to Washington (ADS 203.3.5.2). For more information: TIPS publications are available online at [insert website] Acknowledgements: Our thanks to those whose experience and insights helped shape this publication including Gerry Britan and Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was updated by Michelle Adams-Matson of Management Systems International (MSI). Comments regarding this publication can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 8 NUMBER 13 2ND EDITION, 2010 DRAFT PERFORMANCE MONITORING & EVALUATION TIPS BUILDING A RESULTS FRAMEWORK ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. WHAT IS A RESULTS FRAMEWORK? The Results Framework (RF) is a graphic representation of a strategy to achieve a specific objective that is grounded in cause-and-effect logic. The RF includes the Assistance Objective (AO) and Intermediate Results (IRs), whether funded by USAID or partners, necessary to achieve the objective (see Figure 1 for an example). The RF also includes the critical assumptions that must hold true for the strategy to remain valid. The Results represents Framework a development hypothesis or a theory about how intended change will occur. The RF shows how the achievement of lower level objectives (IRs) leads to the achievement of the next higher order of objectives, ultimately resulting in the AO. In short, a person looking at a Results Framework should be able to understand the basic theory for how key program objectives will be achieved. The Results Framework is an important tool because it helps managers identify and focus on key objectives within a complex development environment. 1 A RESULTS FRAMEWORK INCLUDES: An Assistance Objective (AO) Intermediate Results (IR) Hypothesized cause and effect linkages Critical Assumptions WHY IS THE RESULTS FRAMEWORK IMPORTANT? The development of a Results Framework represents an important first step in forming the actual strategy. It facilitates analytic thinking and helps What’s the Difference Between a Results Framework and the Foreign Assistance Framework (FAF)? In one word, accountability. The results framework identifies an objective that a Mission or Office will be held accountable for achieving in a specific country or program environment. The Foreign Assistance Framework outlines broad goals and objectives (e.g. Peace and Security) or, in other words, programming categories. Achievement of Mission or Office AOs should contribute to those broader FAF objectives. program managers gain clarity around key objectives. Ultimately, it sets the foundation not only for the strategy, but also for numerous other management and planning functions downstream, including project design, monitoring, evaluation, and program management. To summarize, the Results Framework: Provides an opportunity to build consensus and ownership around shared objectives not only among AO team members but also, more broadly, with host-country representatives, partners, and stakeholders. Facilitates agreement with other actors (such as USAID/Washington, other USG entities, the host country, and other donors) on the expected results and resources necessary to achieve those results. The AO is the focal point of the agreement between USAID/Washington and the Mission. It is also the basis for Assistance Agreements (formerly called Strategic Objective Assistance Agreements). Functions as an effective communication tool because it succinctly captures the key elements of a program’s intent and content. Establishes the foundation to design monitoring and evaluation systems. Information from performance monitoring and evaluation systems should also inform the development of new RFs. Identifies the objectives that drive project design. In order to be an effective tool, a Results Framework should be current. RFs should be revised when 1) results are not achieved or completed sooner than expected, 2) critical assumptions are no longer valid, 3) the underlying development theory must be modified, or 4) critical problems with policy, operations, or resources were not adequately recognized. KEY CONCEPTS THE RESULTS FRAMEWORK IS PART OF A BROADER STRATEGY While the Results Framework is one of the core elements of a strategy, it alone does not constitute a complete strategy. Typically it is complimented by narrative that further describes the thinking behind the RF, the relationships between the objectives, and the identification of synergies. As a team develops the RF, broader strategic issues 2 should be considered, including the following: What has led the team to propose the Results Framework? What is strategic about what is being proposed (that is, does it reflect a comparative advantage or a specific niche)? What are the main strategic issues? What is different in the new strategy when compared to the old? What synergies emerge? How are cross-cutting issues addressed? How can these issues be tackled in project level planning and implementation? THE UNDERPINNING OF THE RESULTS FRAMEWORK A good Results Framework is not only based on logic. It draws on analysis, standard theories in a technical sector, and the expertise of on-the-ground managers. Supporting Analysis Before developing a Results Framework, the team should determine what analysis exists and what analysis must yet be completed to construct a development hypothesis with a reasonable level of confidence. Evaluations constitute an important source of analysis, identify important lessons from past programs, and may explore the validity of causal linkages that can be used to influence future programming. Analysis of past performance monitoring data is also an important source of information. FIGURE 2. SETTING THE CONTEXT FOR PARTICIPATION External Forces (Host Country Strategy) Standard Sector Theories Sectors, particularly those that USAID has worked in for some time, often identify a set of common elements that constitute theories for how to accomplish certain objectives. These common elements form a basic ―template‖ of sorts to consider in developing an RF. For example, democracy and governance experts often refer to addressing supply and demand. Supply represents the ability of government to play its role effectively or provide effective services. Demand represents the ability of civil society to demand or advocate for change. Education generally requires improved quality in teaching and curriculum, community engagement, and adequate facilities. Health often requires improved quality of services, as well as access to -- and greater awareness of – those services. An understanding of these common strategic elements is useful because they lay out a standard set of components that a team must consider in developing a good RF. Although, not all of these elements will apply to all countries in the same way, they form a starting point to inform the team’s thinking. As the team makes decisions about what (or what not) to address, this becomes a part of the logic The ―Fit‖ USAID Mission/ Vision that is presented in the narrative. Technical experts can assist teams in understanding standard sector theories. In addition, a number of USAID publications outline broader sector strategies or provide guidance on how to develop strategies in particular technical areas1. On-the-Ground Knowledge and Experience Program managers are an important source of knowledge on the unique program or incountry factors that should be considered in the development of the Results Framework. They are best able to examine different types of information, including 1 Examples include: Hansen, Gary. 1996. Constituencies for Reform: Strategic Approaches for Donor-Supported Civic Advocacy Groups or USAID. 2008. Securing the Future: A Strategy for Economic Growth. 3 Internal Capacity analyses and standard sector theories, and tailor a strategy for a specific country or program environment. PARTICIPATION AND OWNERSHIP Development of a Results Framework presents an important opportunity for USAID to engage its own teams, the host country, civil society, other donors, and other partners in defining program objectives. Experience has shown that a Results Framework built out of a participatory process results in a more effective strategy. Recent donor commitments to the Paris Declaration and the Accra Agenda for Action reinforce these points. USAID has agreed to increase ownership, align systems with country-led strategies, use partner systems, harmonize aid efforts, manage for development results, and establish mutual accountability. Common questions include, ―how do we manage participation?‖ or ―how do we avoid raising expectations that we cannot meet?‖ One approach for setting the context for effective participation is to simply set expectations with participants before engaging in strategic discussions. In essence, USAID is looking for the ―strategic fit‖ (see Figure 2). That is, USAID seeks the intersection between what the host country wants, what USAID is capable of delivering, and the vision for the program. WHOLE-OF- GOVERNMENT APPROACHES Efforts are underway to institute planning processes that take into account the U.S. Government’s overall approach in a particular country. A whole-ofapproach may government identify larger goals or objectives to which many USG entities contribute. Essentially, those objectives would be at a higher level or above the level of accountability of any one USG agency alone. USAID Assistance Objectives should clearly contribute to those larger goals, but also reflect what the USAID Mission can be held accountable for within a specified timeframe and within budget parameters. The whole-of-government approach may be reflected at a lower level in the Results Framework as well. The RF provides flexibility to include the objectives of other GUIDELINES FOR CONSTRUCTING AOs AND IRs AOs and IRs should be: Results Statements. AOs and IRs should express an outcome. In other words, the results of actions, not the actions or processes themselves. For example, the statement ―increased economic growth in targets sectors‖ is a result, while the statement ―increased promotion of market-oriented policies‖ is more process oriented. Clear and Measurable. AOs and IRs should be stated clearly and precisely, and in a way that can be objectively measured. For example, the statement ―increased ability of entrepreneurs to respond to an improved policy, legal, and regulatory environment‖ is both ambiguous and subjective. How one defines or measures ―ability to respond‖ to a changing policy environment is unclear and open to different interpretations. A more precise and measurable results statement in this case is ―increased level of investment.‖ It is true that USAID often seeks results that are not easily quantified. In these cases, it is critical to define what exactly is meant by key terms. For example, what is meant by ―improved business environment‖? As this is discussed, appropriate measures begin to emerge. Unidimensional. AOs or IRs ideally consist of one clear overarching objective. The Results Framework is intended to represent a discrete hypothesis with cause-and-effect linkages. When too many dimensions are included, that function is lost because lower level results do not really ―add up‖ to higher level results. Unidimensional objectives permit a more straightforward assessment of performance. For example, the statement ―healthier, better educated, higher-income families‖ is an unacceptable multidimensional result because it includes diverse components that may not be well-defined and may be difficult to manage and measure. There are limited exceptions. It may be appropriate for a result to contain more than one dimension when the result is 1) achievable by a common set of mutually-reinforcing Intermediate Results or 2) implemented in an integrated manner (ADS 201.3.8). actors (whether other USG entities, donors, the host country, or other partners) where the achievement of those objectives are essential for USAID to achieve its AO. For example, if a program achieves a specific objective that contributes to USAID’s AO, it should be reflected as an IR. This can facilitate greater coordination of efforts. THE LINKAGE TO PROJECTS The RF should form the foundation for project planning. 4 Project teams may continue to flesh out the Results Framework in further detail or may use the Logical Framework2. Either way, all projects and activities should be designed to accomplish the AO and some combination of one or more IRs. 2 The Logical Framework (or logframe for short) is a project design tool that complements the Results Framework. It is also based on cause-and-effect linkages. For further information reference ADS 201.3.11.8. THE PROCESS FOR DEVELOPING A RESULTS FRAMEWORK SETTING UP THE PROCESS Missions may use a variety of approaches to develop their respective results frameworks. In setting up the process, consider the following three questions. When should the results frameworks be developed? It is often helpful to think about a point in time at which the team will have enough analysis and information to confidently construct a results framework. Who is going to participate (and at what points in the process)? It is important to develop a schedule and plan out the process for engaging partners and stakeholders. There are a number of options (or a combination) that might be considered: Invite key partners or stakeholders to results framework development sessions. If this is done, it may be useful to incorporate some training on the results framework methodology in advance. Figure 3 outlines the basic building blocks and defines terms used in strategic planning across different organizations. The AO team may develop a preliminary results framework and hold sessions with key counterparts to present the draft strategy and obtain feedback. Conduct a strategy workshop for AO teams to present out RFs and discuss strategic issues. Although these options require some time and effort, the results framework will be more complete and representative. What process and approach will be used to develop the results frameworks? We strongly recommend that the AO team hold group sessions to construct the results framework. It is often helpful to have one person (preferably with experience in strategic planning and facilitation) to lead these sessions. This person should focus on drawing out the ideas of the group and translating them into the results framework. STEP 1. IDENTIFY THE ASSISTANCE OBJECTIVE The Assistance Objective (AO) is the center point for any results framework and is defined as: The most ambitious result (intended measurable change) that a USAID Mission/Office, along with its partners, can materially affect, and for which it is willing to be held accountable (ADS 201.3.8). Defining an AO at an appropriate level of impact is one of the most critical and difficult tasks a team faces. The AO forms the 5 ―It is critical to stress the importance of not rushing to finalize a results framework. It is necessary to take time for the process to mature and to be truly participative.‖ —USAID staff member in Africa standard by which the Mission or Office is willing to be judged in terms of its performance. The concept of ―managing for results‖ (a USAID value also reflected in the Paris Declaration) is premised on this idea. The task can be challenging, because an AO should reflect a balance of two conflicting and considerations—ambition accountability. On the one hand, every team wants to deliver significant impact for a given investment. On the other hand, there are a number of factors outside the control of the team. In fact, as one moves up the Results Framework toward the AO, USAID is more dependent on other development partners to achieve the result. Identifying an appropriate level of ambition for an AO depends on a number of factors and will be different for each country context. For example, in one country it may be appropriate for the AO to be ―increased use of family planning methods‖ while in another, ―decreased total fertility‖ (a higher level objective) would be more suitable. Where to set the objective is influenced by the following factors: Programming history. There are different expectations for more mature programs, where higher level impacts and greater sustainability are expected. The magnitude of the development problem. The timeframe strategy. for Figure 3. Results Framework Logic So What? Necessary and Sufficient the The range of resources available or expected. The AO should represent the team’s best assessment of what can realistically be achieved. In other words, the AO team should be able to make a plausible case that the appropriate analysis has been done and the likelihood of success is great enough to warrant investing resources in the AO. STEP 2. IDENTIFY INTERMEDIATE RESULTS After agreeing on the AO, the team must identify the set of ―lower level‖ Intermediate Results necessary to achieve the AO. An Intermediate Result is defined as: An important result that is seen as an essential step to achieving a final result or outcome. IRs are How? measurable results that may capture a number of discrete and more specific results (ADS 201.3.8.4). As the team moves down from the AO to IRs, it is useful to ask ―how‖ can the AO be achieved? By answering this question, the team begins to formulate the IRs (see Figure 3). The team should assess relevant country and sector conditions and draw on development experience in other countries to better understand the changes that must occur if the AO is to be attained. The Results Framework methodology is sufficiently flexible to allow the AO team to include Intermediate Results that are supported by other actors when they are relevant and critical to achieving the AO. For example, if another donor is 6 building schools that are essential for USAID to accomplish an education AO (e.g. increased primary school completion), then that should be reflected as an IR because it is a necessary ingredient for success. Initially, the AO team might identify a large number of possible results relevant to the AO. However, it is important to eventually settle on the critical set of Intermediate Results. There is no set number for how many IRs (or levels of IRs) are appropriate. The number of Intermediate Results will vary with the scope and complexity of the AO. Eventually, the team should arrive at a final set of IRs that members believe are reasonable. It is customary for USAID Missions to submit a Results Framework with one or two levels of IRs to USAID/Washington for review. The key point is that there should be enough information to adequately convey the development hypothesis. So What is Causal Logic Anyway? Causal logic is based on the concept of cause-and-effect. That is, the accomplishment of lower-level objectives ―cause‖ the next higher-level objective (or the effect) to occur. In the following example, the hypothesis is that if IR 1, 2, and 3 occur, it will lead to the AO. AO: Increased Completion of Primary School IR 1: Improved Quality of Teaching STEP 3. CLARIFY THE RESULTS FRAMEWORK LOGIC Through the process of identifying Intermediate Results, the team begins to construct the cause-and-effect logic that is central to the Results Framework. Once the team has identified the Intermediate Results that support an objective, it must review and confirm this logic. The accomplishment of lower level results, taken as a group, should result in the achievement of the next higher objective. As the team moves up the Results Framework, they should ask, ―so what?‖ If we accomplish these lower level objectives, is something of significance achieved at the next higher level? The higher-order result establishes the ―lens‖ through which lower-level results are viewed. For example, if one IR is ―Increased Opportunities for Outof-School Youth to Acquire Life Skills,‖ then, by definition, all lower level IRs would focus on IR 2: Improved Curriculum IR 3: Increased Parental Commitment to Education the target population established (out-of-school youth). As the team looks across the Results Framework, it should ask whether the Intermediate Results are necessary and sufficient to achieve the AO. Results Framework logic is not always linear. There may be relationships across results or even with other AOs. This can sometimes be demonstrated on the graphic (e.g., through the use of arrows or dotted boxes with some explanation) or simply in the narrative. In some cases, teams find a number of causal connections in an RF. However, teams have to find a balance between the two extremes- on the one hand, where logic is too simple and linear and, on the other, a situation where all objectives are related to all others. STEP 4. IDENTIFY CRITICAL ASSUMPTIONS The next step is to identify the set of critical assumptions that are relevant to the achievement of 7 the AO. A critical assumption is defined as: ―….a general condition under which the development hypothesis will hold true. Critical assumptions are outside the control or influence of USAID and its partners (in other words, they are not results), but they reflect conditions that are likely to affect the achievement of results in the Results Framework. Critical assumptions may also be expressed as risks or vulnerabilities…‖ (ADS 201.3.8.3) Identifying critical assumptions, assessing associated risks, and determining how they should be addressed is a part of the strategic planning process. Assessing risk is a matter of balancing the likelihood that the critical assumption will hold true with the ability of the team to address the issue. For example, consider the critical assumption ―adequate rainfall.‖ If this assumption has held true for the What is NOT Causal Logic? Categorical Logic. Lower level results are simply sub-categories rather than cause and effect, as demonstrated in the example below. AO: Increased Completion of Primary School IR 1: Improved Pre-Primary School IR 2: Improved Primary Education IR 3: Improved Secondary Education Definitional Logic. Lower-level results are a restatement (or further definition) of a higher-level objective. The use of definitional logic results in a problem later when identifying performance indicators because it is difficult to differentiate indicators at each level. IR: Strengthened Institution IR: Institutional Capacity to Deliver Goods & Services target region only two of the past six years, the risk associated with this assumption is so great that it poses a risk to the strategy. the years when a drought may occur. In cases like this, the AO team should attempt to identify ways to actively address the problem. For example, the team might include efforts to improve water storage or irrigation methods, or increase use of drought-resistant seeds or farming techniques. This would then become an IR (a specific objective to be accomplished by the program) rather than a critical assumption. Another option for the team is to develop contingency plans for As a final step, the AO team should step back from the Results Framework and review it as a whole. The RF should be straightforward and understandable. Check that the results contained in the RF are measurable and feasible with anticipated USAID and partner resource levels. This is also a good point at which to identify synergies between objectives and across AOs. STEP 5. COMPLETE THE RESULTS FRAMEWORK 8 STEP 6. IDENTIFY PRELIMINARY PERFORMANCE MEASURES Agency policies (ADS 201.3.8.6) require that the AO team present proposed indicators for the AO with baseline data and targets. The AO, along with indicators and targets, represents the specific results that will be achieved vis-avis the investment. To the extent possible, indicators for IRs with baseline and targets should be included as well. 1. Figure 1. Illustrative Results Framework AO: Increased Production by Farmers in the Upper River Zone IR: Farmers’ Access to Commercial Capital Increased IR: Farmers’ Capacity to Develop Bank Loan Applications Increased (4 years) IR: Banks’ Loan Policies Become More Favorable for the Rural Sector (3 years) 2. 3. 4. Critical Assumptions Market prices for farmers’ products remain stable or increase. Prices of agricultural inputs remain stable or decrease. Roads needed to get produce to market are maintained. Rainfall and other critical weather conditions remain stable. IR: Farmers’ Transport Costs Decreased IR: Additional Local Wholesale Market Facilities Constructed (with the World Bank) IR: Farmers’ Knowledge About Effective Production Methods Increased IR: Village Associations Capacity to Negotiate Contracts Increased (4 years) IR: New Technologies Available (World Bank) ( Key USAID Responsible Partner(s)(4 Responsible 9 USAID + Partner(s) Responsible IR: Farmers’ Exposure to OnFarm Experiences of Peers Increased Figure 3. The Fundamental Building Blocks for Planning ASSISTANCE OBJECTIVE (AO) AO The highest level objective for which USAID is willing to be held accountable. AOs may also be referred to as outcomes, impacts, or results. Increased Primary School Completion INTERMEDIATE RESULTS (IRs) IR Interim events, occurrences, or conditions that are essential for achieving the AO. IRs may also be referred to as outcomes or results. Teaching Skills Improved OUTPUT OUTPUT Products or services produced as a result of internal activity. Number of teachers trained INPUT INPUT Resources used to produce an output. Funding or person days of training 10 Figure 4. Sample Results Framework and Crosswalk of FAF Program Hierarchy and a Results Framework F Program Hierarchy for Budgeting and Reporting Illustrative Results Framework for Program Planning Assistance Objective: Economic Competitiveness of Private Enterprises Improved IR 1: Enabling Environment for Enterprises Improved IR 1.1 Licensing and registration requirements for enterprises streamlined The Illustrative Results Framework links to the FAF Program Hierarchy as follows: • Objective 4 Economic Growth • Program Areas 4.6 (Private Sector • • • • • • • Competitiveness) and 4.7 (Economic Opportunity Program Elements 4.6.1, 4.6.2, 4.7 Sub-Elements 4.6.12 and 4.7.2.1 Sub-Element 4.6.1.3 Sub-Element 4.7.2.2 Sub-Element 4.6.2.1 Sub-Element 4.7.3 Sub-Element 4.6.2.4 IR 1.3 Regulatory environment for micro and small enterprises improved IR 1.2 Commercial laws that support market-oriented transactions promoted IR 2: Private Sector Capacity Strengthened IR 2.1 Competitiveness of targeted enterprises improved IR 2.2 Productivity of microenterprises in targeted geographic regions increased Critical Assumptions: • • Key political leaders, including the President and the Minister of Trade and Labor, will continue to support policy reforms that advance private enterprise-led growth. Government will sign the Libonia Free Trade Agreement, which will open up opportunities for enterprises targeted under IR 2.1. Note: The arrows demonstrate the linkage of AO1, IR 1, and IR 1.1 to the FAF. As an example, IR1 links to the program element 4.6.1 “Business Enabling Environment”. IR 1.1 links to 4.7.2.1 “Reduce Barriers to Registering Micro and Small Business”. 11 IR 2.3 Information Exchange Improved For more information: TIPS publications are available online at [insert website]. Acknowledgements: Our thanks to those whose experience and insights helped shape this publication including Gerry Britan and Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was updated by Michelle Adams-Matson, of Management Systems International. Comments can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 12 NUMBER 15 2011 Printing PERFORMANCE MONITORING & EVALUATION TIPS MEASURING INSTITUTIONAL CAPACITY ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to peroformance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive Service (ADS) Chapter 203. What are the strengths and limitations of each approach with regard to internal bias, quantification, or comparability over time or across organizations? INTRODUCTION This PME Tips gives USAID managers information on measuring institutional capacity,* including some tools that measure the capacity of an entire organization as well as others that look at individual components or functions of an organization. The discussion concentrates on the internal capacities of individual organizations, rather than on the entire institutional context in which organizations function. This Tips is not about how to actually strengthen an institution, nor is it about how to assess the eventual impact of an organization’s work. Rather, it is limited to a specific topic: how to measure an institution’s capacities. How will the data be collected and how participatory can and should the measurement process be? Measuring institutional capacity might be one important aspect of a broader program in institutional strengthening; it may help managers make strategic, operational, or funding decisions; or it may help explain institutional strengthening activities and related performance. It addresses the following questions: Whatever the reason for assessing institutional capacity, this Tips presents managers with several tools for identifying institutional strengths and weaknesses. Which measurement approaches are most useful for particular types of capacity building? The paper will define and discuss capacity assessment in general and present several ap1 able to accomplish their mission and provide for their own needs in the long run. USAID operating units build capacity with a broad spectrum of partner and customer organizations. These include but are not limited to: proaches for measuring institutional capacity. We assess the measurement features of each approach to help USAID managers select the tool that best fits their diverse management and reporting needs. The paper is organized as follows: 1. Background: Institutional Capacity • American private voluntary organizations Building and USAID (PVOs) 2. How to Measure Institutional Capacity • Local and international nongovernmental organizations (NGOs) and other civil society organizations (CSOs) 3. Measurement Issues 4. Institutional Assessment Tools • Community-based membership cooperatives, such as a water users group 5. Measuring Individual Organizational Components • Networks and associations of organizations 6. Developing Indicators • Political parties 7. Practical Tips for a Busy USAID Manager • Government entities (ministries, departments, agencies, subunits, policy analysis units, health clinics, schools) BACKGROUND: INSTITUTIONAL CAPACITY BUILDING AND USAID • Private sector organizations (financial institutions, companies, small businesses and other forprofit organizations) • Regional institutions USAID operating units must work closely with partner and customer organizations to meet program objectives across all Agency goal areas, among them Peace and Security, Governing Justly and Democratically, Economic Growth, Investing in People, and Humanitarian Assistance. In the course of planning, implementing, and measuring their programs, USAID managers often find that a partner or customer organization’s lack of capacity stands in the way of achieving results. Increasing the capacity of partner and customer organizations helps them carry out their mandate effectively and function more efficiently. Strong organizations are more The Agency uses a variety of techniques to build organizational capacity. The most common involve providing technical assistance, advisory services, and long-term consultants to organizations, to help them build the skills and experience necessary to contribute successfully to sustainable development. Other techniques include providing direct inputs, such as financial, human, and technological resources. Finally, USAID helps establish mentoring relationships; provides opportunities for formal study in-country, in the United States or in third countries; and it sets up internships or 2 apprenticeships with other organizations. The goal of strengthening an institution is usually to improve the organization’s overall performance and viability by improving administrative and management functions, increasing the effectiveness of service provision, enhancing the organization’s structure and culture, and furthering its sustainability. Institutional strengthening programs may address one or more of these components. the capacity of an organization to help make decisions about awarding grants or holding grantees accountable for results. In this case, the assessment is more of an external oversight/audit of an organization hired to carry out Agency programs. Or, the manager may have a programmatic commitment to strengthen the abilities of customer and partner organizations. Different tools and methods are available for both situations. This paper deals primarily with programs that fit the latter description. In most cases, USAID managers are concerned with institutional strengthening because they are interested in the eventual program-level results (and the sustainability of these results) that these stronger organizations can help achieve. While recognizing the need to address eventual results, this Tips looks primarily at ways to measure institutional capacity. Understanding and measuring institutional capacity are critical and often more complex than measuring the services and products an organization delivers. Within USAID, the former Office of Private and Voluntary Cooperation (PVC) took the lead on building the capacity of nongovernmental organization (NGO) and private voluntary organization (PVO) partners. PVC has defined development objectives and intermediate results aimed specifically at improving the internal capacity of U.S. PVOs. PVC has studied different approaches to institutional capacity building and has begun to develop a comprehensive capacity assessment tool called discussion-oriented Measuring organizational capacity is important organizational self-assessment, described in exbecause it both guides USAID interventions ample 1 in this paper. In addition to DOSA, PVC and allows managers to demonstrate and re- has developed several indicators for measuring port on progress. The data that emerge from institutional capacity development. measuring institutional capacity are commonly used in a number of valuable ways. These data PVC specifically targets NGOs and PVOs establish baselines and provide the basis for and is particularly concerned with enhancsetting targets for improvements. They help ex- ing partnerships. USAID missions, by contrast, plain where or why something is going wrong; work with a broader range of organizations they identify changes to specific program in- on activities aimed at increasing institutional terventions and activities that address areas of capacity. Such programs usually view instipoor performance; they inform managers of the tutional capacity as a means to achieve highimpact of an intervention or the effectiveness er level program results, rather than as an of an intervention strategy; and they identify end in itself. lessons learned.They are also useful for reporting to Washington and to partners. HOW TO MEASURE INSTITUTIONAL CAPACITY It is important to note the difference between assessing capacity for contracting and grantmaking decisions versus for a “capacity build- An organization can be thought of as a system ing” relationship with partner/customer organi- of related components that work together to zations. A USAID manager may want to assess achieve an agreed-upon mission. The follow3 ing list of organizational components is not all-inclusive, nor does it apply universally to all organizations. Rather, the components are representative of most organizations involved in development work and will vary according to the type of organization and the context in which it functions. • External relations Administrative and Support Functions • Other Resources • Human • Financial • Administrative procedures and management systems MANAGEMENT ISSUES • Financial management (budgeting, accounting, fundraising, sustainability) This TIPS presents capacity-assessment tools and other measurement approaches that, while similar in some ways, vary in both their emphasis and their method for evaluating an organization’s capacity. Some use scoring systems and others don’t; some use questionnaires while others employ focus groups; some use external evaluators , and others use selfassessments; some emphasize problem solving, while others concentrate on appreciating organzational strengths. Some tools can be used to measure the same standard across many organizations, while others are organization specific. Many of the tools are designed so that the measurement process is just as important as, if not more important than, the resulting information. They may involve group discussions, workshops, or exercises, and may explicitly attempt to be participatory. Such tools try to create a learning opportunity for the organization’s members, so that the assessment itself becomes an integral part of the capacity-building effort. • Human resource management (staff recruitment, placement, support) • Management of other resources (information, equipment, infrastructure) Technical/Program Functions • Service delivery system • Program planning • Program monitoring and evaluation • Use and management of technical knowledge and skills Structure and Culture • Organizational identity and culture • Vision and purpose Because of each user’s different needs, it would be difficult to use this TIPS as a screen to predetermine the best capacity-assessment tool for each situation. Rather, managers are encouraged to adopt the approaches most appropriate to their program and to adapt the tools best suited for local needs. To assist managers in identifying the most useful tools and approach- • Leadership capacity and style • Organizational values • Governance approach 4 es, we consider the following issues for each of the tools presented: methods are hands-on and highly participatory, involving a wide range of customers, partners, and stakeholders, while others are more exclusive, relying on the opinion of one or two specialists. In most cases, it is best to use more than one data collection method. • Type of organization measured. Many of the instruments developed to measure institutional capacity are designed specifically for measuring NGOs and PVOs. Most of these can be adapted easily for use with other types of organizations, including government entities. • Objectivity. By their nature, measures of institutional capacity are subjective. They rely heavily on individual perception, judgment, and interpretation. Some tools are better than others at limiting this subjectivity. For instance, they balance perceptions with more empirical observations, or they clearly define the capacity area being measured and the criteria against which it is being judged. Nevertheless, users of these tools should be aware of the limitations to the findings. • Comparability across organizations. To measure multiple organizations, to compare them with each other, or to aggregate the results of activities aimed at strengthening more than one organization, the tool used should measure the same capacity areas for all the organizations and use the same scoring criteria and measurement processes. Note, however, that a standard tool, applied to diverse organizations, is less able to respond to specific organizational or environmental circumstances. This is less of a problem if a group of organizations, using the same standard tool, has designed its diagnostic instrument together (see the following discussion of PROSE). • Quantification. Using numbers to represent capacity can be helpful when they are recognized as relative and not absolute measures. Many tools for measuring institutional capacity rely on ordinal scales. Ordinal scales are scales in which values can be ranked from high to low or more to less in relation to each other. They are useful in ordering by rank along a continuum, but they can also be misleading. Despite the use of scoring criteria and guidelines, one person’s “3” may be someone else’s “4.” In addition, ordinal scales do not indicate how far apart one score is from another. (For example, is the distance between “agree” and “strongly agree” the same as the distance between “disagree” and “strongly disagree”?) Qualitative descriptions of an organization’s capacity level are a good complement to ordinal scales. • Comparability over time. In many cases, the value of measuring institutional capacity lies in the ability to track changes in one organization over time. That requires consistency in method and approach. A measurement instrument, once selected and adapted to the needs of a particular organization, must be applied the same way each time it is used. Otherwise, any shifts that are noted may reflect a change in the measurement technique rather than an actual change in the organization. • Data collection. Data can be collected in a variety of ways: questionnaires, focus groups, interviews, document searches, and observation, to name only some. Some • Internal versus external assessments. Some tools require the use of external 5 facilitators or assessors; others offer a process that the organization itself can follow. Both methods can produce useful data, and neither is automatically better than the other. Internal assessments can facilitate increased management use and better understanding of an assessment’s findings, since the members of the organization themselves are carrying out the assessment. By contrast, the risk of bias and subjectivity is higher in internal assessments. External assessments may be more objective. They are less likely to introduce internal bias and can make use of external expertise. The downside is that external assessors may be less likely to u cover what is really going on inside an organization. same format. • Practicality. The best measurement systems are designed to be as simple as possible-- not too time consuming, not unreasonably costly, yet able to provide managers with good information often enough to meet their management needs. Managers should take practicality into account when selecting a measurement tool. They should consider the level of effort and resources required to develop the instrument and collect and analyze the data, and think about how often and at what point during the management cycle the data will be available to managers. Background • Background of the methodology/tool • Process (how the methodology/tool is used in the field) • Product (the types of outputs expected) • Assessment (a discussion of the uses and relative strengths of each methodology/ tool) • An example of what the methodology/tool looks like PARTICIPATORY, RESULTS-ORIENTED SELF-EVALUATION The participatory, results-oriented self-evaluation (PROSE) method was developed by Evan Bloom of Pact and Beryl Levinger of the Education Development Center. It has the dual purpose of both assessing and enhancing organizational capacities. The PROSE method produces an assessment tool customized to the organizations being measured. It is designed to compare capacities across a set of peer organizations, called a cohort group, which allows for benchmarking and networking among the organizations. PROSE tools measure and profile organizational capacities and assess, over time, how strengthening activities affect organizational capacity. In addition, through a facilitated workshop, PROSE tools are designed to allow organizations to build staff capacity; create consensus around future organizational capacitybuilding activities; and select, implement, and track organizational change and development strategies. INSTITUTIONAL ASSESMENT TOOLS This section describes capacity measurement tools that USAID and other development organizations use. You can find complete references and Web sites in the resources section at the One example of an instrument developed using end of the paper. For each tool, we follow the the PROSE method is the discussion-oriented 6 organizational self-assessment. DOSA was developed in 1997 for the Office of Private and Voluntary Cooperation and was designed specifically for a cohort of USAID PVO grantees. Participatory, Results-Oriented Self-Evaluation Type of Organization Measured Process NGOs/PVOs; adaptable to other types of organizations Developers of the PROSE method recommend that organizations participate in DOSA or develop a customized DOSA-like tool to better fit their organization’s specific circumstances. The general PROSE process for developing such a tool is as follows: After a cohort group of organizations is defined, the organizations meet in a workshop setting to design the assessment tool. With the help of a facilitator, they begin by pointing to the critical organizational capacities they want to measure and enhance. The cohort group then develops two sets of questions: discussion questions and individual questionnaire items. The discussion questions are designed to get the group thinking about key issues. Further, these structured discussion questions minimize bias by pointing assessment team members toward a common set of events, policies, or conditions. The questionnaire items then capture group members’ assessments of those issues on an ordinal scale. During the workshop, both sets of questions are revised until the cohort group is satisfied. Near the end of the process, tools or standards from similar organizations can be introduced to check the cohort group’s work against an external example. If the tool is expected to compare several organizations within the same cohort group, the tool must be implemented by facilitators trained to administer it effectively and consistently across the organizations. Features • Cross-organizational comparisons can be made • Measures change in one organization or a cohort of organizations over time • Measures well-defined capacity areas against well-defined criteria • Assessment based primarily upon perceived capacities • Produces numeric score on capacity areas • Assessment should be done with the help of an outside facilitator or trained insider • Data collected through group discussion and individual questionnaires given to a cross-section of the organization’s staff team meets for four to six hours and should represent a cross-functional, crosshierarchical sample from the organization. Participants respond anonymously to a questionnaire, selecting the best response to statements about the organization’s practices (1=strongly disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly Once the instrument is designed, it is applied agree) in six capacity areas: to each of the organizations in the cohort. In the case of DOSA, the facilitator leads a team • External Relations of the organization’s members through a series (constituency development, fund-raising of group discussions interspersed with individand communications) ual responses to 100 questionnaire items. The 7 Example 1. Excerpt From DOSA, a PROSE Tool The DOSA questionnaire can be found in annex 1a The following is a brief example drawn from the Human Resource Management section of the DOSA questionnaire: Discussion Questions a. When was our most recent staff training? b. How often over the last 12 months have we held staff training events? Questionnaire items for individual response Strongly Disagress Neutral Agree Strongly Disagree Agree 1. We routinely offer 1 2 3 4 5 staff training. Discussion Questions a. What are three primary, ongoing functions (e.g., monitoring and evaluation, proposal writing, resource mobilization) that we carry out to achieve our mission? b. To what extent does staff, as a group, have the requisite skills to carry out these functions? c. To what extent is the number of employees carrying out these functions commensurate with work demands? Questionnaire items for individual response Strongly Disagress Neutral Disagree 2. We have the ap1 2 3 propriate staff skills to achieve our mission 3. We have the ap1 2 3 propriate staff numbers to achieve our mission Agree 4 4 Strongly Agree 5 5 *The annexes for this paper are available separately and can be obtained through the USAID Development Experience Clearinghouse at http://dec.usaid.gov/index.cfm • Financial Resource Management (budgeting, forecasting, and cash management) • Human Resource Management (staff training, supervision, and personnel practices) 8 • Organizational Learning (teamwork and information sharing) check on the perceived capacities reported by individual organizational members. It also helps identify capacity areas that all members agree • Strategic Management need immediate attention. (planning, governance, mission, and partnering) Because the cohort organizations develop the specifics of the instrument together and share • Service Delivery a common understanding and application of the (field-based program practices and sustainabil- approach, PROSE is relatively good at comparity issues) ing organizations with each other or rolling up results to report on a group of organizations Although the analysis is statistically complex, together. However, the discussions could influquestionnaires can be scored and graphics pro- ence the scoring if facilitators are not consisduced using instructions provided on the DOSA tent in their administration of the tool. Web site. In the case of DOSA, the DOSA team in Washington processes the results and posts INSTITUTIONAL DEVELOPMENT FRAMEthem on the Internet. The assessment tool can WORK be readministered annually to monitor organizational changes. Background The institutional development framework (IDF) is a tool kit developed by Mark Renzi of Management Systems International. It has been used in USAID/Namibia’s Living in a Finite Environment project as well as several other USAID programs. Designed specifically to help nonprofit organizations improve efficiency and become more effective, the IDF is best suited for the assessment of a single organization, rather than a cohort group (as opposed to PROSE). The kit contains three tools (Institutional Development Framework, Institutional Development Profile, and Institutional Development Calculation Sheet), which help an organization determine where it stands on a variety of organizational components, identify priority areas of improvement, set targets, and measure progress over time. While it can be adapted for any organization, the IDF was originally formulated for environmental NGOs. Product PROSE instruments produce two types of scores and accompanying graphics. The first is a capacity score, which indicates how an organization perceives its strengths and weaknesses in each of the capacity and subcapacity areas. The second is a consensus score, which shows the degree to which the assessment team members agree on their evaluation of the organization’s capacity. Assessment Unless the existing DOSA questions are used, developing a PROSE instrument from scratch can be time consuming and generally requires facilitators to guide the process of developing and using the instrument. PROSE, like most other such instruments, is based on perceived capacities and does not currently include a method for measuring externally observable performance in various capacity areas (although this is under consideration). It is unique among the instruments in this paper in its use of a consensus score. The consensus score acts as a Process An organization can use the IDF tools either with or without the help of a facilitator. The IDF identifies five organizational capacity areas, 9 Institutional Development Framework Type of Organization Measured NGOs/PVOs; adaptable to other types of organizations Features • Can be used, with limitations, to compare across organizations • Measures change in the same organization over time • Measures well-defined capacity areas against well-defined criteria • Assessment based primarily upon perceived capacities • Produces numeric score on capacity areas • Produces qualitative description of an organization’s capacity in terms of developmental stages • Assessment can be done internally or with help of an outside facilitator • • Data collected through group discussion with as many staff as feasible called resource characteristics. Each capacity (public relations, ability to work with local area is further broken down into key compo- communities, ability to work with government nents, including: bodies, ability to work with other NGOs) Each key component within a capacity area is • Oversight/Vision rated at one of four stages along an organiza(board, mission, autonomy) tional development continuum (1= start up, 2= development, 3= expansion/consolidation, and • Management Resources 4= sustainability). IDF offers criteria describing (leadership style, participatory managment, each stage of development for each of the key management systems, planning, community components (see example 2 below). participation, monitoring, evaluation) Different processes can be used depending on • Human Resources the organization’s size and the desired out(staff skills, staff development, organizational come. Small organizations usually involve as diversity) many staff as possible; larger organizations may work in small groups or use a few key infor• Financial Resources mants. Members of the organization can modify (financial management, financial vulnerability, the Institutional Development Framework to fit financial solvency) their organization. Nonapplicable areas can be ignored and new areas can be added, although • External Resources the creator of the tool warns against complete10 ly rewriting the criteria. Through discussion, the participating members then use the criteria to determine where along the development continuum their organization is situated for each component. The resulting graphic, the Institutional Development Profile (IDP), uses bars or “x”s to show where the organization ranks on each key component.Through a facilitated meeting or group discussion, organization members then determine which areas of organizational capacity are most important to the organization and which need priority attention for improvement. Using the IDP, they can visually mark their targets for the future. The IDF also provides numeric ratings. Each key component can be rated on a scale of 1 to 4, and all components can be averaged together to provide a summary score for each capacity area. This allows numeric targets to be set and monitored. The Institutional Development Calculation Sheet is a simple table that permits the organization to track progress over time by recording the score of each component along the development continuum. Example 2. Excerpt From the IDF Tool The following is an excerpt from the Financial Management section of the Institutional Development Framework. The entire framework appears in annex 2. Resource Key Characteristic Component Financial Management Budget as Management Tools Cash Controls Financial Security Criteria for Each Progressive Stage (the Development Continuum) Start Up Development Expansion and Consolidation 1 2 3 Total expendiBudgets are Budgets are not used as developed for ture is usually within 20% of management project actools. tivities, but are budget, but often over- or actual activity underspent often diverge by more than from budget 20%. predictions. Improved No clear Financial procedures ex- controls exist financial control ist for handling but lack a sys- systems exist. payables and tematic office receivables. procedure. Financing comes from only one source. Financing comes from multiple sources, but 90% or more from one source. 11 No single source of funding provides more than 60% of funding. Sustainability 4 Budgets are integral part of project management and are adjusted as project implementation warrants. Excellent cash controls for payables and receivables and established budget procedures. No single source provides more than 40% of funding. Product Process The IDF produces a graphic that shows the component parts of an organization and the organization’s ratings for each component at different points in time. It also provides a numeric score/rating of capacity in each key component and capacity area. The OCAT is intended to be a participatory self-assessment but may be modified to be an external evaluation. An assessment team, composed of organizational members (representing different functions of the organization) plus some external helpers, modifies the OCAT assessment sheet to meet its needs (annex 3).The Assessment assessment sheet consists of a series of statements under seven capacity areas (with subThe IDF is an example of a tool that not only elements). The assessment team then identifies helps assess and measure an organization’s ca- sources of information, assigns tasks, and uses a pacity but also sets priorities for future change variety of techniques (individual interviews, foand improvements. Compared with some of the cus groups, among others) to collect the inforother tools, IDF is relatively good at tracking mation they will later record on the assessment one organization’s change over time because of sheet. The assessment team assigns a score to the consistent criteria used for each progres- each capacity area statement (1=needs urgent sive stage of development. It is probably not attention and improvement; 2=needs attention; as well suited for making cross-organizational 3=needs improvement; 4=needs improvement comparisons, because it allows for adjustment in limited aspects; but not major or urgent; to fit the needs of each individual organization. 5=room for some improvement; 6=no need for immediate improvement).The assessment team ORGANIZATIONAL CAPACITY ASSESMENT would have to develop precise criteria for what TOOL rates as a “1” or a “2,” etc. Background The capacity areas and sub-elements are: Pact developed the organizational capacity assessment tool (OCAT) in response to a need to examine the impact of NGO capacity-building activities. Like the Institutional Development Framework, OCAT is better suited for measuring one organization over time. The OCAT differs substantially from the IDF in its data collection technique. It is designed to identify an organization’s relative strengths and weaknesses and provides the baseline information needed to develop strengthening interventions. It can also be used to monitor progress. The OCAT is well known; other development organizations have widely adapted it. Designed to be modified for each measurement situation, the OCAT can also be standardized and used across organizations. • Governance (board, mission/goal, constituency, leadership, legal status) • Management Practices (organizational structure, information management, administration procedures, personnel, planning, program development, program reporting) • Human Resources (human resources development, staff roles, work organization, diversity issues, supervisory practices, salary and benefits) 12 • Financial Resources (accounting, budgeting, financial/inventory Example 3. Excerpt From an Adaptation of the OCAT USAID/Madagascar developed a capacity assessment tool based on the OCAT, but tailored it to its own need to measure 21 partner institutions implementing reproductive health programs, including the Ministry of Health. The mission tried to measure different types of organizations and compare them by creating a standardized instrument to use with all the organizations. Combining the OCAT results with additional information from facilitated discussions, the mission was able to summarize how different types of organizations perceived different aspects of their capacity and recommend future strengthening programs. Some of the difficulties that USAID/Madagascar encountered when using the tool included having to translate questions from French to Malagasy, possibly losing some of their meaning; finding that some respondents were unable to answer some questions because they had no experience with the part of the organization to which the questions referred; discovering that some respondents had difficulty separating the subject area of the questionnaire (family planning) from their work in other health areas; and having difficulty scheduling meetings because of the organizations’ heavy workload. Moreover, the mission noted that the instrument is based on perceptions and is self-scored, with the resulting potential for bias.a Below is an excerpt from the “communications/extension to customers” component of the OCAT used by USAID/Madagascar. The entire questionnaire is in annex 4. Classification Scale 0 Nonexistent or out of order 1 Requires urgent attention and upgrading 2 Requires overall attention and upgrading 3 Requires upgrading in certain areas, but neither major nor urgent 4 Operating, but could benefit from certain improvements 5 Operating well in all regards Communications/Extension to Customers a. The institution has in each clinic a staff trained and competent in counseling all customers. 1 2 3 4 5 b. The institution is able to identify and develop key messages for exten- 1 2 3 4 5 sion among potential customers, and it can produce or obtain materials for communicating such messages. c. A well-organized community extension is practiced by the clinic’s staff or other workers affiliated with the institution, whether they are salaried or volunteers. A system exists for supervising extension workers and monitoring their effectiveness. 13 1 2 3 4 5 controls, financial reporting) • Service Delivery (sectoral expertise, constituency, impact assessment) The IDF and the OCAT are similar in several ways, but the processes differ. The OCAT uses an assessment team that conducts research before completing the assessment sheet. For the IDF, organization members meet and fill out the sheet (determine their capacities) without the intermediate data collection step (the OCAT, by design, relies on evidence to supplement perceptions when conducting an assessment, and the IDF does not). The OCAT’s data-gathering step allows for systematic cross-checking of perceived capacities with actual or observable “facts.” It is more inductive, building up to the capacity description, while the IDF attempts to characterize the organization along the development continuum from the beginning. The OCAT categorizes an organization’s capacity areas into one of four developmental stages. Unlike the IDF, which uses the stages as the criteria by which members rate their organization, the OCAT uses them as descriptors once the rating has been done. • External Relations (constituency relations, inter-NGO collaboration, public relations, local resources, media) • Sustainability (program/benefit sustainability, organizational sustainability, financial sustainability, resource base sustainability) After gathering data, the assessment team meets to reach a consensus on the rating of each element. With the help of an OCAT rating sheet, averages can be calculated for each capacity area. These numeric scores indicate the relative need for improvement in each area. They also correspond to a more qualitative description of the organization’s developmental stage. Each capacity area can be characterized as nascent, emerging, expanding, or mature. DYNAMIC PARTICIPATORY OCAT provides a table (similar to the IDF), INSTITUTIONAL DIAGNOSIS “NGO Organizational Development—Stages and Characteristics” that describes organiza- Background tional capacities at each stage of development. The dynamic participatory institutional diagnoProduct sis (DPID) was developed by the Senegal PVO/ NGO support project in conjunction with the The OCAT provides numeric ratings for each New TransCentury Foundation and Yirawah Incapacity area. In addition, it gives organizations ternational. It is a rapid and intensive facilitated a description of their capacity areas in terms assessment of the overall strengths and weakof progressive stages of organizational develop- nesses of an organization. This methodology ment. This information can be presented graph- explores member perceptions of an organizaically as well as in narrative form. tion and the organization’s relationship with its environment. DPID is highly participatory; an Assessment organization assesses itself in the absence of external benchmarks or objectives to take full The OCAT identifies areas of organization- advantage of its specific context, such as culture al strength and weakness and tracks related and attitudes. changes from one measurement period to the next. Process 14 Example 4. An Application of DPID Since the DPID is such an individualized and flexible tool, every application will be different. The DPID does not lend itself easily to an example as do the other tools in this Tips. Below is an anecdote about one West African organization’s use of the DPID as reported by the Senegal DPIPVO/NGO support project. A Federation of Farmers’ Cooperatives with about 15,000 members in the Sahel was looking for a unique and efficient approach to redress some of the organization’s problems. The federation suffered from internal strife and a tarnished reputation, impeding its ability to raise funds. Through DPID, the federation conducted a critical in-depth analysis of its operational and management systems, resulting in the adoption of “10 emergency measures” addressing leadership weaknesses, management systems, and operational procedures. Subsequently, the organization underwent internal restructuring, including an overhaul of financial and administrative systems. One specific result of the DPID analysis was that federation members gained more influence over the operations of the federation. An outside facilitator conducts the DPID over 5 to 10 days. It takes place during a series of working sessions in which the facilitator leads an organization’s members through several stages: discussion of the services; operations and results of the organization; exploration of the issues affecting the organization; and summarization of the “state of the organization.” During the discussions, members analyze the following features of the organization: They examine each element with reference to institutional behavior, human behavior, management, administration, know-how, philosophy and values, and sensitive points. • Identity Assessment • Mission Unlike the previously described tools, the DPID does not use ranking, scoring, or questionnaires, nor does it assess the organization along a continuum of developmental stages. Assessment is based purely on group reflection. The DPID requires a facilitator experienced in leading a group through this type of analysis. • Means and Resources • Environment • Management • Internal Operations • Service Provided and Results Product A written description of the state of the organization can result from the working sessions.The analysis is qualitative without numeric scoring. The DPID is open ended but somewhat systematic in covering a predefined set of organizational functions. Because of its flexibility, the DPID is organization specific and should not 15 ORGANIZATIONAL CAPACITY INDICATOR Dynamic Participatory Institutional Diagnosis Background From 1994 through 1997, the Christian Reformed World Relief Committee (CRWRC) conducted research on organizational capacitybuilding with the Weatherhead School of Management at Case Western Reserve University and more than 100 local NGOs around the world. The results of this research led them to replace their earlier system, the Skill Rating System, with an approach to capacity building and assessment based on “appreciative inquiry.” Appreciative inquiry is a methodology that emphasizes an organization’s strengths and potential more than its problems. It highlights those qualities that give life to an organization and sustain its ongoing capacity. Rather than providing a standardized tool, the organizational capacity indicator assumes that capacity monitoring is unique to each organization and in the organization’s own self-interest. The organizational capacity indicator (OCI) builds ownership because each organization creates its own capacity assessment tool. Capacity areas are self-defined and vary from organization to organization. Type of Organization Measured NGOs/PVOs; adaptable to other types of organizations Features • Difficult to compare across organizations • Difficult to compare the same organization over time • Capacity areas and criteria for measurement are loosely defined • Assessment based primarily upon perceived capacities • Produces qualitative description of an organization’s capacity • Assessment done with the help of an outside facilitator • Data collected through group discussion with the organization’s staff Process Although organizations create their own tool under the OCI, they all follow a similar probe used to compare organiza tions. Nor is it a cess in doing so. As they involve all partners rigorous means of monitoring an organization’s and stakeholders as much as possible, the parchange over time. Since the DPID does not use ticipants “appreciate” the organization’s history external standards to assess institutional caand culture.Together they explore peak experipacities, it should not be used to track accountences, best practices, and future hopes for the ability. Collecting information from the DPID, organization. Next, the participants identify the as well as using it, should offer organizations a forces and factors that have made the organizaprocess to assess their needs, improve commution’s positive experiences possible. These benications, and solve problems around a range of come the capacity areas that the organization organizational issues at a given moment. tries to monitor and improve. Next, the participants develop a list of “provoca16 ences related to each capacity component. The organization should monitor itself by this process twice a year. The results of the assessment should be used to encourage future development, plans, and aspirations. tive propositions” for each capacity area. These propositions, visions of what each capacity area should ideally look like in the future, contribute to the overall objective: that each organization will be able to measure itself against its own vision for the future, not some external standard. Each capacity area is defined by the most ambitious vision of what the organization can become in that area. Specific indicators or behaviors are then identified to show the capacity area in practice. Next, the organization designs a process for assessing itself and sharing experi- Product Each time a different organization uses the methodology, a different product specific to that organization is developed. Thus, each tool will contain a unique set of capacity areas, an Example 5. Excerpt From an OCI Tool The following is an excerpt of one section from the capacity assessment tool developed by CRWRC’s partners in Asia, using the OCI method. (The entire tool can be found in annex 5.) It offers a menu of capacity areas and indicators from which an organization can choose and then modify for its own use. It identifies nine capacity areas, and under each area is a “provocative proposition” or vision of where the organization wants to be in that area. It provides an extensive list of indicators for each capacity area, and it describes the process for developing and using the tool. Staff and partners meet regularly to determine their capacity on the chosen indicators. Capacity level can be indicated pictorially, for example by the stages of growth of a tree or degrees of happy faces. Capacity Area A clear vision, mission, strategy, and set of shared values Proposition Our vision expresses our purpose for existing: our dreams, aspirations, and concerns for the poor. Our mission expresses how we reach our vision. Our strategy expresses the approach we use to accomplish our goals. The shared values that we hold create a common understanding and inspire us to work together to achieve our goal. Selected Indicators • Every person can state the mission and vision in his or her own words • There is a yearly or a six-month plan, checked monthly • Operations/activities are within the vision, mission, and goal of the organization • Staff know why they do what they’re doing • Every staff member has a clear workplan for meeting the strategy • Regular meetings review and affirm the strategy 17 • Possible to measure well-defined capacity areas across well-defined criteria organization capacity areas like the DPID does. The OCI is the only tool presented in this paper in which the capacity areas are entirely self defined. It is also unique in its emphasis on the positive, rather than on problems. Further, the OCI is more rigorous than the DPID, in that it asks each organization to set goals and develop indicators as part of the assessment process. It also calls for a scoring system to be developed, like the more formal tools (PROSE, IDF, OCAT). Because indicators and targets are developed for each capacity area, the tool allows for relatively consistent measurement over time. OCI is not designed to compare organizations with each other or to aggregate the capacity measures of a number of organizations; however, it has proven useful in allowing organizations to learn from each other and in helping outsiders assess and understand partner organizations. • Assessment based primarily upon perceived capacities THE YES/NO CHECKLIST OR “SCORECARD” • Produces numeric or pictorial score on capacity areas Background Organizational Capacity Indicator Type of Organization Measured NGOs/PVOs; adaptable to other types of organizations Features • Difficult to comparably measure across organizations • Measures change in the same organization over time A scorecard/checklist is a list of characteristics or events against which a yes/no score is assigned. These individual scores are aggregated and presented as an index. Checklists can effectively track processes, outputs, or more general characteristics of an organization. In addition, they may be used to measure processes or outputs of an organization correlated to specific areas of capacity development. • Assessment done internally • Data collected through group discussion with organization’s staff evaluation process, and scoring methods. In general, the product comprises a written description of where the organization wants to be in each capacity area, a list of indicators that can Scorecards/checklists can be used either to be used to track progress toward the targeted measure a single capacity component of an organization or several rolled together. Scorelevel in a capacity area, and a scoring system. cards/checklists are designed to produce a quantitative score that can be used by itself or Assessment as a target (though a scorecard/checklist withLike the DPID, the OCI is highly participatory out an aggregate score is also helpful). and values internal standards and perceptions. Both tools explicitly reject the use of external standards. However, the OCI does not desi nate 18 Process The Yes/No Checklist “Scorecard” To construct a scorecard, follow these general steps: First, clarify what the overall phenomena to be measured are and identify the components that, when combined, cover the phenomenon fairly well. Next, develop a set of characteristics or indicators that together capture the relevant phenomena. If desired, and if evidence and analysis show that certain characteristics are truly more influential in achieving the overall result being addressed, define a weight to be assigned to each characteristic/indicator. Then rate the organization(s) on each characteristic using a well defined data collection approach. The approach could range from interviewing organization members to reviewing organization documents, or it could consist of a combination of methods. Finally, if desired and appropriate, sum the score for the organization(s). Type of Organization Measured All types of organizations Features • Cross-organizational comparisons can be made • Measures change in the same organization over time • Measures well-defined capacity areas against well-defined criteria Product • Possible to balance perceptions with empirical observations • Produces numeric score on capacity areas A scorecard/checklist results in a scored listing of important characteristics of an organization and can also be aggregated to get a summary score. • Assessment can be done by an external evaluator or internally • Data collected through interviews, observation, documents, involving a limited number of staff Assessment A scorecard/checklist should be used when the characteristics to be scored are unambiguous. There is no room for “somewhat” or “yes, but . . .” with the scorecard technique.The wording of each characteristic should be clear and terms should be well defined. Because scorecards/ checklists are usually based on observable facts, processes, and documents, they are more objective than most of the tools outlined in this Tips. This, in turn, makes them particularly useful for cross-organizational comparisons, or tracking organizations over time; that is, they achieve better measurement consistency and comparability. Yet concentrating on observable facts can be limiting, if such facts are not complemented 19 with descriptive and perceptionbased information. Though a person outside the organization frequently completes the scorecard/checklist, self-assessment is also possible. Unlike other tools that require facilitators to conduct or interpret them, individuals who are not highly trained can also use scorecards. Further, since scorecards are usually tightly defined and specific, they are often a cheaper measurement tool. Example 6. A Scorecard USAID/Mozambique developed the following scorecard to measure various aspects of institutional capacity in partner civil society organizations. The following example measures democratic governance. Increased Democratic Governance Within Civil Society Organizations Characteristics Score Multiplied Weight By 1. Leaders (board member or equivalent) of the X 3 CSO electedby secret ballot. No=0 pts.Yes=1 pt. 2. General assembly meetings are adequately announced at least two weeks in advance to all members (1 pt.) and held at least twice a year (1 pt.). Otherwise=0 pt. 3. Annual budget presented for member approval. No=0 pts.Yes=1 pt. 4. Elected leaders separate from paid employees. No=0 pts.Yes=1 pt. 5. Board meetings open to ordinary members (nonboard members). No=0 pts.Yes=1 pt. X 2 X 2 X 2 X 1 Weighted Score Total external relations and internal governance. development professionals also MEASURING INDIVIDUAL Organizational use other tools to measure specific capacity ORGANIZATIONAL areas. Some drawbacks of these tools are that they require specialized technical expertise and COMPONENTS they can be costly to use on a regular basis. Other tools may require some initial training In some cases, USAID is not trying to strength- but can be much more easily institutionalized. en the whole organization, but rather specific Below we have identified some tools for meaparts of it that need special intervention. In many suring selected organizational components. cases, the best way of measuring more specific (You will find complete reference information organizational changes is to use portions of the for these tools in the resources section of this instruments described. For instance, the IDF Tips.) has a comparatively well-developed section on management resources (leadership style, STRUCTURE AND CULTURE participatory management, planning, monitoring and evaluation, and management systems). The Preferred Organizational Structure instruSimilarly, the OCAT has some good sections on ment is designed to assess many aspects of or20 ganizational structure, such as formality of rules, communication lines, and decision-making. This tool requires organizational development skills, both to conduct the assessment and to interpret the results. HUMAN RESOURCES AND THEIR MANAGEMENT First, the development of indicators should be driven by the informational needs of managers, from both USAID and the given relevant organizations-- to inform strategic and operational decisions and to assist in reporting and communicating to partners and other stakeholders. At times, there is a tendency to identify or design a data collection instrument without giving too much thought to exactly what information will be needed for management and reporting. In these situations, indicators tend to be developed on the basis of the data that have been collected, rather than on what managers need. More to the point, the development of indicators should follow a thorough assessment of informational needs and precede the identification of a data collection instrument. Managers should first determine their informational needs; from these needs, they should articulate and define indicators; and only then, with this information in hand, they would identify or develop an instrument to collect the required data. This means that, in most cases, indicators should not be derived, post facto, from a data collection tool. Rather, the data collection tool should be designed with the given indicators in mind. Second, indicators should be developed for management decisions at all levels (input indicators, output indicators, process indicators, and outcome/impact indicators). With USAID’s increased emphasis on results, managers sometimes may concentrate primarily on strategic indicators (for development objectives and intermediate results). While an emphasis on results is appropriate, particularly for USAID managers, tracking operational-level information for the organizations supported through a given Agency program is critical if managers are to understand if, to what degree, and how the organizations are increasing their capacities. The instruments outlined in this paper can provide data for indicators defined at various management levels. Many personnel assessments exist, including the Job Description Index and the Job Diagnostic Survey, both of which measure different aspects of job satisfaction, skills, and task significance. However, skilled human resource practitioners must administer them. Other assessments, such as the Alexander Team Effectiveness Critique, have been used to examine the state and functioning of work teams and can easily be applied in the field. SERVICE DELIVERY Often, a customer survey is one of the best ways to measure the efficiency and effectiveness of a service delivery system. A specific customer survey would need to be designed relative to each situation. Example 7 shows a sample customer service assessment. DEVELOPING INDICATORS Indicators permit managers to track and understand activity/program performance at both the operational (inputs, outputs, processes) and strategic (development objectives and intermediate results) levels. To managers familiar with the development and use of indicators, it may seem straightforward to derive indicators from the instruments presented in the preceding pages. However, several critical points will ensure that the indicators developed within the context of these instruments are useful to managers. Finally, indicators should meet the criteria out21 Example 7. A Customer Service Assessment 1. In the past 12 months, have you ever contacted a municipal office to complain about something such as poor city services or a rude city official, or any other reason? ________No ________Yes If YES: 1a. How many different problems or complaints did you contact the municipality about in the last 12 months? ________One ________Two ________Three to five ________More than five 1b. Please describe briefly the nature of the complaint starting with the one you feel was most important. 1._______________________________________________ 2._______________________________________________ 3._______________________________________________ 2. Which department or officials did you contact initially regarding these complaints? ____Mayor’s office ____Council member ____Police ____Sanitation ____Public works ____Roads ____Housing ____Health ____Other________________________________________ 2a. Were you generally satisfied with the city’s response? (IF DISSATISFIED, ASK: What were the major reasons for your dissatisfaction?) _____Response not yet completed _____Satisfied _____Dissatisfied, never responded or corrected condition _____Dissatisfied, poor quality or incorrect response was provided _____Dissatisfied, took too long to complete response, had to keep pressuring for results, red tape, etc. _____Dissatisfied, personnel were discourteous, negative, etc. _____Dissatisfied, other_____________________________ 3. Overall, are you satisfied with the usefulness, courtesy and effectiveness of the municipal department or official that you contacted? _____Definitely yes _____Generally yes _____Generally no (explain)__________________________ _____Definitely no (explain)__________________________ Survey adapted from Hatry, Blair, and others, 1992. 22 lined in USAID’s Automated Directives System and related pieces of Agency guidance such as CDIE’s Performance Monitoring and Evaluation Tips #6, “Selecting Performance Indicators,” and Tips #12, “Guidelines for Indicator and Data Quality.” That is, indicators should be direct, objective, practical, and adequate. Once an indicator has been decided upon, it is important to document the relevant technical details: a precise definition of the indicator; a detailed description of the data source; and a thorough explanation of the data collection method. (Refer to Tips #7, “Preparing a Performance Monitoring Plan.”) data). If a result refers to multiple organizations, it might be useful to frame an indicator in terms of the number or percent of the organizations that meet or exceed a given threshold score or development stage, on the basis of an aggregate index or the score of a single element for each organization. The key is to ensure that the indicator reflects the result and to then identify the most appropriate and useful measurement instrument. RESULTS-LEVEL INDICATORS Example 8 includes real indicators used by USAID missions in 1998 to report on strategic objectives and intermediate results in institutional capacity strengthening. USAID managers spend substantial time and energy developing indicators for development objectives and intermediate results related to institutional capacity. The range of the Agency’s institutional strengthening programs is broad, as is the range of the indicators that track the programs’ results. Some results reflect multiple organizations and others relate to a single organization. Additionally, of those results that relate to multiple organizations, some may refer to organizations from only one sector while others may capture organizations from a number of sectors. Results related to institutional strengthening also vary relative to the level of change they indicate-- such as an increase in institutional capacity versus the eventual impact generated by such an i crease-- and with regard to whether they reflect strengthening of the whole organization(s) or just one or several elements. It is relatively easy to develop indicators for all types of results and to use the instruments outlined in this Tips to collect the necessary data. For example, when a result refers to strengthening a single organization, across all elements, an aggregate index or “score” of institutional strength may be an appropriate indicator (an instrument based on the IDF or the scorecard model might be used to collect such PRACTICAL TIPS FOR A BUSY USAID MANAGER This TIPS introduces critical issues related to measuring institutional capacity. It presents a number of approaches that managers of development programs and activities currently use in the field. In this section we summarize the preceding discussion by offering several quick tips that USAID managers should find useful as they design, modify, and implement their own approaches for measuring institutional capacity. 1. Carefully review the informational needs of the relevant managers and the characteristics of the organization to be measured to facilitate development of indicators. Identify your information needs and develop indicators before you choose an instrument. 23 2. To assist you in selecting an appropriate measurement tool, ask yourself the following questions as they pertain to your institutional capacity measurement situation. Equipped with the answers to these questions, you Example 8. Selected Institutional Capacity Indicators From USAID Missions Indicator • Number of institutions meeting at least 80% of their targeted improvements To Measure Institutions strengthened (entire organization) • Amount of funds raised from non-USAID Institutions more financially sustainable sources • Number of organizations where USAID contribution is less than 25% of revenues • Number of organizations where at least five funding sources contribute at least 10% each • Percent of suspected polio cases investiga- Organization’s service delivery systems tee within 48 hours strengthened • Number of governmental units displaying Local government management capacities improved practices, such as open and trans- improved parent financial systems, set organizational procedures, accountability, participatory decision-making, by-laws and elections can scan the “features list” that describes every tool in this paper to identify which measurement approaches to explore further. • Is the objective to measure the entire organization? Or is it to measure specific elements of the organization? If the latter, what are the specific capacity areas of functions to be measured? • How will the information be used? To measure change in an organization over time? To compare organizations with each other? To inform procurement decisions? To hold an organization accountable for achieving results or implementing reforms? • What type of organizations are you measuring? Are there any particular measurement issues pertaining to this type of organization that must be considered? • How participatory do you want the measurement process to be? • What is the purpose of the intervention? To strengthen an organization? 24 • Will organization members themselves or outsiders conduct the assessment? • What product do you want the measurement tool to generate? and involve some form of ordinal scaling/ scoring. When reviewing data, managers should therefore zero in on the direction and general degree of change. Do not be overly concerned about small changes; avoid false precision. • Do you want the measurement process to be an institution-strengthening exercise in itself?i. Do you need an instrument that measures one organization? Several organizations againstindividual criteria? Or several organizations against standard criteria? 5. Cost matters-- and so does the frequency and timing of data collection. Data need to be available frequently enough, and at the right point in the program cycle, to inform operational and strategic management decisions. Additionally, the management benefits of data should exceed the costs associated with their collection. 3. If you are concerned about data reliability, apply measurement instruments consistently over time and across organizations to ensure data reliability. You can adapt and adjust tools as needed, but once you develop the instru- 6. The process of measuring institutional capacity can contribute substantially to increasing an orment, use it consistently. ganization’s strength. A number of measurement approaches are explicitly designed as 4. When interpreting and drawing conclusions learning opportunities for organizations; from collected data, remember the limits of the that is, to identify problems and suggest rerelevant measurement tool. Most methods for lated solutions, to improve communication, measuring institutional capacity are subjecor to facilitate a consensus around future tive, as they are based on the perceptions priorities of those participating in the assessment, This TIPS was prepared for CDIE by Alan Lessik and Victoria Michener of Management Systems International. Bibliography RESOURCES Booth, W.; and R. Morin. 1996. Assessing Organizational Capacity Through Participatory Monitoring and Evaluation Handbook. Prepared for the Pact Ethiopian NGO Sector Enhancement Initiative. Washington: USAID. Center for Democracy and Governance. 1998. Handbook of Democracy and Governance 25 Program Indicators.Washington: U.S. Agency for International Development. Christian Reformed World Relief Committee. 1997. Partnering to Build and Measure Organizational Capacity. Grand Rapids, Mich. Cooper, S.; and R. O’Connor. 1993. “Standards for Organizational Consultation: Assessment and Evaluation Instruments.” Journal of Counseling and Development 71: 651-9. Counterpart International. N.d. “CAP Monitoring and Evaluation Questionnaire.” —N.d. “Manual for the Workshop on Development of a Training and Technical Assistance Plan (TTAP).” —N.d. “Institutional Assessment Indicators.” Drucker, P.; and C. Roseum. 1993. How to Assess Your Nonprofit Organization with Peter Drucker’s Five Important Questions: User Guide for Boards, Staff,Volunteers and Facilitators. Jossey--Bass . Eade, D. 1997. Capacity-Building: An Approach to People-Centred Development. Oxford: Oxfam. Fowler, A.; L. Goold; and R. James. 1995. Participatory Self Assessment of NGO Capacity. INTRAC Occasional Papers Series No. 10. Oxford. Hatry, H.; L. Blair; D. Fisk; J. Grenier; J. Hall; and P. Schaenman. 1992. How Effective Are Your Community Services? Procedures for Measuring Their Quality. Washington: The Urban Institute. International Working Group on Capacity Building of Southern NGOs. 1998. “Southern NGO Capacity Building: Issues and Priorities.” New Delhi: Society for Participatory Research in Asia. International Working Group on Capacity Building for NGOs. 1998. “Strengthening Southern NGOs: The Donor Perspective.” Washington: USAID and The World Bank. Kelleher, D. and K. McLaren with R. Bisson. 1996. “Grabbing the Tiger by the Tail: NGOs Learning forOrganizational Change.” Canadian Council for International Cooperation. Lent, D. October 1996. “What is Institutional Capacity?” On Track: The Reengineering Digest. 2 (7): 3. Washington: U.S. Agency for International Development. Levinger, B. and E. Bloom. 1997. Introduction to DOSA: An Outline Presentation. http://www.edc.org/int/capdev/dosafile/dosintr.htm. Lusthaus, C., G. Anderson, and E. Murphy. 1995. “Institutional Assessment: A Framework for Strengthening Organizational Capacity for IDRC’s Research Partners.” IDRC. 26 Mentz, J.C.N. 1997. “Personal and Institutional Factors in Capacity Building and Institutional Development.” European Centre for Development Policy Management Working Paper No. 14. Morgan, P.; and A. Qualman. 1996. “Institutional and Capacity Development, Results-Based Management and Organisational Performance.” Canadian International Development Agency. New TransCentury Foundation. 1996. Practical Approaches to PVO/NGO Capacity Building: Lessons from the Field (five monographs). Washington: U.S.Agency for International Development. Pact. N.d. “What is Prose?” —1998. “Pact Organizational Capacity Assessment Training of Trainers.” 7-8 January. Renzi, M. 1996. “An Integrated Tool Kit for Institutional Development.”Public Administration and Development 16: 469-83. —N.d. “The Institutional Framework: Frequently Asked Questions.” Unpublished paper. Management Systems International. Sahley, C. 1995. “Strengthening the Capacity of NGOs: Cases of Small Enterprise Development Agencies in Africa.” INTRAC NGO Management and Policy Series. Oxford. Save the Children. N.d. Institutional Strengthening Indicators: Self Assessment for NGOs. UNDP. 1997. Capacity Assessment and Development. Technical Advisory Paper No. 3, Management Development and Governance Division. New York. Bureau for Policy and Program Coordination. 1995. USAID-U.S. PVO Partnership. Policy Guidance. Washington: U.S. Agency for International Development. Office of Private and Voluntary Cooperation. 1998. USAID Support for NGO Capacity-Building: Approaches, Examples, Mechanisms. Washington: U.S. Agency for International Development. —1998. Results Review Fiscal Year 1997. Washington: U.S. Agency for International Development. NPI Learning Team. 1997. New Partnerships Initiative: A Strategic Approach to Development Partnering. Washington: U.S. Agency for International Development. 23 USAID/Brazil. 1998. Fiscal Year 2000 Results Review and Resource Request. USAID/Guatemala. 1998. Fiscal Year 2000 Results Review and Resource Request. 27 USAID/Indonesia. 1998. Fiscal Year 2000 Results Review and Resource Request. USAID/Madagascar. 1998. Fiscal Year 2000 Results Review and Resource Request. —1997. Institutional Capacity Needs Assessment. USAID/Mexico. 1998. The FY 1999--FY 2003 Country Strategy for USAID in Mexico. USAID/Mozambique. 1998. Fiscal Year 2000 Results Review and Resource Request. USAID/West Bank--Gaza. 1998. Fiscal Year 2000 Results Review and Resource Request. Whorton, J.; and D. Morgan. 1975. Measuring Community Performance: A Handbook of Indicators, University of Oklahoma. World Bank. 1996. Partnership for Capacity Building in Africa: Strategy and Program of Action. Washington. World Learning. 1998. Institutional Analysis Instrument: An NGO Development Tool. Sources of Information on Institutional Capacity Measurement Tools Discussion-Oriented Organizational Self-Assessment: http://www.edc.org/int/capdev/dosafile/dosintr.htm. Institutional Development Framework: Management Systems International. Washington. Organizational Capacity Assessment Tool: http://www.pactworld.org/ocat.html Pact.Washington. Dynamic Participatory Institutional Diagnostic: New TransCentury Foundation. Arlington,Va. Organizational Capacity Indicator: Christian Reformed World Relief Committee. Grand Rapids, Mich. Smith, P.; L. Kendall; and C. Hulin. 1969. The Measurement of Satisfaction in Work and Retirement. Rand McNally. Hackman, J.R.; and G.R. Oldham. 1975. “Job Diagnostic Survey: Development of the Job Diagnostic Survey” Journal of Applied Psychology 60: 159-70. Goodstein, L.D.; and J.W. Pfeiffer, eds. 1985. Alexander Team Effectiveness Critique:The 1995 Annual: Developing Human Resources. Pfeiffer & Co. 28 Bourgeois, L.J.; D.W. McAllister; and T.R. Mitchell. 1978. “Preferred Organizational Structure: The Effects of Different Organizational Environments Upon Decisions About Organizational Structure.” Academy of Management Journal 21: 508-14. Kraut, A. 1996. Customer and Employee Surveys: Organizational Surveys:Tools for Assessment and Change. Jossey-Bass Publishers. 24 29 NUMBER 16 1ST EDITION 2010 PERFORMANCE MONITORING & EVALUATION TIPS CONDUCTING MIXED-METHOD EVALUATIONS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. INTRODUCTION This TIPS provides guidance on using a mixed-methods approach for evaluation research. Frequently, evaluation statements of work specify that a mix of methods be used to answer evaluation questions. This TIPS includes the rationale for using a mixed-method evaluation design, guidance for selecting among methods (with an example from an evaluation of a training program) and examples of techniques for analyzing data collected with several different methods (including ―parallel analysis‖). MIXED-METHOD EVALUATIONS DEFINED A mixed-method evaluation is one that uses two or more techniques or methods to collect the data needed to answer one or more evaluation questions. Some of the different data collection methods that might be combined in an evaluation include structured observations, key informant interviews, pre- and post-test surveys, and reviews of government statistics. This could involve the collection and use of both quantitative and qualitative data to analyze and identify findings and to develop conclusions in response to the evaluation questions. 1 RATIONALE FOR USING A MIXEDMETHOD EVALUATION DESIGN There are several possible cases in which it would be highly beneficial to employ mixedmethods in an evaluation design: When a mix of different methods is used to collect data from different sources to provide independent estimates of key indicators—and those estimates complement one another—it increases the validity of conclusions related to an evaluation question. This is referred to as triangulation. (See TIPS 5: Rapid Appraisal, and Bamberger, Rugh and Key Steps in Developing a Mixed-Method Evaluation Design and Analysis Strategy 1. In order to determine the methods that will be employed, carefully review the purpose of the evaluation and the primary evaluation questions. Then select the methods that will be the most useful and cost-effective to answer each question in the time period allotted for the evaluation. Sometimes it is apparent that there is one method that can be used to answer most, but not all, aspects of the evaluation question. 2. Select complementary methods to cover different aspects of the evaluation question (for example, the how and why issues) that the first method selected cannot alone answer, and/or to enrich and strengthen data analysis and interpretation of findings. 3. In situations when the strength of findings and conclusions for a key question is absolutely essential, employ a triangulation strategy. What additional data sources and methods can be used to obtain information to answer the same question in order to increase the validity of findings from the first method selected? 4. Re-examine the purpose of the evaluation and the methods initially selected to ensure that all aspects of the primary evaluation questions are covered thoroughly. This is the basis of the evaluation design. Develop data collection instruments accordingly. 5. Design a data analysis strategy to analyze the data that will be generated from the selection of methods chosen for the evaluation. 6. Ensure that the evaluation team composition includes members that are well-versed and experienced in applying each type of data collection method and subsequent analysis. 7. Ensure that there is sufficient time in the evaluation statement of work for evaluators to fully analyze data analysis canof conducting also Mixed-method evaluations generated[2006] from eachfor method employedricher and to realize the and benefits a mixed method evaluation. Mabry further explanation and descriptions of triangulation strategies used in evaluations.) provide a better understanding of the context in which a program operates. When reliance on one method There are a number of additional benefits derived from using a mix of methods in any given evaluation. alone may not be sufficient to answer all aspects of each evaluation question. When the data collected from one method can help interpret findings from the analysis of data collected from another method. For example, qualitative data from in-depth interviews or focus groups can help interpret statistical patterns from quantitative data collected through a randomsample survey. This yields a Using mixed-methods can more readily yield examples of unanticipated changes or responses. Mixed-method evaluations have the potential of surfacing other key issues and providing a deeper understanding of program context that should be considered when analyzing data and developing findings and conclusions. 2 often yield a wider range of points of view that might otherwise be missed. DETERMINING WHICH METHODS TO USE In a mixed-method evaluation, the evaluator may use a combination of methods, such as a survey using comparison groups in a quasi-experimental or experimental design, a review of key documents, a reanalysis of government statistics, in-depth interviews with key informants, focus groups, and structured observations. The selection of methods, or mix, depends on the nature of the evaluation purpose and the key questions to be addressed. SELECTION OF DATA COLLECTION METHODS – AN EXAMPLE The selection of which methods to use in an evaluation is driven by the key evaluation questions to be addressed. Frequently, one primary evaluation method is apparent. For example, suppose an organization wants to know about the effectiveness of a pilot training program conducted for 100 individuals to set up their own small businesses after the completion of the training. The evaluator should ask what methods are most useful and cost-effective to assess the question of the effectiveness of that training program within the given time frame allotted for the evaluation. The answer to this question must be based on the stated outcome expected from the training program. In this example, let us say that the organization’s expectations were that, within one year, 70 percent of the 100 individuals that were trained will have used their new skills and knowledge to start a small business. What is the best method to determine whether this outcome has been achieved? The most costeffective means of answering this question is to survey 100 percent of the individuals who graduated from the training program using a close-ended questionnaire. It follows that a survey instrument should be designed to determine if these individuals have actually succeeded in starting up a new business. While this sounds relatively straightforward, organizations are often interested in related issues. If less than 70 percent of the individuals started a new business one year after completion of the training, the organization generally wants to know why some graduates from the program were successful while others were not. Did the training these individuals received actually help them start up a small business? Were there topics that should have been covered to more thoroughly prepare them for the realities of setting up a business? Were there other topics that should have been addressed? In summary, this organization wants to learn not only whether at least 70 percent of the individuals trained have started up a business, but also how effectively the training equipped them to do so. It also wants to know both the strengths and the shortcomings of the training so that it can improve future training programs. The organization may also want to know if there were factors outside the actual intervention that had a bearing on the training’s success or failure. For example, did some individuals find employment instead? Was 3 access to finance a problem? Did they conduct an adequate market analysis? Did some individuals start with prior business skills? Are there factors in the local economy, such as local business regulations, that either promote or discourage small business start-ups? There are numerous factors which could have influenced this outcome. The selection of additional methods to be employed is, again, based on the nature of each aspect of the issue or set of related questions that the organization wants to probe. To continue with this example, the evaluator might expand the number of survey questions to address issues related to the effectiveness of the training and external factors such as access to finance. These additional questions can be designed to yield additional quantitative data and to probe for information such as the level of satisfaction with the training program, the usefulness of the training program in establishing a business, whether the training graduate received a small business start-up loan, if the size of the loan the graduate received was sufficient, and whether graduates are still in the process of starting up their businesses or instead have found employment. Intake data from the training program on characteristics of each trainee can also be examined to see if there are any particular characteristics, such as sex or ethnic background, that can be correlated with the survey findings. It is important to draw on additional methods to help explain the statistical findings from the survey, probe the strengths and shortcomings of the training program, further understand issues related to access to finance, and identify external factors affecting success in starting a business. In this case, the evaluation design could focus on a sub-set of the 100 individuals to obtain additional qualitative information. A selected group of 25 people could be asked to answer an additional series of open-ended questions during the same interview session, expanding it from 30 minutes to 60 minutes. Whereas asking 100 people open-ended questions would be better than just 25 people, costs prohibit interviewing the entire group. Using the same example, suppose the organization has learned through informal feedback that access to finance is likely a key factor in determining success in business start-up in addition to the training program itself. Depending on the evaluation findings, the organization may want to design a finance program that increases access to loans for small business start-ups. To determine the validity of this assumption, the evaluation design relies on a triangulation approach to assess whether and how access to finance for business start-ups provides further explanations regarding success or failure outcomes. The design includes a plan to collect data from two other sources using a separate data collection method for each source. The first data source includes the quantitative data from the survey of the 100 training graduates. The evaluation designers determine that the second data source will be the managers of local banks and credit unions that survey respondents reported having approached for start-up loans. In-depth interviews will be conducted to record and understand policies for lending to entrepreneurs trying to establish small businesses, the application of those policies, and other business practices with respect to prospective clients. The third data source is comprised of bank loan statistics for entrepreneurs who have applied to start up small businesses. Now there are three independent data sources using different data collection methods to assess whether access to finance is an additional key factor in determining small business start-up success. In this example, the total mix of methods the evaluator would use includes the following: the survey of all 100 training graduates, data from open-ended questions from a subset of graduates selected for longer interviews, analysis of training intake data on trainee characteristics, in-depth interviews with managers of lending institutions, and an examination of loan data. The use of mixed-methods was necessary because the client organization in this case not only wanted to know how effective the pilot training course was based 4 on its own measure of program success, but also whether access to finance contributed to either success or failure in starting up a new business. The analysis of the data will be used to strengthen the training design and content employed in the pilot training course, and as previously stated, perhaps to design a microfinance program. The last step in the process of designing a mixed-method evaluation is to determine how the data derived from using mixed-methods will be analyzed to produce findings and to determine the key conclusions. ANALYZING DATA FROM A MIXEDMETHOD EVALUATION – DESIGNING A DATA ANALYSIS STRATEGY It is important to design the data analysis strategy before the actual data collection begins. Having done so, the evaluator can begin thinking about trends in findings from different sets of data to see if findings converge or diverge. Analyzing data collected from a mixture of methods is admittedly more complicated than analyzing the data derived from one method. This entails a process in which quantitative and qualitative data analysis strategies are eventually connected to determine and understand key findings. Several different techniques can be used to analyze data from mixedmethods approaches, including parallel analysis, conversion analysis, sequential analysis, multilevel analysis, and data synthesis. The choice of analytical techniques should be matched with the purpose of the evaluation using mixed-methods. Table 1 briefly describes the different analysis techniques and the situations in which each method is best applied. In complex evaluations with multiple issues to address, skilled evaluators may use more than one of these techniques to analyze the data. EXAMPLE OF APPLICATION Here we present an example of parallel mixed-data analysis, because it is the most widely used analytical technique in mixed-method evaluations. This is followed by examples of how to resolve situations where divergent findings arise from the analysis of data collected through a triangulation process. PARALLEL MIXED-DATA ANALYSIS Parallel mixed-data analysis is comprised of two major steps: Step 1: This involves two or more analytical processes. The data collected from each method employed must be analyzed separately. For example, a statistical analysis of quantitative data derived from a survey, a set of height/weight measures, or a set of government statistics is conducted. Then, a separate and independent analysis is conducted of qualitative data derived from, for example, indepth interviews, case studies, focus groups, or structured observations to determine emergent themes, broad patterns, and contextual factors. The main point is that the analysis of data collected from each method must be conducted independently. Step 2: Once the analysis of the data generated by each data collection method is completed, the evaluator focuses on how the analysis and findings from each data set can inform, explain, and/or strengthen findings from the other data set. There are two possible primary analytical methods for doing this – and sometimes both methods are used in the same evaluation. Again, the method used depends on the purpose of the evaluation. In cases where more than one method is used specifically to strengthen and validate findings for the same question through a triangulation design, the evaluator compares the findings from the independent analysis on each data set to determine if there is a convergence of findings. This method is used when it is critical to produce defensible conclusions that can be used to inform major program decisions (e.g., end or extend a program). To interpret or explain findings from quantitative 5 analysis, evaluators use findings from the analysis of qualitative data. This method can provide a richer analysis and set of explanations affecting program outcomes that enhance the utility of the evaluation for program managers. Conversely, patterns and associations arising from the analysis of quantitative data can inform additional patterns to look for in analyzing qualitative data. The analysis of qualitative data can also enhance the understanding of important program context data. This method is often used when program managers want to know not only whether or not a program is achieving its intended results, but also, why or why not. WHEN FINDINGS CONVERGE DO NOT In cases where mixed-method evaluations employ triangulation, it is not unusual that findings from the separate analysis of each data set do not automatically converge. If this occurs, the evaluator must try to resolve the conflict among divergent findings. This is not a disaster. Often this kind of situation can present an opportunity to generate more nuanced explanations and important additional findings that are of great value. One method evaluators use when findings from different methods diverge is to carefully re-examine the raw qualitative data through a second and more in-depth content analysis. This is done to determine if there were any factors or issues that were missed when these data were first being organized for analysis. The results of this third layer of analysis can produce a deeper understanding of the data, and can then be used to generate new interpretations. In some cases, other factors external to the program might be discovered through contextual analysis of economic, social or political conditions or an analysis of operations and interventions across program sites. Another approach is to reanalyze all the disaggregated data in each data set separately, by characteristics of the respondents as appropriate to the study, such as age, gender, educational background, economic strata, etc., and/or by geography/locale of respondents. The results of this analysis may yield other information that can help to resolve the divergence of findings. In this case, the evaluator should attempt to rank order these factors in terms of frequency of occurrence. This further analysis will provide additional explanations for the variances in findings. While most professionals build this type of disaggregation into the analysis of the data during the design phase of the evaluation, it is worth reexamining patterns from disaggregated data. Evaluators should also check for data quality issues, such as the validity of secondary data sources or possible errors in survey data from incomplete recording or incorrect coding of responses. (See TIPS 12: Data Quality Standards.) If the evaluators are still at the program site, it is possible to resolve data quality issues with limited follow-up data collection by, for example, conducting in-depth interviews with key informants (if time and budget permit). In cases where an overall summative program conclusion is required, another analytical tool that is used to resolve divergent findings is the data synthesis method. (See Table 2.) This method rates the strength of findings generated from the analysis of each data set based on the intensity of the impact (e.g., on a scale from very high positive to very high negative) and the quality and validity of the data. An overall rating is assigned for each data set, but different weights can then be assigned to different data sets if the evaluator knows that certain data sources or methods for collecting data are stronger than others. Ultimately, an index is created based on the average of those ratings to synthesize an overall program effect on the outcome. See McConney, Rudd and Ayres (2002) to learn more about this method. REPORTING ON MIXED-METHOD EVALUATIONS Mixed-method evaluations generate a great deal of data, 6 and, to profit from the use of those methods, evaluators must use and analyze all of the data sets. Through the use of mixedmethod evaluations, findings and conclusions can be enriched and strengthened. Yet there is a tendency to underuse, or even not to use, all the data collected for the evaluation. Evaluators can rely too heavily on one particular data source if it generates easily digestible and understandable information for a program manager. For example, in many cases data generated from qualitative methods are insufficiently analyzed. In some cases only findings from one source are reported. One way to prevent underutilization of findings is to write a statement of work that provides the evaluator sufficient time to analyze the data sets from each method employed, and hence to develop valid findings, explanations, and strong conclusions that a program manager can use with confidence. Additionally, statements of work for evaluation should require evidence of, and reporting on, the analysis of data sets from each method that was used to collect data, or methodological justification for having discarded any data sets. REFERENCES Bamberger, Michael, Jim Rugh and Linda Mabry. Real World Evaluation: Working Under Budget, Time, Data and Political Constraints, Chapter 13, ―Mixed-Method Evaluation,‖ pp. 303-322, Sage Publications Inc., Thousand Oaks, CA, 2006. Greene, Jennifer C. and Valerie J. Caracelli. ―Defining and Describing the Paradigm Issue in Mixedmethods Evaluation,” in Advances in Mixed-Method Evaluation: The Challenges and Benefits of Integrating Diverse Paradigms, Green and Caracelli eds. New Directions for Evaluation. Josey-Bass Publishers, No. 74, Summer 1997, pp 5-17. Mark, Melvin M., Irwin Feller and Scott B. Button. ―Integrating Qualitative Methods in a Predominantly Quantitative Evaluation: A Case Study and Some Reflections,‖ in Advances in Mixed-Method Evaluation: The Challenges and Benefits of Integrating Diverse Paradigms, Green and Caracelli eds. New Directions for Evaluation. Josey-Bass Publishers, No. 74, Summer 1997, pp 47-59. McConney, Andrew, Andy Rudd, and Robert Ayres. ―Getting to the Bottom Line: A Method for Synthesizing Findings Within Mixed-method Program Evaluations,‖ in American Journal of Evaluation, Vol. 3, No. 2, 2002, pp. 121-140. Teddlie, Charles and Abbas Tashakkori, Foundations of Mixed-methods Research: Integrating Quantitative and Qualitative Approaches in the Behavioral Science, Sage Publications, Inc., Los Angeles, 2009. 7 TABLE 1 – METHODS FOR ANALYZING MIXED-METHODS DATA1 Analytical Method Brief Description Best for… Parallel Two or more data sets collected using a mix of Triangulation designs to look for methods (quantitative and qualitative) are analyzed convergence of findings when the strength independently. The findings are then combined or of the findings and conclusions is critical, integrated. or to use analysis of qualitative data to yield deeper explanations of findings from quantitative data analysis. Conversion Two types of data are generated from one data source Extending the findings of one data set, say, beginning with the form (quantitative or qualitative) of quantitative, to generate additional the original data source that was collected. Then the findings and/or to compare and potentially data are converted into either numerical or narrative strengthen the findings generated from a data. A common example is the transformation of complimentary set of, say, qualitative data. qualitative narrative data into numerical data for statistical analysis (e.g., on the simplest level, frequency counts of certain responses). Sequential A chronological analysis of two or more data sets Testing hypotheses generated from the (quantitative and qualitative) where the results of the analysis of the first data set. analysis from the first data set are used to inform the analysis of the second data set. The type of analysis conducted on the second data set is dependent on the outcome of the first data set. Multilevel Qualitative and quantitative techniques are used at Evaluations where organizational units for different levels of aggregation within a study from at study are nested (e.g., patient, nurse, least two data sources to answer interrelated evaluation doctor, hospital, hospital administrator in questions. One type of analysis (qualitative) is used at an evaluation to understand the quality of one level (e.g., patient) and another type of analysis patient treatment). (quantitative) is used in at least one other level (e.g., nurse). Data Synthesis A multi-step analytical process in which: 1) a rating of Providing a bottom-line measure in cases program effectiveness using the analysis of each data where the evaluation purpose is to provide set is conducted (e.g., large positive effect, small a summative program-wise conclusion positive effect, no discernable effect, small negative when findings from mixed-method effect, large negative effect; 2) quality of evidence evaluations using a triangulation strategy assessments are conducted for each data set using do not converge and appear to be “criteria of worth” to rate the quality and validity of each irresolvable, yet a defensible conclusion is data set gathered; 3) using the ratings collected under needed to make a firm program decision. the first two steps, develop an aggregated equation for Note: there may still be some divergence in each outcome under consideration to assess the overall the evaluation findings from mixed data strength and validity of each finding; and 4) average sets that the evaluator can still attempt to outcome-wise effectiveness estimates to produce one resolve and/or explore to further enrich the overall program-wise effectiveness index. analysis and findings. 1 See Teddlie and Tashakkori (2009) and Mark, Feller and Button (1997) for examples and further explanations of parallel data analysis. See Teddlie and Tashakkori (2009) on conversion, sequential, multilevel, and fully integrated mixed-methods data analysis; and McConney, Rudd, and Ayers (2002), for a further explanation of data synthesis analysis. 8 For more information: TIPS publications are available online at [insert website]. Acknowledgements: Our thanks to those whose experience and insights helped shape this publication including USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was written by Dr. Patricia Vondal of Management Systems International. Comments regarding this publication can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 9 NUMBER 17 1ST EDITION, 2010 PERFORMANCE MONITORING & EVALUATION TIPS CONSTRUCTING AN EVALUATION REPORT ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. INTRODUCTION This TIPS has three purposes. First, it provides guidance for evaluators on the structure, content, and style of evaluation reports. Second, it offers USAID officials, who commission evaluations, ideas on how to define the main deliverable. Third, it provides USAID officials with guidance on reviewing and approving evaluation reports. The main theme is a simple one: how to make an evaluation report useful to its readers. Readers typically include a variety of development stakeholders and professionals; yet, the most important are the policymakers and managers who need credible information for program or project decision-making. Part of the primary purpose of an evaluation usually entails informing this audience. To be useful, an evaluation report should address the evaluation questions and issues with accurate and data-driven findings, justifiable conclusions, and practical recommendations. It should reflect the use of sound evaluation methodology and data collection, and report the limitations of each. Finally, an evaluation should be written with a structure and style that promote learning and action. Five common problems emerge in relation to evaluation reports. These problems are as follows: • An unclear description of the program strategy and the specific results it is designed to achieve. • Inadequate description of the evaluation’s purpose, intended uses, and the specific evaluation questions to be addressed. • Imprecise analysis and reporting of quantitative and qualitative data collected during the evaluation. 1 • A lack of clear distinctions between findings and conclusions. • Conclusions that are not grounded in the facts and recommendations that do not flow logically from conclusions. This guidance offers tips that apply to an evaluation report for any type of evaluation — be it formative, summative (or impact), a rapid appraisal evaluation, or one using more rigorous methods. Evaluation reports should be readily understood and should identify key points clearly, distinctly, and succinctly. (ADS 203.3.6.6) A PROPOSED REPORT OUTLINE Table 1 presents a suggested outline and approximate page lengths for a typical evaluation report. The evaluation team can, of course, modify this outline as needed. As indicated in the table, however, some elements are essential parts of any report. This outline can also help USAID managers define the key deliverable in an Evaluation Statement of Work (SOW) (see TIPS 3: Preparing an Evaluation SOW). We will focus particular attention on the section of the report that covers findings, conclusions, and recommendations. This section represents the core element of the evaluation report. BEFORE THE WRITING BEGINS Before the report writing begins, the evaluation team must complete two critical tasks: 1) establish clear and defensible findings, conclusions, and recommendations that clearly address the evaluation questions; and 2) decide how to organize the report in a way that conveys these elements most effectively. FINDINGS, CONCLUSIONS, AND RECOMMENDATIONS One of the most important tasks in constructing an evaluation report is to organize the report into three main elements: findings, conclusions, and recommendations (see Figure 1). This structure brings rigor to the evaluation and ensures that each element can ultimately be traced back to the basic facts. It is this structure that sets evaluation apart from other types of analysis. Once the research stage of an evaluation is complete, the team has typically collected a great deal of data in order to answer the evaluation questions. Depending on the methods used, these data can include observations, responses to survey questions, opinions and facts from key informants, secondary data from a ministry, and so on. The team’s first task is to turn these raw data into findings. Suppose, for example, that USAID has charged an evaluation team with answering the following evaluation question (among others): “How adequate are the prenatal services provided by the Ministry of Health’s rural clinics in Northeastern District?” To answer this question, their research in the district included site visits to a random sample of rural clinics, discussions with knowledgeable health professionals, and a survey of women who have used clinic prenatal services during the past year. The team analyzed the raw, qualitative data and identified the following findings: • Of the 20 randomly-sampled rural clinics visited, four clinics met all six established standards of care, FIGURE 1. ORGANIZING KEY ELEMENTS OF THE EVALUATION REPORT Recommendations Proposed actions for management Conclusions Interpretations and judgments based on the findings Findings Empirical facts collected during the evaluation while the other 16 (80 percent) failed to meet at least two standards. The most commonly unmet standard (13 clinics) was “maintenance of minimum staffpatient ratios.” • In 14 of the 16 clinics failing to meet two or more standards, not one of the directors was able to state the minimum staff-patient ratios for nurse practitioners, nurses, and prenatal educators. TYPICAL PROBLEMS WITH FINDINGS Findings that: 1. Are not organized to address the evaluation questions — the reader must figure out where they fit. 2. Lack precision and/or context —the reader cannot interpret their relative strength. Incorrect: “Some respondents said ’x,’ a few said ’y,’ and others said ’z.’” Correct: “Twelve of the 20 respondents (60 percent) said ’x,’ five (25 percent) said ’y,’ and three (15 percent) said ’z.’ ” 3. Mix findings and conclusions. Incorrect: “The fact that 82 percent of the target group was aware of the media campaign indicates its effectiveness.” Correct: Finding: “Eighty-two percent of the target group was aware of the media campaign.” Conclusion: “The media campaign was effective.” 2 TYPICAL PROBLEMS WITH CONCLUSIONS Conclusions that: 1. 2. 3. 4. Restate findings. Incorrect: “The project met its performance targets with respect to outputs and results.” Correct: “The project’s strategy was successful.” Are vaguely stated. Incorrect: “The project could have been more responsive to its target group.” Correct: “The project failed to address the different needs of targeted women and men.” Are based on only one of several findings and data sources. Include respondents’ conclusions, which are really findings. Incorrect: “All four focus groups of project beneficiaries judged the project to be effective.” Correct: “Based on our focus group data and quantifiable data on key results indicators, we conclude that the project was effective.” • Of 36 women who had used their rural clinics’ prenatal services during the past year, 27 (76 percent) stated that they were “very dissatisfied” or “dissatisfied,” on a scale of 1-5 from “very dissatisfied” to “very satisfied.” The most frequently cited reason for dissatisfaction was “long waits for service” (cited by 64 percent of the 27 dissatisfied women). • Six of the seven key informants who offered an opinion on the adequacy of prenatal services for the rural poor in the district noted that an insufficient number of prenatal care staff was a “major problem” in rural clinics. These findings are the empirical facts collected by the evaluation team. Evaluation findings are analogous to the evidence presented in a court of law or a patient’s symptoms identified during a visit to the doctor. Once the evaluation team has correctly laid out all the findings against each evaluation question, only then should conclusions be drawn for each question. This is where many teams tend to confuse findings and conclusions both in their analysis and in the final report. Conclusions represent the team’s judgments based on the findings. These are analogous to a court jury’s decision to acquit or convict based on the evidence presented or a doctor’s diagnosis based on the symptoms. The team must keep findings and conclusions distinctly separate from each other. However, there must also be a clear and logical relationship between findings and conclusions. In our example of the prenatal services evaluation, examples of reasonable conclusions might be as follows: • In general, the levels of prenatal care staff in Northeastern District’s rural clinics are insufficient. • The Ministry of Health’s periodic informational bulletins to clinic directors regarding the standards of prenatal care are not sufficient to ensure that standards are understood and implemented. However, sometimes the team’s findings from different data sources are not so clear-cut in one direction as this one. In those cases, the team must weigh the relative credibility of the data sources and the quality of the data, and make a judgment call. The team might state that a definitive conclusion cannot be made, or it might draw a more 3 guarded conclusion such as the following: “The preponderance of the evidence suggests that prenatal care is weak.” The team should never omit contradictory findings from its analysis and report in order to have more definitive conclusions. Remember, conclusions are interpretations and judgments made TYPICAL PROBLEMS WITH RECOMMENDATIONS Recommendations that: 1. Are unclear about the action to be taken. Incorrect: “Something needs to be done to improve extension services.” Correct: “To improve extension services, the Ministry of Agriculture should implement a comprehensive introductory training program for all new extension workers and annual refresher training programs for all extension workers. “ 2. Fail to specify who should take action. Incorrect: “Sidewalk ramps for the disabled should be installed.” Correct: “Through matching grant funds from the Ministry of Social Affairs, municipal governments should install sidewalk ramps for the disabled.” 3. Are not supported by any findings and conclusions 4. Are not realistic with respect to time and/or costs. Incorrect: The Ministry of Social Affairs should ensure that all municipal sidewalks have ramps for the disabled within two years. Correct: The Ministry of Social Affairs should implement a gradually expanding program to ensure that all municipal sidewalks have ramps for the disabled within 15 years. on the basis of the findings. Sometimes we see reports that include conclusions derived from preconceived notions or opinions developed through experience gained outside the evaluation, especially by members of the team who have substantive expertise on a particular topic. We do not recommend this, because it can distort the evaluation. That is, the role of the evaluator is to present the findings, conclusions, and recommendations in a logical order. Opinions outside this framework are then, by definition, not substantiated by the facts at hand. If any of these opinions are directly relevant to the evaluation questions and come from conclusions drawn from prior research or secondary sources, then the data upon which they are based should be presented among the evaluation’s findings. FIGURE 3 OPTIONS FOR REPORTING FINDINGS, CONCLUSIONS, AND RECOMMENDATIONS OPTION 1 FINDINGS Evaluation Question 1 Evaluation Question 2 CONCLUSIONS Evaluation Question 1 Evaluation Question 2 OPTION 2 EVALUATION QUESTION 1 Findings Conclusions Recommendations EVALUATION QUESTION 2 Findings RECOMMENDATIONS Conclusions Evaluation Question 1 Evaluation Question 2 Recommendations OPTION 3 Mix the two approaches. Identify which evaluation questions are distinct and which are interrelated. For distinct questions, use option 1 and for the latter, use option 2. FIGURE 2 Tracking the linkages is one way to help ensure a credible report, with information that will be useful. Evaluation Question #1: FINDINGS CONCLUSIONS RECOMMENDATIONS XXXXXX YYYYYY ZZZZZZ XXXXXX ZZZZZZ XXXXXX YYYYYY Once conclusions are complete, the team is ready to make its recommendations. Too often recommendations do not flow from the team’s conclusions or, worse, they are not related to the original evaluation purpose and evaluation questions. They may be good ideas, but they do not belong in this section of the report. As an alternative, they could be included in an annex with a note that they are derived from coincidental observations made by the team or from team members’ experiences elsewhere. Using our example related to rural health clinics, a few possible recommendations could emerge as follows: • The Ministry of Health’s Northeastern District office should develop and implement an annual prenatal standards-of-care training program for all its rural clinic directors. The program would cover…. • The Northeaster District office should conduct a formal assessment of prenatal care staffing levels in all its rural clinics. • Based on the assessment, the 4 ZZZZZZ Northeastern District office should establish and implement a five-year plan for hiring and placing needed prenatal care staff in its rural clinics on a mostneedy-first basis. Although the basic recommendations should be derived from conclusions and findings, this is where the team can include ideas and options for implementing recommendations that may be based on their substantive expertise and best practices drawn from experience outside the evaluation itself. Usefulness is paramount. When developing recommendations, consider practicality. Circumstances or resources may limit the extent to which a recommendation can be implemented. If practicality is an issue — as is often the case — the evaluation team may need to ramp down recommendations, present them in terms of incremental steps, or suggest other options. In order to be useful, it is essential that recommendations be actionable or, in other words, feasible in light of the human, technical, and financial resources available. Weak connections between findings, conclusions, and recommendations can undermine the user’s confidence in evaluation results. As a result, we encourage teams—or, better yet, a colleague who has not been involved—to review the logic before beginning to write the report. For each evaluation question, present all the findings, conclusions, and recommendations in a format similar to the one outlined in Figure 2. Starting with the conclusions in the center, track each one back to the findings that support it, and decide whether the findings truly warrant the conclusion being made. If not, revise the conclusion as needed. Then track each recommendation to the conclusion(s) from which it flows, and revise if necessary. CHOOSE THE BEST APPROACH FOR STRUCTURING THE REPORT Depending on the nature of the evaluation questions and the findings, conclusions, and recommendations, the team has a few options for structuring this part of the report (see Figure 3). The objective is to present the report in a way that makes it as easy as possible for the reader to digest all of the information. Options are discussed below. Option 1- Distinct Questions If all the evaluation questions are distinct from one another and the relevant findings, conclusions, and recommendations do not cut across questions, then one option is to organize the report around each evaluation question. That is, each question will include a section including its relevant findings, conclusions, and recommendations. Option 2- Interrelated Questions If, however, the questions are closely interrelated and there are findings, conclusions, and/or recommendations that apply to more than one question, then it may be preferable to put all the findings for all the evaluation questions in one section, all the conclusions in another, and all the recommendations in a third. Option 3- Mixed If the situation is mixed—where a few but not all the questions are closely interrelated—then use a mixed approach. Group the interrelated questions and their findings, conclusions, and recommendations into one subsection, and treat the stand-alone questions and their respective findings, conclusions, and recommendations in separate subsections. The important point is that the team should be sure to keep findings, conclusions, and recommendations separate and distinctly labeled as such. Finally, some evaluators think it more useful to present the conclusions first, and then follow with the findings supporting them. This helps the reader see the “bottom line” first and then make a judgment as to whether the conclusions are warranted by the findings. OTHER KEY SECTIONS OF THE REPORT THE EXECUTIVE SUMMARY The Executive Summary should stand alone as an abbreviated version of the entire report. Often it is the only thing that busy managers read. The Executive Summary should be a “mirror image” of the full report—it should contain no new information that is not in the main report. This principle also applies to making the Executive Summary and the full report equivalent with respect to presenting positive and negative evaluation results. Although all sections of the full report are summarized in the Executive Summary, less emphasis is given to an overview of the project and the description of the evaluation purpose and methodology than is given to the findings, conclusions, and recommendations. Decisionmakers are generally more interested in the latter. The Executive Summary should be written after the main report has been drafted. Many people believe that a good Executive Summary should not exceed two pages, but there is no formal rule in USAID on this. Finally, an Executive Summary should be written in a way that will entice interested stakeholders to go on to read the full report. DESCRIPTION OF THE PROJECT Many evaluation reports give only cursory attention to the development problem (or opportunity) that motivated the project in the first place, or to the 5 FIGURE 4. SUMMARY OF EVALUATION DESIGN AND METHODS (an illustration) Evaluation Question Type of Analysis Conducted Data Sources and Methods Used Type and Size of Sample 1. How adequate are the prenatal services provided by the Ministry of Health’s (MOH) rural clinics in Northeastern District? Comparison of rural clinics’ prenatal service delivery to national standards MOH manual of rural clinic standards of care Structured observations and staff interviews at rural clinics Twenty clinics, randomly sampled from 68 total in Northeastern District Three of the originally sampled clinics were closed when the team visited. To replace each, the team visited the closest open clinic. As a result, the sample was not totally random. Description, based on a content analysis of expert opinions Key informant interviews with health care experts in the district and the MOH Ten experts identified by project & MOH staff Only seven of the 10 experts had an opinion about prenatal care in the district. Description and comparison of ratings among women in the district and two other similar rural districts In-person survey of recipients of prenatal services at clinics in the district and two other districts Random samples of 40 women listed in clinic records as having received prenatal services during the past year from each of the three districts’ clinics Of the total 120 women sampled, the team was able to conduct interviews with only 36 in the district, and 24 and 28 in the other two districts. The levels of confidence for generalizing to the populations of service recipients were __, __, and __, respectively. “theory of change” that underpins USAID’s intervention. The “theory of change” includes what the project intends to do and the results which the activities are intended to produce. TIPS 13: Building a Results Framework is a particularly useful reference and provides additional detail on logic models. If the team cannot find a description of these hypotheses or any model of the project’s cause-and-effect logic such as a Results Framework or a Logical Framework, this should be noted. The evaluation team will then have to summarize the project strategy in terms of the “if-then” propositions that show how the project designers envisioned the interventions as leading to desired results. In describing the project, the evaluation team should be clear about what USAID tried to improve, eliminate, or otherwise change for the better. What was the “gap” between conditions at the start of the project and the more desirable conditions that USAID wanted to establish with the project? The team should indicate whether the project design documents and/or the recall of interviewed project designers offered a clear picture of the specific economic and social factors that contributed to the problem — with baseline data, if available. Sometimes photographs and maps of before-project conditions, such as the physical characteristics and locations of rural prenatal clinics in our example, can be used to illustrate the main problem(s). It is equally important to include basic information about when the project was undertaken, its cost, its intended beneficiaries, and where it was implemented (e.g., country-wide or only in specific districts). It can be particularly useful to include a 6 Limitations map that shows the project’s target areas. A good description also identifies the organizations that implement the project, the kind of mechanism used (e.g., contract, grant, or cooperative agreement), and whether and how the project has been modified during implementation. Finally, the description should include information about context, such as conflict or drought, and other government or donor activities focused on achieving the same or parallel results. THE EVALUATION PURPOSE AND METHODOLOGY The credibility of an evaluation team’s findings, conclusions, and recommendations rests heavily on the quality of the research design, as well as on data collection methods and analysis used. The reader needs to understand what the team did and why in order to make informed judgments about credibility. Presentation of the evaluation design and methods is often best done through a short summary in the text of the report and a more detailed methods annex that includes the evaluation instruments. Figure 4 provides a sample summary of the design and methodology that can be included in the body of the evaluation report. From a broad point of view, what research design did the team use to answer each evaluation question? Did the team use description (e.g., to document what happened), comparisons (e.g., of baseline data or targets to actual data, of actual practice to standards, among target sub-populations or locations), or cause-effect research (e.g., to determine whether the project made a difference)? To do causeeffect analysis, for example, did the team use one or more quasiexperimental approaches, such as time-series analysis or use of nonproject comparison groups (see TIPS 11: The Role of Evaluation)? More specifically, what data collection methods did the team use to get the evidence needed for each evaluation question? Did the team use key informant interviews, focus groups, surveys, on-site observation methods, analyses of secondary data, and other methods? How many people did they interview or survey, how many sites did they visit, and how did they select their samples? and developing the findings and conclusions that follow in the report. The reader needs to know these limitations in order to make informed judgments about the evaluation’s credibility and usefulness. Most evaluations suffer from one or more constraints that affect the comprehensiveness and validity of findings and conclusions. These may include overall limitations on time and resources, unanticipated problems in reaching all the key informants and survey respondents, unexpected problems with the quality of secondary data from the host-country government, and the like. In the methodology section, the team should address these limitations and their implications for answering the evaluation questions When writing its report, the evaluation team must always remember the composition of its audience. The team is writing for policymakers, managers, and takeholders, not for fellow social science researchers or for publication in a professional journal. To that end, the style of writing should make it as easy as possible for the intended audience to understand and digest what the team is presenting. For further suggestions on writing an evaluation in reader-friendly style, see Table 2. 7 READER-FRIENDLY STYLE TABLE 1. SUGGESTED OUTLINE FOR AN EVALUATION REPORT1 Element Approximate Number of Pages Description and Tips for the Evaluation Team Title Page 1 (but no page number) Essential. Should include the words “U.S. Agency for International Development” with the acronym “USAID,” the USAID logo, and the project/contract number under which the evaluation was conducted. See USAID Branding and Marking Guidelines (http://www.usaid.gov/branding/) for logo and other specifics. Give the title of the evaluation; the name of the USAID office receiving the evaluation; the name(s), title(s), and organizational affiliation(s) of the author(s); and the date of the report. Contents As needed, and start with Roman numeral ii. Essential. Should list all the sections that follow, including Annexes. For multi-page chapters, include chapter headings and first- and second-level headings. List (with page numbers) all figures, tables, boxes, and other titled graphics. Foreword 1 Optional. An introductory note written by someone other than the author(s), if needed. For example, it might mention that this evaluation is one in a series of evaluations or special studies being sponsored by USAID. Acknowledgements 1 Optional. The authors thank the various people who provided support during the evaluation. Preface 1 Optional. Introductory or incidental notes by the authors, but not material essential to understanding the text. Acknowledgements could be included here if desired. Executive Summary 2-3; 5 at most Essential, unless the report is so brief that a summary is not needed. (See discussion on p. 5) Glossary 1 Optional. Is useful if the report uses technical or project-specific terminology that would be unfamiliar to some readers. Acronyms and Abbreviations 1 Essential, if they are used in the report. Include only those acronyms that are actually used. See Table 3 for more advice on using acronyms. I. Introduction 5-10 pages, starting with Arabic numeral 1. Optional. The two sections listed under Introduction here could be separate, stand-alone chapters. If so, a separate Introduction may not be needed. Description of the Project The Evaluation Purpose and Methodology II. Findings, Conclusions, and Recommendations Essential. Describe the context in which the USAID project took place— e.g., relevant history, demography, political situation, etc. Describe the specific development problem that prompted USAID to implement the project, the theory underlying the project, and details of project implementation to date. (See more tips on p. 6.) Essential. Describe who commissioned the evaluation, why they commissioned it, what information they want, and how they intend to use the information (and refer to the Annex that includes the Statement of Work). Provide the specific evaluation questions, and briefly describe the evaluation design and the analytical and data collection methods used to answer them. Describe the evaluation team (i.e., names, qualifications, and roles), what the team did (e.g., reviewed relevant documents, analyzed secondary data, interviewed key informants, conducted a survey, conducted site visits), and when and where they did it. Describe the major limitations encountered in data collection and analysis that have implications for reviewing the results of the evaluation. Finally, refer to the Annex that provides a fuller description of all of the above, including a list of documents/data sets reviewed, a list of individuals interviewed, copies of the data collection instruments used, and descriptions of sampling procedures (if any) and data analysis procedures. (See more tips on p. 6.) 20-30 pages Essential. However, in some cases, the evaluation user does not want recommendations, only findings and conclusions. This material may be 8 TABLE 1. SUGGESTED OUTLINE FOR AN EVALUATION REPORT1 Element Approximate Number of Pages Description and Tips for the Evaluation Team organized in different ways and divided into several chapters. (A detailed discussion of developing defensible findings, conclusions, and recommendations and structural options for reporting them is on p 2 and p. 5) III. Summary of Recommendations 1-2 pages Essential or optional, depending on how findings, conclusions and recommendations are presented in the section above. (See a discussion of options on p. 4.) If all the recommendations related to all the evaluation questions are grouped in one section of the report, this summary is not needed. However, if findings, conclusions, and recommendations are reported together in separate sections for each evaluation question, then a summary of all recommendations, organized under each of the evaluation questions, is essential. IV. Lessons Learned As needed Required if the SOW calls for it; otherwise optional. Lessons learned and/or best practices gleaned from the evaluation provide other users, both within USAID and outside, with ideas for the design and implementation of related or similar projects in the future. Some are essential and some are optional as noted. Essential. Lets the reader see exactly what USAID initially expected in the evaluation. Annexes Statement of Work Evaluation Design and Methodology Essential. Provides a more complete description of the evaluation questions, design, and methods used. Also includes copies of data collection instruments (e.g., interview guides, survey instruments, etc.) and describes the sampling and analysis procedures that were used. List of Persons Interviewed Essential. However, specific names of individuals might be withheld in order to protect their safety. List of Documents Reviewed Essential. Includes written and electronic documents reviewed, background literature, secondary data sources, citations of websites consulted. Dissenting Views If needed. Include if a team member or a major stakeholder does not agree with one or more findings, conclusions, or recommendations. Recommendation Action Checklist Optional. As a service to the user organization, this chart can help with follow-up to the evaluation. It includes a list of all recommendations organized by evaluation question, a column for decisions to accept or reject each recommendation, a column for the decision maker’s initials, a column for the reason a recommendation is being rejected, and, for each accepted recommendation, columns for the actions to be taken, by when, and by whom. 1 The guidance and suggestions in this table were drawn from the writers’ experience and from the “CDIE Publications Style Guide: Guidelines for Project Managers, Authors, & Editors,” compiled by Brian Furness and John Engels, December 2001. The guide, which includes many tips on writing style, editing, referencing citations, and using Word and Excel is available online at http://kambing.ui.ac.id/bebas/v01/DEC-USAID/Other/publications-style-guide.pdf. Other useful guidance: ADS 320 (http://www.usaid.gov/policy/ads/300/320.pdf ; http://www.usaid.gov/branding; and http://www.usaid.gov/branding/Graphic Standards Manual.pdf. 9 TABLE 2. THE QUICK REFERENCE GUIDE FOR A READER-FRIENDLY TECHNICAL STYLE Writing Style— Keep It Simple and Correct! Avoid meaningless precision. Decide how much precision is really necessary. Instead of “62.45 percent,” might “62.5 percent” or “62 percent” be sufficient? The same goes for averages and other calculations. Use technical terms and jargon only when necessary. Make sure to define them for the unfamiliar readers. Don’t overuse footnotes. Use them only to provide additional information which, if included in the text, would be distracting and cause a loss of the train of thought. Use Tables, Charts and Other Graphics to Enhance Understanding Avoid long, “data-dump”paragraphs filled with numbers and percentages. Use tables, line graphs, bar charts, pie charts, and other visual displays of data, and summarize the main points in the text. In addition to increasing understanding, these displays provide visual relief from long narrative tracts. Be creative—but not too creative. Choose and design tables and charts carefully with the reader in mind. Make every visual display of data a self-contained item. It should have a meaningful title and headings for every column; a graph should have labels on each axis; a pie or bar chart should have labels for every element. Choose shades and colors carefully. Expect that consumers will reproduce the report in black and white and make copies of copies. Make sure that the reader can distinguish clearly among colors or shades among multiple bars and pie-chart segments. Consider using textured fillings (such as hatch marks or dots) rather than colors or shades. Provide “n’s” in all displays which involve data drawn from samples or populations. For example, the total number of cases or survey respondents should be under the title of a table (n = 100). If a table column includes types of responses from some, but not all, survey respondents to a specific question, say, 92 respondents, the column head should include the total number who responded to the question (n = 92). Refer to every visual display of data in the text. Present it after mentioning it in the text and as soon after as practical, without interrupting paragraphs. Number tables and figures separately, and number each consecutively in the body of the report. Consult the CDIE style guide for more detailed recommendations on tables and graphics. Punctuate the Text with Other Interesting Features Put representative quotations gleaned during data collection in text boxes. Maintain balance between negative and positive comments to reflect the content of the report. Identify the sources of all quotes. If confidentiality must be maintained, identify sources in general terms, such as “a clinic care giver” or “a key informant.” Provide little “stories” or cases that illustrate findings. For example, a brief anecdotal story in a text box about how a woman used a clinic’s services to ensure a healthy pregnancy can enliven, and humanize, the quantitative findings. Use photos and maps where appropriate. For example, a map of a district with all the rural clinics providing prenatal care and the concentrations of rural residents can effectively demonstrate adequate or inadequate access to care. Don’t overdo it. Strike a reader-friendly balance between the main content and illustrative material. In using illustrative material, select content that supports main points, not distracts from them. Finally… Remember that the reader’s need to understand, not the writer’s need to impress, is paramount. Be consistent with the chosen format and style throughout the report. Sources: “CDIE Publications Style Guide: Guidelines for Project Managers, Authors, & Editors,” compiled by Brian Furness and John Engels, December 2001 (http://kambing.ui.ac.id/bebas/v01/DEC-USAID/Other/publications-styleguide.pdf); USAID’s Graphics Standards Manual (http://www.usaid.gov/branding/USAID_Graphic_Standards_Manual.pdf); and the authors extensive experience with good and difficult-to-read evaluation reports. 10 For more information: TIPS publications are available online at [insert website]. Acknowledgements: Our thanks to those whose experience and insights helped shape this publication including Gerry Britan and Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was written by Larry Beyna of Management Systems International (MSI). Comments regarding this publication can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 11 NUMBER 18 1ST EDITION, 2010 PERFORMANCE MONITORING & EVALUATION TIPS CONDUCTING DATA QUALITY ASSESSMENTS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. THE PURPOSE OF THE DATA QUALITY ASSESSMENT Data quality assessments (DQAs) help managers to understand how confident they should be in the data used to manage a program and report on its success. USAID’s ADS notes that the purpose of the Data Quality Assessment is to: “…ensure that the USAID Mission/Office and Assistance Objective (AO) Team are aware of the strengths and weaknesses of the data, as determined by applying the five data quality standards …and are aware of the extent to which the data integrity can be trusted to influence management decisions.” (ADS 203.3.5.2) This purpose is important to keep in mind when considering how to do a data quality assessment. A data quality assessment is of little use unless front line managers comprehend key data quality issues and are able to improve the performance management system. THE DATA QUALITY STANDARDS Five key data quality standards are used to assess quality. These are: • Validity • Reliability • Precision • Integrity • Timeliness A more detailed discussion of each standard is included in TIPS 12: Data Quality Standards. WHAT IS REQUIRED? USAID POLICY While managers are required to understand data quality on an ongoing basis, a data quality assessment must also be conducted at least once every three years for those data reported to Washington. As a matter of good management, program managers may decide to conduct DQAs more frequently or for a broader range of data where potential issues emerge. The ADS does not prescribe a specific way to conduct a DQA. A variety of approaches can be used. Documentation may be as simple 1 as a memo to the files, or it could take the form of a formal report. The most appropriate approach will reflect a number of considerations, such as management need, the type of data collected, the data source, the importance of the data, or suspected data quality issues. The key is to document the findings, whether formal or informal. A DQA focuses on applying the data quality standards and examining the systems and approaches for collecting data to determine whether they are likely to produce high quality data over time. In other words, if the data quality standards are met and the data collection methodology is well designed, then it is likely that good quality data will result. This “systematic approach” is valuable because it assesses a broader set of issues that are likely to ensure data quality over time (as opposed to whether one specific number is accurate or not). For example, it is possible to report a number correctly, but that number may not be valid1 as the following example demonstrates. Example: A program works across a range of municipalities (both urban and rural). It is reported that local governments have increased revenues by 5%. These data may be correct. However, if only major urban areas have been included, these data are not valid. That is, they do not measure the intended result. VERIFICATION OF DATA Verification of data means that the reviewer follows a specific datum to its source, confirming that it has supporting documentation and is accurate—as is often done in audits. The DQA may not necessarily verify that all individual numbers reported are accurate. The ADS notes that when assessing data from partners, the DQA should focus on “the apparent accuracy and consistency of the data.” As an example, Missions often report data on the number of individuals trained. Rather than verifying each number reported, the DQA might examine each project’s system for collecting and maintaining those data. If there is a good system in place, we know that it is highly likely that the data produced will be of high quality. “…data used for management purposes have different standards than data used for research. Having said this, it is certainly advisable to periodically verify actual data as part of the larger performance management system. Project managers may: Choose a few indicators to verify periodically throughout the course of the year. Occasionally spot check data (for example, when visiting the field). HOW GOOD DO DATA HAVE TO BE? Refer to TIPS 12: Data Quality Standards for a full discussion of all the data quality standards. 1 In development, there are rarely perfect data. Moreover, data used for management purposes have different standards than data used 2 for research. There is often a direct trade-off between cost and quality. Each manager is responsible for ensuring the highest quality data possible given the resources and the management context. In some cases, simpler, lower-cost approaches may be most appropriate. In other cases, where indicators measure progress in major areas of investment, higher data quality is expected. OPTIONS AND APPROACHES FOR CONDUCTING DQAS A data quality assessment is both a process for reviewing data to understand strengths and weaknesses as well as documentation. A DQA can be done in a variety of ways ranging from the more informal to the formal (see Figure 1). In our experience, a combination of informal, on-going and systematic assessments work best, in most cases, to ensure good data quality. INFORMAL OPTIONS Informal approaches can be ongoing or driven by specific issues as they emerge. These approaches depend more on the front line manager’s in-depth knowledge of the program. Findings are documented by the manager in memos or notes in the Performance Management Plan (PMP). Example: An implementer reports that civil society organizations (CSOs) have initiated 50 advocacy campaigns. This number seems unusually high. The project manager calls the Implementer to understand why the number is so high in FIGURE 1. OPTIONS FOR CONDUCTING DATA QUALITY ASSESSMENTS- THE CONTINUUM Informal Options • Conducted internally by the AO team • Ongoing (driven by emerging and specific issues) • More dependent on the AO team and individual manager’s expertise & knowledge of the program • Conducted by the program manager • Product: Documented in memos, notes in the PMP comparison to previously reported numbers and explores whether a consistent methodology for collecting the data has been used (i.e., whether the standard of reliability has been met). The project manager documents his or her findings in a memo and maintains that information in the files. Informal approaches should be incorporated into Mission systems as a normal part of performance management. The advantages and disadvantages of this approach are as follows: Advantages • Managers incorporate data quality as a part of on-going work processes. • Issues can be addressed and corrected quickly. • Managers establish a principle that data quality is important. Disadvantages • It is not systematic and may not be complete. That is, because informal assessments are normally driven by more Semi-Formal Partnership • Draws on both management expertise and M&E expertise • Periodic & systematic • Facilitated and coordinated by the M&E expert, but AO team members are active participants • Product: Data Quality Assessment Report immediate management concerns, the manager may miss larger issues that are not readily apparent (for example, whether the data are attributable to USAID programs). • There is no comprehensive document that addresses the DQA requirement. • Managers may not have enough expertise to identify more complicated data quality issues, audit vulnerabilities, and formulate solutions. SEMI-FORMAL / PARTNERSHIP OPTIONS Semi-formal or partnership options are characterized by a more periodic and systematic review of data quality. These DQAs should ideally be led and conducted by USAID staff. One approach is to partner a monitoring and evaluation (M&E) expert with the Mission’s AO team to conduct the assessment jointly. The M&E expert can organize the process, develop standard approaches, facilitate sessions, assist in identifying potential data quality issues and solutions, and may 3 Formal Options • Driven by broader programmatic needs, as warranted • More dependent on external technical expertise and/or specific types of data expertise • Product: Either a Data Quality Assessment report or addressed as a part of another report document the outcomes of the assessment. This option draws on the experience of AO team members as well as the broader knowledge and skills of the M&E expert. Engaging front line mangers in the DQA process has the additional advantage of making them more aware of the strengths and weaknesses of the data—one of the stated purposes of the DQA. The advantages and disadvantages of this approach are summarized below: Advantages • Produces a systematic and comprehensive report with specific recommendations for improvement. • Engages AO team members in the data quality assessment. • Draws on the complementary skills of front line managers and M&E experts. • Assessing data quality is a matter of understanding trade-offs and context in terms of deciding what data is “good enough” for a program. An M&E expert can be useful in guiding AO team members through this process in order to ensure that audit vulnerabilities are adequately addressed. • Does not require external team. a large These types of data quality assessments require a high degree of rigor and specific, in-depth technical expertise. Advantages and disadvantages are as follow: Disadvantages Advantages • The Mission may use an internal M&E expert or hire someone from the outside. However, hiring an outside expert will require additional resources, and external contracting requires some time. • Produces a systematic and comprehensive assessment, with specific recommendations. • Examines data quality issues with rigor and based on specific, indepth technical expertise. • Because of the additional time and planning required, this approach is less useful for addressing immediate problems. • Fulfills two important purposes, in that it can be designed to improve data collection systems both within USAID and for the beneficiary. FORMAL OPTIONS Disadvantages At the other end of the continuum, there may be a few select situations where Missions need a more rigorous and formal data quality assessment. • Often conducted by an external team of experts, entailing more time and cost than other options. Example: A Mission invests substantial funding into a highprofile program that is designed to increase the efficiency of water use. Critical performance data comes from the Ministry of Water, and is used both for performance management and reporting to key stakeholders, including the Congress. The Mission is unsure as to the quality of those data. Given the high level interest and level of resources invested in the program, a data quality assessment is conducted by a team including technical experts to review data and identify specific recommendations for improvement. Recommendations will be incorporated into the technical assistance provided to the Ministry to improve their own capacity to track these data over time. • Generally involvement managers. less direct by front line • Often examines data through a very technical lens. It is important to ensure that broader management issues are adequately addressed. THE PROCESS The Mission will also have to determine whether outside assistance is required. Some Missions have internal M&E staff with the appropriate skills to facilitate this process. Other Missions may wish to hire an outside M&E expert(s) with experience in conducting DQAs. AO team members should also be part of the team. DATA SOURCES Primary Data: Collected directly by USAID. Secondary Data: Collected from and other sources, such as implementing partners, host country governments, other donors, etc. STEP 2. DEVELOP AN OVERALL APPROACH AND SCHEDULE The team leader must convey the objectives, process, and schedule for conducting the DQA to team members. This option is premised on the idea that the M&E expert(s) work closely in partnership with AO team members and implementing partners to jointly assess data quality. This requires active participation and encourages managers to fully explore and understand the strengths and weaknesses of the data. For purposes of this TIPS, we will outline a set of illustrative steps for the middle (or semi-formal/ partnership) option. In reality, these steps are often iterative. STEP 3. IDENTIFY THE INDICATORS TO BE INCLUDED IN THE REVIEW STEP 1. IDENTIFY THE DQA TEAM It is helpful to compile a list of all indicators that will be included in the DQA. This normally includes: Identify one person to lead the DQA process for the Mission. This person is often the Program Officer or an M&E expert. The leader is responsible for setting up the overall process and coordinating with the AO teams. 4 • All indicators reported to USAID/Washington (required). • Any indicators with suspected data quality issues. • Indicators for program areas that are of high importance. This list can also function as a central guide as to how each indicator is assessed and to summarize where follow-on action is needed. STEP 4. CATEGORIZE INDICATORS With the introduction of standard indicators, the number of indicators that Missions report to USAID/Washington has increased substantially. This means that it is important to develop practical and streamlined approaches for conducting DQAs. One way to do this is to separate indicators into two categories, as follows: Outcome Level Indicators Outcome level indicators measure AOs or Intermediate Results (IRs). Figure 2 provides examples of indicators at each level. The standards for good data quality are applied to results level data in order to assess data quality. The data quality assessment worksheet (see Table 1) has been developed as a tool to assess each indicator against each of these standards. Output Indicators Many of the data quality standards are not applicable to output indicators in the same way as outcome level indicators. For example, the number of individuals trained by a project is an output indicator. Whether data are valid, timely, or precise is almost never an issue for this type of an indicator. However, it is important to ensure that there are good data collection and data maintenance systems in place. Hence, a simpler and more streamlined approach can be used to focus on the most relevant issues. Table 2 outlines a sample matrix for assessing output indicators. This matrix: • Identifies the indicator. • Clearly outlines collection method. the data • Identifies key data quality issues. • Notes whether further action is necessary. • Provides specific information on who was consulted and when. STEP 5. HOLD WORKING SESSIONS TO REVIEW INDICATORS Hold working sessions with AO team members. Implementing partners may be included at this 5 point as well. In order to use time efficiently, the team may decide to focus these sessions on resultslevel indicators. These working sessions can be used to: • Explain the purpose and process for conducting the DQA. • Review data quality standards for each results-level indicator, including the data collection systems and processes. • Identify issues or concerns that require further review. STEP 6. HOLD SESSIONS WITH IMPLEMENTING PARTNERS TO REVIEW INDICATORS If the implementing partner was included in the previous working session, results-level indicators will already have been discussed. This session may then focus on reviewing the remaining outputlevel indicators with implementers who often maintain the systems to collect the data for these types of indicators. Focus on reviewing the systems and processes to collect and maintain data. This session provides a good opportunity to identify solutions or recommenddations for improvement. STEP 7. PREPARE THE DQA DOCUMENT As information is gathered, the team should record findings on the worksheets provided. It is particularly important to include recommendations for action at the conclusion of each worksheet. Once this is completed, it is often useful to include an introduction to: • Outline the overall approach and methodology used. • Highlight key data quality issues that are important for senior management. • Summarize recommendations for improving performance management systems. AO team members and participating implementers should have an opportunity to review the first draft. Any comments or issues can then be incorporated and the DQA finalized. STEP 8. FOLLOW UP ON ACTIONS Finally, it is important to ensure that there is a process to follow-up on recommendations. Some recommendations may be addressed internally by the team handling management needs or audit vulnerabilities. For example, the AO team may need to work with a Ministry to ensure that data can be disaggregated in a way that correlates precisely to the target group. Other issues may need to be addressed during the Mission’s portfolio reviews. CONSIDER THE SOURCE – PRIMARY VS. SECONDARY DATA PRIMARY DATA USAID is able to exercise a higher degree of control over primary data that it collects itself than over secondary data collected by others. As a result, specific standards should be incorporated into the data collection process. Primary data collection requires that: • Written procedures are in place for data collection. • Data are collected from year to year using a consistent collection process. • Data are collected using methods to address and minimize sampling and nonsampling errors. • Data are collected by qualified personnel and these personnel are properly supervised. • Duplicate data are detected. • Safeguards are in place to prevent unauthorized changes to the data. • Source documents are maintained and readily available. • If the data collection process is contracted out, these requirements should be incorporated directly into the statement of work. SECONDARY DATA Secondary data are collected from other sources, such as host country governments, implementing partners, or from other organizations. The range of control that USAID has over secondary data varies. For example, if USAID uses data from a survey commissioned by another donor, then there is little control over the data collection methodology. On the other hand, USAID does have more influence over data derived from implementing partners. In some cases, specific data quality requirements may be included in the contract. In addition, project performance management plans 6 (PMPs) are often reviewed or approved by USAID. Some ways in which to address data quality are summarized below. Data from Implementing Partners • Spot check data. • Incorporate specific data quality requirements as part of the SOW, RFP, or RFA. • Review data quality collection and maintenance procedures. Data from Other Secondary Sources Data from other secondary sources includes data from host countries, government, and other donors. • Understand the methodology. Documentation often includes a description of the methodology used to collect data. It is important to understand this section so that limitations (and what the data can and cannot say) are clearly understood by decision makers. • Request a briefing on the methodology, including data collection and analysis procedures, potential limitations of the data, and plans for improvement (if possible). • If data are derived from host country organizations, then it may be appropriate to discuss how assistance can be provided to strengthen the quality of the data. For example, projects may include technical assistance to improve management and/or M&E systems. TABLE 1. THE DQA WORKSHEET FOR OUTCOME LEVEL INDICATORS Directions: Use the following worksheet to complete an assessment of data for outcome level indicators against the five data quality standards outlined in the ADS. A comprehensive discussion of each criterion is included in TIPS 12 Data Quality Standards. Data Quality Assessment Worksheet Assistance Objective (AO) or Intermediate Result (IR): Indicator: Reviewer(s): Date Reviewed: Data Source: Is the Indicator Reported to USAID/W? Criterion Definition 1. Validity Do the data clearly and adequately represent the intended result? Some issues to consider are: Face Validity. Would an outsider or an expert in the field agree that the indicator is a valid and logical measure for the stated result? Attribution. Does the indicator measure the contribution of the project? Measurement Error. Are there any measurement errors that could affect the data? Both sampling and non-sampling error should be reviewed. 2. Integrity Do the data collected, analyzed and reported have established mechanisms in place to reduce manipulation or simple errors in transcription? 3. Precision Are data sufficiently precise to present a fair picture of performance and enable management decision-making at the appropriate levels? 4. Reliability Do data reflect stable and consistent data collection processes and analysis methods over time? 5. Timeliness Are data timely enough to influence management decision-making (i.e., in terms of frequency and currency)? Yes or No Explanation Note: This criterion requires the reviewer to understand what mechanisms are in place to reduce the possibility of manipulation or transcription error. Note: This criterion requires the reviewer to ensure that the indicator definition is operationally precise (i.e. it clearly defines the exact data to be collected) and to verify that the data is, in fact, collected according to that standard definition consistently over time. A Summary of Key Issues and Recommendations: 7 Table 2. SAMPLE DQA FOR OUTPUT INDICATORS: THE MATRIX APPROACH Document Source Data Source Data Collection Method/ Key Data Quality Issue Further Action Additional Comments/ Notes AO or IR Indicators 1. Number of investment measures made consistent with international investment agreements as a result of USG assistance Quarterly Report Project A A consultant works directly with the committee in charge of simplifying procedures and updates the number of measures regularly on the website (www.mdspdres.com). The implementer has stated that data submitted includes projections for the upcoming fiscal year rather than actual results. Yes. Ensure that only actual results within specified timeframes are used for reporting. Meeting with COTR 6/20/10 and 7/6/10. 2. Number of public and private sector standards-setting bodies that have adopted internationally accepted guidelines for standards setting as a result of USG assistance Semi-Annual Report Project A No issues. Project works only with one body (the Industrial Standards-Setting Service) and maintains supporting documentation. No. Meeting with COTR and COP on 6/20/10. 3. Number of legal, regulatory, or institutional actions taken to improve implementation or compliance with international trade and investment agreements due to support from USGassisted organizations Quarterly Report Project A Project has reported “number of Regional Investment Centers”. This is not the same as counting “actions”, so this must be corrected. Yes. Ensure that the correct definition is applied. Meeting with COTR, COP, and Finance Manager and M&E specialist on 6/20/10. The indicator was clarified and the data collection process will be adjusted accordingly. 4. Number of Trade and Investment Environment diagnostics conducted Quarterly Report Projects A and B No issues. A study on the investment promotion policy was carried out by the project. When the report is presented and validated the project considers it “conducted”. No. Meeting with CTO and COPs on 6/25/10. 8 For more information: TIPS publications are available online at [insert website]. Acknowledgements: Our thanks to those whose experience and insights helped shape this publication including Gerry Britan and Subhi Mehdi of USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was written by Michelle Adams-Matson, of Management Systems International. Comments can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 9 NUMBER 19 1ST EDITION, 2010 DRAFT PERFORMANCE MONITORING & EVALUATION TIPS RIGOROUS IMPACT EVALUATION ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance monitoring and evaluation. This publication is a supplemental reference to the Automated Directive System (ADS) Chapter 203. WHAT IS RIGOROUS IMPACT EVALUATION? Rigorous impact evaluations are useful for determining the effects of USAID programs on outcomes. This type of evaluation allows managers to test development hypotheses by comparing changes in one or more specific outcomes to changes that occur in the absence of the program. Evaluators term this the counterfactual. Rigorous impact evaluations typically use comparison groups, composed of individuals or communities that do not participate in the program. The comparison group FIGURE 1. DEFINITIONS OF IMPACT EVALUATION • An evaluation that looks at the impact of an intervention on final welfare outcomes, rather than only at project outputs, or a process evaluation which focuses on implementation. • An evaluation carried out some time (five to ten years) after the intervention has been completed, to allow time for impact to appear. • An evaluation considering all interventions within a given sector or geographical area. • An evaluation concerned with establishing the counterfactual, i.e., the difference the project made (how indicators behaved with the project compared to how they would have been without it). is examined in relation to the treatment group to determine the effects of the USAID program or project. Impact evaluations may be defined in a number of ways (see Figure 1). For purposes of this TIPS, rigorous impact evaluation 1 is defined by the evaluation design (quasi-experimental and experimental) rather than the topic being evaluated. These methods can be used to attribute change at any program or project outcome level, including Intermediate Results (IR), sub-IRs, and Assistance Objectives (AO). Decisions about whether a rigorous impact evaluation would be appropriate and what type of rigorous impact evaluation to conduct are best made during the program or project design phase, since many types of rigorous impact evaluation can only be utilized if comparison groups are established and baseline data is collected before a program or project intervention begins. WHY ARE RIGOROUS IMPACT EVALUATIONS IMPORTANT? A rigorous impact evaluation enables managers to determine the extent to which a USAID program or project actually caused observed changes. A Performance Management Plan (PMP) should contain all of the tools necessary to track key objectives (see also TIPS 7 Preparing a Performance Management Plan). However, comparing data from performance indicators against baseline values demonstrates only whether change has occurred, with very little information about what actually caused the observed change. USAID program managers can only say that the program is correlated with changes in outcome, but cannot confidently attribute that change to the program. FIGURE 2. A WORD ABOUT WORDS Many of the terms used in rigorous evaluations hint at the origin of these methods: medical and laboratory experimental research. The activities of a program or project are often called the intervention or the independent variable, and the outcome variables of interest are known as dependent variables. The target population is the group of all individuals (if the unit of analysis or unit is the individual) who share certain characteristics sought by the program, whether or not those individuals actually participate in the program. Those from the target population who actually participate are known as the treatment group, and the group used to measure what would have happened to the treatment group had they not participated in the program (the counterfactual) is known as a control group if they are selected randomly, as in an experimental evaluation, or, more generally, as a comparison group if they are selected by other means, as in a quasiexperimental evaluation. There are normally a number of factors, outside of the program, that might influence an outcome. These are called confounding factors. Examples of confounding factors include programs run by other donors, natural events (e.g., rainfall, drought, earthquake, etc.), government policy changes, or even maturation (the natural changes that happen in an individual or community over time). Because of the potential contribution of these confounding factors, the program manager cannot claim with full certainty that the program caused the observed changes or results. In some cases, the intervention causes all observed change. That is, the group receiving USAID assistance will have improved significantly while a similar, nonparticipating group will have stayed roughly the same. In other situations, the target group may have already been improving and the program helped to accelerate that positive change. Rigorous evaluations are 2 designed to identify the effects of the program of interest even in these cases, where both the target group and nonparticipating groups may have both changed, only at different rates. By identifying the effects caused by a program, rigorous evaluations help USAID, implementing partners and key stakeholders learn which program or approaches are most effective, which is critical for effective development programming. WHEN SHOULD THESE METHODS BE USED? Rigorous impact evaluations can yield very strong evidence of program effects. Nevertheless, this method is not appropriate for all situations. Rigorous impact evaluations often involve extra costs for data collection and always require careful planning during program implementation. To determine whether a rigorous impact evaluation is appropriate, potential cost should be weighed against the need for and usefulness of the information. Rigorous impact evaluations answer evaluation questions concerning the causal effects of a program. However, other evaluation designs may be more appropriate for answering other types of evaluation questions. For example, the analysis of ‘why’ and ‘how’ observed changes, particularly unintended changes, were produced may be more effectively answered using other evaluation methods, including participatory evaluations or rapid appraisals. Similarly, there are situations when rigorous evaluations, which often use comparison groups, will not be advisable, or even possible. For example, assistance focusing on political parties can be difficult to evaluate using rigorous methods, as this type of assistance is typically offered to all parties, making the identification of a comparison group difficult or impossible. Other methods may be more appropriate and yield conclusions with sufficient credibility for programmatic decision-making. rigorous impact While evaluations are sometimes used to examine the effects of only one program or project approach, rigorous impact evaluations are also extremely useful for answering questions about the effectiveness of alternative approaches for achieving a given result, e.g., which of several approaches for improving farm productivity, or for delivering legal services, are most effective. Missions should consider using rigorous evaluations strategically to answer specific questions about the effectiveness of key approaches. When multiple rigorous evaluations are carried out across Missions on a similar topic or approach, the results can be used to identify approaches that can be generalized to other settings, leading to significant advances in programmatic knowledge. Rigorous methods are often useful when: Multiple approaches to achieving desired results have been suggested, and it is unclear which approach is the most effective or efficient; An approach is likely to be replicated if successful, and clear evidence of program effects are desired before scaling up; A program uses a large amount of resources or affects a large number of people; and In general, little is known about the effects of an important program or approach, as is often the case with new or innovative approaches. PLANNING Rigorous methods require strong performance management systems to be built around a clear, logical results framework (see TIPS 13 Building a Results Framework). The development hypothesis should clearly define the logic of the program, with 3 particular emphasis on the intervention (independent variable) and the principal anticipated results (dependent variables), and provides the basis for the questions that will be addressed by the rigorous evaluation. Rigorous evaluation builds upon the indicators defined for each level of result, from inputs to outcomes, and requires high data quality. Because quasiexperimental and experimental designs typically answer very specific evaluation questions and are generally analyzed using quantitative methods, they can be paired with other evaluation tools and methods to provide context, triangulate evaluation conclusions, and examine how and why effects were produced (or not) by a program. This is termed mixed method evaluation (see TIPS 16, Mixed Method Evaluations). Unlike most evaluations conducted by USAID, rigorous impact evaluations are usually only possible, and are always most effective, when planned before project implementation begins. Evaluators need time prior to implementation to identify appropriate indicators, identify a comparison group, and set baseline values. If rigorous evaluations are not planned prior to implementation, the number of potential evaluation design options is reduced, often leaving alternatives that are either more complicated or less rigorous. As a result, Missions should consider the feasibility of and need for a Observed Change Outcome of Interest FIGURE 3. CONFOUNDING EFFECTS Program Effect Confounding Effect Baseline Follow-up = Target Group = Comparison Group rigorous evaluation prior to and during project design. DESIGN Although there are many variations, rigorous evaluations are divided into two categories: quasi-experimental and experimental. Both categories of rigorous evaluations rely on the same basic concept - using the counterfactual to estimate the changes caused by the program. The counterfactual answers the question, “What would have happened to program participants if they had not participated in the program?” The comparison of the counterfactual to the observed change in the group receiving USAID assistance is the true measurement of a program’s effects. While before and after measurements of a single group using a baseline allow the measurement of a single group both with and without program participation, this design does not control for all the other confounding factors that might influence the participating group during program implementation. Well constructed, comparison groups provide a clear picture of the effects of program or project interventions on the target group by differentiating program/project effects from the effects of multiple other factors in the environment that affect both the target and comparison groups. This means that in situations where economic or other factors affecting both groups make everyone better off, it will still be possible to see the additional or incremental improvement caused by the program or project, as Figure 3 illustrates. QUASI-EXPERIMENTAL EVALUATIONS To estimate program effects, quasi-experimental designs rely on measurements of a nonrandomly selected comparison group. The most common means for selecting a comparison group is matching, wherein the 4 evaluator ‘hand-picks’ a group of similar units based on observable characteristics that are thought to influence the outcome. For example, the evaluation of an agriculture program aimed at increasing crop yield might seek to compare participating communities against other communities with similar weather patterns, soil types, and traditional crops, as communities sharing these critical characteristics would be most likely to behave similarly to the treatment group in the absence of the program. However, program participants are often selected based on certain characteristics, whether it is level of need, motivation, location, social or political factors, or some other factor. While evaluators can often identify and match many of these variables, it is impossible to match all factors that might create differences between the treatment and comparison groups, particularly characteristics that are more difficult to measure or are unobservable, such as motivation or social cohesion. For example, if a program is targeted at WHAT IS EXPERIMENTAL AND QUASI-EXPERIMENTAL EVALUATION? Experimental design is based on a the selection of the comparison and treatment group through random sampling. Quasi-experimental design is based on a comparison group that is chosen by the evaluator (that is, not based on random sampling). FIGURE 4. QUASI-EXPERIMENTAL EVALUATION OF THE KENYA NATIONAL CIVIC EDUCATION PROGRAM PHASE II (NCEP II) NCEP II, funded by USAID in collaboration with other donors, reached an estimated 10 million individuals through workshops, drama events, cultural gatherings and mass media campaigns aimed at changing individuals’ awareness, competence and engagement in issues related to democracy, human rights, governance, constitutionalism, and nation-building. To determine the program’s impacts on these outcomes of interest, NCEP as evaluated using a quasi-experimental design with a matched comparison group. Evaluators matched participants to a comparison group of non-participating individuals who shared geographic and demographic characteristics (such as age, gender, education, and involvement with CSOs). This comparison group was compared to the treatment group along the outcomes of interest to identify program effects. The evaluators found that the program had significant long term effects, particularly on ‘civic competence and involvement’ and ‘identity and ethnic group relations, but had only negligible impact on ‘Democratic Values, Rights, and Responsibilities’. The design also allowed the evaluators to assess the conditions under which the program was most successful. They found confirmation of prior assertions of the critical role in creating lasting impact of multiple exposures to civic education programs through multiple participatory methods. - ‘The Impact of the Second National Kenya Civic Education Programme (NECP II-URAIA) on Democratic Attitudes, Values, and Behavior’, Steven E. Finkel and Jeremy Horowitz, MSI communities that are likely succeed, then the target group might be expected to improve relative to a comparison group that was not chosen based on the same factors. Failing to account for this in the selection of the comparison group would lead to a biased estimate of program impact. Selection bias is the difference between the comparison group and the treatment group caused by the inability to completely match on all characteristics, and the uncertainty or error this generates in the measurement of program effects. Other common quasiexperimental designs, in addition to matching, are described below. Non-Equivalent Group Design. This is the most common quasiexperimental design in which a comparison group is hand-picked to match the treatment group as closely as possible. Since handpicking the comparison group cannot completely match all characteristics with the treatment group, the groups are considered to be ‘non-equivalent’. significantly different except in terms of eligibility for the program. Because of this, the group just above the cut-off serves as a comparison group for those just below (or vice versa) in a regression discontinuity design. Regression Discontinuity. Programs often have eligibility criteria based on a cut-off score or value of a targeting variable. Examples include programs accepting only households with income below 2,000 USD, organizations registered for at least two years, or applicants scoring above a 65 on a pre-test. In each of these cases, it is likely that individuals or organizations just above and just below the cut-off value would demonstrate only marginal or incremental differences in the absence of USAID assistance, as families earning 2,001 USD compared to 1,999 USD are unlikely to be Propensity Score Matching. This method is based on the same rationale as regular matching: a comparison group is selected based on shared observable characteristics with the treatment group. However, rather than ‘hand-picking’ matches based on a small number of variables, propensity score matching uses a statistical process to combine information from all data collected on the target population to create the most accurate matches possible based on observable characteristics. 5 Interrupted Time Series.1 Some programs will encounter situations where a comparison group is not possible, often because the intervention affects everyone at once, as is typically the case with policy change. In these cases, data on the outcome of interest are recorded at numerous intervals before and after the program or activity take places. The data form a timeseries or trend, which the evaluator analyzes for significant changes around the time of the intervention. Large spikes or drops immediately after the intervention signal changes caused by the program. This method is slightly different from the other rigorous methods as it does not use a comparison group to rule out potentially confounding factors, leading to increased uncertainty in evaluation conclusions. Interrupted time series are most effective when data are collected regularly both before and after the intervention, leading to a long time series, and alternative causes are monitored. EXPERIMENTAL EVALUATION In an experimental evaluation, the treatment and comparison groups are selected from the target population by a random process. For example, from a target population of 50 communities that meet the 1 Interrupted time series is normally viewed as a type of impact evaluation. It is typically considered quasiexperiemental although it does not use a comparison group. eligibility (or targeting) criteria of a program, the evaluator uses a coin flip, lottery, computer program, or some other random process to determine the 25 communities that will participate in the program (treatment group) and the 25 communities that will not (control group, as the comparison group is called when it is selected randomly). Because they use random selection processes, experimental evaluations are often called randomized evaluations or randomized controlled trials (RCTs). Random selection from a target population into treatment and control groups is the most effective tool for eliminating selection bias because it removes the possibility of any individual characteristic influencing selection. Because units are not assigned to treatment or control groups based on specific characteristics, but rather are divided randomly, all characteristics that might lead to selection bias, such as motivation, poverty level, or proximity, will be roughly equally divided between the treatment and control groups. If an evaluator uses random assignment to determine treatment and control groups, she might, by chance, get two or three very motivated communities in a row assigned to the treatment group, but if the program is working in more than a handful of communities, the number of motivated communities will likely balance 6 out between treatment control in the end. and Because random selection completely eliminates selection bias, experimental evaluations are often easier to analyze and provide more credible evidence than quasi experimental designs. Random assignment can be done with any type of unit, whether the unit is the individual, groups of individuals (e.g., communities or districts), organizations, or facilities (e.g., health center or school) and usually follows one of the designs discussed below. Simple Random Assignment. When the number of program participants has been decided and additional eligible individuals are identified, simple random assignment through a coin flip or lottery can be used to select the treatment group and control groups. Programs often encounter ‘excess demand’ naturally (for example in training programs, participation in study tours, or where resources limit the number of partner organizations), and simple random assignment can be an easy and fair way to determine participation while maximizing the potential for credible evaluation conclusions. Phased-In Selection. In some programs, the delivery of the intervention does not begin everywhere at the same time. For capacity or logistical reasons, some units receive the program intervention earlier than others. This type of schedule creates a natural opportunity for using an FIGURE 5. EXPERIMENTAL EVALUATION OF THE IMPACTS OF EXPANDING CREDIT ACCESS IN SOUTH AFRICA While commercial loans are a central component of most microfinance strategies, there is much less consensus on whether consumer loans are also for economic development. Microfinance in the form loans for household consumption or investment has been criticized as unproductive, usurious, and a contributor to debt cycles or traps. In an evaluation partially funded by USAID, researchers used an experimental evaluation designed to test the impacts of access to consumer loans on household consumption, investment, education, health, wealth, and well-being. From a group of 787 applicants who were just below the credit score needed for loan acceptance, the researchers randomly selected 325 (treatment group) that would be approved for a loan. The treatment group was surveyed, along with the remaining 462 who were randomly denied (control group), eight months after their loan application to estimate the effects of receiving access to consumer credit. The evaluators found that the treatment group was more likely to retain wage employment, less likely to experience severe hunger in their households, and less likely to be impoverished than the control group providing strong evidence of the benefits of expanding access to consumer loans. -‘Expanding Credit Access: Estimating the Impacts’, Dean Karlan and Jonathan Zinman, http://www.povertyactionlab.org/projects/print.php?pid=62 experimental design. Consider a project where the delivery of a radio-based civic education program was scheduled to operate in 100 communities during year one, another 100 during year two, and a final 100 during year three. The year of participation can be randomly assigned. Communities selected to participate in year one would be designated as the first treatment group (T1). For that year, all the other communities that would participate in Years Two and Three form the initial control group. In the second year, the next 100 communities would become the second treatment group (T2), while the final 100 communities would continue to serve as the control group. Random assignment to the year of participation ensures that all communities will participate in the program but also maximizes evaluation rigor by reducing selection bias, which could be significant if only the most motivated communities participate in Year One. Blocked (or Stratified) Assignment. When it is known in advance that the units to which a program intervention could be delivered differ in one or more ways that might influence the program outcome, (e.g., age, size of the community in which they are located, ethnicity, etc.), evaluators may wish to take extra steps to ensure that such conditions are evenly distributed between an evaluation’s treatment and control groups. In a simple block (stratified) design, an evaluation might separate men and women, and then use randomized assignment within each block to construct the evaluation’s treatment and control groups, thus ensuring a specified number or percentage 7 of men and women in each group. Multiple Treatments. It is possible that multiple approaches will be proposed or implemented for the achievement of a given result. If a program is interested in testing the relative effectiveness of three different strategies or approaches, eligible units can be randomly divided into three groups. Each group participates in one approach, and the results can be compared to determine which approach is most effective. Variations on this design can include additional groups to test combined or holistic approaches and a control group to test the overall effectiveness of each approach. COMMON QUESTIONS AND CHALLENGES While rigorous evaluations require significant attention to detail in advance, they need not be impossibly complex. Many of the most common questions and challenges can be anticipated and minimized. COST Rigorous evaluations will almost always cost more than standard evaluations that do not require comparison groups. However, the additional cost can sometimes be quite low depending on the type and availability of data to be collected. Moreover, findings from rigorous evaluations may lead to future cost-savings, through improved programming and more efficient use of resources over the longer term. Nevertheless, program managers must anticipate these additional costs, including the additional planning requirements, in terms of staffing and budget needs. ETHICS The use of comparison groups is sometimes criticized for denying treatment to potential beneficiaries. However, every program has finite resources and must select a limited number of program participants. Random selection of program participants is often viewed, even by those beneficiaries who are not selected, as being the fairest and most transparent method for determining participation. A second, more powerful, ethical question emerges when a program seeks to target participants that are thought to be most in need of the program. In some cases, rigorous evaluations require a relaxing of targeting requirements (as discussed in Figure 6) in order to identify enough similar units to constitute a comparison group, meaning that perhaps some of those identified as the ‘neediest’ might be assigned to the comparison group. However, it is often the case that the criteria used to target groups do not provide a degree of precision required to confidently rankorder potential participants. Moreover, rigorous evaluations can help identify which groups benefit most, thereby improving targeting for future programs. SPILLOVER Programs are often designed to incorporate ‘multiplier effects’ whereby program effects in one community naturally spread to others nearby. While these effects help to broaden the impact of a program, they can result in bias in conclusions when the effects on the treatment group spillover to the comparison group. When comparison groups also benefit from a program, then they no longer measure only the confounding effects, but also a portion of the program effect. This leads to underestimation of program impact since they 8 FIGURE 6. TARGETING IN RIGOROUS EVALUATIONS Programs often have specific eligibility requirements without which a potential participant could not feasibly participate. Other programs target certain groups because of perceived need or likelihood of success. Targeting is still possible with rigorous evaluations, whether experimental or quasi-experimental, but must be approached in a slightly different manner. If a program intends to work in 25 communities, rather than defining one group of 25 communities that meet the criteria and participate in the program, it might be necessary to identify a group of 50 communities that meet the eligibility or targeting criteria and will be split into the treatment and comparison group. This reduces the potential for selection bias while still permitting the program to target certain groups. In situations where no additional communities meet the eligibility criteria and the criteria cannot be relaxed, phase-in or multiple treatment approaches, as discussed below, might be appropriate. appear better off than they would have been in the absence of the program. In some cases, spillovers can be mapped and measured but, most often, they must be controlled in advance by selecting treatment and control groups or units that are unlikely to significantly interact with one another. A special case of spillover occurs in substitution bias wherein governments or other donors target only the comparison group to fill in gaps of service. This is best avoided by ensuring coordination between the program and development actors. other SAMPLE SIZE During the analysis phase, rigorous evaluations typically use statistical tests to determine whether any observed differences between treatment and comparison groups represent actual differences (that would then, in a well designed evaluation, be attributed to the program) or whether the difference could have occurred due to chance alone. The ability to make this distinction depends principally on the size of the change and the total number of units in the treatment and comparison groups, or sample size. The more units, or higher the sample size, the easier it is to attribute change to the program rather than to random variations. During the design phase, rigorous impact evaluations typically calculate the number of units (or sample size) required to confidently identify changes of the size anticipated by the program. An adequate sample size helps prevent declaring a successful project ineffectual (false negative) or declaring an ineffectual project successful (false positive). Although sample 9 size calculations should be done before each program, as a rule of thumb, rigorous impact evaluations are rarely undertaken with less than 50 units of analysis. RESOURCES This TIPS is intended to provide an introduction to rigorous impact evaluations. Additional resources are provided on the next page for further reference. Further Reference Initiatives and Case Studies: - - - - Office of Management and Budget (OMB): o http://www.whitehouse.gov/OMB/part/2004_program_eval.pdf o http://www.whitehouse.gov/omb/assets/memoranda_2010/m10-01.pdf U.S. Government Accountability Office (GAO): o http://www.gao.gov/new.items/d1030.pdf USAID: o Evaluating Democracy and Governance Effectiveness (EDGE): http://www.usaid.gov/our_work/democracy_and_governance/technical_areas/dg_office/eval uation.html o Measure Evaluation: http://www.cpc.unc.edu/measure/approaches/evaluation/evaluation.html o The Private Sector Development (PSD) Impact Evaluation Initiative: www.microlinks.org/psdimpact Millennium Challenge Corporation (MCC) Impact Evaluations: http://www.mcc.gov/mcc/panda/activities/impactevaluation/index.shtml World Bank: o The Spanish Trust Fund for Impact Evaluation: http://web.worldbank.org/WBSITE/EXTERNAL/EXTABOUTUS/ORGANIZATION/EXTHDNETW ORK/EXTHDOFFICE/0,,contentMDK:22383030~menuPK:6508083~pagePK:64168445~piPK:6 4168309~theSitePK:5485727,00.html o The Network of Networks on Impact Evaluation: http://www.worldbank.org/ieg/nonie/ o The Development Impact Evaluation Initiative: http://web.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTDEVIMPEVAINI/0,,menuPK:39982 81~pagePK:64168427~piPK:64168435~theSitePK:3998212,00.html Others: o Center for Global Development’s ‘Evaluation Gap Working Group’: http://www.cgdev.org/section/initiatives/_active/evalgap o International Initiative for Impact Evaluation: http://www.3ieimpact.org/ Additional Information: - - Sample Size and Power Calculations: o http://www.statsoft.com/textbook/stpowan.html o http://www.mdrc.org/publications/437/full.pdf World Bank: ‘Evaluating the Impact of Development Projects on Poverty: A Handbook for Practitioners’: o http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/EXTPOVERTY/EXTISPMA/0,,contentM DK:20194198~pagePK:148956~piPK:216618~theSitePK:384329,00.html Poverty Action Lab’s ‘Evaluating Social Programs’ Course: http://www.povertyactionlab.org/course/ 10 For more information: TIPS publications are available online at [insert website] Acknowledgements: Our thanks to those whose experience and insights helped shape this publication including USAID’s Office of Management Policy, Budget and Performance (MPBP). This publication was written by Michael Duthie of Management Systems International. Comments regarding this publication can be directed to: Gerald Britan, Ph.D. Tel: (202) 712-1158 [email protected] Contracted under RAN-M-00-04-00049-A-FY0S-84 Integrated Managing for Results II 11
© Copyright 2024