 
        Evaluation: Testing, Objective-to-Test-Item Matching and Judgments of Worth EDTEC 540 James Marshall Session Overview   Evaluation Approaches Testing – one possible data point in evaluation     Norm-referenced Criterion-referenced Objective-to-test-item matching Measurement error, reliability and validity Evaluation, typically      Typically, it doesn’t happen! That said, it should And it is required for many funded projects What happened? Were goals and objectives achieved? How can we find that out? At the end is NOT the only time to measure worth. When else? Strategies: tests, observations, surveys, chats with managers, look at work, results Evaluation Approaches Objectivist     Belief in a reality that can be known and measured. Prevalent in education and our business. Objectives-based, deceptively simple. Establish goals-->set objectives--> tailor instruction to obj-->judge effectiveness. Measures are analytical/quantitative in nature. Examples     Do first-graders know the letters of the alphabet? Can the new account representative describe the features of each checking account – as defined by the bank? Others? Advantages/disadvantages? Evaluation Approaches Constructivist   Belief that people construct their own realities. Advocates believe that truth is a matter of consensus, not measurement against an objective reality. Evaluation creates detailed descriptions of that which is inside the head of the learner.     Measures are qualitative in nature. Examples     Reliance upon open-ended exercises, observation, cases and immersion in the field. Observation is useful for us, in that IDs build prototypes, conduct formative evaluations, revise and cycle again. Role play exercise to deal with a hostile customer Theme Park Tycoon – running a theme park for a year Essay question asking you to describe your understanding of Educational Technology Advantages/disadvantages? Evaluation Approaches Postmodern/Critical     Objectivists proclaim objectivity. Constructivists approve of subjectivity. Postmoderns are social activists. Focus on questions of power, “Who are you to set objectives for others?” Use of deconstruction to see what’s inside texts and materials. Most interested in the hidden curriculum, such as the teaching of traditional gender roles.  What does the curriculum teach? Why should IDs care about this evaluation approach? Evaluation Frameworks: Kirkpatrick’s Model Level 4: Level 3: Level 2: Level 1: Does it matter? Does it advance strategy? Are they doing it (objectives) consistently and appropriately? +++++++++++++++++++++++++ Can they do it (objectives)? Do they show the skills and abilities? Did they like the experience? Satisfaction? Use? Repeat use? Evaluation Frameworks: CIPP     Context assesses program/product needs, problems or opportunities specific to the project environment. Input to assess, evaluate and allocate project resources in order to meet identified needs and objectives, solve problems, and optimize program impact. Process assesses project implementation. Product assesses planed and unintended (unforeseen) outcomes, both to keep a project on track and to determine effectiveness or impact. Types of Tests Used to evaluate changes in skills and knowledge Is testing alone sufficient? Test Types: Norm-Referenced    Compare an individual's performance to the performance of other people. Require varying item difficulties. Assume not everybody is going to "get it"  Discern those who "got it" from those who didn't. Normal Distribution Test Types: Norm-Referenced  Norm-referenced tests compare the individual to the group.   Accomplished statistically by “norming” the test with large numbers of people. Consider:   570 You sat for the GRE and received the following scores. You need to retake the test. What is your study plan? 51 22 28 Test Types: Norm-Referenced  Limitations  Not especially helpful for:   identifying individual skill deficiencies identifying weaknesses in the instruction Test Types: Criterion-Referenced   Compares an individual's performance to the acceptable standard of performance for those tasks. Requires completely specified objectives.   Asks: Can this person do that which has been specified in the objectives? Results in yes-no decisions about competence. Test Types: Criterion-Referenced  Applications     Diagnosis of individual skill deficiencies Certification of skills Evaluation and revision of instruction Limitations    Tend to focus on specific skills Results may not reflect general aptitudes Everyone may get an “A” Which Test is Which? NR CRT IQ test GRE SDSU Writing Competency Red Cross Lifesaving Certificate EDTEC 540 midterm and final exams Which Test is Which? NR CRT Give out a CA driver's license Pick students for Russian lang. training Determine entrance into medical school PADI Scuba Certification Select one EDTEC scholarship recipient Figure out where to revise a course Decide which students need remediation Utility of Test Scores      Selection & screening (before):  mastery of prerequisites -- for remediation/placement  mastery of course objectives -- for acceleration (“testing out”) Individual diagnosis and prescription (along the way) Practice (along the way) Grades & summative scores (at or after the end):  promotion  certification and licensure Administrative:  course evaluation  trainer accountability Criterion-referenced Test Items Objectives Items Given a map of the USA with state borders marked, the lwbat write the abbreviation for 45 of 50 states in 15 mins. Here is a map of the USA with the states outlined-- but no names. Use the state abbreviations and fill them in- you've got 15 mins to get at least 45. Given a pair of well-worn shoes, the lwbat identify what's wrong with the shoes and the tools and materials necessary to fix them. Take a look at this pair of shoes. What problems do you see? What will you need to fix them? Given a goal, lwbt write at least two appropriate objectives with proper ABCD parts. The goal of the instruction is: "ID's will know how to write resumes." Write at least 2 objectives with all four parts. Matching Test Items to Objectives  Matching ensures validity   Validity is the extent to which the test measures what is important to performance. Does a high score on the test equate to high performance on the job? The validity of a criterion-referenced test is enhanced when:   objectives match real-world performances (based on solid analysis); test items match stated objectives (including condition). Match, or Not?   Given any stocked fruit or vegetable, the Ralphs Grocery Checker will be able to verbally state the code which matches the produce provided with 100% accuracy. Here is a persimmon from the produce department and the produce code job aid. Please state the produce code for this item. You may examine the persimmon and reference the job aid. Match, or Not?   Given a tree in need of pruning, the gardener’s apprentice will be able to select the correct tree pruning device, based upon the type of tree presented. Here is an overgrown elm tree. Please select the appropriate tool with which you will prune the tree. Match, or Not?   Given a descriptive order for a Café Mocha, including size, caf/decaf, type of milk, the barista will be able to create the drink as specified in the Starbuck’s Guide to Coffee Creations. A customer has just ordered a Grande, non-fat, mocha. Please list the ingredients you will need, and describe the steps you would take to create the drink. Evaluating a Training Program Consider:  Your evaluation uses a criterion-based test to see if the new account representatives can describe the different types of accounts offered by the bank.   All representatives were able to meet the specified criteria Case closed… or, do you want to know more? Ideas in Testing    Measurement Error Validity Reliability Measurement Error Many causes:  mechanical or scoring errors  poor wording (confusing, ambiguous)  poor subject matter, content (validity)  score variation from one time to another (reliability)  score variation from "equivalent" tests  test administration procedure  inter-rater reliability  mood of the student Validity  Does the test assess what's important? Does it really seek out the skill and knowledge linked to the world? (content validity)  Types:   Content Validity (most important to us) Predictive Validity (e.g. SAT, GRE) Reliability  Are the scores produced by the test trustworthy and stable over time?  Assessed by:   parallel (equivalent) forms or test-retest internal consistency Testing and Evaluation A Look Ahead:  ED 690 – Procedures of Investigation  Provides introduction to evaluation procedures and methods Introduces research process, statistical analysis ED 791A, 791B, 791C  Evaluation sequence most often completed by EDTEC students, over writing a thesis  Conduct a full-scale evaluation (design, research, report) for a living, breathing client over a twosemester timeframe  
© Copyright 2025