Evaluation: Testing, Objective-to-Test-Item Matching and Judgments of Worth

Evaluation:
Testing,
Objective-to-Test-Item Matching
and Judgments of Worth
EDTEC 540
James Marshall
Session Overview


Evaluation Approaches
Testing – one possible data point in
evaluation




Norm-referenced
Criterion-referenced
Objective-to-test-item matching
Measurement error, reliability and validity
Evaluation, typically





Typically, it doesn’t happen! That said, it should
And it is required for many funded projects
What happened? Were goals and objectives
achieved? How can we find that out?
At the end is NOT the only time to measure
worth. When else?
Strategies: tests, observations, surveys, chats
with managers, look at work, results
Evaluation Approaches
Objectivist




Belief in a reality that can be known and measured. Prevalent in
education and our business.
Objectives-based, deceptively simple. Establish goals-->set
objectives--> tailor instruction to obj-->judge effectiveness.
Measures are analytical/quantitative in nature.
Examples




Do first-graders know the letters of the alphabet?
Can the new account representative describe the features of each
checking account – as defined by the bank?
Others?
Advantages/disadvantages?
Evaluation Approaches
Constructivist


Belief that people construct their own realities. Advocates believe that
truth is a matter of consensus, not measurement against an objective
reality.
Evaluation creates detailed descriptions of that which is inside the head
of the learner.




Measures are qualitative in nature.
Examples




Reliance upon open-ended exercises, observation, cases and immersion in
the field.
Observation is useful for us, in that IDs build prototypes, conduct formative
evaluations, revise and cycle again.
Role play exercise to deal with a hostile customer
Theme Park Tycoon – running a theme park for a year
Essay question asking you to describe your understanding of Educational
Technology
Advantages/disadvantages?
Evaluation Approaches
Postmodern/Critical




Objectivists proclaim objectivity. Constructivists
approve of subjectivity. Postmoderns are social
activists.
Focus on questions of power, “Who are you to set
objectives for others?” Use of deconstruction to see
what’s inside texts and materials.
Most interested in the hidden curriculum, such as the
teaching of traditional gender roles.
 What does the curriculum teach?
Why should IDs care about this evaluation approach?
Evaluation Frameworks:
Kirkpatrick’s Model
Level 4:
Level 3:
Level 2:
Level 1:
Does it matter? Does it advance
strategy?
Are they doing it (objectives)
consistently and appropriately?
+++++++++++++++++++++++++
Can they do it (objectives)? Do
they show the skills and abilities?
Did they like the experience?
Satisfaction? Use? Repeat use?
Evaluation Frameworks: CIPP




Context assesses program/product needs, problems
or opportunities specific to the project environment.
Input to assess, evaluate and allocate project
resources in order to meet identified needs and
objectives, solve problems, and optimize program
impact.
Process assesses project implementation.
Product assesses planed and unintended
(unforeseen) outcomes, both to keep a project on
track and to determine effectiveness or impact.
Types of Tests
Used to evaluate changes in
skills and knowledge
Is testing alone sufficient?
Test Types: Norm-Referenced



Compare an individual's performance to the
performance of other people.
Require varying item difficulties.
Assume not everybody is going to "get it"

Discern those who "got it" from those who didn't.
Normal Distribution
Test Types: Norm-Referenced

Norm-referenced tests compare the individual to the
group.


Accomplished statistically by “norming” the test with large
numbers of people.
Consider:


570
You sat for the GRE and received the following scores.
You need to retake the test.
What is your study plan?
51 22 28
Test Types: Norm-Referenced

Limitations

Not especially helpful for:


identifying individual skill deficiencies
identifying weaknesses in the instruction
Test Types: Criterion-Referenced


Compares an individual's performance to the
acceptable standard of performance for
those tasks.
Requires completely specified objectives.


Asks: Can this person do that which has been
specified in the objectives?
Results in yes-no decisions about
competence.
Test Types: Criterion-Referenced

Applications




Diagnosis of individual skill deficiencies
Certification of skills
Evaluation and revision of instruction
Limitations



Tend to focus on specific skills
Results may not reflect general aptitudes
Everyone may get an “A”
Which Test is Which?
NR CRT
IQ test
GRE
SDSU Writing Competency
Red Cross Lifesaving Certificate
EDTEC 540 midterm and final exams
Which Test is Which?
NR CRT
Give out a CA driver's license
Pick students for Russian lang. training
Determine entrance into medical school
PADI Scuba Certification
Select one EDTEC scholarship recipient
Figure out where to revise a course
Decide which students need remediation
Utility of Test Scores





Selection & screening (before):
 mastery of prerequisites -- for remediation/placement
 mastery of course objectives -- for acceleration (“testing out”)
Individual diagnosis and prescription (along the way)
Practice (along the way)
Grades & summative scores (at or after the end):
 promotion
 certification and licensure
Administrative:
 course evaluation
 trainer accountability
Criterion-referenced Test Items
Objectives
Items
Given a map of the USA with state
borders marked, the lwbat write the
abbreviation for 45 of 50 states in
15 mins.
Here is a map of the USA with the
states outlined-- but no names. Use
the state abbreviations and fill them in- you've got 15 mins to get at least 45.
Given a pair of well-worn shoes, the
lwbat identify what's wrong with the
shoes and the tools and materials
necessary to fix them.
Take a look at this pair of shoes.
What problems do you see? What will
you need to fix them?
Given a goal, lwbt write at least two
appropriate objectives with proper
ABCD parts.
The goal of the instruction is: "ID's will
know how to write resumes." Write at
least 2 objectives with all four parts.
Matching Test Items to Objectives

Matching ensures validity


Validity is the extent to which the test measures what is
important to performance. Does a high score on the test
equate to high performance on the job?
The validity of a criterion-referenced test is
enhanced when:


objectives match real-world performances (based on solid
analysis);
test items match stated objectives (including condition).
Match, or Not?


Given any stocked fruit or vegetable, the Ralphs Grocery
Checker will be able to verbally state the code which
matches the produce provided with 100% accuracy.
Here is a persimmon from the produce department and the
produce code job aid. Please state the produce code for
this item. You may examine the persimmon and reference
the job aid.
Match, or Not?


Given a tree in need of
pruning, the gardener’s
apprentice will be able to
select the correct tree
pruning device, based upon
the type of tree presented.
Here is an overgrown elm
tree. Please select the
appropriate tool with which
you will prune the tree.
Match, or Not?


Given a descriptive order
for a Café Mocha, including
size, caf/decaf, type of milk,
the barista will be able to
create the drink as specified
in the Starbuck’s Guide to
Coffee Creations.
A customer has just ordered
a Grande, non-fat, mocha.
Please list the ingredients
you will need, and describe
the steps you would take to
create the drink.
Evaluating a Training Program
Consider:
 Your evaluation uses a
criterion-based test to see if
the new account
representatives can
describe the different types
of accounts offered by the
bank.


All representatives were
able to meet the specified
criteria
Case closed… or, do you
want to know more?
Ideas in Testing



Measurement Error
Validity
Reliability
Measurement Error
Many causes:
 mechanical or scoring errors
 poor wording (confusing,
ambiguous)
 poor subject matter, content
(validity)
 score variation from one time to
another (reliability)
 score variation from "equivalent"
tests
 test administration procedure
 inter-rater reliability
 mood of the student
Validity

Does the test assess what's important?
Does it really seek out the skill and
knowledge linked to the world? (content
validity)

Types:


Content Validity (most important to us)
Predictive Validity (e.g. SAT, GRE)
Reliability

Are the scores produced by the test
trustworthy and stable over time?

Assessed by:


parallel (equivalent) forms or test-retest
internal consistency
Testing and Evaluation
A Look Ahead:

ED 690 – Procedures of Investigation
 Provides introduction to evaluation procedures and
methods
Introduces research process, statistical analysis
ED 791A, 791B, 791C
 Evaluation sequence most often completed by EDTEC
students, over writing a thesis
 Conduct a full-scale evaluation (design, research,
report) for a living, breathing client over a twosemester timeframe

