Download Report

Online Survey Sample
and Data Quality Protocols
Socratic Technologies, Inc. © 1994-2014. Reproduction in whole or part without written permission is prohibited. Federal law provides severe civil
and criminal penalties for unauthorized duplication or use of this material in physical or digital forms, including for internal use. ISSN 1084-2624.
sotech.com | 800-576-2728
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Sample and Data Quality
Historical Perspective
Socratic Technologies,
Inc. has developed
sophisticated sample
scanning and quality
assessment programs
to identify and correct
problems that may
lead to reduced data
reliability and bias.
From the earliest days of research,
there have been problems with sample
quality (i.e., poor recruiting, inaccurate
screening, bias in sample pools, etc.) and
potential respondents have attempted to
submit multiple surveys (paper and pencil), lie to get into compensated studies
(mall intercepts and focus groups) and
have displayed lazy answering habits (all
forms of data collection).
In the age of Internet surveying, this is
becoming a highly discussed topic now,
because we now have the technology to
measure sample problems and we can
now detect exactly how many people are
involved in “bad survey behaviors.” While
this puts a keen spotlight on the nature of
problems, we also have the technology to
correct many of these issues in real-time.
So while we are now aware of potential issues, we are also better prepared than at
any time in the past, to deal with threats
to data quality. This paper will detail the
steps and procedures that we use at Socratic Technologies to ensure the highest
data quality by correcting problems in
both sample sourcing and bad survey
behavior.
Sample Sources & Quality Procedures
The first line of defense in overall data
quality is the sample source. Catching
problems begins by examining the way
panels are recruited.
According to a variety of industry sources,
pre-identified sample sources (versus
Web intercepts using pop-up invitations
or banner ads) now account for almost
80% of U.S. online research participants
(and this proportion is growing). Examples include:
• Opt-in lists
• Customer databases
• National research panels
• Private communities
A common benefit to all of these sources
is that they include a ready-to-use database from which a random or pre-defined
sample can be selected and invited. In
addition, pre-recruitment helps to solidify
the evidence of an opt-in permission for
contact or to more completely establish
an existing business relationship—at
least one of which is needed to meet the
requirements of email contact under the
federal CAN-SPAM Act of 2003.
In truth, panels of all kinds contain some
level of bias driven by the way recruitment strategy is managed. At Socratic
we rely on panels that are recruited
primarily through direct invitation. We
exclude sample sources that are recruited using a “Get paid for taking surveys”
approach. This ensures that the people
who we are inviting to our surveys are not
participating for strictly mercenary purposes--which has been shown to distort
answers (i.e. answering questions in such
a way as to “please” the researcher in
exchange for future monetary rewards.)
CONTINUED
sotech.com | 800-576-2728
2
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
continued
In addition, we work with panel partner
who undertake thorough profile verification and database cleaning procedures
on an ongoing basis.
Our approved panel partners regularly
scan databases for:
Panels and Sample
Sources are like wine,
if you start with poor
grapes, no matter
what the skill of the
winemaker, the wine is
still poor. How panels
are recruited determines
• unlikely duplicated Internet server
addresses
• series of similar addresses (abc@hotmail, bcd@hotmail, cde@hotmail, etc.)
• replicated mailing addresses (for incentive checks)
• other data that might indicate multiple
sign-ups by the same individual
• impossible changes to profiling information (e.g., a 34 year old woman
becoming an 18 year old man)
• lack of responsiveness (most drop
panelists if they fail to respond to five
invitations in a row)
• non-credible qualifications (e.g. persons who consistently report ownership
or experience with every screening
option)
• a history of questionable survey behavior (see “Cheating Probability Score”
later in this document)
CONTINUED
the long-run quality of
the respondents they
produce.
sotech.com | 800-576-2728
3
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Socratic’s Network of Global Panel Providers
The following list details the panel partners (subject to change) to whom we regularly turn for recruitment on a global basis:
VENDOR NAME
COUNTRIES
VENDOR NAME
3D interactive.com
Australia
EuroClix B.V.
Panelcliz Netherlands
Accurate Market
ResearchMexico
Flying Post
UK, France, Germany
Focus Forward
US
AdperioUS
GainJapan
AdvaithAsia
Garcia Research
AssociatesUS
42 Market Research France
COUNTRIES
AG3
Brazil, Argentina, Mexico, Chile
AIP Corporation Asia
HRHGreece
Internet Plaza
Asia
IID Interface in DesignAsia
AlterechoBelgium
Amry Research
Russia, Ukraine
ARCPoland
AuroraUK
Aussie Survey
UK & Australia
Authentic Response
US
Beep world
Austria, Switzerland, Germany
BestLifeLATAM
Blueberries Israel
C&R Research
Services, Inc.
US
GMI VENDOR NAME
All Countries
China, Hong Kong
India
CorpscanIndia
CotterwebUS
Data Collect
Czech Republic
Opinions
UAE, Saudi Arabia
Panel Base
UK
Panel Service Africa
South Africa
Panthera Interactive
All Countries
Precision Sample
US
Public Opinious
Canada
Pure Profile
UK, US, Australia
Russia, Ukraine, US, UK
Japan
iPanelOnlineAsia
ResultaAsia
IthinkUS
RPAAsia
ItracksCanada
Sample Bus
Asia
IvoxBelgium
Schlesinger Assoc.
US
Lab 42
SeapanelsAsia
All Countries
LivraLATAM
Community View
All Countries
Rakuten Research
Cint
All Countries
US
Opinion Outpost/SSI
Quick Rewards
Campus Fund Raiser US
All Countries
Russia, Ukraine
Opinion Health
InzichtNetherlands,
Belgium, France
Lightspeed Research Italy, Spain, Germany, (UK Kantar Group) Australia, New Zealand,
Netherlands, France, Sweden, UK, Switzerland
Clear Voice /
Oceanside
OMI
PanelbizEU
Inquision South Africa,
Turkey
Insight CN
COUNTRIES
OfferwiseUS
Luth Research
US
M3 Research
Nordics
Maktoob Research
Middle East
Market intelligence
US, EU
Market Tools
Canada, Australia
US, UK, France, Spec Span
US
Spider Metrix
Australia, UK, Canada, New Zealand, South Africa, US
STR Center
All Countries
Telkoma
South Africa
Testspin/WorldWide
All Countries
Think Now Research US
TKL Interactive
US
TNS New Zealand
New Zealand
All Countries
All Countries
DelviniaCanada
Market-xcel
India, Singapore
Toluna
EC Global Panel
Masmi Ukraine
Hungary, Russia,
United sample
EksenTurkey
Embrain Co.
Mc Million
US
uthinkCanada
EmpanelUS
Mo Web
EU
Empathy Panel
Ireland
My Points
US, Canada
WebMD Market
Research Services Empowered Comm.
Australia
Nerve planet
LATAM, US
Asia
UserneedsNordics
US
World One Research US, France,
Germany, Spain
India, China, Japan
ePanel Marketing
ResearchChina
Net, Intelligence
& ResearchKorea
Erewards/
ResearchNOW
Netquest
Portugal, Spain
YOUMINTIndia
All Countries
US, Canada, UK
Italy, France,
Germany, Spain, UK
Zapera (You Gov)
Esearch
ODC Service YOCGermany
All Countries
sotech.com | 800-576-2728
4
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
continued
Anti-Cheating Protocols
Unlike other data
collection modes, the
As a first step in identifying and rejecting
bad survey behavior, we need to differentiate between Cheating and Lazy Behavior
issues. The solutions Socratic uses for
handling each type of problem differ by
class of delinquency.
server technology
Cheaters attempt to enter a survey multiple times in order to:
used in online surveys
• Collect compensation
give the researcher
far more control over
in-process problems
related to cheating and
bad behavior.
• Sabotage results
Lazy folks don’t really think and do the
least amount of work necessary to complete
• Sometimes to get the compensation
• Other times, because of the burden,
boredom or fatigue of long, repetitious,
difficult surveys
Many forms of possible cheating and lazy
respondent behaviors can be detected
using server-based data and response
pattern recognition technologies. In some
cases, bad respondents are immediately
detected and rejected before they
even begin the survey. This is critical
for quality, because we don’t accept or
pay for “illegitimate” or “duplicated”
respondents increasing the value of
every completed interview. Other times,
we allow people to enter the survey, but
then use pattern recognition software to
detect “answer sequences” that warrant
“tagging and bagging.” Note: while we
inform cheaters that “they’re busted and
won’t be getting any incentive,” we don’t
tell them how they were caught!
One of our key tools in assessing the
quality of a respondent is the Socratic
Cheating Probability Score (CPS). A
Cheating Probability Score looks at many
possible problems and classifies the risk
associated with accepting an interview as
“valid and complete.”
However, we also need to be careful not
to use a “medium probability score” as
an automatic disqualifier. Just because
the results are not what we expect,
doesn’t mean it’s wrong! Marginal scores
should be used to “flag” an interview,
which should then be reviewed before
rejecting. High scores are usually rejected mid-survey before the respondent is
qualified as having “completed.”
Here are some examples of how we use
technology to detect and reject common
respondent problems:
Repeat Survey Attempts
Some cheaters simply attempt to retake
surveys over and over again. These are
the easiest to detect and reject. To avoid
self-selection bias, most large surveys
today are done “by customized invitation”
[CAN-SPAM 2003] and use a “handshake”
protocol. Pre-registering individuals with
verified profiling data in order to establish
a double or triple opt-in status
Cheaters Solutions: Handshake Protocols
A handshake protocol entails generating
a unique URL suffix-code, which is used
for the link to the survey in the email
invitation. It is tied to a specific individual’s email address and/or panel member
identification. Once it is marked as
CONTINUED
sotech.com | 800-576-2728
5
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
continued
“complete” in the database no other submissions are permitted on that person’s
account. An example of this random
suffix code is as follows:
http://sotechsurvey.com/survey/?pid=wx54Dlo1
Supplementing the invitation handshake,
a Cookie check is utilized. At the start of
all surveys, Socratic looks for a cookie
bearing that survey_id and if it is found,
the user will not be allowed to take the
survey again. The respondent ID is
immediately blocked, so that even if they
remove the cookie later on, they still won’t
be allowed back in. At the time a survey
is finished (complete or termination), a
cookie with the survey_id will be placed
on the users machine.
But cookie checks are no longer sufficient by themselves to prevent multiple
submission attempts. More advanced
identification is needed.
For a more advanced identification verification; Socratic utilizes an IP & Browser
Config Check. This is a server-level
test that is invisible to the respondent.
Whenever a person’s browser hits a Web
site, it exchanges information with the
Web server in order for the Web pages
(or survey pages) to display correctly. For
responses to all surveys, a check can be
made for multiple elements:
IP Address
The first level of validation comes from
checking the IP address of the respondent’s
computer. IP addresses are usually generated based on a tightly defined geography. So if someone is supposed to be in
California, and their IP address indicates
a Chinese based service, this would be
flagged as a potential cheating attempt.
Browser String
Each browser sends a great deal of
information about the user’s system to
the Survey Server. These strings are then
logged and subsequent survey attempts
are compared to determine whether
exact matches are occurring. This is an
example of a browser strings that would
be used to detect matches:
• Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; FunWebProducts; Advanced Searchbar; .NET
CLR 1.1.4322; .NET CLR 1.0.3705;
KelkooToolbar 1.0.0)
• Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; Monzilla/4.0 (compatible; MSIE 6.1; Windows 95/98/NT/
ME/2000/XP; 10290201-SDM); SV1;
.NET CLR 1.0.3
Language Setting
Another browser-based information
set that is transmitted is the language
settings for the user’s system. These too
are logged and compared to subsequent
survey attempts.
• en-us,x-ns1pG7BO_dHNh7,x-ns2U3
• en-us,en;q=0.8,en-gb;q=0.5,sv;
• zh-cn;q=1.0,zh-hk;q=0.9,zh-tw;
• en-us, ja;q=0.90, ja-jp;q=0.93
Internal Clock Setting
Finally, the user’s computer system has
an internal time keeping function that
continuously monitors the time-of-date
and date out to a number of decimal
CONTINUED
sotech.com | 800-576-2728
6
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
continued
points. Each user’s computer will vary
slightly even within the same time zone or
within the same company’s system.
When these four measurement are taken
together, the probability of two exact settings
on all readable elements is extremely low.
impaired individuals.)
As computers becoming more and more
sophisticated in their ability to detect
patterns, the CAPTCHA distortions have
become more complex. Examples are as
follows:
Technology for cheating
in online surveys has
proliferated over the
past 10-years and in
some areas of the world
has become a cottage
industry. However,
with the correct server
technology, Socratic can
detect the profiles of
cheating applications and
thwart them in real-time,
prior to completing a
survey.
Techno-Cheaters
Images that Can be “Read”
by Image Recognition Bots (As of 2013)
Some cheaters are caught because they
are trying to use technology to submit
multiple surveys. Form Populators and
Key-Stroke Replicators are examples of
auto-fill technologies.
Techno-Cheaters Solutions:
Total automation can be thwarted by
creating non-machine readable code keys
that are used at the beginning of a survey
to make sure a human being is responding versus a computer “bot.” We refer to
this as a Handshake Code Key Protocol.
One of the most popular Handshake
Code Key Protocols is CAPTCHA [Source:
UC Berkeley CAPTCHA Project: http://
www.cs.berkeley.edu/~mori/gimpy/gimpy.
html]. To prevent bots and other automated form completers from entering
our surveys, a distorted image of a word
or number can be displayed on the start
screen of all Socratic projects. In order to
gain access to a survey, the user has to
enter the word or number shown in the
image into a text box; if the result does
not match the image, the user will not
be allowed to enter the survey. (Note:
some dispensation and use of alternative
forms of code keys are available for vision
Images that Cannot be “Read”
by Image Recognition Bots
CONTINUED
sotech.com | 800-576-2728
7
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
continued
Lazy Respondent Behavior
Lazy behavior is far more
prevalent as a survey
A far more common problem with survey
takers across all modes of data collection
are people who just don’t take the time
and effort to answer questions carefully.
This can result in rushed surveys or those
with replicated pattern issues.
predetermined point, before the survey
has been completed.
• Based on the time since starting, the
number of closed ended questions,
and the number of open end questions, a determination will be made as
to whether the respondent has taken
an adequate amount of time to answer
the questions.
problem than outright
There are several reasons that are related
to why respondents don’t pay attention.
cheating; primarily
• Problem 1: Just plain lazy
If (Time < (((# of CEs * secs/CE) + (# of
OEs * secs/OE)) * 0.5)) Then FLAG.
• Problem 2: Survey design is torturous
Replicated Patterns
because it’s easier to
defeat cheaters than
●●
Too long
●●
Boring/Repetitious
●●
Too difficult
algorithms, however,
●●
Not enough compensation
it is now possible to
●●
No affinity with sponsor
people who aren’t paying
attention. With new,
more sophisticated
limit the influence of
lazy respondents in
mid-survey.
But whatever the reason for lazy behavior,
the symptoms are similar, and the preventative technologies are the same.
Speeders
In the case of rushed surveys (“Speeders”), speed of submission can be
also used to detect surveys completed too
quickly. One statistical metric that
Socratic uses is the Minimum Survey
Time Threshold. By adapting a normative
formula for estimating the length of a
survey based on the number of various
types of questions one can calculate
an estimated time to completion and
determine if actual time to completion
is significantly lower. This test is run at a
• Another common problem caused
by lazy behavior is the appearance of
patterned answers throughout a survey
(e.g. choosing the first answer for every
question, or selecting a single rating
point for all attributes.) These are fairly
easy to detect and the respondent
can be “intercepted” in mid-survey
and asked to reconsider patterned
sequences. Socratic uses Pattern
Recognition Protocols within a survey
to detect and correct these types of
problems.
Here are some of the logic solutions we
apply for common patterning problems:
• XMas Treeing – This technique will
identify those who “zig-zag” their
answers (e.g. 1,2,3,4,5,4,3,2,1, etc.)
●●
How to: When all attributes are completed take the absolute value of all
att-to-att differences, if the mean value is close to 1 you should flag them.
CONTINUED
sotech.com | 800-576-2728
8
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
continued
• Straight Lining – This technique will
identify those who straight line answers to a survey (e.g., taking the first
choice on an answer set or entering
4,4,4,4,4,4 on a matrix, etc.)
●●
The majority of problems
related to data quality
can be detected before
a survey is completed.
How to: Subtract each attribute
(SubQuestion) from the previous
and keep a running total. When all
attributes are completed take the
absolute value, if the mean value is 0
you should flag them.
However, a variety of
Random Answers
ongoing checks can add
While these Pattern Recognition Protocols
pick up many common problems, they
cannot detect random answer submission
(e.g. 1,5,3,2,5,4,3,1,1, etc.) For this we
need another type of logic: Convergent/
Divergent Validity tests.
even more assurance
that respondents are who
they claim to be and are
located in the correct
location. Panel cleaning
is necessary for long-run
viability.
This test relies on the assumption that
similar questions should be answered
in a similar fashion and polar opposites
should receive inverse reactions. For
example, if someone strongly agrees that
a product concept “is expensive,” he or
she should not also strongly agree that
the same item “is inexpensive.” When
these types of tests in place, the survey
designer has the same flexibility to intercept a survey with “validity issues” and
request that the respondent reconsider
their answers.
Cross-Survey Answer Block Sequences
Occasionally, other anti-cheating/anti-lazy behavior protocols will fail to detect a
well-executed illegitimate survey. For this
purpose, Socratic also scans for repeated
sequences using a Record Comparison
Algorithm – Questionnaires are continuously scanned, record-to-record, for
major blocks of duplicated field contents
(e.g. >65% identical answer sequences.) Note: Some level of discretion will
be needed on surveys for which great
similarities of opinion or homogeneity in
the target population are anticipated
Future development is also planned to
scan open-ended comments for duplicated phrases and blocks of similar text
within live surveys. Currently, this can
only be done post-hoc.
Post Survey Panel Cleaning
Post Survey Detection
For the panels managed by Socratic
Technologies, the quality assurance
program extends beyond the sample
cleaning and mid-survey error testing.
We also continuously monitor issues that
can only be detected post-hoc.
Address Verification
Every third or fourth incentive payment
should be made by check or mailed
notice to a physical address if people
want their reward, they have to drop any
aliases or geographic pretext in order
for delivery to be completed, and often
times you can catch cheaters prior to
distribution of an incentive. Of course,
duplicated addresses, P.O. Boxes etc. are
a give-away, we also look for slight name
CONTINUED
sotech.com | 800-576-2728
9
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
continued
derivatives not usually caught by banks,
including:
vey design. Some discretion will always
be a requirement of survey usability:
• nicknames [Richard Smith and Dick
Smith]
• Writing screeners that don’t telegraph
qualification requirements
• use of initials [Richard Smith and R.
Smith]
• Keeping survey length and burden to a
reasonable level
• unusual capitalization [Richard Smith
and RiCHard SmiTH]
• Minimizing the difficulty of compliance
• small misspellings [Richard Smith and
Richerd Smith]
• Enhancing the engagement levels of
boring tasks
Conclusion
• Maximizing the communication that
participation is worthwhile and appreciated
Many features and security checks are
now available for assuring the validity of
modern online research, this includes
pre-survey panel quality, mid-survey
cheating and lazy behavior detection and
post-survey panel cleaning.
While many of these techniques Socratic
can “flag” a possible cheating or lazy
behavior, we believe that the analyst should
not just automatically reject interviews,
but examine marginal cases for possible
validity.
With these technologies in place, online
research can now be more highly regulated than any other form of data collection.
Not all survey bad behavior is malicious;
some is driven by poor and torturous sur-
sotech.com | 800-576-2728
10
CONTACT
San Francisco Headquarters
Socratic Technologies, Inc.
2505 Mariposa Street
San Francisco, CA 94110-1424
T 415-430-2200 (800-5-SOCRATIC)
Chicago Regional Office
Socratic Technologies, Inc.
211 West Wacker Drive, Suite 1500
Chicago, IL 60606-1217
T 312-727-0200 (800-5-SOCRATIC)
Contact Us
sotech.com/contact
Socratic Technologies, Inc. is a leader in the science of computer-based and interactive research methods. Founded in 1994 and headquartered in San Francisco, it is a
research-based consultancy that builds proprietary, interactive tools that accelerate
and improve research methods for the study of global markets. Socratic Technologies
specializes in product development, brand articulation, and advertising research for the
business-to-business and consumer products sectors.
Registered Trademarks, Salesmarks and Copyrights
The following product and service descriptors are protected and all rights are reserved.
Configurator Analysis , ReportSafe , Site-Within-Survey , Socratic Browser , Socratic CardSort , Socratic
ClutterBook , Socratic ColorModeler , Socratic CommuniScore , Socratic Forum , Socratic Perceptometer ,
Socratic ProductExhibitor , Socratic Site Diagnostic , SSD , Socratic Te-Scope , Socratic Usability Lab ,
Socratic VisualDifferentiator , Socratic Web Boards , Socratic Web Survey 2.0 , SWS 2.0 , Socratic
WebComm Toolset , Socratic WebPanel Toolset .
TM
TM
SM
TM
SM
SM
SM
SM
SM
SM
SM
SM
SM
SM
®
SM
SM
SM
SM
SM
SM
sotech.com | 800-576-2728
11