Online Survey Sample and Data Quality Protocols sotech.com | 800-576-2728 sotech.com

A Marketing Research Consultancy
Online Survey Sample
and Data Quality Protocols
sotech.com
sotech.com | | 800-576-2728
800-576-2728
1
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Sample and Data Quality
Socratic Technologies,
Inc., has developed
sophisticated sample
scanning and quality
assessment programs
to identify and correct
problems that may
lead to reduced data
reliability and bias.
Historical Perspective
From the earliest days of research,
there have been problems with sample
quality (i.e., poor recruiting, inaccurate
screening, bias in sample pools, etc.) and
potential respondents have attempted to
submit multiple surveys (paper and pencil), lied to get into compensated studies
(mall intercepts and focus groups) and
displayed lazy answering habits (all forms
of data collection).
In the age of Internet surveying, this is
becoming a highly discussed topic
because we now have the technology to
measure sample problems and we can
detect exactly how many people are
involved in “bad survey behaviors.” While
this puts a keen spotlight on the nature of
problems, we also have the technology to
correct many of these issues in real time.
So, because we are now aware of potential
issues, we are better prepared than at
any time in the past to deal with threats to
data quality. This paper will detail the steps
and procedures that we use at Socratic
Technologies to ensure the highest data
quality by correcting problems in both
sample sourcing and bad survey behavior.
Sample Sources & Quality Procedures
The first line of defense in overall data
quality is the sample source. Catching
problems begins with examining the way
panels are recruited.
According to a variety of industry sources,
preidentified sample sources (versus Web
intercepts using pop-up invitations or banner ads) now account for almost 80% of
U.S. online research participants (and this
proportion is growing). Examples include:
• Opt-in lists
• Customer databases
• National research panels
• Private communities
A common benefit of all of these sources
is that they include a ready-to-use database from which a random or predefined
sample can be selected and invited. In
addition, prerecruitment helps to solidify
the evidence of an opt-in permission for
contact or to more completely establish
an existing business relationship­—at
least one of which is needed to meet the
requirements of email contact under the
federal CAN-SPAM Act of 2003.
In truth, panels of all kinds contain some
level of bias driven by the way recruitment strategy is managed. At Socratic we
rely on panels that are recruited primarily
through direct invitation. We exclude
sample sources that are recruited using
a “get-paid-for-taking-surveys” approach.
This ensures that the people who we are
inviting to our surveys are not participating
for strictly mercenary purposes—which
has been shown to distort answers (i.e.,
answering questions in such a way as to
“please” the researcher in exchange for
future monetary rewards).
In addition, we work with panel partners
who undertake thorough profile verification and database cleaning procedures
on an ongoing basis.
sotech.com | 800-576-2728
2
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Our approved panel partners regularly
scan databases for:
• Unlikely duplicated Internet server
addresses
• Series of similar addresses (abc@hotmail, bcd@hotmail, cde@hotmail, etc.)
Panels and sample
sources are like wine:
If you start with poor
grapes, no matter what
the skill of the winemaker, the wine is still
poor. How panels are
recruited determines
• Replicated mailing addresses (for
incentive checks)
• Other data that might indicate multiple
sign-ups by the same individual
• Lack of responsiveness (most drop
panelists if they fail to respond to five
invitations in a row)
• Non-credible qualifications (e.g., persons who consistently report ownership
or experience with every screening
option)
• A history of questionable survey behavior (see “Cheating Probability Score”
later in this document)
• Impossible changes to profiling information (e.g., a 34-year-old woman
becoming an 18-year-old man)
Figure 1: Socratic Technologies panel development Web site
the long-run quality of
the respondents they
produce.
sotech.com | 800-576-2728
3
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Socratic’s Network of Global Panel Providers
The following list details the panel partners (subject to change) to whom we regularly turn for recruitment on a global basis.
VENDOR NAME
COUNTRIES
VENDOR NAME
3D interactive.com
Australia
EuroClix B.V.
Panelcliz Netherlands
Accurate Market
ResearchMexico
Flying Post
UK, France, Germany
Focus Forward
US
AdperioUS
GainJapan
AdvaithAsia
Garcia Research
AssociatesUS
42 Market Research France
AG3
Brazil, Argentina, Mexico, Chile
AIP Corporation Asia
GMI COUNTRIES
Amry Research
Inquision South Africa,
Turkey
ARCPoland
AuroraUK
Aussie Survey
UK & Australia
Authentic Response
US
Beep World
Austria, Switzerland, Germany
BestLifeLATAM
Blueberries C&R Research
Services, Inc.
Israel
US
Campus Fund Raiser US
Cint
All Countries
Clear Voice /
Oceanside
All Countries
Community View
India
CorpscanIndia
CotterwebUS
Data Collect
Czech Republic
DelviniaCanada
Insight CN
OMI
Russia, Ukraine
Opinion Health
US
Opinion Outpost/SSI
All Countries
Opinions
UAE, Saudi Arabia
Panel Base
UK
Panel Service Africa
South Africa
PanelbizEU
HRHGreece
IID Interface in DesignAsia
COUNTRIES
OfferwiseUS
All Countries
AlterechoBelgium
Russia, Ukraine
VENDOR NAME
China, Hong Kong
Panthera Interactive
All Countries
Precision Sample
US
Public Opinious
Canada
Pure Profile
UK, US, Australia
InzichtNetherlands,
Belgium, France
Quick Rewards
Russia, Ukraine, US, UK
Rakuten Research
Japan
iPanelOnlineAsia
ResultaAsia
IthinkUS
RPAAsia
ItracksCanada
Sample Bus
Asia
IvoxBelgium
Schlesinger Assoc.
US
Lab 42
SeapanelsAsia
All Countries
Lightspeed Research Italy, Spain, Germany, (UK Kantar Group) Australia, New Zealand,
Netherlands, France, Sweden, UK, Switzerland
LivraLATAM
Luth Research
US
M3 Research
Nordics
Maktoob Research
Middle East
Market Intelligence
US, EU
Market Tools
Canada, Australia
US, UK, France, Spec Span
US
Spider Metrix
Australia, UK, Canada, New Zealand, South Africa, US
STR Center
All Countries
Telkoma
South Africa
Testspin/WorldWide
All Countries
Think Now Research US
TKL Interactive
US
TNS New Zealand
New Zealand
All Countries
All Countries
Market-xcel
India, Singapore
Toluna
Masmi Ukraine
Hungary, Russia,
United sample
EmpanelUS
Mc Million
US
uthinkCanada
Empathy Panel
Ireland
Mo Web
EU
Empowered Comm.
Australia
My Points
US, Canada
WebMD Market
Research Services EC Global Panel
LATAM, US
EksenTurkey
Embrain Co.
Asia
ePanel Marketing
ResearchChina
Nerve planet
UserneedsNordics
US
World One Research US, France,
Germany, Spain
India, China, Japan
Net, Intelligence
& ResearchKorea
YOCGermany
Erewards/
ResearchNOW
All Countries
Netquest
Portugal, Spain
YOUMINTIndia
Esearch
US, Canada, UK
ODC Service Italy, France,
Germany, Spain, UK
Zapera (You Gov)
All Countries
sotech.com | 800-576-2728
4
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Anti-Cheating Protocols
Unlike other data
collection modes, the
As a first step in identifying and rejecting
bad survey behavior, we need to differentiate between Cheating and Lazy Behavior.
The solutions Socratic uses for handling
each type of problem differ by class of
delinquency.
server technology
Cheaters attempt to enter a survey multiple times in order to:
used in online surveys
• Collect compensation
gives the researcher
• Sabotage results
far more control over
in-process problems
Lazy folks don’t really think, and they do
the least amount of work in order to:
related to cheating and
• Receive compensation
lazy behavior.
• Avoid the burden, boredom or fatigue
of long, repetitious, difficult surveys
Many forms of possible cheating and lazy
respondent behaviors can be detected
using server-based data and response
pattern recognition technologies. In some
cases, bad respondents are immediately
detected and rejected before they even
begin the survey. This is critical for quality, because “illegitimate” or “duplicated” respondents decrease the value of
every completed interview. Sometimes,
we allow people to enter the survey, but
then use pattern recognition software to
detect “answer sequences” that warrant
“tagging and bagging.” Note: While we
inform cheaters that they’re busted and
won’t be getting any incentive, we don’t
tell them how they were caught!
One of our key tools in assessing the
quality of a respondent is the Socratic
Cheating Probability Score (CPS). A CPS
looks at many possible problems and
classifies the risk associated with accepting an interview as “valid and complete.”
However, we also need to be careful not
to use a “medium probability score” as
an automatic disqualifier. Just because
the results are not what we expect,
doesn’t mean they are wrong! Marginal scores should be used to “flag” an
interview, which should then be reviewed
before rejecting. High scores are usually
rejected mid-survey before the respondent is qualified as having “completed.”
Here are some examples of how we use
technology to detect and reject common
respondent problems:
Repeat Survey Attempts
Some cheaters simply attempt to retake
surveys over and over again. These are
the easiest to detect and reject. To avoid
self-selection bias, most large surveys
today are done “by customized invitation”
(CAN-SPAM 2003) and use a “handshake”
protocol. Preregistering individuals with
verified profiling data in order to establish
a double or triple opt-in status.
Cheaters Solutions: Handshake Protocols
A handshake protocol entails generating
a unique URL suffix-code, which is used
for the link to the survey in the email
invitation. It is tied to a specific individual’s email address and/or panel member
identification. Once it is marked as “complete” in the database no other submissions are permitted on that person’s
account. An example of this random
suffix code is as follows:
http://sotechsurvey.com/survey/?pid=
wx54Dlo1
sotech.com | 800-576-2728
5
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Supplementing the invitation handshake,
a cookie check is utilized. At the start of
all surveys, Socratic looks for a cookie
bearing that survey_id and, if it is found,
the user will not be allowed to take the
survey again. The respondent ID is
immediately blocked, so that even if the
respondent removes the cookie later on,
he or she still won’t be allowed back in. At
the time a survey is finished (complete or
termination), a cookie with the survey_id
will be placed on the user’s machine.
But cookie checks are no longer sufficient by themselves to prevent multiple
submission attempts. More advanced
identification is needed.
For a more advanced identification verification, Socratic utilizes an IP & Browser
Config Check. This is a server-level
test that is invisible to the respondent.
Whenever a person’s browser hits a Web
site, it exchanges information with the
Web server in order for the Web pages
(or survey pages) to display correctly. For
responses to all surveys, a check can be
made for multiple elements:
information about the user’s system to
the survey server. These strings are then
logged and subsequent survey attempts
are compared to determine whether exact
matches are occurring. These are examples of browser strings:
• Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; FunWebProducts; Advanced Searchbar; .NET
CLR 1.1.4322; .NET CLR 1.0.3705;
KelkooToolbar 1.0.0)
• Mozilla/4.0 (compatible; MSIE 6.1;
Windows 95/98/NT/ME/2000/XP;
10290201-SDM; SV1; .NET CLR 1.0.3)
Language Setting
Another browser-based information set
that is transmitted consists of the language settings for the user’s system.
These, too, are logged and compared to
subsequent survey attempts:
• en-us,x-ns1pG7BO_dHNh7,x-ns2U3
• en-us,en;q=0.8,en-gb;q=0.5,sv;
• zh-cn;q=1.0,zh-hk;q=0.9,zh-tw;
• en-us, ja;q=0.90, ja-jp;q=0.93
IP Address
Internal Clock Setting
The first level of validation comes from
checking the IP address of the respondent’s
computer. IP addresses are usually generated based on a tightly defined geography. So if someone is supposed to be in
California, and their IP address indicates
a China-based service, this would be
flagged as a potential cheating attempt.
Finally, the user’s computer system has
an internal time-keeping function that
continuously monitors the time of day and
date out to a number of decimal points.
Each user’s computer will vary slightly
even within the same time zone or within
the same company’s system.
Browser String
Each browser sends a great deal of
When these four measurement are taken
together, the probability of two exact settings
on all readable elements is extremely low.
sotech.com | 800-576-2728
6
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Techno-Cheaters
Images that can be “read”
by image recognition bots (as of 2013)
Some cheaters are caught because they
are trying to use technology to submit
multiple surveys. Form populators and
keystroke replicators are examples of
auto-fill technologies.
Technology for cheating
in online surveys has
proliferated over the
past 10 years and in
some areas of the world
has become a cottage
industry. However,
with the correct server
technology, Socratic
can detect the profiles
of cheating applications
and thwart them in real
time, prior to completing
a survey.
Techno-Cheaters Solutions
Total automation can be thwarted by
creating non-machine readable code keys
that are used at the beginning of a survey
to make sure a human being is responding versus a computer “bot.” We refer to
this as a Handshake Code Key Protocol.
One of the most popular Handshake
Code Key Protocols is CAPTCHA. To
prevent bots and other automated form
completers from entering our surveys, a
distorted image of a word or number can
be displayed on the start screen of all
Socratic projects. In order to gain access
to a survey, the user has to enter the
word or number shown in the image into
a text box; if the result does not match
the image, the user will not be allowed to
enter the survey. (Note: Some dispensation and use of alternative forms of code
keys are available for visually impaired
individuals.)
As computers become more and more
sophisticated in their ability to detect
patterns, the CAPTCHA distortions have
become more complex.
Images that cannot be “read”
by image recognition bots
Images adapted from Mori & Malik, ca. 2003,
Breaking a visual CAPTCHA, http://www.cs.berkeley
.edu/~mori/gimpy/gimpy. html
sotech.com | 800-576-2728
7
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Lazy Respondent Behavior
Lazy behavior is far more
prevalent as a survey
A far more common problem with survey
takers across all modes of data collection
are people who just don’t take the time
and effort to answer questions carefully.
This can result in rushed surveys or those
with replicated pattern issues.
problem than outright
There are several reasons why respondents
don’t pay attention:
cheating—primarily
• Problem 1: Just plain lazy
because it’s easier to
• Problem 2: Survey design is torturous
defeat cheaters than
●●
Too long
people who aren’t paying
●●
Boring/repetitious
attention. With new,
●●
Too difficult
more sophisticated
●●
Not enough compensation
algorithms, however,
●●
No affinity with sponsor
it is now possible to
limit the influence of
lazy respondents in
mid-survey.
But whatever the reason for lazy behavior,
the symptoms are similar, and the preventative technologies are the same.
Speeders
In the case of rushed survey respondents
(“speeders”), speed of submission can be
used to detect surveys completed too
quickly. One statistical metric that Socratic
uses is the Minimum Survey Time Threshold. By adapting a normative formula for
estimating the length of a survey based
on the number of various types of questions, one can calculate an estimated
time to completion and determine if actual time to completion is significantly lower.
This test is run at a predetermined point,
before the survey has been completed.
Based on the time since starting, the
number of closed-ended questions, and
the number of open-ended questions, a
determination will be made as to whether
the respondent has taken an adequate
amount of time to answer the questions.
If (Time < (((# of CEs * secs/CE) + (# of
OEs * secs/OE)) * 0.5)) Then FLAG.
Replicated Patterns
Another common problem caused by lazy
behavior is the appearance of patterned
answers throughout a survey (e.g., choosing the first answer for every question, or
selecting a single rating point for all attributes). These are fairly easy to detect and
the respondent can be “intercepted” in
mid-survey and asked to reconsider patterned sequences. Socratic uses Pattern
Recognition Protocols within a survey to
detect and correct these types of problems.
Here are some of the logic-based solutions
we apply for common patterning problems:
• XMas Treeing: This technique will identify those who “zig-zag” their answers
(e.g., 1, 2, 3, 4, 5, 4, 3, 2, 1, etc.)
●●
How to: When all attributes are
completed, take the absolute value of
all attribute-to-attribute differences.
If the mean value is close to 1 you
should flag them.
• Straight-Lining: This technique will
identify those who straight-line answers
to a survey (e.g., taking the first choice
on an answer set or entering 4, 4, 4, 4,
4, 4 on a matrix, etc.)
●●
How to: Subtract each attribute
(SubQuestion) from the previous
and keep a running total. When all
attributes are completed take the
absolute value. If the mean value is 0
you should flag them.
sotech.com | 800-576-2728
8
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
The majority of problems
related to data quality
can be detected before
a survey is completed.
However, a variety of
ongoing checks can add
even more assurance
that respondents are who
they claim to be and are
located in the correct
location. Panel cleaning
is necessary for long-run
viability.
Random Answers
While these Pattern Recognition Protocols
pick up many common problems, they
cannot detect random answer submission
(e.g., 1, 5, 3, 2, 5, 4, 3, 1, 1, etc.). For
this we need another type of logic: Convergent/Divergent Validity tests.
This type of test relies on the assumption
that similar questions should be answered in a similar fashion and polar opposites should receive inverse reactions.
For example, if someone strongly agrees
that a product concept is “expensive,” he
or she should not also strongly agree that
the same item is “inexpensive.” When
these types of tests are in place, the
survey designer has some flexibility to intercept a survey with “validity issues” and
request that the respondent reconsider
his or her answers.
Cross-Survey Answer Block Sequences
Occasionally, other anti-cheating/anti-lazy behavior protocols will fail to detect a
well-executed illegitimate survey. For this
purpose, Socratic also scans for repeated
sequences using a Record Comparison
Algorithm. Questionnaires are continuously scanned, record-to-record, for
major blocks of duplicated field contents
(e.g., >65% identical answer sequences).
Note: Some level of discretion will be
needed on surveys for which great similarities of opinion or homogeneity in the
target population are anticipated.
Future development is also planned to
scan open-ended comments for dupli-
cated phrases and blocks of similar text
within live surveys. Currently, this can
only be done post hoc.
Post-Survey Panel Cleaning
Post-Survey Detection
For the panels managed by Socratic
Technologies, the quality assurance
program extends beyond the sample
cleaning and mid-survey error testing.
We also continuously monitor issues that
can only be detected post hoc.
Address Verification
Every third or fourth incentive payment
should be made by check, or a notice
mailed to a physical address. If people
want their reward, they have to drop any
aliases or geographic pretext in order for
delivery to be completed, and often times
you can catch cheaters prior to distribution of an incentive. Of course, duplicated addresses, P.O. boxes, etc., are a
give-away. We also look for slight name
derivatives not usually caught by banks,
including:
• nick names (Richard Smith and Dick
Smith)
• use of initials (Richard Smith and R.
Smith)
• unusual capitalization (Richard Smith
and RiCHard SmiTH)
• small misspellings (Richard Smith and
Richerd Smith)
sotech.com | 800-576-2728
9
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Conclusion
Many features and security checks are
now available for assuring the validity of
modern online research. This includes
pre-survey panel quality, mid-survey
cheating and lazy behavior detection and
post-survey panel cleaning.
With these technologies in place, online
research can now be more highly regulated than any other form of data collection.
Not all survey bad behavior is malicious;
some is driven by poor survey design.
Some discretion will always be a requirement of survey usability:
• Writing screeners that don’t telegraph
qualification requirements
• Keeping survey length and burden to a
reasonable level
• Minimizing the difficulty of compliance
• Enhancing the engagement levels of
boring tasks
• Maximizing the communication that participation is worthwhile and appreciated
While Socratic’s techniques can flag possible cheating or lazy behavior, we believe
that the analyst should not just automatically reject interviews, but examine
marginal cases for possible validity.
sotech.com | 800-576-2728
10
CONTACT
San Francisco Headquarters
Socratic Technologies, Inc.
2505 Mariposa Street
San Francisco, CA 94110-1424
T 415-430-2200 (800-5-SOCRATIC)
Chicago Regional Office
Socratic Technologies, Inc.
211 West Wacker Drive, Suite 1500
Chicago, IL 60606-1217
T 312-727-0200 (800-5-SOCRATIC)
Contact Us
sotech.com/contact
Socratic Technologies, Incorporated, is a leader in the science of computer-based and
interactive research methods. Founded in 1994 and headquartered in San Francisco, it
is a research-based consultancy that builds proprietary, interactive tools that accelerate
and improve research methods for the study of global markets. Socratic Technologies
specializes in product development, brand articulation, and advertising research for the
business-to-business and consumer products sectors.
Registered Trademarks, Salesmarks and Copyrights
The following product and service descriptors are protected and all rights are reserved. Brand Power Rating ,
BPR , Brand Power Index , CA , Configurator Analysis , Customer Risk Quadrant Analysis , NCURA ,
ReportSafe , reSearch Engine , SABR , Site-Within-Survey , Socratic CollageBuilder , Socratic ClutterBook ,
Socratic Browser , Socratic BlurMeter , Socratic CardSort , Socratic ColorModeler , Socratic CommuniScore ,
®
Socratic Forum , Socratic CopyMarkup , Socratic Te-Scope , Socratic Perceptometer , Socratic Usability Lab ,
The Bruzzone Model , Socratic ProductExhibitor , Socratic Concept Highlighter , Socratic Site Diagnostic ,
Socratic VirtualMagazine , Socratic VisualDifferentiator , Socratic Web Boards , Socratic Web Survey 2.0,
Socratic WebComm Toolset , SSD , Socratic WebPanel Toolset , SWS 2.0, Socratic Commitment Analysis ,
Socratic WebConnect , Socratic Advocacy Driver Analysis .
TM
TM
TM
TM
TM
TM
TM
TM
SM
TM
SM
SM
TM
SM
SM
SM
SM
SM
SM
SM
SM
SM
SM
TM
SM
SM
SM
TM
TM
SM
SM
SM
SM
SM
SM
TM
TM
Socratic Technologies, Inc. © 1994–2014. Reproduction in whole or part without written permission is
prohibited. Federal law provides severe civil and criminal penalties for unauthorized duplication or use of
this material in physical or digital forms, including for internal use. ISSN 1084-2624.
sotech.com | 800-576-2728
11