KPMG Location Analytics

KPMG Location Analytics
Jori van Lier
April 2, 2015
Intro: Me
Jori van Lier
[email protected]
Intro: Rest of the team
Location Analytics Overview
The gist
Measure WiFi Data
from Smartphones
(MAC addresses
and signal
strengths)
Reconstruct
Location of
Smartphone
(visitor, shopper…)
Heatmaps
Visitor counts
Dwell times
Why?
Heatmaps
■ What is the most visited
area in a location?
Store Layout
Optimization
■ Based on customer
preferences and buying
patterns, what is the best
store layout?
Visits
Capture Rate
■ How many people come to a
location and how does this
vary over locations?
■ How many people that walk
by a location actually enter
that location ?
Returning visits
Conversion
■ How often does the average
customer visit your store? Did
your latest marketing
campaign increase customer
retention?
■ How many customers that
enter the store actually buy
something?
Staffing Optimization
■ Based on visit patterns,
predict peak hours and
determine how much staff is
required where. For example:
ensure that the checkout area
is manned before the crowd
arrives.
Trends
■ Find the different trends
based on customer
behavior and make
decisions before problems
affect your sales.
Dwell Time
■ How long does an average
visitor stay inside a location?
Does the time spent convert
to sales?
Benchmarking
■ Benchmark and A/B test data
between different locations
and dates. Find out what
works and what doesn’t.
The beginning: “proof of concept” dashboard
App for employees
Standardized product: CrumbBase
Data Acquisition
Getting the data…
Data acquisition
Wi-Fi devices
continually send
WiFi probe packets
(802.11 type 0
subtype 4)
WiFi sensors have 2 antennas:
1. Monitor mode to measure
traffic on MAC layer
2. Mesh mode for communication
among sensors
A small barebones
computer:
• Aggregates the raw
data
• Hashes the MACs,
encrypts the rest
• Filters for “opt out”
• Forwards data to TTP
Anonymization & opt out process
No single party has all the information to
extract personal information
Big Data Platform
An enabler to analyze the data…
KAVE: KPMG Analytics & Visualization Environment
The Overview
 Horizontally Scalable
 Open Source
 Configurable
 Modular
 Secure
The Implementation
 Remote (or local) hosting
 Dedicated hardware
 Virtualized system
 Secure internal network
Exploratory data science
• Most of our analyses start off with a dataset, a “hunch”, and lots of
plots…
• Tooling: Python data stack (numpy, scipy, pandas, matplotlib, scikitlearn, iPython notebook)
Storm, for real-time analysis
• Fully developed analyses with a real-time character go into Storm
• Storm is a distributed real-time computation system
• Used by Twitter, Groupon, Flipboard, Yelp, Baidu…
Bolt
Spout
Spark, for batch analysis
• Spark on Hadoop for Batch processing
• From the Hadoop Ecosystem, we only use the Hadoop resource
manager (YARN) and the Hadoop Distributed File System (HDFS).
• Hadoop MapReduce = Slow, Spark = Fast! (In-memory)
• Scala, Python or Java
• Awesome functional programming style:
file = spark.textFile("hdfs://...")
file.flatMap(lambda line: line.split())
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a+b)
Word count in Spark's Python API
Trilateration
From dBm measurements to X and Y coordinates…
Friis free space transmission equation
Looking for the intersection
Analytics
Now that we have X and Y coordinates we can turn the data into actionable
insights
Example Storm analysis
Our “WiFi Orientation Engine” Storm topology
Data
Acquisition
Server
WifiDrpcSpout
WifiDrpcDecryptSplit
WifiDrpcMonitor
WifiNormalization
Trusted Third
Party
DRPC
Server
VisitAnalysisBolt
WifiTrilaterat
ionFitter
regionVisitAnalysisBolt
Mongo
Persistence
Bolt
MongoDB
Hadoop
Persistence
Bolt
Hadoop
HDFS
dwellTimeAnalysisBolt
dwellTimeDailyAnalysisBolt
heatmapProducerBolt
visitAnalysisBolt: incoming data
public final void execute(final Tuple tuple) {
[…]
if (sourceComponent.equals("wifiNormalization")) {
type = TupleType.SEEN;
} else if (sourceComponent.equals(locationSourceBoltId)) {
type = TupleType.VISIT;
}
[…]
Event event = (Event) tuple.getValueByField("event");
addTupleToBucket(type, event.getTimestamp(), event.getSourceMac());
[…]
}
There’s two buckets: 1
for raw data and 1
fitted data.
Add incoming tuples
to corresponding
bucket.
visitAnalysisBolt: outgoing data
(daily cumulative visitor counter)
Every minute, Storm triggers a “tick” tuple which is a signal to emit data for that minute.
private void emitDailyEvent(final Set<String> seenDevicesTotal, final Set<String>
visitDevicesTotal, final Long timestamp) {
int seenTotal = seenDevicesTotal.size();
int visitTotal = visitDevicesTotal.size();
int walkByTotal = seenTotal - visitTotal;
VisitDailyEvent visitDailyEvent = new VisitDailyEvent();
visitDailyEvent.setVisit(visitTotal, calcUncertainty(visitTotal));
visitDailyEvent.setWalkBy(walkByTotal, calcUncertainty(walkByTotal));
visitDailyEvent.setCaptureRate(seenTotal != 0 ?
Math.round((float) visitTotal / (float) seenTotal * 100.) : 0);
visitDailyEvent.setMeasurementTimestamp(new Date(timestamp));
visitDailyEvent.setApplication(application);
visitDailyEvent.setLayer(RESLAYER_DAILY);
}
this.outputCollector.emit(visitDailyEvent.toValues());
This event is picked up by the MongoPersistenceBolt which stores it into MongoDB
VisitAnalysisBolt: Result in MongoDB
{
}
"_id" : ObjectId("551c1558e4b0583e96bf0c07"),
"version" : 2,
"processingTimestamp" : ISODate("2015-04-01T15:57:03.454Z"),
"measurementTimestamp" : ISODate("2015-04-01T15:54:00Z"),
"value" : {
"visit" : {
"value" : 1114,
"error" : 33
},
"walkBy" : {
"value" : 1901,
"error" : 44
},
"captureRate" : 37
},
"history" : [
"Persisting"
]
This is emitted every minute. The counters reset at 0:00 local time.
Cumulative daily visitor count plot
(and calibration…)
Our approach & PoC results
Our approach (1/2)
• Scrum: an agile, iterative approach
• The business prioritizes our backlog. We develop analyses and
present the results in (bi-)weekly product demos.
Our approach (2/2)
• Once metrics have been developed and baselines have been set:
Experiments and A/B testing!
E.g.: Change something in the storefront window and determine if
more visitors came in than before (and if the difference is statistically
significant).
• If so: keep doing that!
• If not: stop wasting effort (money) on that activity!
This is similar to what websites have been doing all along to determine
the best layout. Brick ‘n mortar stores can start doing this as well now.
Results (1/3)
• Proof of Concept: Solution rolled out to a large retailer
• Done:
• 2 months stabilizing system (Dec/Jan)
• 2 months developing new / custom metrics (Feb/March)
• Next 2 months: experiments:
1. If we proactively inform visitors which areas are quiet, does this lead to less
congestion?
• Metric: Occupancy spread more evenly
2. If we send employees to the checkout area before the crowd arrives, based
on a short-term queue time prediction, will we see a reduction in queuing
time?
• Metric: Lower Dwell Time
3. If we offer an incentive to visit, will we see a larger % of people entering?
• Metric: Higher Capture Rate
Results (2/3)
Results (3/3)
Thanks!
Jori van Lier
[email protected]