How to Achieve a Single Customer View

How to Achieve a Single Customer View
1.0 Introduction
Clients want to obtain a Single Customer View of their contact database/CRM system to let them
understand the types of individuals/businesses that they are dealing with. Contact data is held on a
variety of databases, the quality of the data entered will vary and as we have seen from experience,
customers may appear in more than one database leading to multiple communications and wasted
money; or alternatively they may have moved address, again wasting communications.
Not only is there a cost for holding all this incorrect data it also leads to a distorted picture of the
members on the database e.g. if demographic profiling is used the results will show too many
customers and possibly skew the profiles.
2.0 The Business Case
Companies need to know how many customers/supporters/members they have, they want to
cleanse the data to save money on any marketing campaigns. Importantly they also need to be
assured that they are not in breach of the Data Protection Act for, ‘Personal data shall be accurate
and, where necessary, kept up to date’. The cost of handling returns and customer complaints can
also be reduced.
3.0 The Problem Statement
Companies have many databases containing details of customers/supporters/members. The chances
are that people could appear in more than one database and in some cases all of them. The task is to
merge the databases to present a unified view. Once the initial merge is completed then it is
important to implement a process to maintain the database by regularly cleansing the data.
Step 1
Agreeing Rules
A set of rules and processes need to be agreed on how to consolidate certain records and identify
others for further examination. This is to ensure that important records are kept and only to archive
duplicate or inaccurate data.
Once the specification are agreed and signed off, the rules and techniques detailed below can be
applied to create a programme to speed up the identification and merging of duplicate records.
Step2
Data cleansing and validation rules.
It is only possible to move onto de-duplication once the data cleansing is completed, prior to this the
addresses would be in a non standard format and it would be difficult to find any matches at all.
Standardising the data maximises the chances of finding a match.
How to Obtain a Single Customer View
Page 1
The cleansing procedures:
2.1 PAF Cleansing

This matches addresses on the databases against the Postcode Address File (PAF) from
the Royal Mail, automatically correcting errors where possible.
2.2 Telephone Validation

This initial validation checks if a number is callable based on factors such as, number of
digits and dialing codes etc. If required there are more detailed validation methods to
confirm that there is a live line.
2.3 Email Validation

Initially this will be validated at server level. This means that data8 checked that the mail
server for the domain is alive. Again there are more detailed checks that can be applied.
The levels of validation are chosen dependent on what the client wishes to achieve and the rules
agreed.
Step 3
De-duplication
Once the data is cleansed it is possible to look for duplicates, in order to identify duplicates certain
rules and assumptions needed to be made, based on this, bespoke algorithms are designed to
produce the required results.
Candidate Retrieval
This algorithm works by considering each record in the database and finding any suitable candidates
to be considered as duplicates with the currently considered record. The candidate retrieval process
can be:
a)
Records with the same Name and Address (where populated)
b)
Records with the same Name and Date of Birth (where D of B populated)
c)
Records with same Name and Email (where email populated)
The process then becomes recursive and each new candidate gets considered to build a full possible
chain of records that are similar and may be considered as duplicates.
How to Obtain a Single Customer View
Page 2
Fuzzy Matching & Formatting
The quality of data capture across the databases will vary, therefore it is necessary to develop a
custom set of fuzzy logic matching. See table below.
Field
Allowable Difference
DOB:
Differs by one corresponding digit
1st of the month date to date with same month and
year
US date format to UK. i.e. MM/DD/YYYY to
DD/MM/YYYY
Address:
Data8 PAF matching
Email:
Differs by one character
Telephone:
Data8 Telephone Formatting
Just using exact matching would allow
duplicates on the database, small
differences in the matching routine can be
allowed. For example is John Smith the
same as J Smith, if they are, the records
should be merged. Exact matching would
identify them as two different records,
doubling the marketing communications
that John Smith would receive.
Identification of Duplicates
data8 then create a candidate list of potentially similar records, and develops a set of rules which are
used to break the list into three categories



SAME - Records identified as the same and therefore merged
MANUAL - Records identified as possibly the same but a significant difference indicates a
manual check must be performed.
UNCHANGED - The remainder rejected from the duplication process and returned
unchanged.
The rules with which data8 allocated candidates vary here is an example of a recent de-duplication
exercise:
1)
SAME
The following details must match all records in this chain:




2)
Name must be the same (i.e. the name has to be populated and match at initial level)
Non conflicting Address (Allow blanks)
Non conflicting Date of Birth (Allow blanks)
Over18 flag must be the same (Allow blanks)
MANUAL
In the following situations, all records that meet the below criteria were put into the manual
investigation pot:


Name must be the same
Addresses will conflict (Not blank)
How to Obtain a Single Customer View
Page 3

Non conflicting Date of Birth (Allow blanks)
OR
Name must be the same


Non conflicting address (Allow blanks)
Conflicting Date of Birth (Not blank)
3)
UNCHANGED
Any records that are not matched in any of the above criteria were put into the ‘Unchanged’ pot and
returned without merging.
Step 4
Merge Rules
Once candidates are identified as members of the ‘SAME’ duplicate set, then rules to merge
fields have to be defined. What is illustrated is part of the merge priority of the fields.
Taking the merging of the ‘Firstname’ field as an example.
‘Firstname’
John
BLANK
Jon
BLANK
Jonathon
Jon
BLANK
This example assumes that the surname and address appears seven
times in the databases, however the first name associated with
surname and address varies and in some cases it is blank. The choice
therefore is to select from John, Jon or Jonathon, blank values are
ignored. The value to take would be Jon using the ‘Winner Takes All’
rule, it appears twice while John and Jonathon appear only once.
Step 5
Output Files
Two output files are generated to show merged record details and a further file where there were
possible duplicates requiring manual investigation. After a manual intervention any duplicates are
removed and the files merged.
1. The first output file contained the resultant merged and unchanged records based on the
previous rules. A column will also show any related records which contributed to the merge
process.
2. Another output file is created which shows all records merged and any associated records.
How to Obtain a Single Customer View
Page 4
3. A third output file shows records which were identified for manual investigation. This
includes a field showing all related records .
After checking the files requiring a manual check the data can be then all brought together as one
database.
Step 6
Goneaways, Movers and Deceased
Once all the data is one database it is then time to identify those people who have gone away and
where possible find an alternative address i.e. movers. Also it is important to identify those who
have died to avoid the embarrassment of sending out a marketing communication.
Step 7
Maintaining the data
In the longer term the data should be cleansed very couple of months, the costs will be minimal as
most of it is already correct. Alternatively by using web api it is possible to validate the contact data,
in real time, as it flows into the company, confirming address, telephone number and email as it is
input on a website or any business application including POS.
How to Obtain a Single Customer View
Page 5