the case study

Customer
Case Study
Radius Intelligence
Customer Case Study
Radius Intelligence
Benefits
• Dramatic increase in overall team effectiveness
• Improved codebase maintainability
• Focus on data science instead of performance optimizations
Summary
Radius is a Marketing Intelligence platform that processes billions of data points.
Previously, Radius used Amazon Elastic MapReduce to process data which hampered
team effectiveness, code maintainability, and ability to test new methods.
With Databricks, Radius reduced data processing time from days to hours and improved
team effectiveness tremendously.
databricks.com
Customer Case Study – Radius Intelligence
2
Business Background
Radius is a marketing intelligence platform that enables B2B marketers to acquire new
customers intelligently. By matching customer intelligence data to Radius’ weekly-updated
data set of 25 million businesses in the US, marketers can deploy targeted campaigns of
greats leads — net-new and existing opportunities — to their Salesforce.com instance.
Radius provides two specific features, Segments and Insights, that use a combination
of customer data and external data to allow CMOs to predict their future marketing and
campaign success.
For true targeted marketers, Radius offers
an end-to-end software solution that
enables marketers to discover hidden
opportunities and maximize conversion
rates that is deeply integrated with
customer data and existing platforms like
Salesforce.com.
Challenges
On a daily basis, Radius processes billions of data points from customers and external data
sources. With 25 million canonical businesses and hundreds of millions of business listings
from various sources, the process of matching these business listings together through a
series of machine learning algorithms and heuristics proved to be a daunting task. The
Radius team initially implemented the process of matching and harmonizing their data sets
in Hadoop, specifically using Amazon Web Services’ Elastic MapReduce service.
For most companies processing terabytes of data, Amazon EMR is the go-to solution, but as
Radius added more data sources that comprise its main index of 25 million businesses, it
became apparent that processing several terabytes of textual data from disparate sources
became incredibly slow and difficult to maintain. Additionally, Radius customers began
databricks.com
Customer Case Study – Radius Intelligence
3
demanding weekly data updates rather than monthly which made it clear Hadoop was only a
temporary solution. By design, Hadoop is meant to maximize throughput and minimize speed
through a distributed process workflow, but with new demands, Radius needed a solution that
maximized speed and throughput.
The codebase behind matching business listings was also growing at a monumental rate
such that maintaining machine learning algorithms in the MapReduce framework was nearly
impossible to the point that on boarding new engineers to the Radius codebase took weeks
instead of days. Radius required a solution that also offered the ability to write less verbose
code and abstract individual modules without increasing the number of Reduce steps at the
end of a specific Hadoop job.
Solution
Radius chose Apache Spark for their data processing framework to maximize throughput,
maximize speed, and maximize engineer productivity. At the core of Radius’ data processing
efforts, they use the following Spark components:
• Spark Core: Fast batch processing of specific modules for cleaning, matching, and
validating the quality of business listings
•M
Lib: Importing machine learning algorithms on the fly for quick testing and data
quality verification
•G
raphX: Visualization and understanding of the results of specific changes to the
Radius business listing index
•S
park SQL: Quick and easy access to the data from Radius’ front end services and
products
databricks.com
Customer Case Study – Radius Intelligence
4
In order to fully harness the power of Spark, Radius deployed Databricks to maintain their
Spark infrastructure and to provide additional critical data processing components on
top of Spark, including:
• Fully managed Spark clusters in the cloud that help enterprises focus on
their data and not operations
•A
n interactive workspace for exploration and visualization so teams can
learn, work and collaborate in a single, easy to use environment
•A
n extensible platform that enables organizations to connect their
existing data applications with Spark to disseminate the power of big data
The combination of these components allowed the Radius team to maximize the
throughput and speed of data processing, enabling their engineering teams to acquire
new capabilities that were not previously possible with Hadoop on Cloudera or Amazon
EMR, such as:
•B
e able to iterate running code in the Databricks interactive workspace
(as opposed to Hadoop’s batch model) and receive results in minutes or
even seconds. In contrast, doing this with Hadoop’s batch model required
them to constantly create code, jar it and run it end to end on the server
•V
isualize results of changes to the core matching technology without
having to wait an entire day to receive results
• Allow collaboration with multiple teams on larger projects
Since implementing Spark on Databricks, Radius’ core data index build now takes a few
hours compared to more than a day with a MapReduce based system. Databricks has
enabled the Radius teams to work together — data scientists and data engineers — on
difficult problems that require a combination of quality development and the scientific
method. The process of testing hypothesis now can be done in a matter of minutes and in
real-time rather than over the course of days.
databricks.com
Customer Case Study – Radius Intelligence
5
Benefits
Since implementing Spark on Databricks, Radius has seen the following
company-wide benefits:
Dramatic increase in overall team effectiveness
• Explorations by the Radius data science team now can
be completed in under an hour; prior explorations were
expected to take days.
• Team collaboration has improved dramatically as
engineering, data science, product management, quality
assurance, and the founders can see the process of
improving Radius’ core index together and in real-time.
Improved codebase maintainability
“Using Databricks allows
the whole Radius technical
team to work faster and
more efficiently, and has
improved collaboration
across multiple teams
working
on larger projects.”
– Darian Shirazi
CEO and Co-Founder,
Radius Intelligence
• Individual prototypes can be tested in a vacuum before exposed to the
entire team.
• Code can be abstracted easily into modules.
• Documented data science hypothesis and proven hypothesis can be
documented in Databricks notebooks for the entire team to understand work
together on problems.
• Code is scalable from a pure performance perspective and team perspective.
databricks.com
Customer Case Study – Radius Intelligence
6
Focus on data science instead of performance optimizations
• Data engineers previously focused on code performance
and actually working within the MapReduce framework,
now engineers can focus on cutting-edge solutions to data
problems and focus on data accuracy.
• Engineers can now work on developing new code and pass
performance issues to DevOps.
• Spark API allows for more abstraction and more engineers
to work on different problems simultaneously.
“The fact that explorations
by our data science team
now take less than an
hour, rather than days, has
fundamentally changed
how we ask questions
and visualize changes
to the index.”
– Adrian Druzgalski
CTO and Co-Founder,
Radius Intelligence
Evaluate Databricks with a trial account now.
databricks.com/registration
databricks.com
Customer Case Study – Radius Intelligence
150417
7