Customer Case Study Radius Intelligence Customer Case Study Radius Intelligence Benefits • Dramatic increase in overall team effectiveness • Improved codebase maintainability • Focus on data science instead of performance optimizations Summary Radius is a Marketing Intelligence platform that processes billions of data points. Previously, Radius used Amazon Elastic MapReduce to process data which hampered team effectiveness, code maintainability, and ability to test new methods. With Databricks, Radius reduced data processing time from days to hours and improved team effectiveness tremendously. databricks.com Customer Case Study – Radius Intelligence 2 Business Background Radius is a marketing intelligence platform that enables B2B marketers to acquire new customers intelligently. By matching customer intelligence data to Radius’ weekly-updated data set of 25 million businesses in the US, marketers can deploy targeted campaigns of greats leads — net-new and existing opportunities — to their Salesforce.com instance. Radius provides two specific features, Segments and Insights, that use a combination of customer data and external data to allow CMOs to predict their future marketing and campaign success. For true targeted marketers, Radius offers an end-to-end software solution that enables marketers to discover hidden opportunities and maximize conversion rates that is deeply integrated with customer data and existing platforms like Salesforce.com. Challenges On a daily basis, Radius processes billions of data points from customers and external data sources. With 25 million canonical businesses and hundreds of millions of business listings from various sources, the process of matching these business listings together through a series of machine learning algorithms and heuristics proved to be a daunting task. The Radius team initially implemented the process of matching and harmonizing their data sets in Hadoop, specifically using Amazon Web Services’ Elastic MapReduce service. For most companies processing terabytes of data, Amazon EMR is the go-to solution, but as Radius added more data sources that comprise its main index of 25 million businesses, it became apparent that processing several terabytes of textual data from disparate sources became incredibly slow and difficult to maintain. Additionally, Radius customers began databricks.com Customer Case Study – Radius Intelligence 3 demanding weekly data updates rather than monthly which made it clear Hadoop was only a temporary solution. By design, Hadoop is meant to maximize throughput and minimize speed through a distributed process workflow, but with new demands, Radius needed a solution that maximized speed and throughput. The codebase behind matching business listings was also growing at a monumental rate such that maintaining machine learning algorithms in the MapReduce framework was nearly impossible to the point that on boarding new engineers to the Radius codebase took weeks instead of days. Radius required a solution that also offered the ability to write less verbose code and abstract individual modules without increasing the number of Reduce steps at the end of a specific Hadoop job. Solution Radius chose Apache Spark for their data processing framework to maximize throughput, maximize speed, and maximize engineer productivity. At the core of Radius’ data processing efforts, they use the following Spark components: • Spark Core: Fast batch processing of specific modules for cleaning, matching, and validating the quality of business listings •M Lib: Importing machine learning algorithms on the fly for quick testing and data quality verification •G raphX: Visualization and understanding of the results of specific changes to the Radius business listing index •S park SQL: Quick and easy access to the data from Radius’ front end services and products databricks.com Customer Case Study – Radius Intelligence 4 In order to fully harness the power of Spark, Radius deployed Databricks to maintain their Spark infrastructure and to provide additional critical data processing components on top of Spark, including: • Fully managed Spark clusters in the cloud that help enterprises focus on their data and not operations •A n interactive workspace for exploration and visualization so teams can learn, work and collaborate in a single, easy to use environment •A n extensible platform that enables organizations to connect their existing data applications with Spark to disseminate the power of big data The combination of these components allowed the Radius team to maximize the throughput and speed of data processing, enabling their engineering teams to acquire new capabilities that were not previously possible with Hadoop on Cloudera or Amazon EMR, such as: •B e able to iterate running code in the Databricks interactive workspace (as opposed to Hadoop’s batch model) and receive results in minutes or even seconds. In contrast, doing this with Hadoop’s batch model required them to constantly create code, jar it and run it end to end on the server •V isualize results of changes to the core matching technology without having to wait an entire day to receive results • Allow collaboration with multiple teams on larger projects Since implementing Spark on Databricks, Radius’ core data index build now takes a few hours compared to more than a day with a MapReduce based system. Databricks has enabled the Radius teams to work together — data scientists and data engineers — on difficult problems that require a combination of quality development and the scientific method. The process of testing hypothesis now can be done in a matter of minutes and in real-time rather than over the course of days. databricks.com Customer Case Study – Radius Intelligence 5 Benefits Since implementing Spark on Databricks, Radius has seen the following company-wide benefits: Dramatic increase in overall team effectiveness • Explorations by the Radius data science team now can be completed in under an hour; prior explorations were expected to take days. • Team collaboration has improved dramatically as engineering, data science, product management, quality assurance, and the founders can see the process of improving Radius’ core index together and in real-time. Improved codebase maintainability “Using Databricks allows the whole Radius technical team to work faster and more efficiently, and has improved collaboration across multiple teams working on larger projects.” – Darian Shirazi CEO and Co-Founder, Radius Intelligence • Individual prototypes can be tested in a vacuum before exposed to the entire team. • Code can be abstracted easily into modules. • Documented data science hypothesis and proven hypothesis can be documented in Databricks notebooks for the entire team to understand work together on problems. • Code is scalable from a pure performance perspective and team perspective. databricks.com Customer Case Study – Radius Intelligence 6 Focus on data science instead of performance optimizations • Data engineers previously focused on code performance and actually working within the MapReduce framework, now engineers can focus on cutting-edge solutions to data problems and focus on data accuracy. • Engineers can now work on developing new code and pass performance issues to DevOps. • Spark API allows for more abstraction and more engineers to work on different problems simultaneously. “The fact that explorations by our data science team now take less than an hour, rather than days, has fundamentally changed how we ask questions and visualize changes to the index.” – Adrian Druzgalski CTO and Co-Founder, Radius Intelligence Evaluate Databricks with a trial account now. databricks.com/registration databricks.com Customer Case Study – Radius Intelligence 150417 7
© Copyright 2024