Case Study
Customer Case Study
• Increased the amount of ad-hoc analysis done six-fold, leading
to better informed product design and quicker issue detection
and resolution.
• Reduced the load on the analytics engineering team by
expanding access to the number of people able to work with
the data directly by a factor of four.
• Increased collaboration and improved reproducibility and
repeatability of analyses.
• Reduced the cost of cloud infrastructure through faster and
easier management of Spark clusters.
• Celtra relied on data analytics to inform product design, troubleshoot anomalies, and
fine-tune the performance of its display advertising software platform capabilities.
• Celtra encountered difficulties in meeting the rising demand for data analysis due to the
large scale of the data, diversity of data sources, and small size of the analytics team.
• Celtra selected Databricks as their data processing platform; enabling teams from
Engineering, Product Management, and QA to directly work with data and perform the
required analysis.
Business Background
Celtra provides agencies, media suppliers and brand leaders alike with an integrated,
scalable HTML5 technology for brand advertising on smartphones, tablets and desktop.
The platform, AdCreator 4, gives clients such as MEC, Kargo,
Pepsi and Macy’s the ability to easily create, manage, and traffic
sophisticated data-driven dynamic ads, optimize them on the
go, and track their performance with insightful analytics.
A wide variety of data is collected by Celtra, including data related to internal company
processes, data based on the usage of the product by clients and, most importantly,
data focused on the engagements of consumers with their clients’ ads. In addition to
providing analytics to its clients, Celtra is constantly exploring new ways to leverage this
gathered information to improve their offering, for example:
roduct usage analysis: Analyzing feature adoption, usage patterns and
support cases to direct further development focus.
• Environment analysis: Assessing the feasibility of new product concepts and
detecting trends by analyzing the context in which Celtra’s ads run, such as
the publisher and device of choice.
echnical performance: Monitoring load times of ads closely across multiple
dimensions i.e. ad complexity, geography, connectivity and CDNs. Most
recently, Celtra has been evaluating the performance benefits of SPDY and
HTTP/2 for improved page load times.
uality Control: Computing key performance metrics to detect issues at
deployments, enabling the automatic detection of anomalies to detect
regressions that would otherwise get lost in the averages.
As Celtra’s business grew, it was challenged to meet the corresponding increase in demand
for analytics due to three reasons:
1. D
iversity of data sources: The production and engineering data from Celtra’s
systems are scattered in different locations. Celtra did not have an easy way to
combine the data from these disparate data sources and perform the necessary
analysis in a single analytics platform.
2. L
arge scale of data: Celtra’s production systems generate tens of terabytes
data monthly. While Celtra has been using Spark as its data processing platform
since its early days and accumulating lots of expertise, this knowledge was
limited to the team working on the analytics portion of the product.
3. S
mall analytics team: The analytics team consisted of only four people, who
quickly became the bottleneck to service requests from Product Management,
Engineering, and QA.
To overcome these challenges, Celtra needed a powerful data platform that was capable of
integrating data from disparate data sources while being fast enough to support interactive
analysis at terabyte scale. This platform must also be user-friendly enough to empower
teams outside of analytics to perform analysis themselves, and to remove the bottleneck
created by their small analytics team.
Celtra adopted Databricks as their centralized analytics platform because the key features
in Databricks could easily address all of Celtra’s needs:
ero management Apache Spark: Spark is an open source big data processing
framework that was built for speed and scale. Databricks made Spark much
easier to deploy by combining the power of Spark with a zero-management
hosted platform on Amazon Web Services (AWS), allowing Celtra to take
advantage of Spark without the DevOps burdens typically associated with big
data infrastructure.
eamless connection to diverse data sources: Databricks provided built-in
APIs to access data from AWS S3 and relational databases. Since the full power
of Scala is available in Databricks, data from various web service APIs could be
accessed as well. Celtra could seamlessly connect its data by consolidating the
disparate sources in Databricks.
euse of production code in ad-hoc analyses: Since Databricks is based
on Apache Spark, similar to Celtra’s production analytics pipeline, a lot of
production code could be reused as the foundation for ad-hoc analyses instead
of rewriting code in another framework.
ser-friendly interactive workspace: Databricks included an intuitive, multiuser interactive workspace for real-time analysis and visualization, enabling
teams other than analytics, to work with data directly in a single, easy to use
With the adoption of Databricks, Celtra has enabled teams from Engineering, Product
Management, and QA to perform complex data analysis on their own, leveraging the
massive production data to improve product design, address anomalies rapidly, and finetune the performance of production systems.
The most important benefit Celtra gained from deploying Databricks, is the ability to
remove the bottleneck within its analytics team to meet the surging demand for big data
analysis across the company. Since its introduction, Databricks has been broadly utilized
by over a third of the technical staff in Engineering, QA and Product Management. As a
result of empowering them to work with the data directly, many more questions have
been asked and hypotheses tested, leading to better informed product design and quicker
issue detection and resolution. Celtra has increased the amount of analyses done and
insights obtained by six-times in the first four months after adopting Databricks alone, and
increased the number of people working with our most valuable data by fourfold.
Aside from dramatically boosting the amount of analytics done, Celtra also experienced
two additional benefits from using Databricks:
• Improved collaboration and reproducibility:
The self-documenting nature of notebooks in
Databricks meant that ad-hoc analysis code was
automatically stored in a centralized location.
This feature encouraged teams to leverage the
existing codebase instead of duplicating past
efforts in writing new code, eventually leading to
a maintainable collective codebase for ad-hoc
analysis. Additionally, by having all work stored by
default, past results could be easily reproduced in
cases where additional insight was needed.
“The notebooks feature in
Databricks encourages good
documentation by automatically
recording the code written during
an ad hoc analysis session. This
has had profound effects for us,
from increasing collaboration
and improving reproducibility
to making analysis more
approachable to a wider
audience, who can start off by
cloning someone else’s research.”
• Reduced cloud infrastructure cost: The
faster and easier provisioning, resizing, and
– Jaka Jančar
Chief Technology Officer at Celtra
deprovisioning of Spark clusters made Celtra engineers more comfortable with
shutting down unused clusters whenever possible. Agility in cluster management
also facilitated the use of Spot Instances by making its use less risky. When
combined with the “Jobs” feature of Databricks, Celtra was able to substantially
reduce the cost of its cloud infrastructure by scheduling long-running jobs that
automatically provision and deprovision clusters as needed.
“Databricks is used by over a third of our technical staff — from engineering to product management — to help
us make smart, data-driven decisions; After implementation, the amount of analysis performed has increased
sixfold, meaning more questions are being asked, more hypotheses tested.”
– Jaka Jančar
Chief Technology Officer at Celtra
