Apache Hadoop Innovation Summit

Apache Hadoop
Innovation Summit
Don’t Be Afraid of the Elephant in the Room
February 12 & 13, 2015
Westin San Diego, San Diego, CA
#Hadoop15
Confirmed Speakers
Confirmed Speakers
• Enterprise Engineer, Google
• Big Data Engineer, Groupon
• Senior Director, Data Solutions, The New York Times
• Director, Consumer Science Engineering, Netflix
• Lead Research Scientist, eBay
• Director, Data Engineering, Wikia
• Data Scientist, Live Nation
• Software Architect, AOL
• Manager, Business Analytics, LinkedIn
• Enterprise Architect, Art.com
• Director, Big Data, Sears
• Data Informatics Leader, GE
• Engineering Lead, Twitter
• Engineering Manager, Etsy
• Senior Director, Data Management, Time Warner Cabel
• Principal Architect, Schneider Electric
• Data Architect, Simmons Prepared Foods
•Vice President, Data Platforms, ESPN
• Architect, Salesforce.com
Who Will You Meet?
There is no question that IE. provides the
gold standard events in the industry and will
connect you with decision makers within the
analytics industry. You will be meeting
s e n i o r l ev e l ex e c u t i v e s f ro m m a j o r
corporations and innovative small to
medium size companies.
Company Size Of Attendees
1000+ Employees
300-999 Employees
50-299 Employees
Less than 49 Employees
56%
81%
Job Title Of Attendees
78%
Attendees are at Director
level or above
3%
21%
President
/Principal
SVP/VP
12%
C-Level
42%
Snr. Director
/Director
25%
Attendees are
companies with at
least 300
employees
13%
Global Head
/ Head
8%
Snr. Manager
/Manager
11%
8%
Academic (1%)
Past Delegates include
•
•
•
•
•
•
Director, Analytics - Facebook
Director, Insight - Red Bull
Vice President - Google
Senior Director - Coca-Cola
Data Engineer - Blizzard Entertainment
Senior Vice President - Samsung
About The Summit
In the cutting edge market of Big Data, modern businesses
are faced with the challenge of storage, management,
analysis, visualization and security. New technologies,
solutions and challenges are exploding outwards as Big
Data continues to grow exponentially. Hadoop, a huge piece of the puzzle, continues to present
both exciting opportunities and engineering challenges.
Can you become cloud native? What new alternative
paradigms are available with Hadoop? What are the
limitations of sole Hadoop use? How can you use it for
machine learning. What about Integration? Corporate
Accessibility? Ethics? These burning issues are what the
summit looks to address.
The Apache Hadoop Innovation Summit is an industry-led
event. In principle, this means that attendees are working
in engineering, architectural and data science roles. In
practice, this means less sales pitches and more in-depth
discussion on what like-minded professionals are doing
with their Big Data.
Confirmed Speaker Information
Sriram Krishnan
Big Data, Cloud, Distributed
Systems Engineering Leader
Twitter
Sriram is an Engineering Manager on the Data Platform
team at Twitter, where he leads a fantastic group of
engineers building core big data processing frameworks
such as Summingbird, Scalding, Spark, and Parquet. Prior
to that, he was the tech lead of the Big Data Platform
team at Netflix, where he built and open sourced Genie,
which is Netflix’s Hadoop Platform as a Service. Sriram has
a Ph.D. in Computer Science from Indiana University, and
spent several years at the San Diego Supercomputer
Center working on advanced cyberinfrastructures for
science and engineering applications.
Gopal Krishnan
Director, Consumer Science
Engineering
Netflix
Gopal Krishnan is Director of Consumer Science
Engineering at Netflix. He leads many aspects of the AB
testing innovation to help personalize and improve Netflix
experience. Previously, he spent over a decade at Yahoo
on high scale infrastructure including building the first the
global Yahoo homepage.
Data Platform at Twitter - Enabling Realtime & Batch Analytics at Scale
The data platform at Twitter supports engineers and data
scientists running batch jobs on Hadoop clusters that are
several 1000s of nodes, and real-time jobs on top of
systems such as Storm. In this presentation, I will discuss
the overall data platform stack at Twitter. In particular, I
will talk about Scalding, which is a Scala DSL for batch
jobs using MapReduce, Summingbird, which is a
framework for combined real-time and batch processing,
and Tsar, which is a framework for real-time time-series
aggregations. I will also discuss our experience with Spark,
and where it fits in the overall ecosystem.
Data Platform at Twitter - Enabling Realtime & Batch Analytics at Scale
Netflix is renowned for it’s use of big data to improve
personalization for our members.
Previously, our
personalization depended only on explicit user inputs like
star ratings, taste preference, plays, etc. We recently
incorporated additional implicit user signals such as
interactions on device like scrolling, navigation, and idle
time. This session will focus on the challenges of using
these new high volume data sources with billions of
events/day. What are the challenges of maintaining data
quality across hundreds of device types? How do we
scale efficient nearline systems to serve this data for
algorithmic consumption close to real time?
Arek Kaczmarek
Senior Director, Platform & Data
Solutions
The New York Times
Arek Kaczmarek is responsible for the company's data
platform and implementation of a new data platform
based on Big Data technologies. He previously worked at
Intel, as a Senior Big Data Solutions Architect at the Data
Center Group. His skills include among others knowledge
on the Big Data ecosystem, Hadoop/Hive/Pig, NoSQL, ELK
(ElasticSearch/Logstash/Kibana), Lambda architecture,
Oracle, data warehousing, ETL, BI Analytics, systems
architecture, PaaS and the cloud
Thanigai Vellore
Enterprise Architect
Art.com
Thanigai is an enterprise architect, technologist and
innovator with over 14 years of progressive experience
specializing in building large, highly scalable software
systems. At Art.com, Thanigai is the lead architect
responsible for defining and driving the technology
roadmap initiatives for building the next generation
technology vision and platform for the company. Thanigai’s
interests and specialties include Hadoop/Big data, NoSQL,
Distributed Systems, Enterprise Architecture, Scalability,
etc. Prior to joining Art.com, Thanigai has worked in
engineering roles at Sanmina and Flextronics.
Michael Lurye
Senior Director, Enterprise Data
Management
Time Warner Cable
Mike Lurye is Senior Director, Enterprise Data
Management for Time Warner Cable. He and his team are
responsible for shared data warehousing assets and
functions that benefit multiple Business Intelligence (BI)
teams and their customers. This includes creation of
enterprise data assets, BI architecture, quality assurance,
and data quality management. In addition, Mike and his
team are responsible for evaluation and adoption of Big
Data technologies. Prior to joining TWC Mike held Product
Management and Product Marketing positions with
Amdocs, focused on decision automation, mobile content
and personalization solutions. Mike’s prior experience
includes senior roles at major analytical CRM & marketing
services companies.
The Next Enterprise Data Warehouse is a
Hadoop Data Lake
As the data volumes and data generation velocity start
growing, so does the value of all the enterprise data being
generated. At the New York Times, we have moved away
from the traditional Enterprise Data Warehouse based on
dimensional modeling and created a data lake where the
time to market for data solutions and applications is much
faster and much more robust than it ever was before. This presentation will provide an overview of the data lake
approach, how to get there, and why it makes sense for
companies with growing data volumes. The discussion will
focus on cost, architecture, and time to market solutions. Leveraging Hadoop in Polyglot
Architectures
At art.com, we have a heterogeneous web stack (java,
node.js and .net) to support our global brands and
multiple websites. In this session, I will share our
experience in leveraging the power of Hadoop to reach
multiple business goals. The talk will also focus on the tools
that help in addressing concerns related to polyglot
architectures such as interoperability, multi-tenancy,
schema evolution and standardization. I will also talk
about some frameworks and packages that help in
codifying best patterns and practices in integrating
Hadoop with other systems such as traditional Business
Intelligence systems, Web Analytics and other distributed
computing technologies like Apache Spark.
Offloading ELT Workloads to Hadoop Time Warner Cable’s Journey
Shifting ELT workloads from the enterprise data
warehouse (EDW) to Hadoop is gaining traction for
reducing costs, incorporating new data faster, and freeing
up EDW capacity for user-facing analytics and BI
workloads. But, where do you start and what’s the best
approach? This presentation outlines the framework and
processes that Time Warner Cable used to:
· Evaluate potential use cases and architectural options
for Hadoop
· Identify ELT offload as the first focus area
· Choose technology components for the next generation
enterprise data integration solution
· Apply best practices to configure Hadoop environment
for data integration
Weidong Zhang
Manager, Business Analytics
LinkedIn
Weidong Zhang earned his Ph.D in Computation fluid
dynamics. He has a nature and passion of the analytics,
research and data driven decision-making. He spent 10+
years in the data warehouse field, and tends to leverage
his knowledge with the business intelligence and the
Hadoop massive data process capability to address
business needs. Currently, he worked as a manager in
Data Analytics Infrastructure team in LinkedIn and leads
the marketing and customer service data warehouse
vertical.
Nazali Dereli
Data Scientist
Live Nation
Nazli Dereli is currently a data scientist in Live
Nation Userscoring team. She is working on
realtime classification of users and detection of abusive
actors that are stopping users from buying tickets by
holding the tickets. Before joining Live Nation, she
was working in Data Mining and Bioinformatics Lab in
University of California, Santa Barbara focusing on mining
brain activity networks to discover insights on human
learning. Her interests include social and
biological network analysis, and interesting problems on
data and graph mining.
Beena Ammanath
Data Informatics Leader
GE
Beena is the Data Science Informatics Leader at GE. She
leads the data efforts to support data science at GE. She
works across the GE businesses to drive advanced
analytics development leveraging big data technologies.
She is passionate about data and analytics to aid cross
functional teams to derive data insights, aid teams in
articulating questions they did not know they had and help
view data in more effective ways. Beena has over 20
years’ experience in the data arena with a number of
international organizations including British Telecom,
E*trade and Thomson Reuters. She holds a Masters in
Computer Science and MBA in Finance.
Releasing the Power of Hadoop
As a data driven company, LinkedIn has very strong
analytical teams, and has many data engineers, data
scientists, business analysts and business users, who focus
on different domains and business of the company. These
users have different kind usage types and needs. Making
them more productive and efficient is the key point to
make the company success. This talk covers the ecosystem
our Data Analytics Infrastructure (DAI) team built, which
release the power of Hadoop and make it easy to use. This
ecosystem contains several open sourced products, such
as: Pinot, Cubert, and Gobblin(, for fast computation and
real time reporting support), and some tools to automatic
reports generations. I will also cover the roles of our data
warehouse team and our mission.
Detecting Abusive Actors in Hadoop
Ecosystem
Live Nation is the global event ticketing leader with
400,000,000 tickets sold and 180,000 events ticketed in
19 countries. However there is always the threat of
growing multibillion dollars secondary market that intends
to prevent users from buying primary tickets. This talk will
explain how to detect such abusive actors in Hadoop
ecosystem using different approaches from offline,
semionline and online learning. We will go over the
process of building our system starting with different
Hadoopbased approaches leading to our final decision to
use Apache Storm for realtime classification built on top
of Hadoop ecosystem.
Making Hadoop Relevant for the Industrial
Internet
Data management and advanced analytics are core to
GE’s recent success in delivering superior software-based
services to customers across aviation, power generation,
oil & gas, healthcare, and transportation. The torrent of
data generated from machines, networks, devices and
data centers in industry verticals provide challenges and
opportunities. The challenge is to make this machine data
meaningful and actionable to deliver on opportunities
around operational efficiencies. I will share real-world
case studies, leveraging Hadoop to demonstrate tangible
operational benefits - ranging from fuel savings to
improving productivity to reducing unscheduled
maintenance to enhancing on-time performance - by
tightly integrating machines, networked sensors,
industrial-strength data, and software to enable
intelligent insights and affect measurable outcomes.
Ben Jackson
Software Engineer
AOL
As an engineering leader, Ben am as comfortable with strategy documents and presentations as I am deep in
the code. He uses his understanding of the bigger picture to make the best tactical choices for his team in an
agile environment. Bens specialties include: technical writing, hadoop, SaaS applications, big data, parallel
algorithms, distributed computing, high performance computing
Ranjan Sinha
Lead Research Scientist
eBay, inc.
Ranjan Sinha is a Lead Data Scientist at eBay Inc. where he has led projects that significantly enhanced
consumers’ shopping experiences. Previously, Dr. Sinha was a research academic at the University of Melbourne and holds a PHD in Computer Science from RMIT University, Australia. He has over 25 publications
in top-tier venues such as IEEE Big Data, VLDB Journal, and ACM SIGMOD. He was awarded the Sort
Benchmark medals for JouleSort and PennySort and was amongst WSJ’s Top-12 Asia-Pacific Young
Inventors. He is a regular speaker on Big Data and Data Science and co-organizes the popular Bay Area
Search Meetup. Ameya Kantikar
Big Data Infrastructure Engineer
Groupon
Ameya is a lead engineer on Groupon’s deal relevance and personalization system working on big data
technologies such as Hadoop and HBase. Earlier he also built scalable message bus system that now powers
Groupon's global service oriented architecture handling hundreds of millions of messages. Before Groupon, he
was Sr Software Engineer at LiveOps working with distributed systems. Ameya holds masters in Information
Systems from Carnegie Mellon University and masters in Computer Science from Pune University.
Valentino Tereshko
Enterprise Sales Engineer
Google
Valentino is a Solutions Architect with Google Cloud Platform, helping companies accelerate innovation.
Valentino focuses on Big Data and Cloud Computing use cases for large Enterprises. Prior to Google,
Valentino spent his time at several startups, ranging from Streaming Big Data to Cloud Monitoring and
Financial Analytics, and he began his career as a trader and quant developer at an options trading firm in
Chicago. The Information
Apache Hadoop Innovation Summit
Date:
Location:
Venue:
Accommodation:
February 12 & 13, 2015
San Diego, California
Westin San Diego
Click here for online reservations
Registration Pricing
Silver Pass
Gold Pass
Diamond Pass
$1495
$1795
$1995
Access to all sessions &
networking events
7 days access to presentations from the
summit via ieOnDemand
Access to all sessions, networking
events & unlimited access to
presentations from the summit via
ieOnDemand
Access to all sessions, networking
events, annual subscription to all content
on the Big Data & Analytics channels via
ieOnDemand
$1295
$1595
$1795
Early Bird Price
(before Dec 12)
Early Bird Price
(before Dec 12)
Early Bird Price
(before Dec 12)
Access All Areas
Pass
1 Day Pass
$2295
Access to all sessions of the Apache
Hadoop Innovation Summit, Data
Science Innovation Summit & Predictive
Analytics Innovation Summit
On-Demand Pass
$795
Full access to the sessions to your
chosen day of the summit, 7 days
access to presentations from the summit
via ieOnDemand
Annual subscription to content
on the Big Data & Analytics
channels via
ieOnDemand
7 day
online access to
event materials
$600
Unlimited access to presentations
from the summit via ieOnDemand,
including presentations, interviews & the
ability to contact speakers
Unlimited
access to summit presentations
via ieOnDemand
Group Discount Offers
3 Silver Passes:
5 Silver Passes:
3 Gold Passes:
5 Gold Passes:
3 Diamond Passes:
5 Diamond Passes:
$3000 ($1000 per attendee)
$4500 ($900 per attendee)
$3900 ($1300 per attendee)
$6000 ($1200 per attendee)
$4500 ($1500 per attendee)
$7000 ($1400 per attendee)
For larger groups or special requests contact Bola by
calling +1 415 692 5378 or email
[email protected]
* Team discounts are applicable at the point of
registration only.
Ways to Register
+1 415 692 5378
+1 323 446 7673
Register Here
Registration Form
Apache Hadoop Innovation Summit
February 12 & 13, 2015 | Westin San Diego | San Diego, CA
For registration or more information on the program, please call Bola on +1 415 692 5378, or fax this registration form
to +1 (323) 446 7673
1. Delegate Information...
NAME OF EACH ATTENDEE
TITLE OF EACH ATTENDEE
DEPARTMENT
COMPANY
INDUSTRY
ADDRESS
CITY
STATE/PROVINCE
ZIP/POSTAL CODE
EMAIL OF EACH ATTENDEE
COUNTRY
BUSINESS PHONE NUMBER
2. Pass Types...
Early Bird Pass Options until December 12, 2014
Group Discount Pass Options
Early Bird Silver: $1295
Attendees ____
3 Silver Passes $3000 ($1000 per attendee)
Early Bird Gold: $1595
Attendees ____
5 Silver Passes $4500 ($900 per attendee)
Early Bird Diamond: $1795
Attendees ____
3 Gold Passes $3900 ($1300 per attendee)
Early Bird One Day: $795
Attendees ____
5 Gold Passes $6000 ($1200 per attendee)
Regular Pass Options after December 12, 2014
Silver Pass: $1495
Attendees ____
Gold Pass: $1795
Attendees ____
Diamond Pass: $1995
Attendees ____
One Day: $995
Attendees ____
3 Diamond Passes $4500 ($1500 per attendee)
5 Diamond Passes $7000 ($1400 per attendee)
For larger groups or special requests contact Bola by calling +1 415
692 5378 or email [email protected]
Group passes only available when all participants register together.
Pass Descriptions:
Silver Pass: Access to all sessions & networking events
Gold Pass: Access to all sessions, networking events & unlimited access to the summit presentations via ieOnDemand
Diamond Pass: Access to all sessions, networking events, annual subscription to all content on the Big Data & Analytics channels via
ieOnDemand
Access All Areas Pass: Access to all sessions of the Apache Hadoop Innovation Summit, Data Science Innovation Summit & Predictive Analytics
Innovation Summit, networking events, annual subscription to all content on the Big Data & Analytics channels via ieOnDemand
3. Payment Options...
Check (Make checks payable to The Innovation Enterprise Ltd)
Visa
Mastercard
CARD NUMBER
American Express
EXPIRATION DATE
Invoice me
Diners Club
Discover
SECURITY NO.
CARDHOLDERS NAME
CARDHOLDER’S SIGNATURE
BILLING ADDRESS -(same as above)
INDUSTRY
Prices are exclusive of VAT. Places are transferable without any charge to another Summit occurring within 12 months of the original purchase. Team discounts
are applicable at the point of registration only. Any cancellations within a group registration will in turn incur an increase in registration fee for the remaining
group participants. Cancellations before January 12, 2015 incur an administrative charge of 50%. If you cancel your registration after January 12, 2015 you will be
charged the full fee. You must notify The Innovation Enterprise in writing of a cancellation, or you will be charged the full fee. The Innovation Enterprise reserve the
right to make changes to the program without notice. NB: FULL PAYMENT MUST BE RECEIVED BEFORE THE EVENT.
Schedule
Day One
February 12
08.30
Session One 08.30 - 10.00
10.00
Coffee Break 10.00 - 10.30
10.30
Session Two 10.30 - 12.00
12.00
Lunch 12.00 - 13.30
13.30
Session Three 13.30 - 15.00
15.00
Coffee Break 15.00 - 15.30
15.30
Session Four 15.30 - 17.00
17.00
Networking Drinks 17.00 - 19.00
19.00
Day Two
February 13
08.30
Session Five 08.30 - 10.00
10.00
Coffee Break 10.00 - 10.30
10.30
Session Six 10.30 - 12.00
12.00
Lunch 12.00 - 13.30
13.30
Session Seven 13.30 - 15.00
15.00
15.30
Coffee Break 15.00 - 15.30
Session Eight 15.30 - 17.00
17.00
Sponsors
Platinum Sponsor
Media Partner
For sponsorship information contact Giles Godwin-Brown
Media Partner
2015 Calendar
January
May
Big Data Innovation Summit
Big Data Innovation Summit
January 22 & 23, Las Vegas
May 13 & 14, London
Cloud Innovation
Expo
Big Data & Analytics in
Healthcare
January 22 & 23, Las Vegas
May 13 & 14, Philadelphia
February
Chief Data Officer Summit
September Continued
September 23 & 24, Boston
May 20 & 21, San Francisco
Data Science Innovation
Summit
February 12, San Diego
Apache Hadoop Innovation
Summit
February 12 & 13, San Diego
The Digital Oilfield
Innovation Summit
Big Data & Analytics
Innovation Summit
November
Big Data & Analytics
for Pharma
November 4 & 5, Philadelphia
June
Big Data & Marketing
Innovation Summit
Big Data & Analytics for
Pharma
Big Data for Finance
June 10 & 11, Philadelphia
Open Data Innovation
Summit
June 10 & 11, Boston
February 19 & 20, Houston
Big Data Innovation Summit
November 4 & 5, Miami
November 11 & 12, Boston
Data Visualization
Summit
November 11 & 12, London
Big Data & Analytics for
Retail Summit
June 17 & 18, Chicago
Chief Data Officer
Summit
November 11 & 12, London
February 27 & 28, Singapore
August
March
Big Data & Analytics
Innovation Summit
Big Data & Analytics
Innovation
March 25 & 26, Brazil
August 5 & 6
Kuala Lumpur
April
Big Data & Analytics
Innovation Summit
Big Data & Analytics
Innovation Summit
November 11 & 12, London
Big Data & Analytics
Innovation Summit
November 25 & 26, Beijing
August 19 & 20, Brazil
December
September
Big Data & Analytics in
Banking Summit
April 15 & 16, Santa Clara
Big Data & Analytics
Innovation Summit
December 2 & 3, New York
Data Visualization Summit
September 17 & 18, Sydney
Big Data Innovation Summit
April 15 & 16, Santa Clara
DataTalent
April 15 & 16, Santa Clara
Big Data Innovation Summit
April 23 & 24, Hong Kong
Chief Data Officer
Summit
December 2 & 3, New York
Data Visualization
Summit
September 23 & 24, Boston
Flagship
Women
Hadoop
High Tech
Government
CXO
Finance
Expected
Healthcare
Pharma
Oil & Gas
Partnership Opportunities: Giles Godwin-Brown | [email protected] | +1 415 692 5498
Attendee Invitation: Sean Foreman | [email protected] | +1 415 692 5514