slides - Alexandros Labrinidis

The Big Data – Same Humans Problem
Alexandros Labrinidis
Advanced Data Management Technologies Lab
Department of Computer Science
University of Pittsburgh
CIDR 2015 – Gong Show – January 5, 2015
You know Big Data is an important problem if...
•  It is featured on the cover of Nature and the Economist!
2
You know Big Data is an even more important problem if...
•  It has a Dilbert cartoon!
3
What is Big Data?
Definition #1:
•  Big data is like teenage sex:
o 
o 
o 
o 
everyone talks about it,
nobody really knows how to do it,
everyone thinks everyone else is doing it,
so everyone claims they are doing it...
Definition #2:
•  Anything that Won't Fit in Excel!
Definition #3:
•  Using the Vs
4
The three Vs
If you have been living under a rock for the last few years
5
The three Vs
•  Volume - size does matter!
•  Velocity - data at speed, i.e., the data
“fire-hose”
•  Variety - heterogeneity is the rule
6
Five more Vs
•  Variability - rapid change of data characteristics
over time
•  Veracity - ability to handle uncertainty,
inconsistency, etc
•  Visibility – protect privacy and provide security
•  Value – usefulness & ability to find the right-needle
in the stack
•  Voracity - strong appetite for data!
7
Enter Moore’s Law
Moore'ʹs law is the observation that, over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years. The law is named after Gordon E. Moore, co-­‐‑founder of Intel Corporation, who described the trend in his 1965 paper.
Source: hZp://en.wikipedia.org/wiki/Moore'ʹs_law
[ Wikipedia Image ]
8
Enter Bezos’ Law
Bezos'ʹ law is the observation that, over the history of cloud, a unit of computing power price is reduced by 50% approximately every 3 years Source: hZp://blog.appzero.com/blog/futureofcloud
9
Photo: hZp://www.slashgear.com/google-­‐‑data-­‐‑center-­‐‑hd-­‐‑photos-­‐‑hit-­‐‑where-­‐‑the-­‐‑internet-­‐‑lives-­‐‑gallery-­‐‑17252451/
Storage capacity increase
HDD Capacity (GB)
7000
6000
5000
4000
3000
2000
1000
0
Insert other exponentially increasing graphs here
(e.g., data generation rates, world-­‐‑wide smartphone access rates, Internet of Things, …) [ Wikipedia Data ]
10
But
•  Human processing capacity remains
roughly the same!
11
We refer to this as the: Big Data – Same Humans Problem
12
Rethinking
Scalability
Systems point of view Human point of view
• 
• 
• 
• 
Response time
Throughput
Scale-­‐‑up
Scale-­‐‑out
•  Making sure humans do not get lost in a sea of data! 13
Rethinking Scalability
Example:
Data Stream Management System processing
1,000,000 events per second
Systems point of view Human point of view
•  Fantastic!
•  Terrible!
14
So now what?
•  In addition to “traditional” scalability, it is
important to consider “big data” research
in:
o Summarization
o Personalization
o Ranking
o Recommender Systems
o Visual Analytics
o Crowd-sourcing
o …
15
It is important!
16
17
Thank you
Further reading:
Big Data and Its Technical Challenges
http://bit.ly/bigdatachallenges
By Jagadish, Gehrke, Labrinidis, Papakonstantinou,
Patel, Ramakrishnan, and Shahabi,
Communications of the ACM, July 2014
18