Download Report

Distance Based Outlier Detection for Data
Streams
Luan Tran, Liyue Fan, Cyrus Shahabi
Integrated Media Systems Center
University of Southern California
Algorithms (cont.)
Introduction
Motivation: Discovering abnormal phenomena in data streams has
become more critical. There are many applications such as:
§  Micro-Cluster Based Algorithm [3]
A data point in each cluster is certainly an inlier
§  Credit card fraud detection
Stream
source
Based on abnormal transactions
Data point
Can be assigned to a cluster
§  Network intrusion prevention
Micro clusters
If not
Based on abnormal log activities
§  Stock investment tactical planning
Based on abnormal changes in price
§  Event detection in sensor network
Micro cluster
Micro cluster
Use event based approach
R/2
Based on abnormal measuring values
In this poster, we study the problem of distance-based outlier
detection for data streams and some of the most recent algorithms
proposed by the scientific community.
§ Lifespan - Aware Minimal Probing Algorithm (MESI) [4]
Only find enough k neighbors, instead of all neighbors, for each data
point, to prove it is an inlier or not.
Slide 1
Distance Based Outlier
Slide 2
Slide 3
Slide 4
Slide 5
Slide 6
W
§  Data point o is a distance-based outlier if less than k neighbors in
a window lie within distance R from o.
Preceding next
K=3, W=16 (window size)
K= 5
Succeeding first
Results
- Current window: t18 (dash lines)
o9 has 4 neighbors (o5, o10, o14, o15) à inlier
o11 has 4 neighbors à inlier
§ Windows PC with Intel core i5, 1.8GHz, 4GB RAM
§ Datasets:
- TAO (Tropical Atmosphere Ocean), real-world, outlier rate = 0.01%
- Gaussian, synthetic, outlier rate = 0.1%
- Consider window t22
o9 is an inlier, neighbors set = (o10, o14, o15)
o11 is an outlier neighbors set = (o13) :
o3, o4, o6 expired
§ CPU time (s)
§ Peak memory consumption (MB)
§  Succeeding neighbor of point o is a neighbor that comes after o.
§  Preceding neighbor of point o is a neighbor that comes before o.
§  Safe inlier is a point that never becomes an outlier: has at least k succeeding
neighbors
§  Naïve approach: For each data point, store a list of neighbors.
Preceding neighbor k
Preceding neighbor1
Succeeding neighbor 1
Succeeding neighbor K
Algorithms
§  Exact Storm [1]
For each data point, store number of succeeding neighbors and list of
preceding neighbors
Preceding neighbor k
Preceding neighbor2
Preceding neighbor 1
Number of succeeding neighbors
§  Abstract-C [2]
For each point, store total number of neighbors in every window that it
participates in. For red point : <W1: 3, W2: 3>
W2
Conclusion and Future Work
§  Observations:
- Micro-cluster based algorithm is the best at CPU time and memory consumption.
- MESI, Abstract-C are comparable when window size is small.
§  Future work:
W1
§ Event Based Approach (LUE) [3]
Use a priority queue to store un-safe inliers. Poll from event
queue when a data object expires.
Outlier list
If it is an outlier
Stream
source
Data point
- Complete evaluation with more real-world datasets and larger parameter ranges
- Identify the characteristics of real-world outliers and propose efficient methods
References
[1] Fabrizio Angiulli and Fabio Fassetti. Detecting distance-based outliers in streams of data. CIKM '07.
If it is a unsafe inlier
[2] Di Yang, Elke A. Rundensteiner, and Matthew O. Ward. Neighbor-based pattern detection for windows
over streaming data. EDBT '09.
[3] Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, Kostas Tsichlas, and Yannis
Manolopoulos. Continuous monitoring of distance-based outliers over data streams. ICDE’11.
Priority queue
[4] Lei Cao, Di Yang, Qingyang Wang, Yanwei Yu, Jiayuan Wang, and Elke A. Rundensteiner. Scalable
distance-based outlier detection over high-volume data streams. ICDE’14.
IMSC Retreat 2015

CHEST PAIN IN CHILDREN: “IS MY CHILD GOING TO DIE?”

Distance Based Outlier Detection for Data Streams

CHEST PAIN IN CHILDREN: “IS MY CHILD GOING TO DIE?”

abcdocz.com