How to find the shape of a banana? Vancouver, June 2009

How to find the shape of a banana?
Ivan Mizera, University of Alberta
Vancouver, June 2009
Gratefully acknowledging the support of the NSERC of Canada
Prologue: boys and girls...
1
... or that quantiles are important
Univariate quantiles
standard and nonstandard applications: Parzen (2004)
quantile regression: Koenker (2005)
Multivariate quantiles
Serfling (2002), Koenker (2005), Wei (2008)
Depth
Hodges (1955), Tukey (1975), ... , Zuo and Serfling (2000)
Special case (often preceding the general): “medians”
... , Small (1990)
Some multivariate quantile proposals:
2
3.8
4.0
4.2
4.4
4.6
Multivariate normal contours
1.0
1.5
2.0
2.5
3.0
“The Choice of a Real Statistician”
3
Minimization of a L1-norm-type functions
0.8
0.3
0.6
0.2
0.4
0.1
0.2
0
0
−0.1
−0.2
−0.2
−0.4
−0.3
−0.6
−0.8
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−0.4
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
various adjustments needed,
in particular for affine equivariance
4
Minimization of the volume of simplices
“Oja depth”
5
Halfspace depth
4.6
4.4
4.2
4.0
log of height (in centimeters)
4.8
Envelope of directional quantile lines (p = .1)
1.5
2.0
2.5
3.0
log of weight (in kilograms)
“Directional quantile envelopes”
6
Directional quantile envelopes
4.6
4.4
4.2
4.0
3.8
log of height (in centimeters)
4.8
Directional quantile envelopes (p = 0.1 x 2^(-5:2))
1.0
1.5
2.0
2.5
3.0
log of weight (in kilograms)
Halfspace (Tukey) depth contours
7
Various desiderata
Old:
- connection to univariate case
- allow for analogs of medians, L-statistics, etc.
- equivariance properties (affine, orthogonal)
- ease of computation
New:
- interpretability
It’s always some nested contours - but what do they mean??
- introduction of covariates (multivariate quantile regression)
Wei (2008), Kong and Mizera (2008)
http://arxiv.org/abs/0805.0056v1
So, how about “curved” situations (banana)?
See 1., 2., 3., 4., 5., 6. below
8
1. Transform the coordinates
80
60
40
log of height (in centimeters)
100
Directional quantile envelopes (p = .0125, 0.025, .05, .1, .2, .4)
5
10
15
20
log of weight (in kilograms)
(depth contours fitted to logs and then backtransformed)
9
80
60
40
nepal$height
100
Original data
5
10
15
20
nepal$weight
“As There Were Created”
10
4.2
4.0
3.8
log(nepal$height)
4.4
4.6
And logged data
0.5
1.0
1.5
2.0
2.5
3.0
log(nepal$weight)
“Single Most Useful Transformation”
11
That’s it
Tukey’s ladder of transformations, Box-Cox maybe
- the approach probably having most sense of them all
- but often hard to get
- and quite limited to the two-dimensional case
12
2. Quantile regression in polar coordinates
Wei (2008):
JASA, 103(108), Figure B.1, page 409, right upper panel
- shows (undesirable) sensitivity on the selection of the center
of the polar coordinates
13
3.
transparency intentionally left blank
14
4.
transparency intentionally left blank
15
5. “Delaunay depth”:
8
6
4
2
0
−2
−4
−3
−2
−1
0
1
2
3
Izem, Souvaine, Rafalin (2006)
16
- how far are we from the boundary?
8
6
4
2
0
−2
−4
−3
−2
−1
0
1
2
3
Problems: what is “boundary”? what is “far”?
17
Perspective: somewhat off...
3
2
1
0
3
2
1
0
−1
−2
−3
8
6
4
2
0
−2
−4
18
Trick: preliminary density estimation
12
10
8
6
4
2
0
−2
−4
−3
−2
−1
0
1
2
3
Add “undata”, and do the weighted version of the original
19
Isn’t it better?
8
6
4
2
0
−2
−4
−3
−2
−1
0
1
2
3
Really? If yes, what does it say?
20
New perspective
0.2
0.15
0.1
0.05
0
3
2
1
0
−1
−2
−3
8
6
4
2
0
−2
−4
And doesn’t it depend too much on the triangulation and undata?
21
6. α-shapes: erase all empty discs
Edelsbrunner, Kirkpatrick and Seidel (1983), ...
A survey: Edelsbrunner (200?); use: “persistent homology”
A related concept: Hall, Park and Turlach (2002)
22
Some other greek letter
Adapted to quantilistic needs:
(i) Erase not discs, but “semi-infinite” extensions:
paraboloids, for instance
(ii) Not only empty, but containing less than a prescribed
proportion k/n of data mass
(not every boundary-seeking method suitable for this)
- “a curvilinear depth”
- recall depth contour: erase all halfspaces that contain less
than k = 0, 1, 2, . . . points
- for increasing k, we have nested contours
- in fact, it is “curved depth”, with all ensuing properties
- in particular, well-defined population analog
- hopefully with some meaning too...
- ...and efficient algorithm...
- however, some “bandwith” selection inevitable
23
Science:
24
Science:
- the exploration of impasses
24
Science:
- the exploration of impasses
- to see which are not
24
References
Edelsbrunner, H., Kirkpatrick, D. G. and Seidel, R. (1983). On the shape of a set of points in the plane.
IEEE Trans. Inform. Theory 29 551–559.
Hall, P., Park, B. U. and Turlach, B. A. (2002). Rolling-ball method for estimating the boundary of the
support of a point-process intensity. Ann. I. H. Poincar´
e 6 959–971.
Hodges, J. L., Jr (1955). A bivariate sign test. Ann. Math. Statist. 26 523–527.
Koenker, R. (2005). Quantile regression. Cambridge University Press, Cambridge.
Parzen, E. (2004). Quantile probability and statistical data modeling. Statist. Sci. 19 652–662.
Serfling, R. (2002). A depth function and a scale curve based ons spatial quantiles. In Statistical Data
Analysis Based on the L1 -Norm and Related Methods (Y. Dodge, ed.) 25–38. Birkh¨
auser Verlag, Basel.
Small, C. G. (1990). A survey of multidimensional medians. International Statistical Review 58 263–277.
Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress
of Mathematicians (Vancouver, B. C., 1974), Vol. 2 523–531. Canad. Math. Congress, Quebec.
Wei, Y. (2008). An approach to multivariate covariate-dependent quantile contours with application to
bivariate conditional growth charts. J. Amer. Statist. Assoc. 103 397–409.
Zuo, Y. and Serfling, R. (2000). On the performance of some robust nonparametric location measures
relative to a general notion of multivariate symmetry. J. Statist. Plann. Inference 84 55–79.
25