Slides

Roger Moussalli, Mudhakar Srivatsa, Sameh Asaad
05 May 2015
Fast and flexible
conversion of geohash
codes to and from
lat./long. coordinates
FCCM 2015
c2b2qkded
© 2015 IBM Corporation
Spatiotemporal analytics
 Ubiquitous availability of location sensing
devices:
– GPS, RFID, mobile devices.
 Overwhelming majority of geospatial data is
Internet of
Things that Move – geospatial
being produced by the
data is increasingly spatiotemporal data:
– Complex pattern queries can be posed on
objects moving in time and space.
 Opportunities for business intelligence:
– Traffic congestions prediction.
– Targeted advertising.
– Epidemic spread characterization.
– Insurance pricing.
– Crime pattern analysis.
25+ TBs daily logs
100s of millions
GPS-enabled
devices sold
annually
12+ TBs
daily tweet
data
1.75 Billion
smartphone
users worldwide
 CPUs unable to keep up with online and offline processing of skyrocketing amounts of data.
2
© 2015 IBM Corporation
Geohash: hierarchical geocoding
 Algorithm introduced in 2009:
– http://geohash.org (/c2b2qkded)
 Hierarchically divide space into grids.
Geohash codes properties:
Support for hierarchical regions.
Arbitrary precision: remove characters from the end
of the code to reduce size and precision.
Near places often (but not always) present similar
prefixes (simple proximity estimation).
Main uses of geohashes:
Geotagging.
Efficient database indexing.
3
© 2015 IBM Corporation
Geohash: hierarchical geocoding
 Algorithm introduced in 2009:
– http://geohash.org (/c2b2qkded)
 Hierarchically divide space into grids.
Geohash codes properties:
Support for hierarchical regions.
Arbitrary precision: remove characters from the end
of the code to reduce size and precision.
Near places often (but not always) present similar
prefixes (simple proximity estimation).
1
2
3
4
Main uses of geohashes:
Geotagging.
Efficient database indexing.
4
© 2015 IBM Corporation
Geohash: hierarchical geocoding
 Algorithm introduced in 2009:
– http://geohash.org (/c2b2qkded)
 Hierarchically divide space into grids.
Geohash codes properties:
Support for hierarchical regions.
Arbitrary precision: remove characters from the end
of the code to reduce size and precision.
Near places often (but not always) present similar
prefixes (simple proximity estimation).
1
21 22
23 24
3
4
Main uses of geohashes:
Geotagging.
Efficient database indexing.
5
© 2015 IBM Corporation
Geohash: hierarchical geocoding
 Algorithm introduced in 2009:
– http://geohash.org (/c2b2qkded)
 Hierarchically divide space into grids.
Geohash codes properties:
Support for hierarchical regions.
Arbitrary precision: remove characters from the end
of the code to reduce size and precision.
Near places often (but not always) present similar
prefixes (simple proximity estimation).
1
21 22
23
3
4
241 242
243 244
Main uses of geohashes:
Geotagging.
Efficient database indexing.
6
© 2015 IBM Corporation
Geohash: hierarchical geocoding
 Algorithm introduced in 2009:
– http://geohash.org (/c2b2qkded)
 Hierarchically divide space into grids.
 Geohash codes properties:
– Support for hierarchical regions.
– Arbitrary precision: remove characters from the
end of the code to reduce size and precision.
– Near places often (but not always) present similar
prefixes (simple proximity estimation).
1
21 22
23
3
4
241 242
243 244
 Main uses of geohashes:
‒ Geotagging.
‒ Efficient database indexing.
7
© 2015 IBM Corporation
Problem statement
 Geohash codes do not replace lat./long. coordinates:
– Format of disseminated data.
– Dependency of legacy algorithms on lat./long. coordinates.
 New algorithms are developed to operate in the geohash domain.
 The (bit-serial) conversion of geohash codes to and from lat./long. is a highly frequent
problem, and is computationally demanding (double-precision FP):
– MongoDB, Apache Accumulo, IBM Streams and SPSS (among others).
8
© 2015 IBM Corporation
Geohash code generation (latitude)
 Geohash latitude bits:
1
+ 90
1
0
- 90
9
© 2015 IBM Corporation
Geohash code generation (latitude)
 Geohash latitude bits:
1, 0
+ 90
1
0
-0
10
© 2015 IBM Corporation
Geohash code generation (latitude)
 Geohash latitude bits:
1, 0, 1
+ 45
1
0
-0
11
© 2015 IBM Corporation
Geohash code generation
 Geohash latitude bits:
1, 0, 1
+ 45
 Geohash longitude bits:
1, 1, 1
-0
 Geohash code:
111011
+ 90
12
+ 180
© 2015 IBM Corporation
Single step conversion block
input interval
lat.
 Latitude (or longitude) to geohash
• Input: Latitude (or longitude).
Interval (min, mid, max).
• Output: Geohash bit.
New interval.
min mid max
lat. to geohash
geohash min mid max
bit
 Geohash to latitude (or longitude)
• Input: Geohash bit.
Latitude (or longitude) value.
Interval (min, mid, max).
• Output: New interval.
Latitude (or longitude) value = new interval mid.
13
geohash
bit
min mid max
geohash to lat.
min mid max
lat.
© 2015 IBM Corporation
Single step conversion block
lat.
 Latitude (or longitude) to geohash
• Input: Latitude (or longitude).
Interval (min, mid, max).
• Output: Geohash bit.
New interval.
min mid max
lat. to geohash
geohash min mid max
bit
 Geohash to latitude (or longitude)
• Input: Geohash bit.
Latitude (or longitude) value.
Interval (min, mid, max).
• Output: New interval.
Latitude (or longitude) value = new interval mid.
geohash
bit
lat.
min mid max
geohash to lat.
min mid max
lat.
14
© 2015 IBM Corporation
Flexible, high-throughput converter
lat.
long.
single
step
single
step
single
step
single
step
…
…
 Run-time flexibility:
– Mode: geo-to-ll, ll-to-geo.
– Geohash precision.
– Geohash transfer size.
Input
interface
controller
single
step
single
step
batch info
 Design-time flexibility:
– Max geohash precision.
– Num deployed stages (single step blocks).
– Supported transfer-side geohash sizes.
– Engineering details:
• Stream interface width.
• Further Pipelining.
Input stream
Output
interface
controller
Output stream
15
© 2015 IBM Corporation
Conversion by lookup
0
1
1
2
00
-180
-135
-90
3
01
0
45
90
4
10
-90
-45
0
5
11
90
135
180
6
000
-180
-157.5
-135
7
001
address
 Pre-meditated (offline)
conversion table.
 Relax requirements on compute
resources (DSPs).
 Table size =
min
mid
max
-180
-90
0
0
90
180
0
22.5
45
conversions for
longitude of
length 1 bit
conversions for
longitude of
length 2 bits
conversions for
longitude of
length 3 bits
…
…
longitude geohash code+offset
…
 Address in table =
longitude
longitude
geohash
0
 Offset =
16
© 2015 IBM Corporation
Combining lookup with compute
length of latitude
(or longitude)
geohash code
2N
 Flexible resource-friendly solution.
 Lookup tables process at most X
geohash bits (each).
 N-X bits are processed by
compute single step blocks:
– Multiple passes through the
compute stages is allowed.
latitude
geohash
code bits
X
N-X
latitude interval
X
N
min
X
length
N-X
X
latitude
lookup
based
conversion
single step
longitude
lookup
based
conversion
longitude interval
num
remaining
bits
single step
…
…
17
longitude
geohash
code bits
N
single step
single step
latitude value
longitude value
N-X
back-to-back
stages
© 2015 IBM Corporation
Experimental evaluation
 Hardware platform:
– Pico M505 board.
– PCIe gen2 x8.
– Xilinx Kintex 7 325T @250MHz.
– Vivado 2014.2 .
– End-to-end performance: host RAM -> FPGA -> host RAM.
 Software framework:
– Multi-threaded converter from IBM Streams and SPSS.
– Single socket 8-core x2 Hyper-Threads Intel Xeon @2.5GHz.
– 32GB RAM.
 Metric:
18
© 2015 IBM Corporation
Design space exploration
Resource
utilization %
100
90
80
70
60
50
40
30
20
10
0
LUTs %
Registers %
DSPs %
Number of single step blocks
19
© 2015 IBM Corporation
Design space exploration
Resource
utilization %
100
90
80
70
60
50
40
30
20
10
0
LUTs %
Registers %
DSPs %
Number of single step blocks
20
© 2015 IBM Corporation
End-to-end throughput: PCIe gen2 x8 limitations
 64-bit geohashes.
 Batch of 100M conversions.
250
core
200
Throughput
lat./long. to geohash (end-to-end)
geohash to lat./long. (end-to-end)
150
(MConversions/s)
100
50
0
4
8
16
32
64
Number of single step blocks
21
© 2015 IBM Corporation
Speedup vs multi-threaded software: a socket-to-socket comparison
 8-core x2 Hyper-Threads Intel Xeon @2.5GHz.
 64-bit geohashes.
 100M conversions.
Speedup
160
140
120
100
80
60
40
20
0
geohash to long/lat
long/lat to geohash
16.1
13
1
2
4
8
16
32
Number of threads
22
© 2015 IBM Corporation
Speedup vs multi-threaded software: a socket-to-socket comparison
 8-core x2 Hyper-Threads Intel Xeon @2.5GHz.
 64-bit geohashes.
 100M conversions.
speedup starts at batch of 100 conversions
Speedup
160
140
120
100
80
60
40
20
0
geohash to long/lat
long/lat to geohash
16.1
13
1
2
4
8
16
32
Number of threads
23
© 2015 IBM Corporation
Conclusions
 First hardware implementation of a geohash conversion engine operating at wire speed.
 Flexibility:
– Design time.
– Run time.
 Performance:
– >13X vs multi-threaded software.
– Limited by the available PCIe bandwidth.
 Future work:
– Accelerating compute-intensive queries in the geohash domain.
24
© 2015 IBM Corporation
25
© 2015 IBM Corporation
BACKUP
26
© 2015 IBM Corporation
End-to-end throughput
geohash-to-lat./long.
Throughput
(Mconversions/s)
120
100
80
60
fpga_geo_8
fpga_geo_16
fpga_geo_32
fpga_geo_64
fpga_geo_128
40
lat./long.-to-geohash
20
Number of conversions (batch size)
Throughput
(Mconversions/s)
0
180
160
140
120
100
80
60
40
20
0
fpga_geo_8
fpga_geo_16
fpga_geo_32
fpga_geo_64
fpga_geo_128
Number of conversions (batch size)
27
© 2015 IBM Corporation
Speedup vs single-threaded software
 Geohash-to-lat./long.
fpga_geo_128
300
fpga_geo_64
250
fpga_geo_32
200
Speedup
fpga_geo_16
150
fpga_geo_8
100
50
0
10
100
1K
10K
100K
1M
10M 100M
Number of conversions (batch size)
28
© 2015 IBM Corporation
Speedup vs single-threaded software
8
6
 Geohash-to-lat./long.
4
2
0
10
1K
fpga_geo_128
300
fpga_geo_64
250
fpga_geo_32
200
Speedup
100
fpga_geo_16
150
fpga_geo_8
100
50
0
10
100
1K
10K
100K
1M
10M 100M
Number of conversions (batch size)
29
© 2015 IBM Corporation