Mori-san slides - Data Science SG

Data Scientist Meet-up
March 24th, 2015
Rakuten Institute of Technology
Global Head
Masaya Mori
http://rit.rakuten.co.jp/
Introduction
• Masaya Mori
• Rakuten Inc. Executive Officer
• RIT. Global Head
• responsible for managing R&D
activities and IT strategies in the
Rakuten Group.
• Advisor for IPSJ
Masaya Mori
Twitter: @emasha
AR-HITOKE:
Augmented Reality Shopping Experience
With Purchase History, Product Data, Review Data
If you look at products at actual stores, you can get
information about products, the popularity, the reputation,
and so on. You can also enjoy shipping with SNS services.
How many of you know Rakuten?
Growing Faster
6
Rakuten’s Global Expansion
Expanding to 25 countries / regions
Providing EC Services in 12 countries.
Many businesses
EC
Electronic Money
Bank
Life Insurance
CredItCard
Travel
Media
Securities
Telecommunication
8
Professional Baseball team
9
Professional Soccer team (Football team)
10
We can provide a variety of data
to Data Scientists!
We are living in the BigData Era.
In 2013,
people were always online, always connected to the internet via smart devices.
Everybody can
communicate with
friends in all over the
world. everybody can
upload photos shortly to
spread it.
AR-HITOKE:
Augmented Reality Shopping Experience
With Purchase History, Product Data, Review Data
Ou app i 1o those ex
If you look at products at actual stores, you can get
information about products, the popularity, the reputation,
and so on. You can also enjoy shipping with SNS services.
People can be always connected to the internet
and enjoy freely shopping supported by smart
devices.
People are getting free increasingly with smart devices.
Smart Device Expansion
Shopping
Life
Leisure
Money
Rakuten Business is very unique!
But we provide the business platform to support
merchants and consumers to meet each other
Diversifying Merchants
World generates Data
We are a Diversifying Network
Merchants are diversified.
A solar energy generation system
(1,620,000 SGD)
It is very expensive!
a golden Buddhist altar (金仏壇)
777,000 SGD
If you have money,
you can buy it also.
Thi mer sel* at 7*. Ama
RIT
Rakuten Institute of Technology (R.I.T.)
Masaya Mori
Global head of
R.I.T. .
•
•
•
Rakuten Institute of Technology
Accelerate Contribution to Rakuten Asia
Prediction
O2O Research
Machine Learning
Physical Area Research
Recommender systems
Image Processing
Distributed File Systems
Rakuten Hub Categories
Product Clustering
Machine Translation
3 research groups in R.I.T.
Reality
Interface for Connecting Users
to Large Amount of Data
O2O HCI Multimedia Data Processing
Intelligence
Power
Algorithm for Creating Values
from Large Amount of Data
Data Mining Natural language Processing
Information retrieval
machine learning
HPC infrastructure for
Processing Large Amount of Data
Stream Processing GPGPU
Distributed File System Distributed KBS
3 research groups in R.I.T.
Reality
Interface for Connecting Users
to Large Amount of Data
O2O HCI Multimedia Data Processing
Intelligence
Power
Algorithm for Creating Values
from Large Amount of Data
Data Mining Natural language Processing
Information retrieval
machine learning
HPC infrastructure for
Processing Large Amount of Data
Stream Processing GPGPU
Distributed File System Distributed KBS
Examples
Rakuten SPDB
Demography
Collecting
Analysis
Purchase History
Demographic
questionnaire
Rakuten
Super DB
Credit Card Usage
Super Point
Coupon
Behavior
Psychographic
Data Access
Applications
Login
External
Data
Geographic
DB
File
・Personalization
・Recommender
・BTA
・BI Tool
・・・・
Application
Recommender Algorithms
Product Search Engine
Morphological Analysis
Item id : 1234
tokens : {Canon, EOS, 5d, mark, ….}
Indexing
Product Data
(=Documents)
Inverted Index
tokens
item IDs
Canon
xxxx, yyyy, 1234, zzzz, ….
EOS
aaaa, bbbb, 1234, ….
5d
hhhh
Search Keyword Trend Analysis
Finding related keywords with trend analysis
Keyword: Father’s Day
Keyword: steteko
Attribute Extraction
Item pages in Rakuten are created by merchants
They Contain lots of unstructured text
For better service, we need structured data.
Hard to see a wine’s attributes
Easy to see a wine’s attributes
Machine Learning
•[Tokyo & NY] We’ve spread ML across Rakuten. Prediction
for Economics, PIOP, Kireido Navi (which got IT Award), etc.
•Recently, Javier made Active Learning tool for Slice.
•Utilized Deep Learning on image filter for ICHIBA & text
recognition (which got No.1 accuracy in academia.)
Text detection got No.1 rank in academia.
CEO emphasized ML
Led to IT Award
Machine Translation
•[NY] To support Bing translation of overseas shopping, we put
together our abilities such as Raqumo, Dictionary, RTagger,
etc and improved qualities for ICHIBA. We put together them
into RIT-Core package.
•Next year, at last we'll start machine translation project.
Automation
•Automation is a key for innovation.
•Yang-san amazingly implemented Bandit Algorithm in
business coupon strategy and preference extraction into
fashion-style project which is the advanced version of
Discover Pages.
Prediction
This is Prediction team based on Rakuten BigData, hiring
economists, market analysts.
They’ll predict trends of stock market, economics, product
demands, etc.
Impact on Finance
& Management
Rakuten’s BigData
(ICHIBA’s data)
Economist Prediction
of stock market
of economics index
of demands of products
….
Provide prediction
For Merchants
PIOP: Demand Prediction
We have already prediction system, PIOP,
Standing for Price and Inventory Optimization Platform.
•It utilizes the machine learning
technology to make sure of precise
prediction of demand of products.
S*
Economic Prediction
We already tried to use ICHIBA’s data for prediction of CI (Composite
Index or 景気動向指数) as follows. The result is very good.
CI (景気動向指数)
110
•
•
108
Prediction of CI
106
104
102
100
Actual
98
96
Predict(Training fit)
94
Predict(Test fit)
Training data : Dec., 2009 – Dec. 2012
Test data : Jan., 2013 – Apr., 2013
Month
Rakuten Data Release
 Rakuten Ichiba
http://rit.rakuten.co.jp/opendata.html
 All item data (approx. 156 million items)
 Review data (approx. 64 million reviews)
 Rakuten Travel
 Facility data (82,458 facilities)
 Review data (approx. 4.7 million reviews)
 GORA (Rakuten’s golf service)
 Facility data (1,669 facilities)
 Review data (320,000 reviews)
 Rakuten Recipe
 Recipe Data (approx. 440,000 recipes)
 Recipe Images (approx. 440,000 images)
 Rakuten Auction
 Evaluation data (approx. 12 million evaluations)