Improving Vector Space Word Representations Using Multilingual

+
Manaal Faruqui and Chris Dyer
Language Technologies Institute
Carnegie Mellon University
Improving Vector Space
Word Representations
Using Multilingual
Correlation
+
Distributional Semantics
“You shall know a word by the company it keeps”
(Harris 1954; Firth, 1957)
…I will take what is mine with fire and blood…
…the end battle would be between fire and ice…
…My dragons are large and can breathe fire now…
…flame is the visible portion of a fire…
…take place whereby fires can sustain their own heat…
+
Translational Semantics
What other Information?
That plane can seat more than 300 people
तीन सौ से अधिक लोगों को बैठाने वाला वायय
ु ान …
रूसी वायय
ु ान बहुत बड़े हैं
Russian airplanes are huge
plane ≅ airplane
Multilingual Information!
(Bannard & Callison-Burch, 2005)
+
Outline
 Distributional
 Monolingual
context
 Translational
 Multilingual
 Better
 Using
Semantics
Semantics
context
Semantic Representations
Distributional + Translational semantics
Word Vector Representations
How to encode such co-occurrences?
contexts
words
+
day
night
…
cold
sleep
0
10
2
winter
3
3
50
10
12
9
…
the
+
Word Vector Representation
Latent Semantic Analysis
(Deerwester et al., 1990)
words
words
context
Singular Value Decomposition
+
Multilingual Information
English
dragon
German
Drache
French
dragon
Spanish
dragón
Problem ?
= Append
+
Multilingual Information
Disadvantages of Vector Concatenation
 Vector
Size Increases
 Idiosyncratic
 What
✗
Info.
if word is OOV ?
?
+
Multilingual Information
So, what can we do?
…I will take what is mine
with fire and blood…
…the end battle would be
between fire and ice…
…My dragons are large and
can breathe fire now…
... Das Ende der Schlacht
würde zwischen Feuer und Eis
...
... gesehen ist Feuer eine
Oxidationsreaktion mit...
... Das Licht des Feuers ist eine
physikalische Erscheinung…
Two Views: Canonical Correlation Analysis !
+
Canonical Correlation Analysis
(CCA)
Project two sets of vectors (equal cardinality) in a
space where they are maximally correlated
CCA
Ω
Θ
Ω
≅
Θ
Convex Optimization Problem with Exact Solution !
+
Canonical Correlation Analysis
(CCA)
W, V = CCA(Ω, Θ)
n1
×
X
W
d1
Y
n2
k
d2
×
V
d2
k
d1
n1
X”
Y”
n2
k = min(r(Ω), r(Θ))
k
k
X” and Y” are now maximally correlated !
+
Canonical Correlation Analysis
(CCA)
Problems Addressed?
 Vector
Size Increases, Doesn’t increase
 Idiosyncratic
 What
Information, Lets you choose!
if word is OOV?, Projection vectors for everyone!
+
Canonical Correlation Analysis
(CCA)
Ok, but equal cardinality sets Ω& Θ?
 The
vocabularies cant be of equal size !
 Get
word alignments from a parallel corpus

Preserve only words in the original vocabulary
 For
every word in English, select the best foreign word
+
Experimental Setup
LSA Word Vector Learning
Monolingua
l Data
English
German
French
Spanish
News Corpus
WMT-2011
WMT-2011
WMT 2011-12
WMT-2011
Tokens
360,000,000
290,000,000
263,000,000
164,000,000
Types
180,000
294,000
137,000
145,000
Tokenizer and Lowercasing: WMT scripts
+
Experimental Setup
LSA Word Vector Learning
Parallel
Data
De-En
Fr-En
Es-En
News Comm
+ Europarl
WMT
WMT
WMT
Tokens
128,000,000
138,000,000
134,000,000
Word pairs
37,000
38,000
38,000
Word Alignment Tool: fast_align (Dyer et al, 2013)
+
Experimental Setup
LSA Word Vector Learning
Corpus Preprocessing
...hello… …hello… …hello… …hello… …hello…
Context :
23.45 , 21st , 10-20-2014 , 0.5e10
NUM
anchfgugsjh, wekjfbg, bhguyq
UNK
+
Experimental Setup
Evaluation Benchmarks
 Word
Similarity Evaluation
WS-353 (Finkelstein et al, 2001)
 WS-353-SIM (Agirre et al, 2009)
 WS-353-REL (Agirre et al, 2009)
 RG-65 (Rubenstein and Goodenough, 1965)
 MC-30 (Miller and Charles, 1991)
 MTurk-287 (Radinsky et al, 2011)

 Word
Relation Evaluation
Semantic Relations (Mikolov et al, 2013)
 Syntactic Relations (Mikolov et al, 2013)

+
Experimental Setup
Multilingual Vector Learning
 Monolingual Vector
 Multilingual Vector
 The
Length: 80
Length: ?
length in projected space can be chosen: ‘k’
 Choose
the best value of ‘k’ for WS-353
k ε[0.1, 0.2, …, 1.0]
Experimental Setup
Multilingual Vector Learning
Spearman’s correlation
+
Dimensions
Performance on WS-353; k = 0.6
+
Experimental Setup
Multilingual Vector Learning
70
60
50
Spearman’s
correlation
40
Monolingual
Multilingual
30
20
10
0
WS-353
RG-65
Mturk-287
+
Experimental Setup
Multilingual Vector Learning
35
30
25
Accuracy
20
Monolingual
Multilingual
15
10
5
0
Semantic
Syntactic
+
Experimental Setup
Multilingual Vectors: Neural Networks
 RNNLM
(Mikolov et al, 2011)
Predict next word given the history
 Neural language model
 Recurrent hidden layer connections

 Skip-Gram, word2vec
(Mikolov et al, 2013)
Predict context given the word
 Removes hidden layer
 Vocabulary represented in Huffman coding

+
Experimental Setup
Multilingual Vector Learning
50
45
40
35
30
25
20
15
10
5
0
80
70
60
50
40
30
20
10
0
RNNLM
Mono
Multi
Skip-Gram
+
Experimental Setup
Multilingual Vectors: Scaling
Spearman’s correlation on WS-353
+
Experimental Setup
Multilingual Vectors: Qualitative Analysis
Antonyms and Synonyms of “Beautiful”: Monolingual Setting
t-SNE tool (van der Maaten and Hinton, 2008)
+
Experimental Setup
Multilingual Vectors: Qualitative Analysis
Antonyms and Synonyms of “Beautiful”: Multilingual Setting
t-SNE tool (van der Maaten and Hinton, 2008)
+
Conclusion
 CCA: Easy

to use tool in MATLAB
Take vectors from two languages and improve them.
 Multilingual

Even if the problems are inherently monolingual.
 More

Information is Important
Effective for Distributional Vectors
Semantics generalizes better than Syntax.
 Vectors
available at: http://cs.cmu.edu/~mfaruqui
+
Related Work
 Document
representation
 Bilingual
word vectors

Vinokourov et al, 2002,

Klementiev et al 2012

Platt et al, 2010

Zou et al, 2013
 Synonymy
and Paraphrasing

Bannard and Burch, 2005,

Ganitkevitch et al, 2013
 Bilingual
lexicon induction

Haghighi et al, 2008

Vulic and Moens, 2013
 Translation

Models
Kalbrenner & Blunsom, 2013
 Compositional

Semantics
Hermann & Blunsom, 2014
+
Thanks!
Visit us at ACL-demo:
wordvectors.org