+ Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon University Improving Vector Space Word Representations Using Multilingual Correlation + Distributional Semantics “You shall know a word by the company it keeps” (Harris 1954; Firth, 1957) …I will take what is mine with fire and blood… …the end battle would be between fire and ice… …My dragons are large and can breathe fire now… …flame is the visible portion of a fire… …take place whereby fires can sustain their own heat… + Translational Semantics What other Information? That plane can seat more than 300 people तीन सौ से अधिक लोगों को बैठाने वाला वायय ु ान … रूसी वायय ु ान बहुत बड़े हैं Russian airplanes are huge plane ≅ airplane Multilingual Information! (Bannard & Callison-Burch, 2005) + Outline Distributional Monolingual context Translational Multilingual Better Using Semantics Semantics context Semantic Representations Distributional + Translational semantics Word Vector Representations How to encode such co-occurrences? contexts words + day night … cold sleep 0 10 2 winter 3 3 50 10 12 9 … the + Word Vector Representation Latent Semantic Analysis (Deerwester et al., 1990) words words context Singular Value Decomposition + Multilingual Information English dragon German Drache French dragon Spanish dragón Problem ? = Append + Multilingual Information Disadvantages of Vector Concatenation Vector Size Increases Idiosyncratic What ✗ Info. if word is OOV ? ? + Multilingual Information So, what can we do? …I will take what is mine with fire and blood… …the end battle would be between fire and ice… …My dragons are large and can breathe fire now… ... Das Ende der Schlacht würde zwischen Feuer und Eis ... ... gesehen ist Feuer eine Oxidationsreaktion mit... ... Das Licht des Feuers ist eine physikalische Erscheinung… Two Views: Canonical Correlation Analysis ! + Canonical Correlation Analysis (CCA) Project two sets of vectors (equal cardinality) in a space where they are maximally correlated CCA Ω Θ Ω ≅ Θ Convex Optimization Problem with Exact Solution ! + Canonical Correlation Analysis (CCA) W, V = CCA(Ω, Θ) n1 × X W d1 Y n2 k d2 × V d2 k d1 n1 X” Y” n2 k = min(r(Ω), r(Θ)) k k X” and Y” are now maximally correlated ! + Canonical Correlation Analysis (CCA) Problems Addressed? Vector Size Increases, Doesn’t increase Idiosyncratic What Information, Lets you choose! if word is OOV?, Projection vectors for everyone! + Canonical Correlation Analysis (CCA) Ok, but equal cardinality sets Ω& Θ? The vocabularies cant be of equal size ! Get word alignments from a parallel corpus Preserve only words in the original vocabulary For every word in English, select the best foreign word + Experimental Setup LSA Word Vector Learning Monolingua l Data English German French Spanish News Corpus WMT-2011 WMT-2011 WMT 2011-12 WMT-2011 Tokens 360,000,000 290,000,000 263,000,000 164,000,000 Types 180,000 294,000 137,000 145,000 Tokenizer and Lowercasing: WMT scripts + Experimental Setup LSA Word Vector Learning Parallel Data De-En Fr-En Es-En News Comm + Europarl WMT WMT WMT Tokens 128,000,000 138,000,000 134,000,000 Word pairs 37,000 38,000 38,000 Word Alignment Tool: fast_align (Dyer et al, 2013) + Experimental Setup LSA Word Vector Learning Corpus Preprocessing ...hello… …hello… …hello… …hello… …hello… Context : 23.45 , 21st , 10-20-2014 , 0.5e10 NUM anchfgugsjh, wekjfbg, bhguyq UNK + Experimental Setup Evaluation Benchmarks Word Similarity Evaluation WS-353 (Finkelstein et al, 2001) WS-353-SIM (Agirre et al, 2009) WS-353-REL (Agirre et al, 2009) RG-65 (Rubenstein and Goodenough, 1965) MC-30 (Miller and Charles, 1991) MTurk-287 (Radinsky et al, 2011) Word Relation Evaluation Semantic Relations (Mikolov et al, 2013) Syntactic Relations (Mikolov et al, 2013) + Experimental Setup Multilingual Vector Learning Monolingual Vector Multilingual Vector The Length: 80 Length: ? length in projected space can be chosen: ‘k’ Choose the best value of ‘k’ for WS-353 k ε[0.1, 0.2, …, 1.0] Experimental Setup Multilingual Vector Learning Spearman’s correlation + Dimensions Performance on WS-353; k = 0.6 + Experimental Setup Multilingual Vector Learning 70 60 50 Spearman’s correlation 40 Monolingual Multilingual 30 20 10 0 WS-353 RG-65 Mturk-287 + Experimental Setup Multilingual Vector Learning 35 30 25 Accuracy 20 Monolingual Multilingual 15 10 5 0 Semantic Syntactic + Experimental Setup Multilingual Vectors: Neural Networks RNNLM (Mikolov et al, 2011) Predict next word given the history Neural language model Recurrent hidden layer connections Skip-Gram, word2vec (Mikolov et al, 2013) Predict context given the word Removes hidden layer Vocabulary represented in Huffman coding + Experimental Setup Multilingual Vector Learning 50 45 40 35 30 25 20 15 10 5 0 80 70 60 50 40 30 20 10 0 RNNLM Mono Multi Skip-Gram + Experimental Setup Multilingual Vectors: Scaling Spearman’s correlation on WS-353 + Experimental Setup Multilingual Vectors: Qualitative Analysis Antonyms and Synonyms of “Beautiful”: Monolingual Setting t-SNE tool (van der Maaten and Hinton, 2008) + Experimental Setup Multilingual Vectors: Qualitative Analysis Antonyms and Synonyms of “Beautiful”: Multilingual Setting t-SNE tool (van der Maaten and Hinton, 2008) + Conclusion CCA: Easy to use tool in MATLAB Take vectors from two languages and improve them. Multilingual Even if the problems are inherently monolingual. More Information is Important Effective for Distributional Vectors Semantics generalizes better than Syntax. Vectors available at: http://cs.cmu.edu/~mfaruqui + Related Work Document representation Bilingual word vectors Vinokourov et al, 2002, Klementiev et al 2012 Platt et al, 2010 Zou et al, 2013 Synonymy and Paraphrasing Bannard and Burch, 2005, Ganitkevitch et al, 2013 Bilingual lexicon induction Haghighi et al, 2008 Vulic and Moens, 2013 Translation Models Kalbrenner & Blunsom, 2013 Compositional Semantics Hermann & Blunsom, 2014 + Thanks! Visit us at ACL-demo: wordvectors.org
© Copyright 2025