Predicting Dialect Variation In Immigrant Contexts Using Light Verb Constructions A. Seza Doğruöz (NIAS) & Preslav Nakov (QCRI) [email protected] [email protected] Summary • Motivation: • Most NLP tools ignore immigrant varieties (NL-Turkish) and focus on the standard ones (TR-Turkish). Example (1) Misafir-ler-e O Guest-pl-dat arkadaș overplaat-sen yap-ıl-acakt-ı. replace-inf do-pass-fut-past • Our Objective: Distinguish between • Turkish spoken in the Netherlands (NLTurkish) vs. Hoca-m • • • We are the first to predict on-going dialect variation in immigrant contexts as opposed to studying established dialect variations. We are also the first to compare bilingual LVCs with monolingual ones across two dialects of the same language. Our comparison of grammatical versus contextual features reveals context to be much more important. We experiment with LVCs extracted from natural spoken data rather than relying on isolated occurrences out of context. TR-Turkish spoken corpus: 22 speakers from Turkey (28,731 words) Corpora #etmek NL-Turkish 449 TR-Turkish 527 Total 976 #yapmak 543 755 1298 #Total 992 1282 kadın doğum et-ti. (instead of “yap”) birth do-past. “The lady gave birth there.” Experiments 2. Case Marking (presence or absence) Do-part one thing exist.not. Features: (1) words from the LVC context, (2) type of the light verb (yapmak or etmek), (3) the nominal complements, (4) finiteness of the verb (finite/nonfinite), (5) case marking on the nominal complement (yes/no), (6) word order (VO/OV), (7) etymological origins of the nominal complement (Arabic/Dutch/ French/English/Persian/Turkish/mixed). Example (3) “There is nothing to do.” Classifier: SVM with linear kernel hitap edi-yo-z biz. address do-prog-1pl we 3. Word Order in LVCs (OV or VO) “We address (him) as the teacher.” Example (6) Yap-acak bir șey do-pres-past-1sg. “I used to prepare food sometimes. “ yok. Context yemek yap-ar-dı-m. Sometimes food • buy-past-1pl. Example (5) There lady diye Teacher-poss.1sg. as Bazen Contributions for “We bought it to serve the guests.” Orda Example (2) • Turkish spoken in Turkey (TR-Turkish). • Focus: Light Verb Constructions serve do-inf al-dı-k. 2. Type of the Light Verb Arabic Influence • Dialect-aware NLP tools ikram et-mek için “That friend would have been replaced” • Long-term goal: immigrant NL-Turkish spoken corpus: 46 speakers from the Netherlands (74,461 words) Example (4) Dutch That friend Data 1.Finiteness 1.Etymological Origin • Immigrant languages (e.g., Turkish) change due to contact with the language (e.g., Dutch) of the host communities. • Translations that take varieties into account. Verbal Features Nominal Features We also included the words surrounding the LVCs. Evaluation: 5-fold cross validation Two experiments: distinguish left/right context? Results Features Baseline Full Model No context No nominal complements No info about etymological origin No finiteness No case marking No word order No verb type Discussion Left vs. Right No Split 56.38 82.81 Also important: 84.30 70.67 82.19 82.10 83.03 82.76 82.89 82.94 Most important feature: Context 83.64 83.99 84.35 84.43 84.43 84.39 • nominal complements; • etymological origin. Future Work • other dialects in immigrant settings (e.g., Turkish spoken in Germany); • other MWEs (e.g., noun compounds).
© Copyright 2025