DIFFICULTIES FOR LEARNING PERSIAN AS A SECOND LANGUAGE Megerdoomian MITRE

DIFFICULTIES FOR LEARNING PERSIAN
AS A SECOND LANGUAGE
Karine Megerdoomian
MITRE
Approved for Public Release; Distribution Unlimited. Case number 08-0171 .
©2008 The MITRE Corporation. All rights reserved.
Source: American Educational
Research Foundation, adapted
from National Virtual
Translation Center (2006),
“Language Learning Difficulty
for English Speakers”
Goals of this talk
4
What are the factors that place Persian in
Category II of language difficulty for a speaker
of English?
1.
i.e., more difficult than French, Swedish or German but
easier than Arabic or Chinese

2.
How can linguistics help in the classroom?


use linguistic patterns to explain difficult issues
computational tools to discover and practice language
patterns
Language Relatedness

How close are Persian and English?
They are both members of the Indo-European
language family
Source: The American Heritage® Dictionary of
the English Language, Fourth Edition. Copyright
© 2000 by Houghton Mifflin Company.
Source: Circle of Ancient Iranian Studies, SOAS
Language Relatedness

Persian and English are both Indo-European
 They
share certain linguistic features (verb conjugation,
morphology patterns)

But their writing systems are very different
 Iranian
and Dari Persian use an extended version of the
Arabic script (4 additional letters)
 Tajiki Persian uses an extended version of the Cyrillic
script
Writing System
Writing goes from right
to left
 Letters are completely
unfamiliar for English
speakers
 Letters must be connected together rather than being
written separately
 Letters have up to 4 different shapes based on
context in the word (‫)هـ ـهـ ـه ه‬
 Diacritics are generally not written

Writing System: Diacritics


Tanvin
‫( اقال‬aqla) 
aqalan
Alef maksura
‫( حتی‬hti) 
hattâ
‫اقلن به خودم مي گفت‬
‫رژیمی که حتا قادربه گرفتن دست دوستان خود نیست‬

 

Vowels (a,e,o)
‫( کرم‬krm)
 kerm (worm), korom (chrome), karm (vine),
karám (generosity), káram (I’m deaf),
kerem (cream)
Writing System: Ambiguous Letters

‘he’: /h/ after a vowel and /é/ after a consonant
 mâhâné ‫( ماهانه‬monthly)
kuh ‫( کوه‬mountain), sé ‫( سه‬three)
‫ به‬ beh (quince) vs. bé (to)

‘vav’: /v/ after a vowel and /u/, /ow/ or /o/ after
consonants

sovashun ‫( سووشون‬Sovashun)
gâv ‫( گاو‬cow), ravand ‫( روند‬process)
dur ‫( دور‬far), dowr ‫( دور‬around)
Writing System
12
Difficult things:
 Missing diacritics
 Ambiguous letters (he and vav)
 Sound to spelling correspondence (dictation)
ambiguous: e.g., /z/ can map to several letters
‫ = زظ ذ ض‬/z/ ‫ = س ص ث‬/s/ ‫ =هـ ح‬/h/
 But simplified pronunciation
Writing System
13
Easy things:
 Alphabet system (unlike Chinese)
 Regular, one-to-one correspondence of letter to sound
overall (more than French or English!)
Writing System: Handwriting!!
14


Forms of the letters can be quite different from
the typeset (naskh) form
Used very commonly in
writing letters, manuscripts,
class notes, but also street
signs, ads, at grocery stores
Outline
15







Language Relatedness
Writing System
Lexicon
Morphology
Syntax
Social issues
Computational linguistics
Lexicon
16
Difficulties:
 Lack of cognates for English speakers (in contrast
with French, Spanish)
 Distinctions that do not exist in English
 time
 know
 ‫( دفعه‬vez) vs. ‫( وقت‬tiempo) vs. ‫(ساعت‬hora)
 ‫( دانستن‬saber) vs. ‫( شناختن‬conocer)
Lexicon: Language Contact
17
Persian has been in contact with many languages
historically and has many borrowings
 Other Iranian languages (e.g., Parthian)
‫ اژدها‬eždehâ (dragon), ‫ آژیر‬âžir (siren), ‫ چهره‬chehre (face),
‫ پیغام‬peyghâm (message), ‫ فرشته‬fereshte (angel)

Arabic forms a large part of the vocabulary
‫سعی کنیم از کلمات عربی استفاده نکنیم‬
‘Let us try not to use arabic words’
 But
watch out for false friends!
‫ جامعه‬jâme’e (society), ‫ جواب‬javâb (answer), ‫ کنیسا‬kenisâ (synagogue)
Lexicon: Language Contact
18

French words entered the language in late 19th/early
20th century
‫ شوفاژ‬،‫ نوستالژی‬،‫ سکسوالیته‬،‫ تئاتر‬،‫ مرسی‬،‫ سندیکا‬،‫ مونارشی‬،‫دموکراسی‬
démocratie, monarchie, syndicat, merci, théâtre, sexualité,
nostalgie, chauffage

Most new technical words are English
‫ کلیک کردن‬،‫ کامپیوتر‬،‫ فیلترینگ‬،‫ آن الین‬،‫ دانلود‬،‫ایمیل‬
e-mail, download, online, filtering, computer, click …
Note: Dari Persian has more English borrowings

Most common, everyday words have to be learned as
new words as they are usually not borrowed
Outline
19







Language Relatedness
Writing System
Lexicon
Morphology
Syntax
Social issues
Computational linguistics
Morphology
20
Easy Parts:
 Concatenative like English: use of prefixes and
suffixes
 Ex.
sâz from sâkhtan (‘to build’ or ‘to agree’)
‫ سازش‬sâzesh (agreement), ‫ نساز‬nasâz (uncompromising, disagreeing),
‫ سازگار‬sâzegâr (agreeable), ‫ ناسازگاری‬nâsâzegâri (disagreement),
‫ سازش پذیری‬sâzesh paziri (compatibility)


No gender, case, dual number
No agreement of adjective or demonstrative with
noun
MorphoSyntax
21

But Persian is more free in suffixation  Words that
equal a sentence
‫کوچکترینهایشانند‬
kuchek-tarin-hâ-yeshân-and
small-Sup-Pl-Pron.3pl-Cop.3pl
‘They are the smallest ones of them.’
‫پسرخالههامونین‬
pesar khale-hâ-mun-in
son aunt-Pl-Pron.1pl-Cop.2pl
‘You are our cousins.’
Morphology
22
Difficult Parts:
 Many forms for the plural
‫ کتابها‬ketabhâ (books), ‫ بینندگان‬binandegân (viewers), ‫ کلمات‬kalamât (words),
‫ مسافرین‬mosâferin (travelers), ‫ روحانیون‬ruhânyun (clergy)

Arabic morphology system (‫)اشتقاق‬
 Mostly
plural and participial forms
 Based on a root template: ‫ اشعار‬،‫ شعرا‬،‫ مشاعره‬،‫ شاعر‬،‫شعر‬
 But there are patterns that can be learned
Morphology Patterns: Plurals
23
Plurals with -ân
Mothers
Children
Residents
Iranians
Students
Animals
‫مادران‬
‫کودکان‬
‫ساکنان‬
‫ایرانیان‬
‫دانشآموزان‬
‫جانوران‬
 Only on Animate nouns
Plurals with -hâ
Books
Centuries
Opinions
Iranians
Houses
‫کتابها‬
‫قرنها‬
‫عقیدهها‬
‫ایرانیها‬
‫خانهها‬
Morphology Patterns: Plurals
24
Plural ending -gân
parandegân
nevisandegân
fereshtegân
‫پرندگان‬
‫نویسندگان‬
‫فرشتگان‬
zelzele zadegân ‫زلزلهزدگان‬
 nouns ending in ‘e’
(written as he)
Plural ending -yân
âshnâyân
dâneshjuyân
binavâyân
zurguyân
Plural ending -ân
dustân
‫دوستان‬
‫دانشجویان‬
farzandân
‫فرزندان‬
‫بینوایان‬
mehmânân
‫مهمانان‬
‫آشنایان‬
‫زورگویان‬
 nouns ending in
vowel ‘â’ or ‘u’
zanân
‫زنان‬
 nouns ending in
consonant
Morphology Patterns: Plurals
25
Plural form
thoughts
waves
individuals
tribes
poems
Arabs
goals
afkar
amvaj
afrad
aqvam
ashEar
aErab
ahdaf
xyz  axyaz
Singular form
‫افکار‬
‫امواج‬
‫افراد‬
‫اقوام‬
‫اشعار‬
‫اعراب‬
‫اهداف‬
thought
wave
individual
tribe
poem
Arab
goal
fkr
mvj
frd
qvm
shEr
Erb
hdf
 zyx
z‫ا‬yx‫ا‬
 ‫فکر‬
‫ ا فـکـــا ر‬
‫فکر‬
‫موج‬
‫فرد‬
‫قوم‬
‫شعر‬
‫عرب‬
‫هدف‬
Outline
26







Language Relatedness
Writing System
Lexicon
Morphology
Syntax
Social issues
Computational linguistics
Knowledge of Arabic
27
Can help in:
 Writing / Reading
 Vocabulary
 Morphology based on Arabic patterns
But Persian syntax is very different from Arabic
Syntax: Word Order
28




Word order different from English: verb-final
Scrambling allowed (free word order)
Very long sentences in print
Linking element (ezafe) is not written  makes it
hard to identify boundaries
Source: Hamshahri
‫رئیسجمهوری که در شب عاشورای حسینی در جمع دانشجویان عزادار‬
،‫ در سخنانی که واحد مرکزی خبر‬،‫كوي دانشگاه تهران حضور یافته بود‬
‫ موضوع دیگري که آنها مي خواستند به‬:‫ افزود‬،‫آن را مخابره کرده است‬
‫ این بود که در ایران هرگونه اقدامي براي حرکت در‬،‫ایران تحمیل کنند‬
‫مسیر فناوري هسته اي ميبایست قبال به اطالع قدرتهاي غربي برسد‬
‫و ملت ایران این کار را بدتر از قرارداد ترکمنچاي تشخیص داد و آن را‬
.‫رد کرد که این امر بدون یاري و قدرت الهي امکان نداشت‬
The President, that on the night of Husseini Ashura was present among
the group of the mourning students of Tehran University Street, in a
speech that the central news unit has broadcast added: Another matter
that they wanted to impose on Iran was that any step in Iran moving
towards Uranium enrichment must first be notified to the Western
powers and the people of Iran recognized this issue as worse than the
Treaty of Torkmanchay and rejected it, which could not have been
possible without divine help and power.
Syntax: Word Order
30
‫رئیسجمهوری که در شب عاشورای حسینی در جمع دانشجویان عزادار‬
... ،‫كوي دانشگاه تهران حضور یافته بود‬
president that on night Ashura-of Hosseini in group university-students
mourning street university tehran presence had found, …
The President, that on the night of Husseini Ashura was present among
the group of the mourning students of Tehran University Street, …
Syntax: Word Order
31

Noun Phrase word order is quite structured
.‫این دو تا کتاب کهنه خودش رو بهم هدیه داد‬
this two CL book old his-own OBJ to-me gift gave
‘He gave me these two old books of his as a gift.’
-
Det Num Cl Noun Adjectives Possessor
Elements after the Noun are linked by the ezafe
Elements before the Noun have no ezafe
Syntax: Ezafe
32

Noun + modifiers + possessor
‫‘ کتاب کهنه پیام‬Payam’s old book’

Preposition + complement
‫‘ روی کتاب کهنه پیام‬on Payam’s old book’

Proper names
‫‘ احمد شاملو‬Ahmad Shamlu’

Geographic names
‫‘ دریای مازندران‬Caspian Sea’
‫‘ کوه دماوند‬Damavand Mountain’
Syntax: râ
33

Specificity, rather than definiteness, is marked
‫دیشب کتاب خوندم‬
‫دیشب کتاب رو خوندم‬
‘I read a book/books last night’
‘I read the book last night’
‫من دیشب یه کتاب خوندم‬
‘I read a book last night’
‫من دیشب یه کتاب رو خوندم‬
‘I read a (specific) book last night’
 Specific direct objects are marked by ‫‘ را‬râ’ (‫‘ رو‬ro’)

Absence of article does not mean indefinite or
definite  bare nouns have ambiguous roles
Syntax: Light verb constructions
34
Word for word translation
Water to boil _______
came.
His friend to cry _______
threw.
I window open made.
_______
Superman her rescue _____
gave.
How much you worry are ____
eating!
Persian sentence
.‫آب به جوش آمد‬
______
.‫دوستش رو به گریه انداخت‬
______
______
.‫من پنجره را باز کردم‬
______
.‫سوپرمن اورا نجات داد‬
______
!‫چقدر تو غصه میخوری‬
Syntax: Light verb constructions
35
keshidan ‘pull, drag’
be in pain
‫درد کشیدن‬
to take pains
‫زحمت کشیدن‬
to wait
‫انتظار کشیدن‬
to scream
worry
‫غصه خوردن‬
catch a cold ‫سرما خوردن‬
be deceived ‫گول خوردن‬
be slapped
‫سیلی خوردن‬
to be ashamed ‫خجالت کشیدن‬
to yell
‫فریاد کشیدن‬
be beaten
‫کتک خوردن‬
to suffer
‫رنج کشیدن‬
be shot
to last
‫طول کشیدن‬
‫داد کشیدن‬
 Focus on the
duration of the event
zadan ‘hit’
khordan ‘eat, collide’
be defeated ‫شکست خوردن‬
‫تیر خوردن‬
comb
brush teeth
‫شانه زدن‬
‫مسواک زدن‬
sweep
‫جارو زدن‬
whip
‫شالق زدن‬
stab
‫چاقو زدن‬
pedal
‫پا زدن‬
beat (w/ wood) ‫چوب زدن‬
wax
‫واکس زدن‬
 Subject is affected  Repetitive event
using an instrument
(negatively)
Syntax: Interference from English
36


Interference: Transference of elements of one
language to another
Embedded questions
.‫*میخوام بدونم اگه میتونی بیای‬
‘I want to know if you can come’

Preposition subcategorization
Write it in English
I am on the phone
We arrived on time
He used your book
‫به انگلیسی بنویسید‬
‫پا تلفنم‬
‫ما سر ساعت رسیدیم‬
‫از کتابت استفاده کرد‬
Syntax: Subjunctive mood
37

Used in more contexts than e.g. Romance languages
Possibility, probability,
necessity, ability
Desire, will, preference,
hope, command
Doubt implied
Expressions of emotion
Adjectival clauses
Purpose clause
Deliberative interrogative
Temporal expressions
‫‘ ممکنه بخوابم‬I may sleep’
‫‘ باید بخوابم‬I must sleep’
‫‘ نمیتونم بخوابم‬I can’t sleep’
‫‘ میخواد بخوابم‬He wants that I sleep’
‫‘ اجازه میدی بخوابم ؟‬Would you allow me to sleep?’
‫‘ اگه بخوابم سرحالتر میشم‬If I sleep I’ll feel better’
‫‘ فکر کنم خوابیده باشه‬I think he may be asleep’
‫‘ میترسم بخوابم‬I am afraid I might sleep’
‫‘ هرکسی که بخوابه کباب نمیگیره‬Whoever sleeps will get no kabob’
‫‘ رفتم خانه که بخوابم‬I went home to sleep’
‫‘ آخه کی بخوابم ؟‬Well, when should I sleep?’
... ‫‘ قبل از اینکه بخوابم‬Before I sleep …’
Outline
38







Language Relatedness
Writing System
Lexicon
Morphology
Syntax
Social issues
Computational linguistics
Social Issues: Taarof
39


Taarof: Honorific system is quite complex and
affects morphology and syntax
Pronoun usage
 2nd
person:
 3rd person:

‫ تو‬vs. ‫شما‬
‫ او‬vs. ‫ایشان‬
Verb choice
 come
 as
I told you
‫بیاین ← تشریف بیارین‬
‫همونجور که بهتون گفتم‬
‫← همونجور که خدمت شما عرض کردم‬
Social Issues: Diglossia
40

Diglossia: Two distinct varieties of the language coexist
in society
 Literary
variant (used in newsprint, official documents, and
literature)
 Conversational variant (used for everyday conversations,
writing letters, in some weblogs, modern literature)


Most textbooks teach only the literary variant
What is the goal of the learner?
 To
speak with people?
 To give lectures? To study literature?
 To read and analyze newsprint only? Blogs? Letters?
Teaching only the literary language has shortcomings:
41
1.
Students learn obsolete things
Future
Present Perfect
Literary
Conversational
‫خواهم رفت‬
‫می رم‬
khâham raft
miram
‫شام خوردم‬
‫شام خوردم‬
shâm khórdam
shâm khórdam
‫شام خوردهام‬
‫شام خوردم‬
shâm khordé am
shâm khordám
‘I will go’
‘I ate dinner’
‘I have eaten
dinner’
Teaching only the literary language has shortcomings:
42
2.
Students don’t learn conversational forms
 Definite
marker:
 Verb forms
Literary
 Use
‫مرده‬
‫بگویم‬
beguyam
‫بگم‬
begam
‘I say’ [subj]
‫میآیند‬
mi-âyand
‫میان‬
miân
‘they are coming’
‫میاندازد‬
mi-andâzad
‫میندازه‬
mindâze
‘she’s throwing’
‫دادم بهش‬
dâdam behesh
‘I gave it to him’
word order
 ‫ازش‬
‘the man’
Conversational
of pronominal clitics:
 Free
marde
‫عین خر داره خون میاد‬
 ‫هیچوقت حس استیصال بهم دست نداده بود تهران‬
Outline
43







Language Relatedness
Writing System
Lexicon
Morphology
Syntax
Social issues
Computational linguistics
Computational Linguistics
44


Study of language from a computational
perspective
Components
 Analyze
the word form so you can look up the word
automatically in a dictionary or use it to search
documents containing variants of that word
 Parse sentences by analyzing their structures and
identifying their constituents
 Translate sentences into e.g., English  requires all of
the above as well as semantic analysis to disambiguate
word choices.
Computational Linguistics
45

Using computational linguistics in the language
classroom
 Intelligent
tutoring systems: automatically detect errors
in web-based exercises and provide feedback to
learners
 Finding language patterns: can be used by teachers or
learners to detect language patterns in authentic media
(e.g., conversational language in weblogs)
 Automatic classifiers: can help teachers find relevant
material for a lesson
 Evaluation: is the learner’s writing improving?

46
Persian has high web presence and is among top 10
blog languages in the world
Issues of Persian for Computational
Approaches
47







Writing system
Detecting noun phrase boundaries (where is the
ezafe?)
Morphological patterns
Word order issues (free word order)
Light verb constructions (especially when separated)
Ambiguities in the lexicon
Conversational Persian forms (very different from
Literary variant)
Conclusion
48

Studied factors that make Persian more or less
difficult for speakers of English
 writing



system, morphology, lexicon, syntax, social issues
Emphasis has been placed on communicative
approaches at the detriment of explanations
What may seem difficult (e.g., plurals, Arabic
morphology, light verb constructions) often follow
patterns that can be generalized  no need to
learn by heart necessarily
Computational linguistics can be used in the
classroom and on authentic media to discover
language patterns or to generate feedback