Building up Corpus of Technical Vocabulary – Strategies and Feasibility

Building up Corpus of Technical
Vocabulary – Strategies and
Feasibility
Presenters: Dr. Aparna Palle, Preetha Anthony
GNITS, HYDERABAD
An overview of the presentation
•
•
•
•
•
•
•
•
Introduction
Theoretical premise
Interfacing of ESP and Corpora
Criteria for selection of words
Web Tools
The Corpus
Classroom techniques
Conclusion
What is Corpus?
• Corpora or corpuses are simply large collections
or databases of language, incorporating stretches
of discourse ranging from a few words to entire
books. (Norbert Schmitt, 2000).
• A corpus is a collection of naturally occurring
texts that is usually stored on a computer. (Randi
Reppen, 2011).
• A corpus is a large collection or database of
machine-readable texts involving natural
discourse in diverse contexts. (Bernardini, 2000)
Definition
• A Corpus is an inventory of essential language
inputs drawn from authentic contexts using
web tools.
Why Corpus?
• Emphasis on the specific needs of the learners of
professional courses.
• Limited vocabulary to perform academic tasks.
• Lack of knowledge of specialised vocabulary.
• Corpus data provide descriptive insights relevant to
how people use language.
• Acts as tool that enable students and instructors to
analyse both how people use different language
forms at various levels of formality and how language
fulfils multiple speech functions across contexts.
Why Corpus? (contd.)
• Learning activities centred on analysing corpus data
are consistent with current principles of languagelearning theory, that is students develop more
autonomy when they receive guidance about how to
observe language and make generalizations.
• Such activities promote noticing and grammatical
consciousness raising (Schmidt 1990), which can
enhance second language learning and development.
Word-building criteria
•
•
•
•
•
•
Frequency and Range
Keyword in context
Collocation
Homonymy
Word families
Idioms and set expressions etc…..
Web tools
•
•
•
•
•
•
AWL Highlighter
British National Corpus (BNC)
Collins Cobuild Corpus Concordance Sampler
Compleat Lexical Tutor
Corpus.BYU.edu
Corpus of Contemporary American English
(COCA)
• WordSmith
Source: Materials Development in Language Teaching, Ed. By Brian Tomlinson (1998)
AWL Highlighter
Corpus of Computer Programming Word List (CCPWL)
Source from which the Corpus was
extracted
“C the Complete Reference”
Herbert Schildt
Distinguishing Technical Vocabulary
(Computing) from others
Category 1: The word form appears rarely if at all outside this particular field
De bug, operand, recompile, loop Purely Technical
Category 2: The word form is used both inside and outside this particular field but not with the same
meaning
Characters, flag, error, default, constants Homonyms - specialised
Category 3:
The word form is used both inside and outside this particular field, but the majority of
its uses with a particular meaning though not all, are in this field. The specialised meaning it has in this field is
readily accessible through its meaning outside the field.
Variable, parameter, in-put, out-put, pre-fix, code Homonyms - general
Category 4:
The word form is more common in this field than else where. There is little or no
specialisation of meaning, though someone knowledgeable in the field would have a more precise idea of its
meaning.
Manuals, memory, application, functions Literal Meaning
Filling Word Parts
Noun
Verb
Adjectives
Adverbs
Compatability
Programme
Incremental
variously
Cutting up complex words
Word
Meaning
Decode:
a methodical process of finding and reducing the
number of defects, in a computer program or a
piece of electronic hardware
Encode
The process of assigning load addresses to various
parts of a program and adjusting the code and
data in the program to reflect the assigned
addresses
Debugging
the process of putting a sequence of characters
(letters, numbers, punctuation, and certain
symbols) into a specialized format for efficient
transmission or storage
Relocation
the conversion of an encoded format back into the
original sequence of characters
Meanings of the Prefixes:
Re – Again En – also De- down, away completely removal, reversal
Choosing the Correct Form
Learning C is similar and ____ (easy). Instead of
straight-away l______ (learn)how to write
programs, we must first know what alphabets,
numbers and special symbols are ____ (use) in C,
then how _____ (use) them constants, variables
and keywords are _____ (construct), and _____
(final) how are these _____ combine) to form an
_____ (instruct).
Strengthening the
Form – Meaning Connection
Word
Definition
Manual
a value automatically assigned
Syntax
A well structured collection of information for reference
Default
the set of rules that defines the combinations of symbols
Answering questions
• Qn. Differentiate between syntax error and
semantic error.
• Ans. A syntax error is an error in the type of code
or statement. A semantic error basically means
invalid logic.
• Qn. What is the difference between character
array and integer array?
• Ans. Character array stores an array of characters,
where as an integer array stores sequence of
number integers.
Defining in the second language
(a) Term (b) class (c ) defining characteristics
(a) A character constant is (b) either a single alphabet,
a single digit or a single special symbol (c) enclosed
within single inverted commas.
(b) A variable in C is (b) a quantity which may vary (c )
during programme execution.
(a) Key words are (b) the words whose meaning has
already been explained (c ) to the C compiler.
Conclusion
• Writing skills of the learners would be enhanced with the
appropriate use of technical vocabulary.
• Teaching of vocabulary becomes meaningful enhancing their
academic writing.
• The learners would be able to produce better answers using the
words from the corpus – the end result from examination point of view is fulfilled.
• Enhancement of learner autonomy.
• Confident in their discourse with the professional community.