Verb Sense Disambiguation Tiago Manuel Paulo Travanca

Verb Sense Disambiguation
Tiago Manuel Paulo Travanca
Advisors: Prof. Nuno Mamede and Prof. Jorge Baptista
Instituto Superior T´ecnico
Spoken Language Systems Laboratory – INESC-ID Lisboa
Rua Alves Redol 9, 1000-029 Lisboa, Portugal
[email protected]
Abstract. This work addresses the problem of Verb Sense Disambiguation (VSD) for the European Portuguese. VSD is a sub-problem of the
Word Sense Desambiguation (WSD) problem, which tries to identify the
sense of a polysemic word used in a given sentence.
Two approaches to VSD were considered. The first, rule-based, approach
makes use of the lexical, syntactic and semantic descriptions of the verb
senses present in a lexical resource (ViPEr). The second approach uses
machine learning with a set of features commonly used in WSD.
Both approaches were tested in several scenarios to determine the impact
of different features and different combinations of methods. The baseline
accuracy of 84%, resulting from the Most Frequent Sense (MFS) for
each verb lemma, was both surprisingly high and hard to surpass. Still,
both approaches provided some improvement over this value. The best
combination of the three techniques yielded an accuracy of 87.2%, a gain
of 3.2% above the baseline.
1
Introduction
Nowadays, Natural Language Processing (NLP) has many applications: search
engines (semantic/topic search rather than word matching), automated speech
translation, automatic summarization, among others. Still, one problem that all
NLP systems have to deal with is ambiguity, i.e. the fact that a certain expression
can be interpreted in more than one way. Ambiguity appears at different stages
in the processing of a text or a sentence, such as, tokenization, sentence-splitting,
part-of-speech (POS) tagging, syntactic parsing and semantic processing.
This work addresses semantic ambiguity of verbs. Considering that the sentence has already been parsed and, even if its syntactic analysis (parse tree)
is unique and correct, some words may feature more than one meaning for the
grammatical category they were tagged with. For example, sentences (1a) and
(1b) show how the verb to count, used in the same position on both sentences,
means enumerate on the first sentence while it stands for rely on the second:
(1a) I am counting the number of students
(1b) I am counting on you to help me
2
In these examples, the meaning of count in (1a) results from its directtransitive construction and plural object, while in (1b) the preposition introducing the second argument and the human trait of that same argument of
count determine the verb’s sense.
To tackle this problem, two modules were developed: one, rule-based, using a
set of linguistic resources, specifically built for Portuguese (ViPEr); and another,
based on machine learning, which makes use of a set of features, commonly
adopted for the WSD problem, to correctly guess the verb sense. Both modules
were integrated in a fully-fledged NLP system, STRING [1].
2
Related Work
2.1
Lexical Resources
Many works on WSD [2,3,4] rely on existing linguistic resources as catalogues
of word senses (and their relations). For English, one can cite WordNet [5],
FrameNet [6,7,8] and VerbNet [9]. For (European) Portuguese, though other
resources have been built they are not yet freely available to the general public.
We thus considered using ViPEr [10], which features a classification based on
the syntactic-semantic properties of the most frequently occurring verbs.
In the particular case of VSD, the most important features to be considered
are formal features, such as the syntactic patterns of the sentence forms, and
the distributional constraints resulting from the selectional restrictions of the
verb on its arguments. Besides the resource here mentioned, there are other
known sources with linguistic information on verbal constructions of Portuguese.
Among others, the most relevant (and recent) are described in [11] and [12], each
developed in a different theoretical framework. However, none of these works
have a machine readable form, thus, they cannot be automatically processed.
2.2
Word Sense Disambiguation Approaches
An overview of the most used algorithms and features for WSD was conducted,
based on the systems evaluated at the SensEval31 [13] All-Words and Lexical
Sample tasks (2004). The most commonly used techniques [14] were the Naive
Bayes algorithm, Decision Lists, Vector Space Model (with cosine similarity function) and Support Vector Machines; other methods, such as Maximum Entropy
models, Adaptative boosting and Latent Semantic Analysis, were also used by
some systems.
The majority of systems proposed at SensEval3 divided the features used to
describe the instances into four different groups, as follows:
– Collocations: n-grams (usually bigrams or trigrams) around the target word,
composed by the lemma, word-form and part-of-speech tag of each word.
– Syntactic dependencies: dependencies connecting the surrounding words to the
target word. Typically, the relations of subject, object and modifier are used.
1
http://www.senseval.org/
3
– Surrounding context: words in a defined window size are collected and used in
a bag-of-words approach. Usually, the word form and lemma are extracted;
– Knowledge-Based information: some systems make use of information such as
WordNet domains, or FrameNet syntactic patterns, for example.
The results obtained by the best systems at the SensEval3 conference for the
English All-Words and the Lexical Sample tasks were 65.1% and 72.9% precision,
respectively.
3
Rule-based Disambiguation
Mappings
Verb
Lexicon
Rules
Rule
Sorting
Differences
Rule
Generation
Verb List
Difference
Finder
ViPEr
Pre
Processing
The large number of verbs involved and the complexity of the rule generation
task required the construction a specific module for rule generation, whose architecture is illustrated in Figure 1.
Disamb.
Rules
Feature
Priority
Fig. 1. Rule Generation Module Architecture
The first step in the rule generation process, the Pre-Processing, takes as its
input the lexical resource information (ViPEr) and converts the linguistic information to an internal structure that represents each verb as a set of meanings.
Each meaning, in turn, is represented as a collection of features and their possible values. This module is also responsible for producing the lexicon file (the
verb lemmas and their meanings) used by the grammar of XIP [15], the parsing
module of STRING.
After the parsing is complete, the system passes the processed information
to the next module: the Difference Finder. This module compares the different
meanings of the same lemma and generates a difference for each argument found
different between each pair of meanings.
The rule generation module is the last transformational step in the system.
The module transforms differences into rules, with each difference usually generating two rules, one for each verb meaning contained in that difference. This
step requires an additional parametrization file, mappings, to make the bridge
between the features used in ViPEr and those provided by XIP.
4
Finally, the system sorts the rules according to the feature/trait being tested
in each of them, since XIP executes the rules in the order they are present in
the file. This order favours, firstly, the properties related to structural patterns
in sentences, followed by several semantic/selectional restrictions, according to
their strength.
The base features considered by the system are the selectional restrictions on
the verb’s arguments (N0 to N3 ) and their respective prepositions, the property
vse, denoting intrinsically reflexive constructions (e.g. queixar-se ‘complain’) and
transformational properties regarding the most common two types of passives
(with auxiliary verbs ser and estar ‘be’).
Additionally, two other properties were added to ViPEr during this work and
are used by the rule generation system. The first, vdic property, indicates which
verb meanings (usually only one per lemma) allows the verbum dicendi construction pattern, with subject inversion. Then, the aNi=lhe feature, indicating the
possibility of a dative pronominalization for some of the verb arguments.
4
Machine Learning Disambiguation
The architecture of the supervised classification approach, presented in Figure 2,
uses MegaM [16] as the machine learning module, which is based on maximum
entropy models [17]. The typical feature extraction and classification modules
of supervised classification systems were both implemented inside XIP (that is,
inside STRING), to allow XIP rules to be built afterwards to solve other NLP
tasks, such as semantic role labeling. In the training phase, STRING takes annotated data (sentences) and extracts the features describing each verb instance
and its respective manually annotated verb class (label). Then, MegaM uses that
information to build a classification model for each verb lemma. In the prediction phase, STRING assigns a verb class (label) to each verb instance in the raw
data, using the models built by MegaM.
Label
STRING
Annotated
Data
MegaM
Classification
Model
Features
STRING
Raw
Data
Labelled
Data
Fig. 2. Supervised Machine Learning Architecture for VSD using STRING
5
4.1
Training Corpus
Due to the time required for manual annotation of the training data, only some
verbs were addressed by this approach during this project. The verbs chosen
were: explicar, falar, ler, pensar, resolver, saber and ver. This choice was based
on the higher number of instances left to disambiguate that these lemmas exhibited after the rule-based testing. Further analysis revealed that these verbs also
formed an heterogeneous group, with different MFS percentage values, different
number of senses and different ambiguity classes.
4.2
Features
The features used by the machine learning module follow those presented in
section 2.2, and can be organized into three groups, as follows:
– Local features, extracted by the system in a window of size 3 around the target
verb, resulting in a total of 6 tokens, with their respective indexes (-3 to +3).
Information regarding the POS tag and lemma of each token was considered.
– Syntactic features, regarding constituents with a direct relation with the verb.
The dependencies used in the system are those of SUBJ (subject), CDIR (direct
complement) and MOD (modifier). The POS tag and lemma of the head word
of each node in these relations is extracted, together with the dependency name.
– Semantic features are extracted for the head words of the nodes that have a
direct relation with the verb, as they act as the selectional restrictions on the
verb arguments. Each semantic trait found in the head word of the node is also
accompanied by the dependency name. The semantic features considered by the
system are those defined in ViPEr.
Additional features that do no fit exactly into any of those categories were
also used. These features concern the subject inversion pattern of verbum dicendi
constructions and information regarding the case of clitic personal pronouns.
5
Evaluation
To evaluate the two approaches presented in the previous sections, a manually
annotated corpus had to be built and validated. The corpus chosen for this
evaluation contains around 250 thousand words with, 38,827 verbs manually
disambiguated, from which 21,368 (about 55%) are full verbs. From the full verb
constructions, 13,030 correspond to ambiguous verbs.
The most frequent sense (MFS) for each verb lemma was adopted as the
baseline against which the developed modules were compared and evaluated.
Since the corpus was the only available annotated data, it had to be used both
for counting (to determine the MFS) and for evaluation. To avoid biased results,
the common 10-fold cross-validation method was used. An accuracy of 84% was
achieved by MFS. This is related to the conservative approach underlying the
ViPEr classification, leading to an average lower number of senses per verb when
compared to other lexical resources.
6
5.1
Rules+MFS
This experiment tested three different combinations of the rule-based approach
and the MFS (baseline). In this case, the MFS classifier was applied after the
rule-based module to the remaining non-fully disambiguated instances. In the
first experiment, the disambiguation rules were applied to all the verbs, regardless of their number of senses per lemma, obtaining an accuracy of 79.15%,
below the baseline of 84%. For the second scenario, a finer grained combination was performed, with rule-based disambiguation being applied only to verbs
with 8 or more meanings. This resulted in 84.41% accuracy, which was 0.41
percentage points over the baseline. The last experiment, considered the results
of the previous one, but also included other verb lemmas that had shown positive results with rules, but had fewer meanings. The final result obtained in
the rule-based disambiguation experiments reached the 86.46% accuracy (2.46
percentage points above the baseline).
5.2
Machine Learning Disambiguation
Accuracy
The first machine learning disambiguation experiment tested the impact of using
bias, a special feature calculated by the system during the training step, which
indicates the deviation of the model towards each class. Comparing the two scenarios (with and without bias) against the previously presented methods showed
that, whenever a verb has a high MFS, it is difficult for another approach to surpass it, and only ML-Bias can equal the MFS accuracy. All verbs with low MFS
exhibited some improvement in the results (using ML-NoBias) when compared
to the MFS baseline. Still, for the verb pensar, ML-NoBias is outperformed by
the combination of rules and MFS.
100
90
80
70
60
50
40
30
20
10
0
explicar
falar
MFS
ler
Rules+MFS
pensar
ML-NoBias
resolver
ML-Bias
saber
ver
Rules+ML
Fig. 3. Machine Learning Results Comparison
Looking at the set of classes associated with each lemma, both resolver and
ver share the same ambiguity set (06, 32C), while pensar has more senses (06, 08,
7
16, 32C, 35R, 36R). This suggests that ML might not be able to deal with verbs
that have many senses or that a specific set of classes is better disambiguated
by this technique, though further testing encompassing more verbs should be
performed to verify this finding.
Afterwards, another scenario was considered, to test the performance of ML
as a complementary technique to the rule-based disambiguation, similarly to the
scenario that combined rules and MFS, presented in section 5.1.
Globally, adding ML as a complementary technique to rules proved to be
worse than just using ML for the majority of the verbs here studied. This suggests that the instances being correctly classified by the rules are a subset of the
instances correctly disambiguated by ML, and that rules are incorrectly classifying the remaining instances, which ML would otherwise guess correctly.
This experiment showed that rules and ML were not very complementary,
and that choosing ML alone provides the best improvement to the system. For
the cases where Rules+ML surpasses ML alone, other approaches provide better
results. Further testing of the ML scenarios should be performed to confirm these
findings, namely, using annotated data regarding other verb lemmas and trying
out other combination of methods and other scenarios.
5.3
Rules+MFS+ML
With all the previous information regarding each verb lemma, a final combination using all the techniques was done, choosing the best method for each lemma.
This last experiment obtained the best overall results, settling a new maximum
accuracy of 87.2% for the system, improving 3.2 percentage points above the
baseline.
Finally, a performance test was conducted to assess the impact of the newly
added modules to the global performance of STRING. This test revealed a minimal increase in the global time required for processing (around 0.05%, for 5,000
sentences).
6
Conclusions
This work addressed the problem of verb sense disambiguation for the European Portuguese. Two approaches were considered when tackling this problem:
a ruled-based, and a machine learning disambiguation. The results obtained from
the baseline (MFS) were surprisingly high (about 84% accuracy) and proved to
be hard to surpass. The integration of both modules provided better results to
the system, achieving a final score of 87.2% accuracy, an improvement of 3.2%
above baseline. Certainly, there is still much work to be done in VSD, and the
system here presented can be improved. Nevertheless, we would like to think
that some valid contribution has been made to the domain.
The integral work here presented is available at http://www.inesc-id.pt/pt/
indicadores/Ficheiros/7279.pdf
8
References
1. Mamede, N.J., Baptista, J., Diniz, C., Cabarr˜
ao, V.: STRING: An Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese. In:
Propor 2012, Demo Session (2012) Paper available at http://www.propor2012.
org/demos/DemoSTRING.pdf
2. Buscaldi, D., Rosso, P., Masulli, F.: The upv-unige-CIAOSENSO WSD system.
In Mihalcea, R., Edmonds, P., eds.: Senseval-3: 3rd International Workshop on the
Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, ACL
(2004) 77–82
3. Shi, L., Mihalcea, R.: Open Text Semantic Parsing using FrameNet and WordNet.
In: Demonstration Papers at HLT-NAACL 2004. HLT-NAACL–Demonstrations
’04, Stroudsburg, PA, USA, ACL (2004) 19–22
4. Brown, S.W., Dligach, D., Palmer, M.: VerbNet class assignment as a WSD task.
In: Proceedings of the 9th International Conference on Computational Semantics.
IWCS ’11, Stroudsburg, PA, USA, ACL (2011) 85–94
5. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: WordNet: An
On-line Lexical Database. International Journal of Lexicography 3 (1990) 235–244
6. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet Project. In:
Proceedings of the 36th Annual Meeting of the ACL and 17th COLING - Volume
1. ACL ’98, Stroudsburg, PA, USA, ACL (1998) 86–90
7. Johnson, C.R., Fillmore, C.J., Wood, E.J., Ruppenhofer, J., Urban, M., Petruck,
M.R., Baker, C.F.: The FrameNet Project: Tools for Lexicon Building (2001)
8. Ruppenhofer, J., Ellsworth, M., Petruck, M.R., Johnson, C.R., Scheffczyk, J.:
FrameNet II: Extended Theory and Practice. International Computer Science
Institute, Berkeley, California (2006) Distributed with the FrameNet data.
9. Kipper, K., Dang, H.T., Schuler, W., Palmer, M.: Building a Class-Based Verb
Lexicon using TAGs. In: TAG+5 5th Workshop on Tree Adjoining Grammars and
Related Formalisms, Paris, France (2000)
10. Baptista, J.: ViPEr: A Lexicon-Grammar of European Portuguese Verbs. In: 31st
International Conference on Lexis and Grammar, Nov´e Hrady, Czech Republic
(2012) 10–16
11. Borba, F.: Dicion´
ario Gramatical de Verbos do Portuguˆes Contemporˆ
aneo do
Brasil. UNESP, S˜
ao Paulo (1990)
12. Busse, W.: Dicion´
ario Sint´
actico de Verbos Portugueses. Almedina, Coimbra
(1994)
13. Mihalcea, R., Edmonds, P., eds.: Senseval-3: 3rd International Workshop on the
Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, ACL
(2004)
14. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning
Tools and Techniques. 3rd edn. Morgan Kaufmann, Amsterdam (2011)
15. Ait-Mokhtar, S., Chanod, J., Roux, C.: Robustness beyond shallowness: incremental dependency parsing. Natural Language Engineering 8(2/3) (2002) 121–144
16. Daum´e, H.: Notes on CG and LM-BFGS Optimization of Logistic Regression. Paper available at http://pub.hal3.name#daume04cg-bfgs, implementation available at http://hal3.name/megam/ (2004)
17. Berger, A.L., Pietra, S.A.D., Pietra, V.J.D.: A Maximum Entropy approach to
Natural Language Processing. Computational Linguistics 22 (1996) 39–71