Automatic Labeling of Semantic Roles By Daniel Gildea and Daniel Jurafsky Presented By Kino Coursey Outline       Their Goals Semantic Roles Related Work Methodology Results Their Conclusions Their Goals   To create a system that can identify the semantic relationships or semantic roles, filled by the syntatic constituents of a sentence and place them into a semantic frame. Lexical and syntactic features are derived from parse trees and are used to make statistical classifiers from hand –annotated training data Potential users      Shallow semantic analysis would be useful in a number of NLP tasks Domain independent starting point for information extraction Word sense Disambiguation based on current semantic role Intermediate representation for translation and summarization Adding semantic roles could improve parser and speech recognition accuracy Their Approach     Treat the role assignment problem as being like other tagging problems Use recent successful methods in probabilistic parsing and statistical classification Use the hand-labeled FrameNet database to provide training info over 50,000 sentences from the BNC FrameNet roles defines the tag set Semantic Roles  Historically two types of roles • Very abstract like AGENT & PATIENT • Verb specific like EATER and EATEN for “eat”   FrameNet defines and intermediate, schematic representation of situations, with participants, props and conceptual roles. A frame being a situation description can be activated by multiple verbs or other constituents Frame Advantages    Avoids difficulty with trying to find a small set of universal, abstract or thematic roles Has as many roles as necessary to describe the situation with minimal information loss and discrimination Abstract roles can be defined as high level roles of abstract frames such as “action” or “motion” at the top of the hiearchy Example Domains and Frames Examples of Semantic Roles Example FrameNet Markup <CORPUS CORPNAME="bnc" DOMAIN="motion" FRAME="removing" LEMMA="take.v"> <S TPOS="80499932"> <T TYPE="sense1"></T> <C FE="Agt" PT="NP" GF="Ext">Pioneer/VVB European/AJ0</C> settlers/NN2 used/VVD-VVN several/DT0 methods/NN2 to/TO0 <C TARGET="y"> take/VVI</C> <C FE="Thm" PT="NP" GF="Obj">land/NN1</C> <C FE="Src" PT="PP" GF="Comp">from/PRP indigenous/AJ0 people/NN0</C> ./PUN </S> Related Work  Traditional parsing and understanding systems rely on handdeveloped grammars • Must anticipate the way semantic roles are realized through syntax • Time consuming to develop • Limited coverage (human proscriptive recall problem) Related Work    Others have used data-driven approaches for template-based semantic analysis in “shallow” systems Miller(1996) Air Travler Information System, probability of a constituent filling slots in frames. Each node could have both semantic and syntactic elements Data-driven information extraction by Riloff. Automatically derived case frames for words in domain Related Work  Blaheta and Charniak used a statistical algorithm for assigning Penn Tree bank functional words with F-measure of 87% with 99% when ‘no tag’ is valid choice Methodology  Two part strategy • Identify the boundaries of the frame elements in the sentence • Given the boundaries label each with the correct role  Statistics based: train a classifier on labeled training set then test on unlabeled test set Methodology  Training • Trained using Collins parser on 37000 sentences • Match annotated frame elements to parse constituents • Extract various features from string of words and parse tree  Testing • Run parser on test sentences and extract same features • Probability for each semantic role r is computed from features Features used   Phrase Type: Standard syntactic type (NP,VP,S) Grammatical Function • Relation to rest of sentence (subject of verb, object of verb…) • Limited to NP’s  Position • Before or after predicate defining the frame • Correlated to Grammatical functions • Redundant backup information   Voice: Used 10 passive-identifying patterns for active/passive classification Head Word: head words of each constituent Parsed Sentence with FrameNet role assignments Testing  FrameNet corpus test set • 10% of each target word -> test set • 10% of each target word -> tuning set • Words with fewer than 10 ignored • Average number of sentences per target word = 34 [Too SPARSE !!!] • Average number of sentences per frame = 732 Sparseness Problem     Problem: Data is too sparse to directly calculate probabilities on the full set of features Approach: Build classifiers by combining probabilities from distributions conditioned on combinations of features Additional problem: FrameNet data was selected to show prototypical examples of semantic frames, not as a random sample for each frame Approach : Collect more data in the future Results: Probability Distributions    Coverage= % of test data seen in training Accuracy = % of test data correctly predicted (similar to precision) Performance = overall % of test data for which correct role is predicted (similar to recall) Results: Simple Probabilities Used simple empirical distributions Results: Linear Interpolation Results: Geometric mean in the log domain Results: Combining data   Schemes of giving more weight to distributions with more data did not have a significant effect Role assignments only depended on relative ranking so fine tuning makes little difference Backoff combination: use less specific data only if more specific is missing Results: Linear Backoff was the best    Final system performance 80.4% up from the 40.9% baseline Linear Backoff performed 80.4% on development set and 76.9% on Test set Baseline performed 40.9% on development set and 40.6% on Test set Results: Their Discussions    Constituent position relative to target word + active/passive info (78.8%) performed as well as reading grammatical functions off the parse tree (79.2%) Using active/passive info can improve performance from 78.8% to 80.5%. 5% of examples were passives Lexicalization via head words when available is good • P(role|head,target) is available for only 56.0% of data • P(role|head,target) is 86.7% correct without using any syntactic features. Results: Lexical Clustering   Since head words performed so well but are so sparse, try to use clustering to improve coverage Compute soft clusters for nouns using only frame elements with noun head words from the BNC P(r|h,nt,t)=SumOf( P(r|c,nt,t)*P(c|h), over C clusters h belongs to)     Unclustered data is 87.6% correct but only covers 43.7% Clustered head words 79.9% for the 97.9% of nominal head words in vocabulary. Adding clustering of NP constituents improved performance from 80.4% to 81.2% (Question: Would other lexical semantic resources help?) Automatic Identification of Frame Element Boundaries      Original experiments used hand annotated frame element boundaries Used features in a sentence parse tree likely to be a frame element System given human annotated target word and frame Main feature used: path from target word through parse tree to constituent, using upward and downward links Used P(fe|path), P(fe|path,target) and P(fe|head,target) Automatic Identification of Frame Element Boundaries       P(fe|path,target) peforms relatively poorly since only about 30 sentences for each target word P(fe|head,target) alone not a useful classifier, but helps with linear interpolation Can only ID frame elements that have a constituent in the parse tree, but can be helped with partial matching With relaxed matching, 86% agreement with hand annotations When correctly ID’ed FE’s are fed into the previous role labeler, 79.6% are correct, in the same range as with human data (Question: If it is correctly ID’ed, shouldn’t this be the case?) Their Conclusions      Their system can label roles with some accuracy Lexical statistics on constituents head words were most important feature used Problem is while very accurate they are very sparse Key to high overall performance was combining features Combined system was more accurate than any feature alone, the specific method was less important