An Automatic Classification Approach to Business Stakeholder Analysis on the Web

An Automatic Classification Approach
to Business Stakeholder Analysis on
the Web
Wingyan Chung, Hsinchun Chen,
Edna O. F. Reid
January 16, 2003
Agenda
•
•
•
•
•
•
•
Introduction
Literature Review
Research Questions
Research Approach and Testbed
Evaluation Methodology
Experimental Results and Discussion
Conclusions and Future Directions
2
Introduction
Current Business Environment
• Networked business environment facilitates
information sharing
• Collaborative commerce integrates business
processes among partners through electronic
sharing of information
– Sales support, vendor management, planning and
scheduling, demand planning, etc.
• Knowledge sharing about stakeholder
relationships through a company’s Web sites
and pages
– Textual content or annotated hyperlinks
4
Problems
• Information overload on the Web
– Hinders analysis of stakeholder relationships
• Knowledge hidden in interconnected Web
resources
– Posing challenges to identifying and classifying
various business stakeholders
• e.g., A company’s manager may not know who are using
their company’s Web resources
– Problem of traditional stakeholder analysis
– The emergence of electronic commerce
5
An Automatic Classification
Approach
• Need better approaches to uncovering such
knowledge
– Enhance understanding of business stakeholders
– Enhance understanding of competitive
environments
• We propose an automatic classification
approach to business stakeholder analysis
– Human knowledge + machine-learned information
• We will review related areas in stakeholder
analysis and Web page classification
techniques
6
Literature Review
Stakeholder Analysis
• Stakeholder theories evolve over time while
the view of firm changes
– Production view (19th century): Suppliers and
Customers
– Managerial view (20th century): + Owners,
Employees
– Stakeholder view (1960-80s) (Freeman, 1984): +
Competitors, Governments, News Media,
Environmentalists, …
– E-commerce view (1990s - now): + International
partners, Online communities, Multinational
employees, …
8
Summary of stakeholder types
Research
Stakeholder Types
Reid, 2003
Partners/suppliers, customer, employee, investor,
education institutions, media, portal, public, recruiter,
reviewer, competitor, unknown
Elias &
Cavana, 2000
Owners, community, unions, employees, government,
consumer advocates, competitors, financial community,
media, customers, SIG, suppliers
Agle et al.,
1999
Shareholders, employees, customers, government,
communities
Donaldson &
Preston, 1995
Investors, government, suppliers, trade associations,
employees, communities, customers, political groups
Clarkson, 1995 Employees, shareholders, customers, suppliers, public
stakeholders
• These types, ordered by their relevance to those
appearing on the Web, are important for practical
understanding of stakeholders of firms
9
Comparing Stakeholder Types* Used
Research
P E C S U M G R V O T F I N
Reid, 2003




Elias &
Cavana, 2000








Agle et al.,
1999





Donaldson &
Preston, 1995





Clarkson, 1995















* P = Partners/suppliers, E = Employees/Unions, C = Customers,
S = Shareholders/investors, U = Education/research institutions,
M=Media/Portals,
G = Public/government, R = Recruiters, V = Reviewers, O = Competitors,
T = Trade associations, F = Financial institutions, I = Political groups,
N = SIG/Communities
(Note that a class “Unknown” is not included here)
10
Comments on Stakeholder
Research
• Strong explanatory power but are weak at
practical classification of stakeholders
• Conclusions drawn from old data
• Previous research rarely considers the many
opportunities offered by the Web for
stakeholder analysis, e.g.,
– Business intelligence, which is obtained from the
business environment, is likely to help in
stakeholder activities
– Tools have been developed to exploit business
intelligence but not yet applied to stakeholder
analysis
11
BI and Stakeholder Analysis
• Advanced BI tools often rely on Web mining
techniques to discover patterns on the Web
automatically (Etzioni 1996; Kosala & Blockeel 2000), e.g.,
– PageRank (Brin & Page 1998), HITS (Kleinberg
1999), Web IF (Ingwersen 1998)
– External links mirror social communication
phenomena (e.g., stakeholder relationships)
• Tools and approaches exploit Web content
and link structure information
– Ong et al 2001; Tan et al. 2002; Reiterer et al.
2000; Chung et al. 2003; Reid 2003; Byrne 2003
12
Information on the Web
• Structural and textual content
• But commercial BI tools lack analysis
capability (Fuld et al. 2002)
• Need to automate stakeholder
classification, a primary step in
stakeholder analysis
– Automatic classification of Web pages is a
promising way to alleviate the problem
13
Web Page Classification
• The process of assigning pages to predefined
categories
– Helps to discover companies’ stakeholders on the
Web and enables companies to understand the
competitive environment better
• Major approaches include k-nearest neighbor,
neural network, Support Vector Machines,
and Naïve Bayesian network (Chen & Chau 2004)
• Previous work
– Kwon and Lee 2003; Mladenic 1998; Furnkranz
1999; Lee et al. 2002; Glover et al. 2002
14
Feature selection in Web Page
Classification
• Features considered
– Page textual content: full text, page title, headings
– Link related textual content: anchor text, extended
anchor text, URL strings
– Page structural information: #words, #page outlinks, inbound outlinks (i.e., links that point to its
own company), outbound outlinks (i.e., links that
point to external Web site)
• Methods for selection
– Human judgment / Use of domain lexicon
– Feature ratios and thresholding
– Frequency counting / MI
15
Research Questions
Research Gaps
• Stakeholder research provides rich theoretical
background but rarely considers the
tremendous opportunities offered by the Web
for stakeholder analysis
– Conclusions drawn from old data may not reflect
rapid development in e-commerce
• Existing BI tools lack stakeholder analysis
capability
• Automatic Web page classification techniques
are well developed but have not yet been
applied to business stakeholder classification
17
Research Questions
• How can we develop an automated approach
to business stakeholder analysis on the Web?
• How can Web page textual content and
structural information be used in such an
approach?
• What are the effectiveness (measured by
accuracy) and efficiency (measured by time
requirement) of such an approach for
business stakeholder classification on the
Web?
18
Research Approach and
Testbed
Automatic Classification Approach
• Purpose: To automatically classify the stakeholders of
businesses on the Web in order to facilitate
stakeholder analysis
• Rationale
– Business stakeholders should have identifiable clues that can
be used to distinguish their types
– The Web content and structural information is important for
understanding the clues for stakeholder classification
• Two generic steps:
– Creation of a domain lexicon that contains key textual
attributes for identifying stakeholders
– Automatic classification of Web pages (stakeholders) linking
to selected companies based on textual and structural
content of Web pages
20
Building a Research Testbed
• Business stakeholders of the KM World top
100 KM companies (McKellar 2003)
• Used backlink search function of the Google
search engine to search for Web pages
having hyperlinks pointing to the companies’
Web sites
• For each host company, we considered only
the first 100 results returned
– Removed self links and extra links from same sites
– After filtering, we obtained 3,713 results in total
– Randomly selected the results of 9 companies as
training examples (414  283 pages stored in DB)
21
Creation of a Domain Lexicon
• Manually read through all the Web pages of the nine
companies’ business stakeholders to identify one-,
two-, and three-word terms that were indicative of
business stakeholder types
• Extracted a total of 329 terms (67 one-word terms,
84 two-word terms, and 178 three-word terms), e.g.,
22
Automatic Stakeholder
Classification
• Three steps:
Manual
Tagging
Feature
selection
Automatic
classification
23
Manual
tagging
Feature
selection
Automatic
classification
Manual Tagging
• Manually classified each of the stakeholder pages of
the nine selected companies into one of the 11
stakeholder types (based on our review on slides 9-10)
24
Manual
tagging
Feature
selection
Automatic
classification
Feature Selection
• Structural content features: binary variables
indicating whether certain lexicon terms are
present in the structural content
– A term could be a one-, two-, or three-word long
– Considered occurrences in title, extended anchor
text, and full text
• Textual content features: frequencies of
occurrences of the extracted features
– The first set of features was selected based on
human knowledge, while the second was selected
based on statistical aggregation, thereby
combining both kinds of knowledge
25
An Example
(a media type)
Link to the
host company
(ClearForest)
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1" />
HTML hyperlink
and extended
anchor text
<title>David Schatsky: Search and Discovery in the Post-Cold
War Era</title> ...
<p>I just saw a demo by <a href =
"http://www.clearforest.com"> ClearForest, </a> a company
that provides tools for analyzing unstructured textual
information. It's truly amazing, and truly the search tool
for the post-Cold War era. ... </p> ...
</body>
</html>
26
Manual
tagging
Feature
selection
Automatic
classification
Automatic Classification
• A feedforward/backpropagation neural
network (Lippman 1987) and SVM (Joachims,
1998) were used due to their robustness in
automatic classification
– Train the algorithms using the stakeholder pages
of the 9 training companies and obtain a model or
sets of weights for classification
– Test the algorithms on sets of stakeholder pages
of 10 companies different from training examples
27
Evaluation Methodology
Experimental Design
• Consisted of algorithm comparison, feature
comparison, and a user evaluation study
– Compared the performance of neural network
(NN), SVM, baseline method (random
classification), human judgment
– Compared structural content features, textual
content features, and a combination of the two
sets of features
– 36 Univ of Arizona business students performed
manual stakeholder classification and provided
comments on the approach
29
Performance Measures
• Effectiveness:
– Overall accuracy
– Within-class accuracy
• Efficiency: time used (in minutes)
• User subjective ratings and comments
User Study
• Each subject was introduced to stakeholder
analysis and was asked to use our system
named “Business Stakeholder Analyzer (BSA)”
to browse companies’ stakeholder lists
• We randomly selected three companies
(Intelliseek, Siebel, and WebMethods) from
testing companies to be the targets of
analysis
31
Hypotheses (1)
• H1: NN and SVM would achieve similar
effectiveness when the same set of
features was used
– Both techniques were robust
– Procedure: created 30 sets of stakeholder
pages by randomly selecting groups of 5
stakeholder pages of each of the 10 testing
companies
32
Hypotheses (2)
• H2: NN and SVM would perform better than
the baseline method
– Incorporated human knowledge and machine
learning capability into the classification
• H3: Human judgment in stakeholder
classification would achieve effectiveness
similar to that of machine learning, but that
the former is less efficient
– They could make use of the Web page’s textual
and structural content in classifying stakeholders
– Humans might spend more time on it
33
Hypotheses (3)
• H4 & H5 examined the use of different
types of features in automatic
stakeholder classification
– H4: structural = textual
– H5: combined > structural or textual alone
34
Experimental Results and
Discussion
Algorithm Comparison
• H1 not confirmed
• NN performed significantly differently than
SVM when the same set of features was used
– NN performed significantly better than SVM when
structural content features were used
– SVM performed significantly better than NN when
textual content features or a combination of both
feature sets were used
– More studies would be needed to identify optimal
feature sets for each algorithm
36
Effectiveness of the Approach
• H2 confirmed
• The use of any combination of features and
techniques in automatic stakeholder
classification outperformed the baseline
method significantly
– Our approach has integrated human knowledge
with machine-learned information related to
stakeholder types …
– and was significantly better than a random
conjecture
37
Comparing with Human
Judgment
• H3b and H3d (efficiency) confirmed
– Human: 22 minutes (average), varied
– Algorithms: 1 – 30 seconds (average)
– Showing high efficiency of using the automatic
approach to facilitate stakeholder analysis
• H3a and H3c (effectiveness) not confirmed
– Humans were significantly more effective than NN
or SVM
– They could rely on more clues in performing
classification
– Experience in Internet browsing and searching
helped narrow down choices
38
However, the algorithms achieved better
within-class accuracies than humans in
frequently occurring types …
39
Use of Features
• To our surprise, hypotheses H4a-b, H5a-b,
and H5d were not confirmed
– Different feature sets yielded different
performances of the algorithms
• Structural features enabled NN to achieve better
effectiveness than textual ones
• Textual and combined features enabled SVM to achieve
better effectiveness than structural ones
– Do not know exactly why
– Future research: studying the effect of features
and the nature of algorithms
• H5c was confirmed: structural content feature
did not add value to the performance of SVM
40
Subjects’ Comments
• Overwhelmingly positive
• “It would be very helpful!”
• “That’s cool!”
• “I want to use it.”
Conclusions and Future
Directions
Conclusions
• Proposed an automatic classification
approach to business stakeholder analysis on
the Web
– Integrated Human expert knowledge + machinelearned information
– Promising in terms of effectiveness and efficiency
• A strong potential to use the approach to
augment traditional stakeholder classification
• Could potentially facilitate business analysts’
interaction with automated stakeholder
analysis systems in today’s networked
enterprises
43
Future Directions
• To automate the next steps of business
stakeholder analysis
– With more expert participation and more Web
page data
• Type-specific stakeholder analysis
– e.g., partner relationships are often important in
developing business strategies
• Automating cross-regional business
stakeholder analysis
– Study multinational business partnerships and
cooperation and related HCI issues
44