Metadata and Taxonomies The Best of Both Worlds

Metadata and Taxonomies
The Best of Both Worlds
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Taxonomy Good, Metadata Bad
–
To Metadata or not to Metadata
– Issues and Approaches to Metadata
 Taxonomies, Browse, Facets, and Metadata
–
Strengths and Weaknesses
– Uses and Value of Each
 Knowledge Architecture Solutions
–
Putting the Pieces Together: Why, Who, How
–
Deep Personalization and Other Advanced Applications
 Conclusion – How do I get there from here?
2
Metadata about Metadata: Two Sources
 Global Corporate Circle DCMI 2003 Workshop
–
–
Importance of Metadata
Difficulty of implementation and justification
 KAPS Group Experience
–
–
–
–
Consulting, Taxonomy & Metadata, Strategy
Knowledge architecture audit
Partners – Inxight, Convera, etc.
Intellectual infrastructure for organizations
• Knowledge organization, technology, people and processes
• Search, CM, portals, collaboration, KM, e-learning, etc
 EContent October Article – To Metadata or not to Metadata
3
Taxonomy Good, Metadata Bad
 To Metadata or not to Metadata
 That is the Question
 Whether ‘tis nobler in the mind to suffer the slings and
arrows of outrageous search results
 Or to take up metadata against a sea of irrelevance
 And by organizing them find them?
4
To Metadata or not to Metadata?
 Why Not Metadata?
–
–
Costly - $200K to set up, maintenance costs
Difficult to do
• Missing, incorrect, confusing, inconsistent
• Poor quality metadata can make search worse
 Why Metadata?
–
Not doing Metadata is more expensive
• $8,200 an employee a year
–
–
Ways to lower the cost – not all custom jobs
Need more sophisticated ROI – stories, business needs,
requirements
5
Metadata Approaches: 4 Not So Good Alternatives
 Metadata, we don’t need no stinking metadata
–
–
Condemned to wander search results lists forever
Need to answer these people
 KA Team – Consultants
–
Costly, Still need to maintain
 Automatic metadata (clustering & categorization)
–
Uneven, poor quality
 Author generated metadata
–
–
Uneven quality, inconsistent
Cultural – getting authors to want to do it
6
Knowledge Architecture Solutions: The Right Context
 No one solution
Can’t answer content questions from perspective of content alone
– need to understand users and activities and organization
–
 Context – understanding your context
–
–
–
Match amount of metadata to value
Match type of metadata to content and use
Lower the cost and increase the value
 The problem is not that metadata initiatives have been too
complex, it’s been that they have been too simple.
–
Metadata is more than adding keywords as an afterthought
 For same or less effort, you can go from metadata that makes
search worse to a set of solutions
7
Taxonomies, Browse, Facets, and Metadata
Variety of Structures
 A hierarchy does not a taxonomy make
–
–
–
–
Thesaurus (BT, NT, Related Terms), Controlled Vocabulary
Catalog, Index, site map, Partonomy, Ontology,
Classification, Semantic Network
Knowledge Map, Topic Maps, Paradigm, Prototype
 4 Basic Structures
–
Formal Taxonomy – Aristotle & Linnaeus
• Concept of Species, Is-A-Kind-Of (Part)
–
Browse Taxonomy
• Yahoo – hierarchical classifications
–
Metadata
• Dublin Core – Titles, Descriptions, Keywords, +
–
Facets/Entities
• Products, Companies, People, Events, Geography
8
Taxonomies, Browse, Facets, and Metadata
Four Basic Structures
 Units of Organization
– Taxonomy – Concepts
– Browse Taxonomy – web site or content collections
– Facets – Entities
– Metadata – variety of values
 Metadata – After or About Data
– Not just documents – objects, art works, events, etc
– Characteristics about the objects
– Characterization of content (meaning) within object
 It’s All Metadata to Me!
– Browse – reverse metadata
– Facets - metadata fields or sub-domains of Keywords
– Taxonomy – Controlled Vocabulary
9
Taxonomies, Browse, Facets, and Metadata
Strengths and Weaknesses
 Formal Taxonomy Strengths
–
–
–
Fixed Resource - Little or no maintenance
Communication – share ideas, build on others
Infrastructure Resource
• Controlled vocabulary and keywords
• Indexing – conceptual relationships
 Weaknesses
–
–
Difficult to develop and customize
Don’t reflect user’s perspective
• User’s have to adapt to language
10
Taxonomies, Browse, Facets, and Metadata
Types of Taxonomies – Yahoo Browse
11
Taxonomies, Browse, Facets, and Metadata
Strengths and Weaknesses
 Browse Taxonomy Strengths
–
Browse better than search
• Context and discovery
–
Easiest Structure to Develop
 Browse Taxonomy Weaknesses
–
Mix of Organization
• Catalogs, Alphabetical listings, Inventories
–
Vocabulary and Nomenclature Issues
– Difficult to maintain
– Poor granularity and little relationship between parts.
• Web Site unit of organization
–
No foundation for standards
12
Taxonomies, Browse, Facets, and Metadata
Strengths and Weakness
 Metadata Strengths
–
Variety of Fields supports variety of applications, user
behaviors
– Well developed best practices
 Metadata Weaknesses
–
High Cost of Implementing
– Inconsistent values
– Studies show little value in search
• Have to do it completely and correctly to get any value
13
Taxonomies, Browse, Facets, and Metadata
Strengths and Weakness
 Facets Strengths
Orthogonal Categories – easier to understand what
goes in what bin and why
– Combination of formal (partonomy) and browse
– Automatic Software works
–
 Facets Weaknesses
High Cost – adding structure to facets
– Can be overwhelming – 30 or more facets
–
14
Knowledge Architecture Solutions
Metadata
 Look beyond authors adding keywords to influence search
results
 Value from All Fields
–
–
–
–
–
Titles and Descriptions – balance of system and description
Publisher and author – automated and easy
DocumentObjecttype – FAQ’s, Policy Doc – supports user
behavior
Audience – target information, agents – no need for search
Facets – additional fields to support multiple use
15
Knowledge Architecture Solutions
Metadata
 Keywords – most difficult
• Common terms, unique terms, aboutness terms
• Need to do it right and completely to get real value
 Keywords - Need Taxonomy, Controlled Vocabulary
–
–
Enhance quality, consistency
Supports author generated metadata
 Value from other applications
–
–
–
Alerts and variety of personalization schemas
Data and Text Mining
Inter-application communication
 Controlled Vocabularies
–
–
–
Form, Format, Language, Audience, etc.
Structured – taxonomies
Multiple subjects = multiple taxonomies
16
Knowledge Architecture Solutions
Metadata
 Tools
–
Content Management, Metadata Management
 People
–
Central – evaluate and select taxonomies
• Facilitate use of controlled vocabulary taxonomies
• Monitor and measure use of metadata and taxonomies
–
Authors – select from list is better, easier
• Automated support and work flow
17
Knowledge Architecture Solutions
Taxonomies
 General Intellectual Resource
–
Powerful Vocabulary, Glossary, Index
– Standards, Naming Conventions – Communication Tool
 Pre-defined Taxonomies vs. Custom Taxonomies
Pre-defined – Cross Organization Communication
– Custom – specialized vocabularies
– Best – Standard, Pre-defined taxonomies that are customized according to a
set of established best practices
–
 Value from Taxonomies
Indexing documents – to a very granular level – automatic
– Cross application communicaiton – exchange meaning, not just bits
– Dynamic Classification – structured search results
–
• Works even while advanced search does not
• Not Browsing
18
Knowledge Architecture Solutions
Browse Taxonomies
 Limited Depth (User’s set the limit)
–
–
Navigation to collections of content, web sites
Limited Content – single web site or section of web site
• Best for homogenous audience, common vocabulary, view
 Limited Rigor
–
–
Search and Browse better than either
Broad, multiply defined categories give poor results
 Combine with Facets and Taxonomies
–
Categories as clusters of taxonomy levels
19
Knowledge Architecture Solutions
Facets
 Combine Browse and Search
–
–
–
Structured results not advanced search
More flexible than navigation browse
Still Limited Depth – combine with classifications
 Combine with Taxonomies
–
Added structure, especially subject areas
 Selection of Facets – Ontology, Personalization
 See Flamenco Project
–
http://bailando.sims.berkeley.edu/flamenco.html
20
Knowledge Architecture Solutions
Facets
21
Knowledge Architecture Solutions
Integration: It’s All Metadata to Me!
 Metadata the framework for value from Taxonomy and





Facets
Metadata, Taxonomies, and Facets add value and structure
to search
Taxonomy adds structure to Facets and Metadata
Facets add formal extensibility to Taxonomy
Facets add structure to Metadata and Browse Taxonomies
Integrated solution – the right mix for variety of applications
22
Knowledge Architecture Solutions: The Right Context
 Content – structured & unstructured, external & internal
–
–
Publishing Policy and Procedures
Metadata, taxonomies and controlled vocabularies
• Standards and Best Practices
 Business processes and requirements
 Technologies – search, portals, CM, applications
–
CM is the right time for adding metadata,
• Automation, distributed work flow
–
–
Analytics based on meaning, not clicks
Look at the entire range of applications
23
Knowledge Architecture Solutions: People
 Communities of users and information behaviors
 Variety of authors, subject matter experts, publishers
 Central Team supported by software and offering services
–
–
–
–
–
–
–
Creating, acquiring, evaluating taxonomies, metadata standards,
vocabularies
Input into technology decisions and design – content management,
portals, search
Socializing the benefits of metadata, creating a content culture
Evaluating metadata quality, facilitating author metadata
Analyzing the results of using metadata, how communities are using
Research metadata theory, user centric metadata
Design content value structure – more nuanced than good / poor
content.
24
Knowledge Architecture Solutions: Why?
 Metadata as add on to a search engine purchase will fail
 Most cost effective way to produce valuable metadata
 Needed to implement any alternative approach
–
–
–
–
Justification for metadata - measure and present realistic ROI
Supplement consultants
Integrate automated and author supplied metadata
Integrate content tiers into broader context
 Needed for tailoring solutions to organizations
25
Knowledge Architecture Solutions: Why?
 Increase the value of creating metadata
–
Better quality metadata
• Categorization experts and subject matter experts
–
Beyond Search and relevance ranking
• Dynamic classification – intersection of 2 subjects
• Applications – integrated metadata for portals, agents, etc
–
Beyond content – people metadata:
• Community personalization, information behaviors
• Community categorization
 Decrease the cost of creating Metadata
–
Start with Standards, Distributed System and Cost
26
Knowledge Architecture Solutions:
What if I can’t get there from here?
 First Step – Create an infrastructure strategic vision
–





Including metadata standards
KA Team – can be part time, needs official recognition
Content Management is essential
Don’t start with keywords
Buy and customize taxonomies, controlled vocabularies
Relevance ranking as last resort
–
Best bet metadata
– Browse and dynamic classifications
– Faceted Displays
 Think Big, Start Small, Scale Fast
27
Questions?
Tom Reamy
[email protected]
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com