Implementing a Taxonomy in a Content Management Portal Content Week 2005 Miami, Florida

Implementing a Taxonomy in a Content
Management Portal
Content Week 2005
Miami, Florida
Monday, January 31, 2005
Workshop H
2:45pm – 4:45 pm
Marjorie M.K. Hlava
Access Innovations, Inc.
505-998-0800
[email protected]
www.accessinn.com
Introductions
•
•
•
•
•
Name
Project
Expectations for these two short hours
Please fill in the sign up sheet
Would you like
– 1. Copy of this presentation?
– 2. Sample software?
– 3. Other information?
What will we talk about this afternoon?
•
•
•
•
•
•
•
1.Definitions
2.Where taxonomy fits in the Information Circle
3.Where to use a taxonomy
4.Taxonomies for Communities of Practice
5.Surrounding theories and applications
6.How to build and maintain
7.How is used in enterprise information
Copyright © 2005 Access Innovations, Inc.
Implementing a Taxonomy in a Content
Management Portal
Thesaurus Master
Database
Management
System Add
Metadata using
MAI
Data Feed
MAI
to add Metadata
Inverted File
1. Definitions
Copyright © 2005 Access Innovations, Inc.
What is a taxonomy?
• A hierarchical thesaurus with authority terms
applied at the final node
• A browse-able web interface
• A Linnaean System
• A browse- able list with the term instance at
the final leaf
Copyright © 2005 Access Innovations, Inc.
Types of Taxonomies
• Naming and organizing things into groups that share
similar characteristics
• 1. Flat – just a list
• 2. Hierarchical
– Taxonomic view
• 3. Faceted
– Sorted by a single charasteristic
– Metadata - Dublin Core
– COSATI -GILS
• 4. Thesaurus
– Term records
– Database backend
– Easier to modify and maintain
Copyright © 2005 Access Innovations, Inc.
Taxonomy in meta data
• Definition
– Taxonomy is a thesaurus in its hierarchical view
with the authority files applied at the final nodes
– It allows the browse-able front end to a portal
– It provides keyword and name access to the
content in the portal
Copyright © 2005 Access Innovations, Inc.
Taxonomy definition
• A taxonomy is a thesaurus in hierarchical
view with authority file terms added at the
final nodes
• Thesaurus
• Authority file
• Hierarchical form
• Final nodes
Copyright © 2005 Access Innovations, Inc.
Thesaurus
• Concepts
• Methods
• Procedures
• Cognitive approach
• The knowledge capture piece
• The topics or subjects
Copyright © 2005 Access Innovations, Inc.
Authority file
• People
• Places
• Things
• The tangible approach
• Concrete Entities
Copyright © 2005 Access Innovations, Inc.
Hierarchical view
• Gives the Portal view
• The view of all the preferred terms in
categorized order
• An outline of the thesaurus
Copyright © 2005 Access Innovations, Inc.
Final Nodes
• The last position on the hierarchical tree
– Taxonomy
• concept
– narrower terms
» final node - people, place or thing term
» document instance
» Letter to George Wiesman Dec 12, 2003
» Technical report number TR-1039
» Museum artifact 1706 wodden wagon wheel
Copyright © 2005 Access Innovations, Inc.
Term Records – the Database Part
• Associative terms
– Related terms
• Equivalence terms
– Preferred and non preferred
– Use and used for
– Synonyms
• Hierarchical terms
– Broader narrower terms
– Parent Child
Copyright © 2005 Access Innovations, Inc.
Other term record fields
•
•
•
•
•
•
Scope notes
Cross references
History
Term Status
Category
User defined
Copyright © 2005 Access Innovations, Inc.
2. Where does a taxonomy fit in
the information circle?
Copyright © 2005 Access Innovations, Inc.
Information Circle - Overview
Content
User
Taxonomy
Output
Copyright © 2005 Access Innovations, Inc.
Content
•Web Pages
•White Papers
•Research Reports
•Licensed Data Feeds
Content
•Intranet
•Internal Reports
•Lotus Notes files
•Databases
•Public Relations Documents/Press Releases
•Market Research Reports
•Customer Relationship Management (CRM)
•HR Files
User
•Accounting/Financial Records
•Legal Documents
•Patents
•Museum artifacts
Taxonomy
Output
Copyright © 2005 Access Innovations, Inc.
Content – cont’d
Content
Taxonomy
Content Creation:
HTML – Meta name / Keywords
DB – Field / Meta tag / Element
XML – Entity table for valid values
User
Output
Copyright © 2005 Access Innovations, Inc.
Taxonomy
Content
Taxonomy
Taxonomy is applied to new and existing content:
Meta Tags
Rule Base
User
Thesaurus Terms
Authority Terms
Date
Output
Author
Description
etc.
Copyright © 2005 Access Innovations, Inc.
Taxonomy
Taxonomy – cont’d
Content
Taxonomy
Index data
- Manually
- Automatically
Suggest new candidate terms
User
Review
Output
Copyright © 2005 Access Innovations, Inc.
Output
Content
User
Taxonomy
Output
Searchable Data
- Internal Data
- External Data
Copyright © 2005 Access Innovations, Inc.
User
Content
Taxonomy
Web Browsing/Searching
Database Browsing/Searching
Query Resolution
User
Output
Copyright © 2005 Access Innovations, Inc.
User – cont’d
Content
User Input
- Suggested Candidate Terms
- New Documents
Taxonomy
Reports Based on User Search
- Search Logs
- Null Hits
(These will also suggest new candidate terms)
User
Output
Copyright © 2005 Access Innovations, Inc.
New Content
New
Content
Taxonomy
The cycle
begins again
User
Output
Copyright © 2005 Access Innovations, Inc.
Information Circle - Overview
Content
User
Taxonomy
Output
Copyright © 2005 Access Innovations, Inc.
3. Where to use a taxonomy
•
•
•
•
•
•
•
•
•
•
•
Link the Taxonomy and Indexing
Always in sync with the industry
Keep up to date with terminology
Automatically index the old data
Filter newsfeeds
Search using the Taxonomy
File using the taxonomy
Spell check using the taxonomy
Link to translation system
Catalog using the taxonomy
Index a book
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
Thesaurus Master
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
Database
Management
System
- Add Metadata
using MAI
Database
records
Each with
many
elements
Record locator
Accessinn.com/12345/demofile/recid15
Inverted File
Aadvark
Alligator
Apple
Advantage
….
Zebra
Portal Searching
Copyright © 2005 Access Innovations, Inc.
Many data bases
can be reached
Database
records
Each with
many elements
Record locator
Accessinn.com/12345/demofile/recid15
Inverted File
Aadvark
Alligator
Apple
Advantage
….
Zebra
Portal Searching
Copyright © 2005 Access Innovations, Inc.
4. Taxonomies for
Communities of Practice
Copyright © 2005 Access Innovations, Inc.
Taxonomies in a Community of Practice
•
•
•
•
•
Nature of Communities of Practice (CoP)
Taxonomies in context
Value of taxonomies
Creating a taxonomy
Applying the taxonomy
Copyright © 2005 Access Innovations, Inc.
Nature of CoPs
• Free flowing,
loosely structured
• Simple, ad hoc
categorization
• Active CoPs need
organization
• Search tends to be
hit-or-miss
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Copyright © 2005 Access Innovations, Inc.
Taxonomies in Context
A taxonomy aspires to be:
• a correlation of the different functional, regional
and (possibly) national languages used by a
community of practice
• a support mechanism for navigation
• a support tool for search engines and knowledge
maps
• an authority for tagging documents and other
information objects
• a knowledge base in its own right
Reference: “Taxonomies: the vital tool of information architecture”, www.tfpl.com
Copyright © 2005 Access Innovations, Inc.
Value of Taxonomies
•
•
•
•
•
Improves organization & structure
Facilitates navigation
Facilitates knowledge discovery
Reduces effort
Saves time
“Taxonomies are better created by professional
indexers or librarians than by domain experts.”
Copyright
© 2005 Access School,
Innovations,
Inc.
Courtesy of Lillian Gassie, Naval
Postgraduate
Monterey,
CA
Naval Postgraduate School’s Homeland Security Taxonomy (1)
Copyright © 2005 Access Innovations, Inc.
Naval Postgraduate School’s Homeland Security Taxonomy (2)
Copyright © 2005 Access Innovations, Inc.
IBM Insight graphical view
Copyright © 2005 Access Innovations, Inc.
Applying a Taxonomy (1)
Manually
• Add terms into
meta data fields
• Design
navigation & site
indexes with
taxonomy
hierarchy
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Copyright © 2005 Access Innovations, Inc.
Incorporating Hierarchical Classification from a Taxonomy
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Applying a Taxonomy (2)
System integration
• Search & retrieval
systems
• Auto-assignment
of metadata
• Categorization
systems
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Applying the Taxonomy to a Digital Library
INTERNET
(public)
Library
catalogs
Locally held
documents
Public
repositories
Commercial
data sources
Agency data
sources
Search engine
Search engine
Search engine
Search engine
Search engine
spiders
Filtered
content
Search engine
Meta-Search Tool
Automated
categorization
Web portal
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
5. Surrounding theories and
applications
Copyright © 2005 Access Innovations, Inc.
Other Vocabulary types
•
•
•
•
Uncontrolled lists
Classification System
Subject headings
Controlled vocabulary
– usually synonyms and spelling
• Authority files
• Thesaurus
• Taxonomy
Copyright © 2005 Access Innovations, Inc.
Uncontrolled list - define
• Add terms as they occur
• No cross reference
• Simple flat structure
Copyright © 2005 Access Innovations, Inc.
Controlled term lists - defined
•
•
•
•
•
•
State the preferred terms
Provide allowed term entry
Heavily cross referenced
Not generally hierarchical
Popular
Easy to create
Copyright © 2005 Access Innovations, Inc.
Controlled term list - format
• Cars
– use Automobiles
• Personal Computer
– use Microcomputer
Copyright © 2005 Access Innovations, Inc.
Classification vs Subject Headings
• Classification
– single spot or placement
– browse physical list
– often a numbering system
– clear hierarchy
– no or few cross references
Copyright © 2005 Access Innovations, Inc.
Classification vs Subject Headings
• Subject headings
– generic search
– hidden classification system
– related terms and cross references in heavy use
– Usually the inverted form
• cells, electric
– Alphabetic access
Copyright © 2005 Access Innovations, Inc.
Authority systems - defined
•
•
•
•
•
Lists of terms in the preferred format for use
Frequently have cross references
Widely available
Frequently coded lists
Brand names
Copyright © 2005 Access Innovations, Inc.
Authority lists - examples
• ISO Country Name and Code
– International Standards Organization
• ISO Language list
• NAICS (SIC)
– Standard Industrial Classification Code (SIC)
– Replaced by
– North American Industrial Classification System
(NAICS)
Copyright © 2005 Access Innovations, Inc.
What is a thesaurus?
• Jessica L. Milstead. All Rights Reserved
• “For writers, it is a tool like Roget’s one with words grouped
and classified to help select the best word to convey a specific
nuance of meaning.
• For indexers and searchers, it is an information storage and
retrieval tool: a listing of words and phrases authorized for
use in an indexing system, together with relationships,
variants and synonyms, and aids to navigation through the
thesaurus”
• www.jelem.com
Copyright © 2005 Access Innovations, Inc.
Thesaurus - defined
• For information retrieval 1960’s
– indexing either intellectual or automatic
– in searching
– searching but not indexing
– indexing but not searching
– hierarchical view for searching
Copyright © 2005 Access Innovations, Inc.
Thesaurus - defined
• Monolingual - standard
– British – English - ISO 5578
– American – English –ANSI/NISO Z39.19
• Multilingual – standard ISO 5579
– concept mapping
– Eurovoc
• Discipline or Mission based - ad hoc
Copyright © 2005 Access Innovations, Inc.
Thesaurus -standard format
•
•
•
•
•
•
•
•
Main Entries
Top Terms - TT
Broader Terms - BT
Narrower Terms - NT
RELATED TERMS - RT
Scope Notes - SN
History - HI
Date term added/changed - DA
Copyright © 2005 Access Innovations, Inc.
Standards
• Monolingual
– NISO / ANSI – Z39.19
– ISO 5578
• Multilingual
– ISO 5579
Copyright © 2005 Access Innovations, Inc.
ISO Standards
• Set up already - easy to adopt
• Multiple broader terms
• The standards outline procedures
– ISO -better for implementation
– NISO much better reading
Copyright © 2005 Access Innovations, Inc.
Why do we index ?
• Improve precision
– define scope of terms
• Improve recall
– different terms for same concept
• Guide to a field of expertise
• Learning tool
• Richer expression
Copyright © 2005 Access Innovations, Inc.
Uses ?
• Indexing*
– …process by which subject terms or classification symbols
are assigned to concepts in documents
– A thesaurus is also known as an indexing language
– * not the building of the inverted file in computer sense of
indexing
Copyright © 2005 Access Innovations, Inc.
What are we controlling ?
• Synonyms
– different terms same concept
• Polysemes or Homonyms
– same word different meanings
– Lead
– Reading
Copyright © 2005 Access Innovations, Inc.
How ?
• Meaning
– delineation of scope of a term
• Term equivalence
– linking of synonyms
• Disambiguation of homonyms
– lead (metal)
– lead (element)
– lead (management)
Copyright © 2005 Access Innovations, Inc.
Precision options
• Language specificity
• Coordination
• Compound terms - level of
precoordination
• Homographs and scope notes
• Word distance indication
Copyright © 2005 Access Innovations, Inc.
Precision options
•
•
•
•
Structural relationships
Links and roles
Treatment and aspect codes
Weighting
Copyright © 2005 Access Innovations, Inc.
Disambiguation
Bill
Invoice
Bill
Legislative
Bill
Sport
Bill
Person
Copyright © 2005 Access Innovations, Inc.
Disambiguation
Bills
PT Invoices
NT Bills
BT Legislation
RT Bill
RT Animal
NT Bill
BT Person
Copyright © 2005 Access Innovations, Inc.
6. How to build and maintain a
taxonomy
Copyright © 2005 Access Innovations, Inc.
How to build a taxonomy
•
•
•
•
•
•
•
Collect the terms
Pull out authority terms
Organize into arrays
Choose top terms
Organize hierarchically
Flesh out term records
Test, review, and edit
Copyright © 2005 Access Innovations, Inc.
Or said another way …
•
•
•
•
•
•
•
Define scope
Collect terms and relationships
Identify existing taxonomies
Identify resources
Create & refine taxonomy
Apply taxonomy
Review and update
Copyright © 2005 Access Innovations, Inc.
Maintain
• Steady stream of terms
–
–
–
–
–
–
–
Web logs
Null sets
New announcements
Indexing team
Library
Records managers
Etc.
• Candidate terms
• Out of date is nearly useless
Copyright © 2005 Access Innovations, Inc.
Best Results Measures
•
•
•
•
•
•
•
Accuracy
Productivity
Hits, Misses and Noise
Precision (Recall)
Relevance
Ease of set up
Time to production
Copyright © 2005 Access Innovations, Inc.
Integration
• Thesaurus
–
–
–
–
full featured
multiple views
multiple versions
multiple languages
• Automatic indexing
– filtering
– assisted
• Data Harmony MAI and Thesaurus Master
Copyright © 2005 Access Innovations, Inc.
Visual Taxonomy
Taxonomy
Visual
• Ways to look
– Hierarchical
– Alphabetic – by term
– Ring diagrams
– Topic maps
– Related terms
Copyright © 2005 Access Innovations, Inc.
API to Many Systems for CMS
Copyright © 2005 Access Innovations, Inc.
Apply to the meta data
•
•
•
•
•
Automatic application?
Spider setting internally
External web crawls – use all aliases
Filter data
Enhance search experience
Copyright © 2005 Access Innovations, Inc.
Meta data
• The fields
• The elements
– Class codes
– Title
– Author
– Plaintiff
– Product
– subject / topic
• Meta Name Keywords in HTML
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
7. How Taxonomies are used in
Enterprise Information
Copyright © 2005 Access Innovations, Inc.
Brand is repeated in several spots
and tied to search as well
Copyright © 2005 Access Innovations, Inc.
Another way of listing
brands
Category list from
taxonomy is tied to brand
list and product list
Category code from the
taxonomy is tied to the brand
list and the product list
Enterprise Taxonomy Management
•
•
•
•
•
•
Consistent application across entire site
Synonyms are used interchangeably
User doesn’t need to know the taxonomy
Pop up view is helpful
Site map for construction and browsing
Allows hidden sections for internal use
Copyright © 2005 Access Innovations, Inc.
Taxonomies
•
•
•
•
•
Form the basis for knowledge sharing
Add value to discussion
Allow deeper retrieval
Are straightforward to create
Require on-going maintenance
Copyright © 2005 Access Innovations, Inc.
Your Taxonomy
• There is too much information to pile it on
the floor.
• It fits in many places in the information flow
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
Implementing a Taxonomy in a Content
Management Portal
Thesaurus Master
Database
Management
System Add
Metadata using
MAI
Data Feed
MAI
to add Metadata
Inverted File
Thank you for your time!
Questions?
Marjorie M.K. Hlava
Access Innovations, Inc.
505-998-0800
[email protected]
www.accessinn.com
Copyright © 2005 Access Innovations, Inc.