The Challenges of Building Enterprise Content Taxonomies and the Role of Classification Technologies in Maintaining Their Effectiveness Reginald J. Twigg, Ph.D. ([email protected]) Capture, Classification and Taxonomy, IBM ECM © 2007 IBM Corporation 1 Information Management Software | Enterprise Content Management Agenda The Challenge of Unstructured Content Key Concepts and Terms Taxonomy, Classification and ECM Adoption Classification Technologies for ECM © 2008 IBM Corporation 2 The Challenge of Managing Unstructured Content © 2007 IBM Corporation 3 Information Management Software | Enterprise Content Management 80% of Enterprise Data is Unstructured Databases • Billing statements • Claims images • Customer correspondence • Mortgage docs • Contracts • Signed BOLs • Healthcare EOBs • Marketing collateral • Website content • Voice authorizations • Signature cards • Credit enrollments • Material Safety Data Sheets • ISO 9000 docs • Plant schematics • Product images • Spec sheets • ….and much more! © 2008 IBM Corporation 4 Information Management Software | Enterprise Content Management What is Enterprise Content? © 2008 IBM Corporation 5 Information Management Software | Enterprise Content Management Where do I start? Organizing the explosion of unstructured content becomes critical: We’ve got 600 GB of content from basic content services all over the enterprise. How can we get this content efficiently mapped into our ECM taxonomy? We’ve been managing our content without classifying it for a few years now. How can our users navigate amongst this existing content in a way that’s intuitive for our business? The lawyers have to review 400,000 electronic documents for their case. How can we make sure they don’t waste their time? © 2008 IBM Corporation 6 Information Management Software | Enterprise Content Management Business Value of Classification for ECM Key Business Drivers ECM Taxonomy and Compliance, Classification Records, Legal Discovery 1 Increase accessibility of content under management Automated, High Scale Classification Classify at ingestion and/or re-classify over time Taxonomy Evolution Tools Enhanced Accessibility Taxonomy Proposer © 2008 IBM Corporation In Process Classification 2 Increase legal discovery review effectiveness while reducing risk Legal Discovery Prioritization and Workflow Assignment 3 Message Tagging, Classification and Monitoring 4 Increase worker productivity and automate content related decisions Reduce inquiry costs, automate message routing and increase customer satisfaction Ad Hoc Category Email, Chat Routing Suggestion Agent Response Suggestion Records Classification and Exception Handling Content-Based Workflow Selection Storage and Retention Policy Assignment Content Based Decision Making Email Supervision and Monitoring Automatic Customer Response 7 Information Management Software | Enterprise Content Management Ability to Structure Content with Databases Percent of corporate information value managed in traditional databases Unstructured Data Data Creation And Demand Structured Data OLTP and BI (narrow scope) Application Types Compliance, Competitive Intelligence (wide scope) Source: Gartner © 2008 IBM Corporation 8 Information Management Software | Enterprise Content Management Multiple Repositories Make Access Difficult 1 repository 5% Don't know “The Future of Content in the Enterprise,” Connie Moore and Robert Markham 17% More than 15 repositories 36% 25% 2-5 repositories 14% 10-15 repositories 4% 6-10 repositories Base: 81 North American decision-makers (multiple responses accepted) © 2008 IBM Corporation 9 Information Management Software | Enterprise Content Management And Then There’s SharePoint, File Shares and . . . © 2008 IBM Corporation 10 Key Concepts and Terms © 2007 IBM Corporation 11 Information Management Software | Enterprise Content Management Key Concepts Metadata: a means of describing, locating, cataloging, and activating content as objects in a software ecosystem (literally, data about data). Enterprise Catalog: a centralized and normalized metadata model for unstructured content for the purposes of providing consistent services across all ECM applications. Taxonomy: a hierarchical structure of information components, any part of which can be used to classify a content item in relation to other items in the structure. Classification: a coding of content items as members of a group for the purposes of cataloging them or associating them with a taxonomy. © 2008 IBM Corporation 12 Information Management Software | Enterprise Content Management Taxonomy Is . . . Not turning animals into trophies © 2008 IBM Corporation A system for organizing the corpus of business content 13 Information Management Software | Enterprise Content Management Taxonomy and Classification in ECM Classification Examples: – Document Classing – Foldering Taxonomy Examples: – Enterprise Content Catalog – Industry Standard Document Taxonomies (ISO, XMI) Methods: – Rules-Based: Applies pre-determined rules for ‘if, then’ classification of text and properties – Analytics-Based: Applies algorithms to interpret classes in order to apply classification rules to them © 2008 IBM Corporation 14 Information Management Software | Enterprise Content Management ECM Taxonomy Illustrated © 2008 IBM Corporation 15 Taxonomy, Classification and ECM Adoption © 2007 IBM Corporation 16 Information Management Software | Enterprise Content Management Drive New Business Value from Content Improve Content Access Organize Unstructured Content Content Classification Solutions Derive Business Insight © 2008 IBM Corporation 17 Information Management Software | Enterprise Content Management Business Drivers for ECM Taxonomy Management Proliferating departmental solutions – Content Management – Collaboration (SP, Quickr, Team Rooms, Wikis) User-based classification and high workforce turnover – Productivity declines as knowledge disappears – Legal discovery is a secondary concern Mergers and Acquisitions – need to reconcile disparate content management practices, repositories and processes © 2008 IBM Corporation 18 Information Management Software | Enterprise Content Management Classification is Hard Work Key Business Challenges ECM Taxonomy and Classification 1 Increase accessibility of content under management Most organizations face content taxonomy pain – especially as they standardize around ECM – Mapping content to taxonomy during ingestion – Reclassifying content under management Automated, High Scale Classification Classify at ingestion and/or re-classify over time Taxonomy Evolution Tools – Evolving taxonomies as new types of content emerge – Integrating folksonomies (SharePoint) into a master taxonomy Enhanced Accessibility Taxonomy Proposer © 2008 IBM Corporation 19 Information Management Software | Enterprise Content Management Organization is the Root Cause Most organizations face content taxonomy barriers – especially as they standardize around ECM – Assigning categories en masse – Reclassifying existing content as taxonomies evolve – Merging taxonomies – Integrating the wisdom of folksonomies © 2008 IBM Corporation 20 Information Management Software | Enterprise Content Management Challenges and Impacts of Merging Taxonomies Misclassification – change is constant, and master taxonomies must manage multiple custom taxonomies for each content source “Folksonomies” from departmental collaboration solutions are created by users and unmanaged by ECM standards Impact: – Unreliable Metadata – Inconsistencies lose or mislabel content – Process Misfires – Poor metadata triggers incorrect events and workflows Scale is the Challenge – Automation is Essential © 2008 IBM Corporation 21 Information Management Software | Enterprise Content Management Lessons Learned From ERP Adoption Getting Classification Right: ‘Garbage in = garbage out’ is often used in metadata management projects to describe the problem of building a metadata model on inconsistent sources. Driving Process on Taxonomies: ERP systems depending on 3 master taxonomies – material, vendor and customer. These taxonomies drive events, workflow definition and the development of transaction-centric business process applications Mastering Metadata: The ability to deploy new enterprise applications depends upon the re-usability, scalability and integrity of the metadata model System of Record is Required for Standardization: – Establishes an enterprise standard that can be audited – Forms the foundation for building demonstrable best practices – Enforces consistency of data capture and output © 2008 IBM Corporation 23 Information Management Software | Enterprise Content Management Customer Lessons for Mastering ECM Taxonomies ‘Master’ taxonomy of record required for – Compliance – Business process applications Merged master taxonomies become large and unwieldy – Multiple taxonomies require integration and translation – Centralized, decentralized, or hybrid? Intelligent Classification increasingly is used to manage: – Taxonomy merging from multiple use cases – Taxonomy/folksonomy translation from distributed content sources © 2008 IBM Corporation 24 A Look at ECM Classification Technologies © 2007 IBM Corporation 25 Information Management Software | Enterprise Content Management State of Classification Management Technologies ECM Classification/Taxonomy is an emerging discipline – Industry standard taxonomies: • Focus on business function or transaction types • Have not reached the enterprise level – Classification best practices: • Content ingestion • Application development reclassification Classification software focuses on content ingestion: – Electronic content (email, Office documents, free-form text) – Paper content (document images) requires OCR Search is not enough – must drive value in the business process © 2008 IBM Corporation 26 Information Management Software | Enterprise Content Management Criteria For ECM Classification Management Solutions Integrate with and support the ECM metadata model Interpret a highly-federated content ecosystem Go beyond search to catalog and manage content Build on advanced analytic technologies – rules alone are not enough – Interpret content to extract meaningful (meta)data – Employ multiple methods (engines) for classification – Integrate teaching/learning © 2008 IBM Corporation 27 Information Management Software | Enterprise Content Management Common Platform for Electronic Content Classification Email Queue Classification and Monitoring Compliance, Records, Legal Discovery Classification Platform In Process Classification © 2008 IBM Corporation ECM Taxonomy and Classification 28 Information Management Software | Enterprise Content Management IBM Classification Module for Electronic Content Organize your ECM content Automated classification and filtering Combines text analytics understanding with rules Acquires domain specificity from your own content Unique learning technology for adaptive classification Suggests new categories or even seeds an entirely new taxonomy Rectifies conflicting taxonomies Market proven, scalable platform © 2008 IBM Corporation 30 Information Management Software | Enterprise Content Management Understanding Content with Text Analytics A The strategic value of this market is paramount to IBM Training (Teach) A B Classification Engine C Feedback Corpus (Categorized) C The core market for this new product has been defined as such by IBM © 2008 IBM Corporation Audit Matching IP is Legal is essentialcurrently requiringEngineering requires clear full approval requirements Strategy is Important to the marketing team Categories list and Relevancies (Scores) The strategic value of this market is paramount to IBM C: 97%, B: 54%, A: 12% 31 Information Management Software | Enterprise Content Management Classification Workflow: Accelerating Content Organization Classification Review Tool Existing Unclassified Managed Content File System Send to taxonomy proposer Automatically categorize majority of content Classifier Filter out documents Basic Content Services Reference: Integration Components Classifier (Runtime Application) Classification Review (UI) Taxonomy Proposer (UI) Content Extractor (training based on P8) © 2008 IBM Corporation 32 Information Management Software | Enterprise Content Management Components of the Solution for Text Classification Classifier – Automatically classifies and filters out documents – Moves some documents for manual review Classification Review Tool – Allows user to manually review documents Content Extractor – Extracts content from the ECM system for training Taxonomy Proposer – User workflow to identify and name new categories or apply existing taxonomy from P8 © 2008 IBM Corporation 33 Information Management Software | Enterprise Content Management Classification for Paper Documents Classification of paper documents occurs in capture process Use cases for paper document classification – Recognition using OCR/ICR – Classification to associate to folders or doc class – Separation to reduce costs and improve process © 2008 IBM Corporation 34 Information Management Software | Enterprise Content Management Three Primary Types of Images – The Document Recognition Problem More Advanced Un-Structured • SemiStructured Structured Less Advanced © 2008 IBM Corporation 35 Information Management Software | Enterprise Content Management The Document Separation Problem in Image Capture Separation of documents is a significant expense for a high-volume capture system – Typical ‘structured’ recognition technologies are not applicable – Manual insertion of separator sheets is the primary workaround today – 50% of document preparation labor is spent sorting documents and inserting separator pages – source: TAWPI Where does one document stop and the next begin? Here? © 2008 IBM Corporation Here? Here? Here? 36 Information Management Software | Enterprise Content Management Classification Methods for Paper Content (Images) Image Classification – based on the overall layout and structure of a document – Includes lines, boxes, logos and placement of text Text Classification – based on detailed analysis of the text content of a page Rules-Based Classification – performed by searching for specific data or keywords – independent of layout Templated Classification – © 2008 IBM Corporation determined by the presence of one or more marks, barcodes or items of text in pre-defined locations 37 Information Management Software | Enterprise Content Management Waterfall Approach to Classification and Separation Two-pass system: 1st pass: Classification – optimizes performance by using fastest classification techniques first – Advanced Text Classification final “catch-all Page # Barcode Recognition: 1 2 3 4 5 6 7 8 First Form X ? ? ? ? ? ? ? 1 ms Image Classification: N/A Rules Based : N/A Text Classification: N/A © 2008 IBM Corporation ? ? First Form Y ? First Form Z ? ? 20 ms ? Last Form X N/A Last Form Y N/A ? Last Form Z 200 ms Middle Form X N/A N/A N/A N/A Middle Form Z N/A 1000 ms 38 Information Management Software | Enterprise Content Management Why Invest in Automated Classification? Accelerate the time to value in your investment in ECM Free up your subject matter experts © 2008 IBM Corporation Ensure more accurate content catalogs Make your content easier to find and leverage 39 Information Management Software | Enterprise Content Management Summary 1. Accelerate ECM Standardization Poor content classification undermines ECM value – maximize your ECM potential and time-to-value with automated classification 2. Automating Classification Always Pays Typical employees spend 10 hours/week searching for information – slash that time and increase productivity 3. Classification Technologies Automate Classification to Drive Development of Best Practices IBM Classification Module for IBM FileNet P8 Automatically organizing your content by understanding it © 2008 IBM Corporation 40 Information Management Software | Enterprise Content Management Contact Reggie Twigg ([email protected]) for more information or to arrange a demonstration © 2008 IBM Corporation 41
© Copyright 2024