Architecting Architecting aa Corporate Corporate Metadata Metadata Repository Repository at at the the U.S. U.S. Bureau Bureau of of Census Census Gail Wright CMR Program Manager Technical Director Oracle Corporation [email protected] Agenda n Why a CMR? n What to include in a CMR? n Architecting a CMR n Leveraging a CMR 1 Why a Corporate Metadata Repository (CMR)? 3 Metadata Technology Continuum EMR Buried, Tool-based Inaccessible Data Metadata Dictionaries Defined Application Models Autonomous Repositories Integrated Vertical/ Inter-Dept Metadata CMR FedStats Integrated Corporate Enterprise Metadata Integrated Global Enterprise Metadata low integration high integration low share/reuse high share/reuse few open standards many open standards low interoperability high interoperability 4 2 BOC Current Business Process Does not include an Integrated Metadata Business Process Census 2000 American Community Survey internally developed systems customized commercial systems CASES Demographic Surveys Econ Census Econ Surveys variety of programming languages GIDS individual tool of choice Design internally developed systems CATI CAPI Mail PAPI OCS ICM CADE CSAQ OCR TDE PFIRS individual tool of choice Collect Process SAS DEVSURV DADS/AFF CENSAS FERRET COBOL FORTRAN DECForms Econ DW StEPS ECON DW Internet CD-ROM ISS (future) (future) Share What are the problems with the current Business Process? n n Difficult to: n meet customer demands for quick turnaround of surveys, and customized products n re-use and share metadata within the BOC n maintain consistent standards n compile and format metadata needed by dissemination systems n share metadata with external agencies, participate in Virtual Statistical Agencies, etc. n meet new metadata requirements like FGDC’s CSDGM content standard n perform time series or cross dataset comparisons Metadata integrity and quality can be compromised 3 BOC Goal: An Integrated Metadata Process copy 1998 Annual Survey 1998 Annual Survey copy 1999 Annual Survey Census and Survey Design 1998 Annual Survey copy 1999 Annual Survey Data Collection 1998 Annual Survey copy 1999 Annual Survey Data Processing 1999 Annual Survey Data Dissemination Corporate M E T A D A T A Repository What to include in a Corporate Metadata Repository? 8 4 What is Metadata? n “Data about data” n n n Information about “raw” data that gives it meaning, context or enhances understanding Data about the Content, Quality, Condition, and other characteristics about data Every informational asset that’s not data n n n n n Requirements, Data Models, Business Models, Screen Layouts Data Mappings and transformations Hierarchies, Aggregation rules, Formulas Rules for comparison of data sets and historical meaning Security access controls, operational schedules, code, ... 9 What is a Repository? Everything else •Name •Definition •Format Data Dictionary •Application •Application •System •Model •System •Model •Owner •Authority •Standard •Owner •Authority •Standard •Owner •Authority •Standard •Source •Source •Source •Source •Destination •Destination •Destination •Destination •Legacy •Legacy •Legacy •Legacy •Name •Definition •Format •Name •Definition •Format •Name •Definition •Format •Name •Definition •Format Data Directory Data Registry Data Data Encyclopedia Repository 5 Factors for determining CMR content n n Strategic to BOC Enterprise Opportunity for sharing and reuse of: n n n Metadata Meta-Model Generic vs. Application specific CMR Meta-Models Data Element Registry Security Framework (ISO/IEC 11179 Standard) Configuration Mgmt Framework Data Elements, Value Domains, Valid Values, Data Element Concepts,… Data Set Registry (Support FGDC CSDGM Geospatial Metadata Standard) A Data Set is a collection of Data Elements. Workflow Framework Product Registry Data Store (Supports FGDC CSDGM Geospatial Metadata Standard & Dublin Core) A Data Product may be a file/document, website/URL, or physical object. (OMG CWM Standard) Metadata for the physical data store. (Supports Relational, MultiDimensional, and Flat File stores) Business Rule Registry Survey Registry Surveys, Survey Instances, Universes, Frames, Sample, Questionnaires, Questions,… Classification Schemes (ISO/IEC 11179 Standard) Taxonomies, Keywords 6 Basic CMR Meta-Model Relationships Survey Survey Instance Questionnaire Product Data Set Data Store Data Element Question Definitions n Administered Component n n n n An object requiring naming, identification, configuration, security, and optionally, registration Has one or more designations (names) Has one or more definitions Classified Component n An object that may be classified as a part of a classification scheme 7 CMR Meta-Model High Level Administered Component Basic CMR Meta-Model Relationships Classified Component Survey Product Survey Instance Questionnaire Data Set Data Store Data Element Question Generating a Census Bureau Taxonomy + Census Bureau Information + Demographic + Census +Data Elements +Basic Demographic - Relationship - 1990 Census + Sex +2000 Census - Alternative Designations - Questionnaires - Alternative Definitions - Products - Data Element Concept +Datasets - Conceptual Domain - Public Use Microdata Sample - Value Domain - 100% Edited Detail File - Related Data Elements +Sample Edited Detail File - Related Information - Data Elements - Related Information - Survey - Age - Race - Marital Status - Economic - Occupation/Employment - Geographic - Housing 8 Architecting a CMR 17 CMR Component Based Architecture u Browser User Interface COTS Integrated Admin Tools Browsing Tools External Systems Metadata Interchange Load/Unload Products u Object Layer u Metadata Repository Physical Storage Layer u Flexible, functional, open, standardsbased, componentbased architecture Reuse Components Swap Components Minimize change impacts Security Framework 9 Proposed Technical/Software Architecture Four Ways an Application Can Use CMR Metadata Tightly Coupled with CMR 1. Application written against CMR - uses it directly for metadata access and maintenance. 2. Application uses same CMR core physical model - can replicate metadata from CMR. 3. Application communicates with CMR through an API to exchange metadata. 4. Application communicates with CMR using a standard XML-based metadata interchange. Loosely Coupled with CMR CMR Tools Web-enabled Open Administration Web-enabled Java Open Tools API Browsing XML Tools Interchange Integrated Portal Web Site Builder Corporate Metadata Repository CMR Core Meta-Models 10 CMR Extensibility CMR Extended Tools, API, Open Interchange Web-enabled Open Administration Web-enabled Java Tools Browsing API XML Tools Interchange Integrated Portal Web Site Builder Corporate Metadata Repository CMR Core Meta-Models CMR Extended Meta-Model S/W Requirements n n n Scalable Provides for open API and Interchange Implements Standards n n n n n n ISO/IEC 11179 FGDC CSDGM Dublin Core COTS preferred, if meets requirements High productivity development tools Self-documenting, easy to maintain app 11 CMR S/W for Deployment & Development Software Used for Oracle8i EE V8.1.6 interMedia WebDB V2.2 (upgrading to Oracle 9i Portal) CMR Physical Repository OAS V4.0.8.1 (upgrading to iAS) CMR Web Server Rational Rose 2000 CMR UML Modeling Designer6i CMR Server Modeling. CMR Web Application Generation plus some PL/SQL coding. JDeveloper V3.1 CMR Java API and XML application development (BC4Js & JSPs) Oracle XDK & MS Notepad CMR XML generation, parsing, processing, & upload/download from database tables Structured and Full-text Metadata CMR Web Portal Designer Generated CMR Tools Logical Models Functional Requirements Use Cases Physical Models Web Modules Server Tier Deployment PL/SQL generating HTML & JS Application Code TAPI (PL/SQL) UML Object Model Server Model Middle Tier Deployment Net8 Net8 OAS Environment w/ PL/SQL Cartridge & HTTP Listeners Client Tier Deployment HTTP Web Browser HTML Application View Layer CMR Repository Created/Generated using Oracle Designer Hand coded Created/Generated using Rational Rose 12 JDeveloper Generated Java API Logical Models Physical Models Server Tier Deployment Rational Rose Client Tier Deployment OAS Functional Requirements Use Cases UML Object Model Middle Tier Deployment BOC Java HTTP Server Pages Server Model Rational Rose Generated. Oracle Designer Maintained. CMR Open HTTP API CMR View Layer JDBC Java Object Layer (BC4J) CMR Repository Designer Generated DER XML Application BOC Java Applet or Application JDeveloper Generated Leveraging a CMR 26 13 Metadata for Dissemination Metadata Data ID WGT SEX AGE MARITAL 1 5 2 7 3 2 0 1 1 45 5 90 2 0 5 4 5 6 7 8 0 1 0 0 0 0 23 37 14 75 0 1 4 0 2 2 7 3 4 2 ... Survey/Census: Source: Dataset: Description: 1990 Decennial Census Bureau of the Census 1990 Public Use Microdata Sample (PUMS) The PUMS dataset has basic demographic information about persons and housing in the U.S. This information comes from the 1990 Decennial Census long form which is randomly sent to 1 in every 7 households. This dataset is for public use and does not compromise the confidentiality of individuals. Data Elements: ID - Record Identifier - A unique id for a record. Each record identifies 1 or more persons having the same demographic characteristics. (See WGT) WGT - Person Weight - A weight given to a record to represent the 1 or more persons with the same demographic characteristics. Valid values: 1..9 SEX - Person Gender - Valid values (0: male, 1: female) AGE - Person Age in Years - Valid values (0-90) Persons over 90 years of age are top-coded with an age of 90 for confidentiality reasons. MARITAL - Person Marital Status - Valid values (0: not applicable, 1: single, 2: married, 3: separated, 4: divorced, 5: widowed). Universe: Persons over 15 years of age. Those 15 and under are given a value of 0. For more information: Related Datasets and Publications, Sampling Errors and Techniques, etc. 27 CMR Support for American FactFinder XML CMR File ASCII AFF File CMR ASCII AFF File AFF Data Elements Data Sets Data Products AFF Metadata Providers 14 AFF Metadata-Driven Architecture u dataset -> AFF can automatically RunTime Calls AFF Application Code Add metadata and data for new CMR/AFF Business & Technical Metadata search and query the new dataset u Geography Trees, Datasets, Subjects, Report topics, etc. are all generated at runtime, by accessing the metadata Produces u Business metadata is linked to technical metadata such that user selections are used to generate SQL statements to query the data AFF Metadata-Driven, Dynamic Application 15 16 CMR Support for Econ 2002 Census Econ FGDC ACSD File File XML CMR File ASCII AFF File CMR XML Survey File ASCII AFF File Econ Metadata Providers AFF GIDS EMR 450 Econ Questionnaires Activating the CMR Data Element Registry Security Framework (ISO/IEC 11179 Standard) Configuration Mgmt Framework Data Elements, Value Domains, Valid Values, Data Element Concepts,… Data Set Registry Data Quality Inspection Workflow Framework (Support FGDC CSDGM Geospatial Metadata Standard) A Data Set is a collection of Data Elements. Product Generation Product Registry Data Store Data Set (OMG CWM Standard) Query Metadata for the physical data store. (Supports Relational, MultiGeneration Dimensional, and Flat File stores) (Supports FGDC CSDGM Geospatial Metadata Standard & Dublin Core) A Data Product may be a file/document, website/URL, or physical object. Business Rule Registry Survey Registry Surveys, Survey Instances, Universes, Frames, Sample, Questionnaires, Questions,… Taxonomy Tree Generation Survey Instrument Generation Classification Schemes (ISO/IEC 11179 Standard) Taxonomies, Keywords 17 Metadata: A core enabling component of any Information technology Enterprise Information Portal Knowledge Mgmt & Business Intelligence Digital Libraries e-Business ERP Data Integration Application/Tool Integration Legacy Migration Data Warehousing & Decision Support Data Query and Search Leveraging the CMR Data Element Registry 36 18 Government Vision Data Element Registry BOC Demographic, Economic, Geographic Data BLS USGS Economic Geographic Data Data HUD Housing Data EPA Environmental FAA Air Safety Data Data NASA Aircraft Data CDC Health Data NCI Health Data HCFA Health Data Integration Layer Global Standardized Data Elements Agency Standardized Data Elements Non-Standardized Data Elements 19 DER Integration Technology DER and Metadata Repository Legacy Data Exports OLTP DB Data Warehouse Staging DB Data Marts Source Flat Files FF ExtractExternal Transform Data Sources Quality Check Legacy Migration Load Multi-Dimensional Cubes DW and Analytics Web Deployment Information Portals E-Commerce Apps Questions 20
© Copyright 2024