Dublin Core for Museums Day 1 CIMI Paul Miller

Dublin Core for Museums
Day 1
Paul Miller
UK Office for Library &
Information Networking
[email protected]
CIMI
Thomas Hofmann
John Perkins
[email protected]
[email protected]
Australian Museums OnLine
Overview for Thursday March 25
 Introduction to Metadata
 Introducing the Dublin Core
 CIMI DC Guidelines - Dublin Core for Museums
 Break
 DC for museums continued...
 Lunch
 Practicalities of Implementing DC
 Break
 Introduction to MICI
What’s the Problem
?
 Need to serve a Web audience
 Demand for content
 Uncertain quality
 Expectations for rapid easy access
 Need to be visible on the Web
 Two million web sites
 Half a billion addressable pages
 Many communities with the same problem
What’s the Problem
?
 Manage and organise interconnected data
 Different types
 Different repositories
 Packages
 Interoperate with other communities
 Interoperate with other applications
 Need a way to:
 Express meanings in rich and complex data
 Express the structure of our data
 Encode the transfer of data
What’s the Solution ?
 Communities address their own needs
 Do so in a way that works across communities
 Standards based
 Collaborative
What is a Community?
 A resource description community is
characterised by agreed semantic,
structural and syntactic conventions for
exchange of descriptive information
Libraries
MARC
Museums
AACR2
SPECTRUM
MICI
Based on a slide by Stu Weibel
Communities working together
Home
Pages
Scientific
Databases
Libraries
Geo
Commerce
‘Internet
Commons’
Museums
Whatever...
Based on a slide by Stu Weibel
Communities working together
Metadata
Metadata
Metadata
Metadata
Museums
Metadata
Based on a slide by Stu Weibel
What is Metadata?
 Meaningless jargon
 or
a fashionable term for what we’ve always done
 or
“a means of turning data into information”
 and
“data about data”
 and
the name of a film director (‘Luc Besson’)
 and
the title of a book (‘The Lord of the Flies’).
What is Metadata?
 Metadata exists for almost anything
 People
 Places
 Objects
 Concepts
 Databases
 Web pages
What is Metadata?
 Metadata fulfils three main functions:
 description of resource content
 “What is it?”
 description of resource form
 “How is it constructed?”
 description of issues behind resource use
 “Can I afford it?”.
What is Metadata?
 Many structures have evolved at different
levels, and to meet different requirements...
MICI
For human communication we need...
Semantic
Interoperability
Standardisation of
content
“Let’s talk English”
“cat milk sat drank mat ”
Structural
Interoperability
Standardisation of
form
“Here’s how to
make a sentence”
“Cat sat on mat. Drank
milk.”
Syntactic
Interoperability
Standardisation of
expression
“These are the rules “The cat sat on the mat.
of grammar”
It drank some milk.”
Challenges
Opportunities
 Many flavours of metadata
 which one do I use?
 Managing change
 new varieties, and evolution of
existing forms
 Tension between functionality and simplicity,
extensibility and interoperability
Functions, features, and cool stuff
Simplicity and interoperability
Introducing the Dublin Core
 An attempt to improve resource discovery
on the Web
 now adopted more broadly
 Building an interdisciplinary consensus about a
core element set for resource discovery
 simple and intuitive
 cross–disciplinary
 international
 flexible.
Introducing the Dublin Core
 15 elements of descriptive metadata
 All elements optional
 All elements repeatable
 The whole is extensible
 offering a starting point for semantically richer
descriptions
 Interdisciplinary
 libraries, museums, government, education...
 International
 available in 20 languages, with more on the way.
Introducing the Dublin Core
 Title
 Format
 Creator
 Identifier
 Subject
 Source
 Description
 Language
 Publisher
 Relation
 Contributor
 Coverage
 Date
 Rights
 Type
http://purl.org/dc/
Extending DC (semantic refinement)
Improve descriptive precision by adding
sub–structure (subelements and schemes)
Element qualifier
Value qualifier
Greater precision = lesser interoperability
Should ‘dumb down’ gracefully
Creator
First Name
Affiliation
Surname
Contact Info
Based on a slide by Stu Weibel
Extending DC (a modular approach)
 Modular extensibility...
 additional elements to support local needs
 complementary packages of metadata
 …but only if we get the building blocks right
Terms & Conditions
Description
Archival Management
Based on a slide by Stu Weibel
Extending DC?
 DC offers a semantic framework
 through use of further substructure,
meaning can often be clarified
“John”
<Creator>
<Creator>
<fore name>
“John”
John Inc. ?
John xyz ?
xyz John ?
John Inc.
John xyz
xyz John.
Extending DC?
 DC offers a semantic framework
 Use of domain–specific schemes greatly
increases precision
“Washington”
<Coverage>
<Coverage>
<TGN>
“Washington”
Washington State ?
Washington DC ?
Washington monument ?
Washington State
Washington DC
Washington monument
“North and Central America, United States, Washington”
http://gii.getty.edu/tgn_browser/
Dublin Core in the physical world
 Dublin Core originally designed
with electronic resources in mind
 Physical resources are fundamentally
different
 Issues of surrogacy become more important
 Genre, Type, and Format models vary greatly
 Difficult to remember what is being described, and
which characteristics of the resource and its
surrogates are ‘correct’.
Introducing Physical Objects
 Aspects of the real world are key
to much of what museums do
 Physical objects have dimensions
 23 x 46 cm
 12 x 52 x 18 in
 18.6 cm3
 823 pages
 Physical objects have a form
 oil on canvas
 Tadcaster limestone
 stainless steel.
Introducing Physical Objects
 Physical objects change over
time
 constructed between AD524
and 873
 repaired in AD1270
 incorporated into ornamental arch in AD1320
 Physical objects move
 cast in Beijing
 used in Shanghai
 taken to Hong Kong
 on display in Macau.
Introducing Physical Objects
 Physical objects are associated with people
 written by William Shakespeare
 acquired by Lord Elgin
 decreed by the Emperor Hadrian
 associated with Prince Charles Edward
Stuart
 Physical objects are contextualised
 fired at the Battle of Trafalgar
 carried on Apollo 11 from the moon
 printed on the first printing press
 salvaged from the Titanic.
Introducing Collections
 Museum objects, whether original or
surrogate, are normally part of a
collection
 Collections may be ‘real’...
 the Sutton Hoo hoard
 the Terracotta Warriors
 ...an aspect of the process by which objects enter the
museum...
 the Burrell Collection
 Solomon Guggenheim’s art collection
 …or simply practical
 coins at the British Museum
 the Tate Gallery’s collection of works by Da Vinci.
Introducing Surrogacy
 Many of the resources we describe are,
in reality, surrogates for something else
 a photograph of King Tutankhamen’s
death mask
 a photograph of a statue of
George Washington
 a film of President Kennedy’s assassination
 a sound recording of Neil Armstrong’s “One
small step for man…” speech on the moon
 a copy of the Mona Lisa
 a model of the Great Wall of China
 a reproduction of the Terracotta warriors.
Issues of Surrogacy
 Many of the resources we describe are,
in reality, surrogates for something else
 we need to be clear whether we are
describing the resource or its surrogate
 the sculptor of a statue is often not the
person who made its photographic surrogate
 the model of the Forbidden City is unlikely
to have been created at the same date as
the Forbidden City itself
 the format of a computer image of the Mona
Lisa (image/jpeg ?)is not the same as the
format of the original painting (oil on canvas ?).
Other Museum Issues
 Museums need to describe real objects
and surrogates in a similar manner
 guidelines/standards therefore need to encompass
both, despite their differences
 Resource descriptions will often be drawn from
existing collection management systems in the first
instance, rather than created afresh
 guidelines therefore need to respect existing practices
within established systems
 There is often no ‘right’ answer
 so practices need to allow for approximate dates,
multiple possible creators, etc.
Introducing the 1:1 Principle
 The broader Dublin Core community is
tackling some of the problems relevant to museums
 Their work on the ‘1:1 Principle’ is especially useful in
resolving museum issues over original versus
surrogate and item versus collection:
 each Dublin Core ‘record’ should describe only one
resource, whether surrogate or original. Associated
resources should be linked together by means of the
Relation element in Dublin Core.
Introducing the 1:1 Principle
 In a record describing a photo of the Mona
Lisa on a web page, for example…
 Leonardo da Vinci is not the creator of the image
 The image was not created during the Renaissance
 …but you might include these as Subject terms, and
you could usefully provided a link to the record
describing the real painting via Dublin Core’s Relation
element
 Equally, in describing the painting itself…
 http://www.louvre.fr/…/monalisa.jpg is not the Identifier
of the painting
 but you might link to this image via Relation, just to
show people what the painting looks like.
The primacy of ‘Type’
 In describing museum objects,
it is often most useful to first decide what
you are describing and why, rather than
beginning with ‘who made it’ and
‘what is it called’, as is often the case with books
 if you know you’re describing a surrogate of the Mona
Lisa, then you know Leonardo da Vinci is not the
Creator; whoever made the surrogate is
 if you know you’re describing a collection of 20th
century paintings, then you know that Picasso,
Hockney et al are not the Creators; the collector is.
The primacy of ‘Type’
 if you know you’re describing the
Sutton Hoo helmet, then the fact that
it was added to a particular museum
collection in 1939 perhaps doesn’t matter;
that information is better placed in the collection record
 if you know you’re describing a natural specimen, then
perhaps it has no Creator; there may be a ‘creator’
associated with its identification or collection, though.
Dublin Core for Museums: Assumptions
 In applying Dublin Core to museums, we are
making certain basic assumptions, many of
which were tested by CIMI
 DC is appropriate for use in describing both physical
and digital resources
 DC is easy to learn and simple to use
 Information can be meaningfully and efficiently
extracted from existing museum systems in order to
populate DC records
 the creation of a DC record to describe a museum
object is cost–effective, and aids the discovery of
resources more than simply allowing access to the
underlying Collection Management system might.
Practicalities of Implementing
Dublin Core
Paul Miller
Thomas Hofmann
Uk Office for Library & Information
Networking
Australian Museums On-Line
[email protected]
[email protected]
Overview
 Creation and Maintenance
 Harvesting and Distribution
 Retrieval
 Implementation Models
 Case Study
Dublin Core - Refresher
 15 simple elements
 Focus on Resource Discovery not Resource
Description
 One Dublin Core record per resource
 Interoperable across communities
 Can be easy populated from existing
databases
 Can be formatted in XML/ RDF or HTML
When should I use Dublin Core?
 You have a rich standard, need simpler one
 You want to disclose your data to other
communities using commonly understood
semantics
 You want to provide unified access to
databases with different underlying schemas
 You need core description semantics and don’t
feel compelled to invent them anew
Considerations

Creation and Maintenance
tools
educate
Harvesting/ Distribution

tools
Retrieval
tools
consensus
interface design


Creating and Maintaining
Dublin Core Metadata

Encoding Dublin Core
 HTML
 Unqualified
 Easy
 Qualified
 Overloaded Content (HTML 3.2)
 Additional Attribute (HTML 4)
 RDF
 Based on XML
 Sophisticated
 More complex
Encoding Dublin Core - Unqualified
<HEAD>
<META
NAME="DC.TITLE"
CONTENT="My Web Page">
<META
NAME="DC.Subject"
CONTENT="Computers,Metadata">
</HEAD>
Encoding Dublin Core - Qualified (HTML 3.2)
<HEAD>
<META
NAME="DC.Subject"
CONTENT="(SCHEME=AAT)(LANG=EN)
Statue, Granite">
</HEAD>
Encoding Dublin Core - Qualified (HTML 4)
<HEAD>
<META
NAME="DC.Subject"
SCHEME="AAT"
LANG="EN"
CONTENT="Statue, Granite">
</HEAD>
Encoding Dublin Core - Sub-Elements
<HEAD>
<META
NAME="DC.Date.Created"
CONTENT=" (SCHEME=ISO8601)
1999-03-01">
<META
NAME="DC.Date.Modified"
SCHEME="ISO8601"
CONTENT="1999–03–25">
</HEAD>
Encoding Dublin Core - RDF
...
<?xml:namespace href="http://iso.ch/8601/" as="ISO"?>
<RDF:RDF>
<RDF:Description …>
<DC:Date>
<RDF:Description>
<ISO:date>1999–03–25</ISO:date>
</RDF:Description>
</DC:Date>
<RDF:Description>
</RDF:RDF>
Example Tool: DC Dot
 http://www.ukoln.ac.uk/metadata/dcdot/
 Semi-automated generation of Dublin Core
 Cut and past into document
 Conversions to HTML, SOIF, XML, WHOIS++,
USMARC, GILS
Example Tool: DC Dot
Screenshot of http://www.ukoln.ac.uk/metadata/dc-dot/
Example Tool: DC Dot
Screenshots of DC Dot output
Example Tool: Reggie
 http://metadata.net
 Generic creation tool for any metadata schema
published to metadata.net
 Currently supports: Dublin Core in 5 languages
 Syntax: HTML META tags (V3.2 and 4.0), RDF
Example Tool: Reggie
Screenshot of Reggie
Example Tool: Site Generator
 http://www.dstc.edu.au/RDU/MetaWeb/
 Tool which parses local web site and automatically
creates Dublin Core metadata
 Syntax: HTML
 JAVA based tool which requires JDK 1.1
Further Information - Creation and Maint.
 Metadata Creation Tools
General METADATA PAGE AT UKOLN
http://www.ukoln.ac.uk/metadata/software-tools/
METAWEB
http://www.dstc.edu.au/RDU/MetaWeb/
TagGen SE
http://www.hisoftware.com/fact_sheetcc.htm
 User Guides
Official User Guide for Simple Dublin Core
http://purl.org/dc/core/documents/working_drafts/wd-guide-current.htm
CIMI Guide to Best Practice: Dublin Core
Harvesting and Distributing
Dublin Core Metadata

Harvesting / Distribution
 Tools
 Z39.50 Gateway
 Metadata Harvester
 Full-text Search Engine
 Resources
 Indexing, harvesting tools
http://www.searchenginewatch.com/
http://www.searchtools.com/
http://www.ukoln.ac.uk/metadata/software-tools/
http://www.dstc.edu.au/RDU/MetaWeb/
 Z39.50
http://www.ilrt.bris.ac.uk/discovery/z3950/resources/
http://www.ukoln.ac.uk/dlis/z3950/resources/
Searching and Retrieving
Dublin Core Metadata


Retrieval
 Tools
 HTML - search forms
 HTML - predefined queries
 Z39.50 clients/ Java applets
 Standalone applications
 Interface design
 Assist users:
-help them to understand what they are looking for
-give them an idea what terminologies you are using
-use commonly understood design language
Bringing it all together:
Implementation Models




Implementation Models
 Harvesting DC into a repository (database)
 Distributed Database Search
 Full-text indexing with metadata extraction
Implementation Models
 Harvesting DC into a repository (database)
Query
Repository
Harvester
retrieve resource



HTML
XML
Other types
Dynamic document
creation from database
Implementation Models
 Distributed Database Search
Z39.50 Server
Query
Z39.50 Gateway
Z39.50 Server
Z39.50 Server
retrieve resource
Implementation Models
 Full-text indexing with metadata extraction
Query
Index DB
Indexer
retrieve resource



HTML
XML
Other types
Dynamic document
creation from database
Questions before implementation






Do I really need Dublin Core?
What is my budget?
What type of resources do I want to describe?
Which encoding format for which resource?
Do I have community support?
Can I provide creation tools?
Challenges of implementing Dublin Core
 Intellectual
 Education of information creators
 Community consensus
 Resistance against sharing information
 Technical
 Efficient tools
 Infrastructure
 Economical
 Automatic generation vs. manual creation
 Cost of training
 Cost of tools
Dublin Core for Masses?
Dublin Core for the masses
Why Dublin Core hasn’t hit the consumer market yet
 No killer application
 Lack of standardisation
 No support in public search engines
 No support in mass market applications
 Non transparent applications
 Inefficient handling in HTML
Further Information
 Projects
Official Dublin Core web site
http://purl.oclc.org/dc/projects/index.htm
 Mailing lists
Dublin Core Implementors workgroup Mailing list
http://www.mailbase.ac.uk/lists/dc-implementors/
Case Study: AMOL
Case Study AMOL (1)
 Gateway to Australian Museums and Galleries
 Initial idea: One central access point for all Australian
collections
 Creation of AMOL standard record for object data due to
lack of common standards
 8 basic field with focus on resource discovery and easy
deployment from within existing databases
 Fields: Object Title, Object Name, Creator, Description,
Item ID, KeySearchTerms, Date/DateRange, Associated
Places
Case Study AMOL (2)
AMOL search/ system architecture - current system
User queries search
engine and gets records
delivered to web browser
Mapped metadata exported

HTML documents
Legacy DB
AMOL index server
Remote web server
storing HMTL documents
Case Study AMOL (3)
Lessons Learned
 Data and technology related
 Lack of consistent use of controlled vocabularies, quality of
data recorded
 Performance of indexing software, lack of metadata support
in public search engines
 high administration efforts

Intellectual


Users have problems with “empty text box” approach
Limited information in record to see context with larger picture
 General
 Large institutions: bureaucratic machinery, complex collection
systems designed without interoperability in mind
 Small institutions: concerned about security issues,
fear of larger institutions
Case Study AMOL (4)
New perspectives
 New resource types: Information about institutions,




Images, Video, Audio, general HTML pages - goes
beyond capabilities of standard AMOL record
Need to provide easier access for users
New cross community projects require interoperable
metadata standards for cross domain searching
Strong move in Australia towards Dublin Core based
metadata schemas driven by government
Strong move towards interpretation of objects through
stories
Search Architecture and extended AMOL metadata
standard
Case Study AMOL (5)
NEW AMOL search/ system architecture
User queries search
engine and gets records
delivered to web browser
Legacy databases
AV resources
Textual resources



Information mapped
to DC based
metadata plus index
text, images
AMOL index server
Remote web server
Providing dynamic access
to ODBC databases
Case Study AMOL (6)
Future Directions
 Implementation of RDF for dynamically served
databases and text style resources
 Consensus of community: Metadata Forum
 Further education of users: Metadata
Workshops
 Creation of multi-type metadata schema
based on Dublin Core
 Creation of mapping tools for easier database
implementation
Case Study AMOL (7)
Recommendations





Prepare good user guides
Run workshops and educate museum professionals
Get consensus from community
Plan with interoperability in mind
Evaluate tools and plan for future additions
Biggest Problem still remaining:

what is the benefit to the individual institution other
than being interoperable for networked resources
Dublin Core for Masses?
Dublin Core for the masses
Why Dublin Core hasn’t hit the consumer market yet
 No killer application
 Lack of standardisation
 No support in public search engines
 No support in mass market applications
 Non transparent applications
 Inefficient handling in HTML
Further Information
 Projects
Official Dublin Core web site
http://purl.oclc.org/dc/projects/index.htm
 Mailing lists
Dublin Core Implementors workgroup Mailing list
http://www.mailbase.ac.uk/lists/dc-implementors/
http://www.cimi.org/
For Machine Communication we need..
Semantic
Interoperability
Standardisation of
content
“Let’s talk Resource
Description”
Structural
Interoperability
Standardisation of
form
“Lets use MICI”
Syntactic
Interoperability
Standardisation of “Here’s how to say it in
expression
HTML”
“Creator,
Publisher..,”
“Field # 1 Element Name
“<Meta name=
Element Name=
“….”>”