Interoperable Annotation: Perspectives from the Open Annotation Collaboration

Interoperable Annotation:
Perspectives from the Open
Annotation Collaboration
<http://www.openannotation.org/>
Robert Sanderson
–
[email protected]
[email protected]
Herbert Van de Sompel – [email protected]
[email protected]
Digital Library Research and Prototyping Team
This research was funded by the Andrew W. Mellon Foundation.
Acknowledgements:
Tim Cole,
Los Alamos
National Laboratory,
USA
Bernhard Haslhofer, Jane Hunter, Ray Larson, Cliff Lynch, Michael Nelson, Doug Reside
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Overview
• The Collaboration and Project
• Interoperability:
• Status
• Basic Principles
• Current Data Model
• Protocol-less Approach
• Demo
• Summary
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 2
The Collaboration
• Los Alamos National Labs
(Data Modeling and Interoperability)
• Herbert Van de Sompel (PI)
• Robert Sanderson
• University of Illinois at Urbana-Champaign
• Tim Cole (PI)
• Allen Renear
• Carole Palmer
• Tom Habing
• University of Queensland
• Jane Hunter (PI)
• Anna Gerber
• Ron Chernich
• University of Maryland
• Neil Fraistat (PI)
• Douglas Reside
• George Mason University
• Dan Cohen
• Sean Takats
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 3
The Project
• Aims
• Facilitate a Web-centric interoperable annotation environment
• Demonstrate the proposed environment for scholarly use-cases
• Seed adoption by deployment of high-visibility production systems
• Phase I
• Exploration of Existing Systems, Requirements and Use Case analysis
• Initial Interoperability Specification
• Integration of AXE and Zotero
• Mellon Foundation Funded
• Alerted to possibility through JSTOR (unfunded partner)
• 14 Months for Phase I
• Hope to proceed to Phases II and III
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 4
Advisory Board
• Members
• Maristella Agosti
(University of Padua)
• Geoffrey Bilder
(CrossRef)
• John Bradley
(King's College London)
• Gregory Crane
(Tufts University)
• Paul Eggert
(Australian Scholarly Editions Centre)
• Julia Flanders
(Brown University)
• Cliff Lynch (Chair)
(Coalition for Networked Information)
• Cathy Marshall
(Microsoft Research)
• Martin Mueller
(Northwestern University)
• Geoffrey Rockwell
(University of Alberta)
• David Ruddy
(Cornell University Library)
• Joyce Rudinsky
(University of North Carolina, Chapel Hill)
• Mackenzie Smith
(MIT Libraries)
• Amanda Ward
(Nature Publishing Group)
• John Wilbanks
(Science Commons)
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 5
Technical Group
• Open List
• Anyone can join and discuss
• Discussions visible, not behind closed doors committee
• http://groups.google.com/group/oac-tech
• Current Members:
• Bruce D'Arcus
• Anna Gerber
• Tom Habing
• Bernhard Haslhofer
• Michael Nelson
• Rob Sanderson
• Ed Summers
• Herbert Van de Sompel
(University of Ohio)
(University of Queensland)
(UIUC)
(University of Vienna)
(Old Dominion University)
(LANL)
(Library of Congress)
(LANL)
• Please join, or have your techie join!
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 6
Interoperability: Status
• Requirements and Use Cases
• Gathered from scholars, mainly humanities
• Gathered from scholarly literature
• Initial Design
• First part of Phase I, Thread I (LANL)
• Discussed in f2f at UC Berkeley (Collaborators plus invited experts)
• Feedback
• Difficulty of defining what an 'annotation' actually is
• Model too generic!
• Risk of Everything being an Annotation
• Reaction
• We (Rob and Herbert) tend to agree
We Want Your Feedback!
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 7
Interoperability: Basic Principles
• Effort focuses on Interoperability to allow annotation sharing
• This is what we were funded to do!
• Many MANY non-interoperable annotation systems already
• Existing interoperability mechanisms (eg Annotea) need updating
• Interoperability approach is based on the Architecture of the Web
• Communication is increasingly online
• Resources of interest are increasingly online
• Maximize chance of adoption by not being domain-centric
• Semantic Web principles and Linked Data guidelines important
• Entities within the model must be identified by HTTP URIs
• … when possible
• From Linked Data guidelines
• Globally unique identifiers without central system overhead
• Locator as well as Identifier: can retrieve representation
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 8
Current Data Model
• Alpha Data Model
• Not set in stone, please provide feedback!
• Not published as a specification, just guidelines and 'current thinking'
• Implemented as demonstrator, not production system
• Informed by previous work and expert contributions
• 4th internal iteration of model
• Guided by Requirements and Use Cases
• Will build up in steps with appropriate justification
• Conflicting requirements possible (e.g. simple yet powerful and flexible)
• Helps to ensure adoption and understanding by minimizing unnecessary or
unused features
• Keeps specification authors in touch with real users and needs
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 9
Data Model: Step 1
Requirement:
Users must be able to create an annotation with some content about a target
resource
Principle:
An annotation is an event at a moment in time, initiated by an agent, with a
source of content and a target. There is an implicit or explicit relationship
between the source and target expressed by the annotation.
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 10
Step 1: Baseline Model
The intuitive model is that we have a Source of Content (S-1), which annotates a
Target resource (T-1). We add an Annotation node (A-1) that represents the
annotation event.
The Source of Content must have some relationship to the Target. By default, it
should be somehow 'about' the Target for it to be considered an Annotation.
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 11
Step 1: Baseline Model
As entities in the baseline model, both Content and Target are resources on the web.
Thus they can be of any type, format, language (or not language at all), location, …
http://www.youtube.com/watch?v=fgg2tpUVbXQ
http://commons.wikimedia.org/wiki/File:Hubble_ultra_deep_field.jpg
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 12
Step 1b: Properties and Relationships
Properties and relationships can now be
attached to the Annotation.
This diagram shows the other information
from the initial use case: who created the
annotation (U-1), when they created it
(datetime) and the relationship (called a
predicate, P-1) between Content and
Target.
Diagram Style Note:
• Entities are circles
• Values are ovals with data type
• Relationships/Properties are named lines
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 13
Step 1b: Properties and Relationships
oac:annotates
http://public.lanl.go
v/
rsanderson
2009-12-03 09:30:00
oac:
annotates
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 14
Step 1b: Properties and Relationships
oac:annotates
http://public.lanl.
gov/
rsanderson
2009-12-03 12:06:00
oac:
annotates
http://twitter.com/azaroth42/status/6312196800
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 15
Data Model: Step 2
Requirement:
The model must support systems
which only support a userentered string as the content of
the annotation.
Principle:
Annotations must allow for
content and target of any media
type and format.
Fab4 Client: http://bodoni.lib.liv.ac.uk/fab4/
(Extensive functionality, but only string content)
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 16
Step 2: String Content
To allow for this common case, we can
assign a unique non resolvable URI
called a URN. This is an exception to
the HTTP URIs for everything principle.
(We will come back to this later)
The text is captured in the oac:body
property.
We define a Note class which can be
used with either a protocol or non
protocol URI, as a hint to the client to
not bother dereferencing the URI and
instead look in oac:body.
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 17
Step 2: String Content
urn:uuid:1234-567…
"The Hubble Deep
Field image is very
impressive!"
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 18
Data Model: Step 3
Requirement:
The model must enable the user to select a part of the resource as the target for their
annotation, not just the entire resource.
Pliny Client: http://pliny.cch.kcl.ac.uk/
Flickr: http://www.flickr.com/photos/raindog/75675947/
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 19
Step 3: Segments of Resources
W3C Media Fragment URIs allow us to create a URI that identifies a segment of a
resource for common cases.
e.g:
http://…/foo.jpg#xywh=160,120,320,240
… identifies a 320 by 240 box, at 160,120
in the image foo.jpg
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 20
Step 3: Segments of Resources
http://commons.wikimedia.org/wiki/File:Hubble_ultra_deep_field.jpg
http://twitter.com/azaroth42/status/6312261983
http://commons.wikimedia.org/wiki/File:Hubble_ultra_deep_field.jpg#xywh=400,80,100,100
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 21
Data Model: Step 4
Requirement:
It is important to be able to express non-rectangular regions of a resource for
situations where a rectangle cannot unambiguously delineate the region of interest.
Principle:
Content and Target must be able to
be arbitrary parts or segments of a
resource.
http://dme.arcs.ac.at/image-annotation-frontend/
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 22
Step 4: Complex Segments
In this case, we use a Context node (C-1)
with a Segment Description (SD-1). This
can be of any format, and will be
dependent on the properties of the Target.
A description for a segment of an image
might be an SVG path, an XPath for XML,
a speaker in an audio track.
Work exists on this topic for common
cases, such as in MPEG-7.
Segments of complex objects such as
databases, datasets, or other nontraditional media will require more
research.
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 23
Step 4: Complex Segments
http://twitter.com/azaroth42/status/6312304007
SVG
http://annotation.lanl.gov/docs/0.1/examples/4.svg
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 24
Data Model: Step 5
Requirement:
The annotation should be able to
have more than one target resource
or segment, when the annotation
concerns multiple resources or
creates a relationship between
resources.
Principle:
The model must support multiple
contents and targets of annotation.
Pseudo multiple targets in Adobe Acrobat
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 25
Step 5: Number of Targets
This is modeled in a very predictable way.
Note that the relationship from the Content applies to all of the Targets.
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 26
Step 5: Number of Targets
oac:annotates
oac:
annotates
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 27
Step 5b: Number of Content Sources
Although somewhat exotic, multiple content sources also can be modeled.
Use Cases:
• Same comment expressed in different
formats (txt, MathML)
• Same comment expressed in different
media (txt, mp3)
• Same comment later translated to
different language, format, media,
locations
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 28
Step 5b: Number of Content Sources
oac:annotates
MP3
oac:
annotates
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 29
Data Model: Step 6
Requirement:
The annotation should be robust across time with respect to changing resources,
such that the annotation is applied correctly to the version of the resource which was
originally annotated. This will prevent misinterpretation of the annotation and take
steps towards the digital preservation of the resources involved as well as the
annotation.
Please also come to hear about Memento tomorrow morning!
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 30
Step 6: Robust Annotations in Time
http://news.bbc.co.uk/
http://twitter.com/azaroth42/status/6314856966
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 31
Step 6: Robust Annotations in Time
Solution 1:
The created timestamp for the Annotation should be used as the date/time of the
version of the content and target resources. Unless the version is specified explicitly.
2009-12-01 12:00:00
But what about "Sad story yesterday …" sort
of annotations?
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 32
Step 6: Robust Annotations in Time
Solution 2:
The Context node can record the
timestamp of when the annotation applies
to the target, if not at the creation time of
the annotation. Storing other information
such as digests is also possible.
Solution 3:
Annotate an appropriate archived copy of
the resource (e.g. in Internet Archive),
and relate it back to the original to enable
appropriate discovery and rendering.
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 33
Data Model: Step 7
Requirement:
A description of the annotation must be made available for dissemination in an
interoperable method, following the Web Architecture and Linked Data guidelines.
http://code.google.com/apis/sidewiki/docs/2.0/developers_guide_protocol.html
http://www.w3.org/2001/Annotea/User/Protocol.html
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 34
Step 7: Annotation "Transcription"
We follow the same conventions as Linked Data (and OAI-ORE) and introduce a
'transcription' of the annotation as a resource on the web. This can be retrieved using
HTTP and includes all of the data and metadata concerning the annotation and other
objects in the model.
http://annotation.lanl.gov/docs/0.1/examples/7.rdf
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 35
Alpha Data Model
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 36
Background for Data Model
Background Research:
• Annotea
• LEMO
• DiLAS
• Fab4
• Pliny
• Google SideWiki
• Flickr Annotations
• Richard Newman's Tag Model
• … plus 3 extensions
• Common Tag Model
• Henry Story's Tag Model
• Many online web annotation systems
We believe that our data model covers everything that we have seen as well as the
requirements and use cases expressed.
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 37
Protocol-less Approach
Existing systems are tightly coupled:
• The client sends the annotation to the server to store
• The server sends the annotation to clients on request
Annotea is a REST protocol, Google Sidewiki uses ATOM plus
extensions, most are proprietary.
We believe this is a hindrance to interoperability … any protocol
that ties servers and clients together is a hindrance to
interoperability from the Linked Data perspective.
We recommend no protocol, as opposed to not recommending
a protocol.
. . o O (This from key players in OAI-PMH and SRU… have
they totally lost it??)
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 38
Protocol-less Approach
Existing systems are tightly coupled:
• The client sends the annotation to the server to store
• The server sends the annotation to clients on request
Google
Interface
Google
Interface
Annotea is a REST protocol, Google Sidewiki uses ATOM plus
extensions, most are proprietary.
Google
Protocol
Google
Protocol
We believe this is a hindrance to interoperability … any protocol
that ties servers and clients together is a hindrance to
interoperability from the Linked Data perspective.
Google
SideWiki
We recommend no protocol, as opposed to not recommending
a protocol.
Google
Protocol
Google
Interface
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 39
Protocol-less Approach
Existing systems are tightly coupled:
Google
• TheGoogle
client sends the annotation to the server to store
Mail
• The Docs
server sends the annotation to clients on request
Google
Interface
Google
Interface
Annotea is a REST protocol, Google Sidewiki uses ATOM plus Google
Google
extensions, most are proprietary.
Protocol
Protocol
Google
Google
DNS
Wave
We believe this is a hindrance
to interoperability … any protocol
Google
that ties servers and clients together is a hindrance to
SideWiki
interoperability from the Linked Data perspective.
Google
Google
HTTPS
Google
Chromium
We recommend no protocol, as opposed
to not recommending
Protocol
OS
a protocol.
Google
Droid
Google
Chrome
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Google
Interface
Slide: 40
Protocol-less Approach
Breaking This Apart Promotes Interoperability:
• The client sends the annotation somewhere to
store (or multiple places)
• The server retrieves the annotation
• … using regular discovery/harvesting
techniques (Pull)
• … on demand from the client
(Pull on demand)
• … by being one of the places the client
sends the annotation to (Push)
• The server is just one service that can send the
annotation to clients on request
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 41
Protocol-less Approach
Breaking This Apart Promotes Interoperability:
• The client sends the annotation somewhere to
store (or multiple places)
• The server retrieves the annotation
• … using regular discovery/harvesting
techniques (Pull)
• … on demand from the client
(Pull on demand)
• … by being one of the places the client
sends the annotation to (Push)
Preferred
Interface
Preferred
Interface
Harvester
Harvester
Blog
Twitter
Web Server
• The server is just one service that can send the
annotation to clients on request
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Preferred
Interface
Slide: 42
Protocol-less Approach
Consequences:
• Multiple servers, aggregators or other applications can access the annotation
• The client can use whatever protocol is needed by the storage server(s)
• Annotations are regular web resources by necessity
• Access control is just like any other access control on the web
• Services can be used to extend information in annotation
• Add extra information for robustness over time
• Add extra information for robustness of segment location
• Text Mining, Data Mining services
• Graph/Relationship Mining across other annotations
•…
• Servers can replace URNs with real URIs
• Multiple servers can do this, and will deduplicate with original identifier
• Use well known owl:sameAs predicate for this
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 43
Demonstration!
Let's See This Working!
(Or at least a screen capture of it working, given the poor network here)
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 44
Demonstration!
Let's See This Working!
(Or at least a screen capture of it working, given the poor network here)
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 45
Summary
Collaboration:
• Up and working
Data Model:
• Alpha version ready for feedback:
• Not a specification document
• Covers use cases, requirements and previous work
• Based on Linked Data and the Web Architecture
• Protocol-less interactions between distributed servers and clients
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 46
Thank You 
Thank You!
Questions?
Pointers:
• http://www.openannotation.org/
• http://groups.google.com/group/oac-tech
• [email protected] ; [email protected]
• This presentation being videoed, will be posted on CNI website
• http://www.slideshare.com/azaroth42/oac-presentation-at-cni-09-fall-forum
• http://www.youtube.com/watch?v=n_7rgsQHuHA
Interoperable Annotation
Rob Sanderson, Herbert Van de Sompel
CNI Fall Task Force Meeting, Dec 14-15, Washington DC
Slide: 47